History log of /PHP-8.1/ext/mbstring/tests/iso2022jp_kddi_encoding.phpt (Results 1 – 6 of 6)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
Revision tags: php-8.1.7RC1, php-8.1.4RC1, php-8.1.3, php-8.1.2RC1, php-8.1.0, php-7.3.33, php-7.3.32, php-7.3.31
# 776296e1 30-Aug-2021 Alex Dowad

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like "BAD+XXXX", where "XXXX"
would be the erroneous bytes expressed in hexadecimal. This mode could
be enabled by calling `mb_substitute_character("long")`.

However, accurately reproducing input byte sequences from the cached
state of a conversion filter is often tricky, and this significantly
complicates the implementation. Further, the means used for passing
the erroneous bytes through to where the "BAD+XXXX" text is generated
only allows for up to 3 bytes to be passed, meaning that some erroneous
byte sequences are truncated anyways.

More to the point, a search of publically available PHP code indicates
that nobody is really using this feature anyways.

Incidentally, this feature also provided error output like "JIS+XXXX"
if the input 'should have' represented a JISX 0208 codepoint, but it
decodes to a codepoint which does not exist in the JISX 0208 charset.
Similarly, specific error output was provided for non-existent
JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few
other charsets. All of that is now consigned to the flames.

However, "long" error markers also include a somewhat more useful
"U+XXXX" marker for Unicode codepoints which were successfully
decoded from the input text, but cannot be represented in the output
encoding. Those are still supported.

With this change, there is no need to use a variety of special values
in the high bits of a wchar to represent different types of error
values. We can (and will) just use a single error value. This will be
equal to -1.

One complicating factor: Text conversion functions return an integer to
indicate whether the conversion operation should be immediately
aborted, and the magic 'abort' marker is -1. Also, almost all of these
functions would return the received byte/codepoint to indicate success.
That doesn't work with the new error value; if an input filter detects
an error and passes -1 to the output filter, and the output filter
returns it back, that would be taken to mean 'abort'.

Therefore, amend all these functions to return 0 for success.

show more ...

Revision tags: php-7.3.30
# ae4c9560 30-Jul-2021 Alex Dowad

Add more tests for ISO-2022-JP-KDDI text conversion

# 57a81af0 18-Aug-2021 Alex Dowad

ISO-2022-JP-KDDI text conversion doesn't swallow PUA codepoints

There was a bit of legacy code here which looks like the original author
of mbstring intended to allow conversion of Unico

ISO-2022-JP-KDDI text conversion doesn't swallow PUA codepoints

There was a bit of legacy code here which looks like the original author
of mbstring intended to allow conversion of Unicode Private Use Area
codepoints to ISO-2022-JP-KDDI. However, that code never worked.
It set the output variable to values which were not matched by any
of the 'if' clauses below, which meant that nothing was actually
emitted to the output. In other words, if one tried to convert Unicode
to ISO-2022-JP-KDDI, and the Unicode string contained PUA codepoints,
they would be quietly 'swallowed' and disappear.

I don't know what ISO-2022-JP-KDDI byte sequences the author wanted
to map those PUA codepoints to, and anyways, this use case is so obscure
that there is little point in worrying about it. However, it is better
to remove the non-functioning code than to leave it in.

This means that if now one tries to convert PUA codepoints to
ISO-2022-JP-KDDI, those codepoints will be treated as erroneous rather
than silently ignored.

show more ...

# 51b9d7a5 27-Jul-2021 Alex Dowad

Test behavior of 'long' illegal character markers

After mb_substitute_character("long"), mbstring will respond to
erroneous input by inserting 'long' error markers into the output.
D

Test behavior of 'long' illegal character markers

After mb_substitute_character("long"), mbstring will respond to
erroneous input by inserting 'long' error markers into the output.
Depending on the situation, these error markers will either look like
BAD+XXXX (for general bad input), U+XXXX (when the input is OK, but it
converts to Unicode codepoints which cannot be represented in the
output encoding), or an encoding-specific marker like JISX+XXXX or
W932+XXXX.

We have almost no tests for this feature. Add a bunch of tests to
ensure that all our legacy encoding handlers work in a reasonable
way when 'long' error markers are enabled.

show more ...

Revision tags: php-7.3.29
# 39131219 11-Jun-2021 Nikita Popov

Migrate more SKIPIF -> EXTENSIONS (#7139)

This is a mix of more automated and manual migration. It should remove all applicable extension_loaded() checks outside of skipif.inc files.

Revision tags: php-7.3.28, php-7.3.27, php-7.3.26, php-7.3.26RC1, php-7.3.25, php-7.3.25RC1, php-7.3.24
# 570e89a9 23-Oct-2020 Alex Dowad

Fix mbstring support for ISO-2022-JP-KDDI encoding

- Treat it as an error if a multi-byte character or escape sequence is truncated
- When converting other encodings to ISO-2022-JP-KDDI,

Fix mbstring support for ISO-2022-JP-KDDI encoding

- Treat it as an error if a multi-byte character or escape sequence is truncated
- When converting other encodings to ISO-2022-JP-KDDI, don't swallow trailing
hash characters or digits
- Don't allow 'control' characters to appear in the middle of a multi-byte char

Note: I was not able to find any kind of official or even semi-official
specification for this legacy encoding. Therefore, the test suite for
ISO-2022-JP-KDDI is based largely on the behavior of the existing code.

Verifying the correctness of program code in this way is very questionable.
In a sense, all you are proving is that the code "does what it does". However,
the test suite will still expose any unintended _changes_ to behavior.

show more ...