History log of /PHP-8.2/ext/mbstring/tests/sjis_mobile_encodings.phpt (Results 1 – 9 of 9)
Revision Date Author Comments
# bfccdbd8 08-Aug-2022 Alex Dowad

SJIS-Mobile#SOFTBANK string can end immediately after special escape sequence

SJIS-Mobile#SOFTBANK text encoding supports special escape sequences,
which shift the decoder into a mode wh

SJIS-Mobile#SOFTBANK string can end immediately after special escape sequence

SJIS-Mobile#SOFTBANK text encoding supports special escape sequences,
which shift the decoder into a mode where each single byte represents
an emoji. To get out of this mode, a 0xF (SHIFT OUT) byte can be
used.

After one of these special escape sequences, the new conversion
code expected to see at least one more byte. However, there doesn't
seem to be any particular reason why it should be treated as an
error condition if a string ends abruptly after one of these
escapes. Well, the escape sequence is useless in that case, but
it is a complete and valid escape sequence.

The legacy conversion code did allow a string to end immediately
after one of these escape sequences. Amend the new code to allow
the same.

show more ...


# 7559bf77 24-Jun-2022 Alex Dowad

Fix new conversion filters for mobile SJIS variants ('0' at end of buffer)

Previously, I had adjusted this code so that if a character which could
be part of a special Docomo/Softbank/KD

Fix new conversion filters for mobile SJIS variants ('0' at end of buffer)

Previously, I had adjusted this code so that if a character which could
be part of a special Docomo/Softbank/KDDI 'keypad' emoji appeared at
the end of one buffer, and the 'keypad' character appeared at the
beginning of the next, they would still be combined. However, this
broke the handling of such a character appearing at the end of one
buffer, and a character which is NOT 'keypad' appearing at the
beginning of the next.

This was found while fuzzing the new implementation of
mb_decode_numericentity.

show more ...


# 67d83f57 23-Jan-2022 Alex Dowad

Implement fast text conversion interface for mobile SJIS variants


# bf940a13 03-Sep-2021 Alex Dowad

Add another test for SJIS-Mobile text conversion


# 776296e1 30-Aug-2021 Alex Dowad

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like "BAD+XXXX", where "XXXX"
would be the erroneous bytes expressed in hexadecimal. This mode could
be enabled by calling `mb_substitute_character("long")`.

However, accurately reproducing input byte sequences from the cached
state of a conversion filter is often tricky, and this significantly
complicates the implementation. Further, the means used for passing
the erroneous bytes through to where the "BAD+XXXX" text is generated
only allows for up to 3 bytes to be passed, meaning that some erroneous
byte sequences are truncated anyways.

More to the point, a search of publically available PHP code indicates
that nobody is really using this feature anyways.

Incidentally, this feature also provided error output like "JIS+XXXX"
if the input 'should have' represented a JISX 0208 codepoint, but it
decodes to a codepoint which does not exist in the JISX 0208 charset.
Similarly, specific error output was provided for non-existent
JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few
other charsets. All of that is now consigned to the flames.

However, "long" error markers also include a somewhat more useful
"U+XXXX" marker for Unicode codepoints which were successfully
decoded from the input text, but cannot be represented in the output
encoding. Those are still supported.

With this change, there is no need to use a variety of special values
in the high bits of a wchar to represent different types of error
values. We can (and will) just use a single error value. This will be
equal to -1.

One complicating factor: Text conversion functions return an integer to
indicate whether the conversion operation should be immediately
aborted, and the magic 'abort' marker is -1. Also, almost all of these
functions would return the received byte/codepoint to indicate success.
That doesn't work with the new error value; if an input filter detects
an error and passes -1 to the output filter, and the output filter
returns it back, that would be taken to mean 'abort'.

Therefore, amend all these functions to return 0 for success.

show more ...


# e6f1a722 08-Aug-2021 Alex Dowad

Add test suite for mobile variants of UTF-8 (and fix bugs)


# 51b9d7a5 27-Jul-2021 Alex Dowad

Test behavior of 'long' illegal character markers

After mb_substitute_character("long"), mbstring will respond to
erroneous input by inserting 'long' error markers into the output.
D

Test behavior of 'long' illegal character markers

After mb_substitute_character("long"), mbstring will respond to
erroneous input by inserting 'long' error markers into the output.
Depending on the situation, these error markers will either look like
BAD+XXXX (for general bad input), U+XXXX (when the input is OK, but it
converts to Unicode codepoints which cannot be represented in the
output encoding), or an encoding-specific marker like JISX+XXXX or
W932+XXXX.

We have almost no tests for this feature. Add a bunch of tests to
ensure that all our legacy encoding handlers work in a reasonable
way when 'long' error markers are enabled.

show more ...


# 39131219 11-Jun-2021 Nikita Popov

Migrate more SKIPIF -> EXTENSIONS (#7139)

This is a mix of more automated and manual migration. It should remove all applicable extension_loaded() checks outside of skipif.inc files.


# beef5971 20-Oct-2020 Alex Dowad

Fix mbstring support for SJIS-Mobile (DoCoMo, KDDI, and Softbank variants of Shift-JIS)

Lots of problems here.

- Don't pass 'control' characters through silently in the middle of a

Fix mbstring support for SJIS-Mobile (DoCoMo, KDDI, and Softbank variants of Shift-JIS)

Lots of problems here.

- Don't pass 'control' characters through silently in the middle of a
multi-byte character.
- Treat it as an error if a multi-byte character is truncated.
- For ESC sequences used to encode emoji on earlier Softbank phones, if an
invalid ESC sequence is found, don't pass it through. Rather, handle it as
an error and respect `mb_substitute_character`.
- In ranges used by mobile vendors for emoji, if a certain byte sequence
doesn't map to any emoji, don't emit a mangled value (actually a raw
(ku*94)+ten value, which may not even be a valid Unicode codepoint at all).
- When converting Unicode to SJIS-Mobile, don't mangle codepoints which fall
in the 2nd range of MicroSoft vendor extensions.

Some vendor-specific emoji have been mapped to standard Unicode codepoints
now, rather than 'private use area' codepoints. When the legacy code was
written, these codepoints may not have existed yet in the Unicode standard
which was current at that time.

Also do a major code cleanup -- remove dead code, rearrange what is left,
use some new macros and helper functions to make the code clearer...

show more ...