History log of /php-src/ext/mbstring/tests/gh10192_utf7.phpt (Results 1 – 3 of 3)
Revision Date Author Comments
# 7914b8ce 11-May-2023 Alex Dowad

Use pakutoma's encoding check functions for mb_detect_encoding even in non-strict mode

In 6fc8d014df, pakutoma added specialized validity checking functions
for some legacy text encoding

Use pakutoma's encoding check functions for mb_detect_encoding even in non-strict mode

In 6fc8d014df, pakutoma added specialized validity checking functions
for some legacy text encodings like ISO-2022-JP and UTF-7. These
check functions perform a more strict validity check than the encoding
conversion functions for the same text encodings. For example, the
check function for ISO-2022-JP verifies that the string ends in the
correct state required by the specification for ISO-2022-JP.

These check functions are already being used to make detection of text
encoding more accurate when 'strict' detection mode is enabled.

However, since the default is 'non-strict' detection (a bad API design
but we're stuck with it now), most users will not benefit from
pakutoma's work. I was previously reluctant to enable this new logic
for non-strict detection mode. My intention was to reduce the scope of
behavior changes, since almost *any* behavior change may affect *some*
user in a way we don't expect.

However, we definitely have users whose (production) code was broken
by the changes I made in 28b346bc06, and enabling pakutoma's check
functions for non-strict detection mode would un-break it. (See
GH-10192 as an example.) The added checks do also make sense.

In non-strict detection mode, we will not immediately reject candidate
encodings whose validity check function returns false; but they will
be much less likely to be selected. However, failure of the validity
check function is weighted less heavily than an encoding error detected
by the encoding conversion function.

show more ...


# b721d0f7 10-Mar-2023 pakutoma

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are ca

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are cases where we want to use different logic for validation
and conversion. For example, if a string ends up with missing input
required by the encoding, or if a character is input that is invalid
as an encoding but can be converted, the conversion should succeed and
the validation should fail.

To achieve this, a function pointer mb_check_fn has been added to
struct mbfl_encoding to implement the logic used for validation.
Also, added implementation of validation logic for UTF-7, UTF7-IMAP,
ISO-2022-JP and JIS.

(The same change has already been made to PHP 8.2 and 8.3; see
6fc8d014df. This commit is backporting the change to PHP 8.1.)

show more ...


# 6fc8d014 21-Mar-2023 pakutoma

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are ca

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are cases where we want to use different logic for validation
and conversion. For example, if a string ends up with missing input
required by the encoding, or if a character is input that is invalid
as an encoding but can be converted, the conversion should succeed and
the validation should fail.

To achieve this, a function pointer mb_check_fn has been added to
struct mbfl_encoding to implement the logic used for validation.
Also, added implementation of validation logic for UTF-7, UTF7-IMAP,
ISO-2022-JP and JIS.

show more ...