#
6fc8d014 |
| 21-Mar-2023 |
pakutoma |
Fix phpGH-10648: add check function pointer into mbfl_encoding Previously, mbstring used the same logic for encoding validation as for encoding conversion. However, there are ca
Fix phpGH-10648: add check function pointer into mbfl_encoding Previously, mbstring used the same logic for encoding validation as for encoding conversion. However, there are cases where we want to use different logic for validation and conversion. For example, if a string ends up with missing input required by the encoding, or if a character is input that is invalid as an encoding but can be converted, the conversion should succeed and the validation should fail. To achieve this, a function pointer mb_check_fn has been added to struct mbfl_encoding to implement the logic used for validation. Also, added implementation of validation logic for UTF-7, UTF7-IMAP, ISO-2022-JP and JIS.
show more ...
|
#
a6186823 |
| 19-Nov-2022 |
Alex Dowad |
For UTF-7, flag unnecessary extra trailing byte in Base64 section as error This bug was found when I was fuzzing a patch related to mb_strpos. In some cases, the legacy text conversion c
For UTF-7, flag unnecessary extra trailing byte in Base64 section as error This bug was found when I was fuzzing a patch related to mb_strpos. In some cases, the legacy text conversion code for UTF-7 (and UTF7-IMAP) would correctly recognize an error for a Base64-encoded section which was not correctly padded with zero bits, but the new (and faster) text conversion code would not. Specifically, if the input string ended abruptly after the 4th or 7th byte of a Base64-encoded section, the new conversion code would confirm that the trailing padding bits from the previous byte (3rd or 6th) were zeroes, but would not check whether the 4th or 7th byte itself encoded any non-zero bits. The legacy conversion code did perform this check and would treat the input string as invalid. Actually, even if the 4th or 7th byte does encode only (padding) zero bits, this is still a problem, because there is no reason to have a 4th (or 7th) byte in that case. The UTF-7 string should have ended on the previous byte instead. Apply the same fix for both UTF-7 and UTF7-IMAP.
show more ...
|
#
d3933e0b |
| 14-Nov-2022 |
Alex Dowad |
Fix regression test for GH-9535 on PHP-8.2+ Some of the legacy text encodings which were used in this regression test are deprecated in PHP-8.2+. The deprecation warnings break the e
Fix regression test for GH-9535 on PHP-8.2+ Some of the legacy text encodings which were used in this regression test are deprecated in PHP-8.2+. The deprecation warnings break the expected output. Since using these encodings in mbstring is now deprecated, I think there is little point in keeping them in this test. So they are now removed from it. Further, in 219fff376b, I made a change to avoid a situation where the legacy UTF7-IMAP conversion code gets stuck in a wrong state when its attempt to emit a character fails. When a Base64-encoded section of input ended with -, the previous code would FIRST emit a character if necessary (using the CK or "check" macro, which causes the function to return immediately if the downstream filter function returns an error code), and THEN update its own state to indicate that it is now in ASCII rather than Base64 mode. If the downstream filter function returned an error code, the CK macro would then cause the UTF7-IMAP filter function to return immediately WITHOUT setting its own state to indicate that the Base64-encoded section was done. I fixed this by updating the filter state as needed BEFORE calling CK... but I missed updating the filter state in the case where the Base64 section ends normally and there is no need to emit anything. Again, in 6d525a425e, I modified the legacy conversion code for ISO-2022-KR to try to comply more closely with the RFC for this text encoding. The RFC states that before any occurrence of 'Shift In' or 'Shift Out' codes in a ISO-2022-KR string, a special escape sequence must appear at least ONCE, at the beginning of a line. The previous code did not comply with this requirement. I made it comply by always emitting this escape sequence at the beginning of the first line. Since mb_strcut (wrongly) determines when it has consumed enough of the input string by looking at the length of its output in bytes, this extra escape sequence makes mb_strcut consume 4 bytes less of an ISO-2022-KR string than would otherwise be the case. When this strange behavior of mb_strcut is fixed, this test will have to be adjusted to restore the previous expected outputs for ISO-2022-KR.
show more ...
|
#
f3c8efd7 |
| 04-Aug-2022 |
Alex Dowad |
In legacy text conversion filters, reset filter state in 'flush' function Up until now, I believed that mbstring had been designed such that (legacy) text conversion filter objects shoul
In legacy text conversion filters, reset filter state in 'flush' function Up until now, I believed that mbstring had been designed such that (legacy) text conversion filter objects should not be re-used after the 'flush' function is called to complete a text conversion operation. However, it turns out that the implementation of _php_mb_encoding_handler_ex DID re-use filter objects after flush. That means that functions which were based on _php_mb_encoding_handler_ex, including mb_parse_str and php_mb_post_handler, would break in some cases; state left over from converting one substring (perhaps a variable name) would affect the results of converting another substring (perhaps the value of the same variable), and could cause extraneous characters to get inserted into the output. All this code should be deleted soon, but fixing it helps me to avoid spurious failures when fuzzing the new/old code to look for differences in behavior.
show more ...
|
#
219fff37 |
| 25-Jul-2022 |
Alex Dowad |
Fix legacy text conversion filter for UTF7-IMAP Make necessary updates to filter state before using CK macro. |
#
8533fccd |
| 06-Jun-2022 |
Alex Dowad |
Assert minimum size of wchar buffer in text conversion filters In all text conversion filters which require the wchar buffer used for output to have some minimum size, it's better to inc
Assert minimum size of wchar buffer in text conversion filters In all text conversion filters which require the wchar buffer used for output to have some minimum size, it's better to include an assertion; this will help us to catch bugs, and will also help future readers to understand what we expect of the function arguments. For UTF-7 and UTF7-IMAP, these assertions were already there, but I have added comments explaining why the minimum size is what it is.
show more ...
|
#
0d635d93 |
| 21-Jan-2022 |
Alex Dowad |
Implement fast text conversion interface for UTF7-IMAP The old code would convert a 0x00 byte in the input to 0x00 in the output, but this clearly violates the RFC which defines UTF7-IMA
Implement fast text conversion interface for UTF7-IMAP The old code would convert a 0x00 byte in the input to 0x00 in the output, but this clearly violates the RFC which defines UTF7-IMAP.
show more ...
|
#
3c732251 |
| 21-Jul-2021 |
Alex Dowad |
New internal interface for fast text conversion in mbstring When converting text to/from wchars, mbstring makes one function call for each and every byte or wchar to be converted. Typica
New internal interface for fast text conversion in mbstring When converting text to/from wchars, mbstring makes one function call for each and every byte or wchar to be converted. Typically, each of these conversion functions contains a state machine, and its state has to be restored and then saved for every single one of these calls. It doesn't take much to see that this is grossly inefficient. Instead of converting one byte or wchar on each call, the new conversion functions will either fill up or drain a whole buffer of wchars on each call. In benchmarks, this is about 3-10× faster. Adding the new, faster conversion functions for all supported legacy text encodings still needs some work. Also, all the code which uses the old-style conversion functions needs to be converted to use the new ones. After that, the old code can be dropped. (The mailparse extension will also have to be fixed up so it will still compile.)
show more ...
|
#
626f0fec |
| 30-Jul-2021 |
Alex Dowad |
Remove some dead code from mbstring mbstring has a great deal of dead code. Some common types are: - Default switch clauses which will never be taken - If clauses intended to co
Remove some dead code from mbstring mbstring has a great deal of dead code. Some common types are: - Default switch clauses which will never be taken - If clauses intended to convert codepoints which were not present in a conversion table... but the codepoint in question *is* in the table, so the if clause is not needed. - Bounds checks in places where it is not possible for a value to ever be out of bounds. - Checks to see if an unmatched Unicode codepoint is in CP932 extension range 3... but every codepoint in range 3 is also in range 2, so no codepoint will ever be matched and converted by that code.
show more ...
|
#
16a1e0a2 |
| 27-Aug-2021 |
Alex Dowad |
In UTF7-IMAP, reject the 2nd part of surrogate pair if it appears unexpectedly |
#
776296e1 |
| 30-Aug-2021 |
Alex Dowad |
mbstring no longer provides 'long' substitutions for erroneous input bytes Previously, mbstring had a special mode whereby it would convert erroneous input byte sequences to output like
mbstring no longer provides 'long' substitutions for erroneous input bytes Previously, mbstring had a special mode whereby it would convert erroneous input byte sequences to output like "BAD+XXXX", where "XXXX" would be the erroneous bytes expressed in hexadecimal. This mode could be enabled by calling `mb_substitute_character("long")`. However, accurately reproducing input byte sequences from the cached state of a conversion filter is often tricky, and this significantly complicates the implementation. Further, the means used for passing the erroneous bytes through to where the "BAD+XXXX" text is generated only allows for up to 3 bytes to be passed, meaning that some erroneous byte sequences are truncated anyways. More to the point, a search of publically available PHP code indicates that nobody is really using this feature anyways. Incidentally, this feature also provided error output like "JIS+XXXX" if the input 'should have' represented a JISX 0208 codepoint, but it decodes to a codepoint which does not exist in the JISX 0208 charset. Similarly, specific error output was provided for non-existent JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few other charsets. All of that is now consigned to the flames. However, "long" error markers also include a somewhat more useful "U+XXXX" marker for Unicode codepoints which were successfully decoded from the input text, but cannot be represented in the output encoding. Those are still supported. With this change, there is no need to use a variety of special values in the high bits of a wchar to represent different types of error values. We can (and will) just use a single error value. This will be equal to -1. One complicating factor: Text conversion functions return an integer to indicate whether the conversion operation should be immediately aborted, and the magic 'abort' marker is -1. Also, almost all of these functions would return the received byte/codepoint to indicate success. That doesn't work with the new error value; if an input filter detects an error and passes -1 to the output filter, and the output filter returns it back, that would be taken to mean 'abort'. Therefore, amend all these functions to return 0 for success.
show more ...
|
#
78dc160e |
| 04-Apr-2021 |
Alex Dowad |
Catch and handle errors in mUTF-7 (IMAP) conversion |
#
cef4b94e |
| 04-Apr-2021 |
Alex Dowad |
Code cleanup in mbfilter_utf7imap.c |
#
a06c20a1 |
| 18-Oct-2020 |
Alex Dowad |
Remove useless constant MBFL_ENCTYPE_MBCS This flag indicated that an encoding was 'multi-byte'; it can use a variable number of bytes to encode each character. As it turns out, we don't
Remove useless constant MBFL_ENCTYPE_MBCS This flag indicated that an encoding was 'multi-byte'; it can use a variable number of bytes to encode each character. As it turns out, we don't actually need to check this flag anywhere, so it's better to remove it.
show more ...
|
#
7bb5b435 |
| 11-Sep-2020 |
Alex Dowad |
mUTF-7 (UTF7-IMAP) conversion: handle illegal (non-RFC-compliant) input correctly Instead of looking the other way and letting things slide, report errors when the input does not follow
mUTF-7 (UTF7-IMAP) conversion: handle illegal (non-RFC-compliant) input correctly Instead of looking the other way and letting things slide, report errors when the input does not follow the RFC.
show more ...
|
#
b43a7dea |
| 12-Sep-2020 |
Alex Dowad |
Add 'mUTF-7' alias for UTF7-IMAP encoding |
#
b9758172 |
| 10-Sep-2020 |
Alex Dowad |
Add comment explaining mUTF-7 to mbfilter_utf7imap.c |
#
dcd6c604 |
| 16-Jul-2020 |
Alex Dowad |
Remove unneeded function mbfl_filt_conv_common_dtor This is a default destructor for mbfl_convert_filter structs. The thing is: there isn't really anything that needs to be done to those
Remove unneeded function mbfl_filt_conv_common_dtor This is a default destructor for mbfl_convert_filter structs. The thing is: there isn't really anything that needs to be done to those structs before freeing them. The default destructor just zeroed out some fields, but there's no reason why we should actually do that.
show more ...
|
#
cdc66404 |
| 11-Jul-2020 |
Alex Dowad |
Comment constants in mbfl_consts.h, remove unused ones These were unused, and almost certainly will never be used: - MBFL_ENCTYPE_MWC4BE - MBFL_ENCTYPE_MWC4LE - MBFL_ENCTYPE
Comment constants in mbfl_consts.h, remove unused ones These were unused, and almost certainly will never be used: - MBFL_ENCTYPE_MWC4BE - MBFL_ENCTYPE_MWC4LE - MBFL_ENCTYPE_SHFTCODE - MBFL_ENCTYPE_ENC_STRM For the latter two, there were some encodings which were marked with these flags; but nothing ever _checked_ these particular flags.
show more ...
|
#
62317d59 |
| 04-Jul-2020 |
Alex Dowad |
Remove redundant includes from mbstring (and make sure correct config.h is used) Very interesting... it turns out that when Valgrind support was enabled, `#include "config.h"` from withi
Remove redundant includes from mbstring (and make sure correct config.h is used) Very interesting... it turns out that when Valgrind support was enabled, `#include "config.h"` from within mbstring was actually including the file "config.h" from Valgrind, and not the one from mbstring!! This is because -I/usr/include/valgrind was added to the compiler invocation _before_ -Iext/mbstring/libmbfl. Make sure we actually include the file which was intended.
show more ...
|
#
363d87f2 |
| 21-Feb-2020 |
George Peter Banyard |
Fix [-Wmissing-field-initializers] compiler warning in mbstring Add missing NULL pointer for mbfl_convert_vtbl struct. |
Revision tags: php-7.3.13RC1, php-7.2.26RC1, php-7.4.0, php-7.2.25, php-7.3.12, php-7.4.0RC6, php-7.3.12RC1, php-7.2.25RC1, php-7.4.0RC5, php-7.1.33, php-7.2.24, php-7.3.11, php-7.4.0RC4, php-7.3.11RC1, php-7.2.24RC1, php-7.4.0RC3, php-7.2.23, php-7.3.10, php-7.4.0RC2, php-7.2.23RC1, php-7.3.10RC1, php-7.4.0RC1, php-7.1.32, php-7.2.22, php-7.3.9, php-7.4.0beta4, php-7.2.22RC1, php-7.3.9RC1 |
|
#
7b152990 |
| 09-Aug-2019 |
Nikita Popov |
Don't short-circuit MBFL_OUTPUTFILTER_ILLEGAL_MODE_NONE Make sure we always go through mbfl_filt_conv_illegal_output(), so that the number of illegal characters gets counted. |
Revision tags: php-7.4.0beta2, php-7.1.31, php-7.2.21, php-7.3.8, php-7.4.0beta1, php-7.2.21RC1, php-7.3.8RC1, php-7.4.0alpha3, php-7.3.7, php-7.2.20, php-7.4.0alpha2, php-7.3.7RC3, php-7.3.7RC2, php-7.2.20RC2, php-7.4.0alpha1, php-7.3.7RC1, php-7.2.20RC1, php-7.2.19, php-7.3.6, php-7.1.30, php-7.2.19RC1, php-7.3.6RC1, php-7.1.29, php-7.2.18, php-7.3.5, php-7.2.18RC1, php-7.3.5RC1, php-7.2.17, php-7.3.4, php-7.1.28, php-7.3.4RC1, php-7.2.17RC1, php-7.1.27, php-7.3.3, php-7.2.16, php-7.3.3RC1, php-7.2.16RC1, php-7.2.15, php-7.3.2, php-7.2.15RC1, php-7.3.2RC1, php-5.6.40, php-7.1.26, php-7.3.1, php-7.2.14, php-7.2.14RC1, php-7.3.1RC1, php-5.6.39, php-7.1.25, php-7.2.13, php-7.0.33, php-7.3.0, php-7.1.25RC1, php-7.2.13RC1, php-7.3.0RC6, php-7.1.24, php-7.2.12, php-7.3.0RC5, php-7.1.24RC1, php-7.2.12RC1, php-7.3.0RC4 |
|
#
1ad08256 |
| 14-Oct-2018 |
Peter Kokot |
Sync leading and final newlines in source code files This patch adds missing newlines, trims multiple redundant final newlines into a single one, and trims redundant leading newlines.
Sync leading and final newlines in source code files This patch adds missing newlines, trims multiple redundant final newlines into a single one, and trims redundant leading newlines. According to POSIX, a line is a sequence of zero or more non-' <newline>' characters plus a terminating '<newline>' character. [1] Files should normally have at least one final newline character. C89 [2] and later standards [3] mention a final newline: "A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character." Although it is not mandatory for all files to have a final newline fixed, a more consistent and homogeneous approach brings less of commit differences issues and a better development experience in certain text editors and IDEs. [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206 [2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2 [3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
show more ...
|
Revision tags: php-7.1.23, php-7.2.11, php-7.3.0RC3 |
|
#
26f82a77 |
| 02-Oct-2018 |
Nikita Popov |
Fixed bug #76958 |
Revision tags: php-7.1.23RC1, php-7.2.11RC1, php-7.3.0RC2, php-5.6.38, php-7.1.22, php-7.3.0RC1, php-7.2.10, php-7.0.32, php-7.1.22RC1, php-7.3.0beta3, php-7.2.10RC1, php-7.1.21, php-7.2.9, php-7.3.0beta2, php-7.1.21RC1, php-7.3.0beta1, php-7.2.9RC1, php-5.6.37, php-7.1.20, php-7.3.0alpha4, php-7.0.31, php-7.2.8, php-7.1.20RC1, php-7.2.8RC1, php-7.3.0alpha3, php-7.3.0alpha2, php-7.1.19, php-7.2.7, php-7.1.19RC1, php-7.3.0alpha1, php-7.2.7RC1, php-7.1.18, php-7.2.6, php-7.2.6RC1, php-7.1.18RC1, php-5.6.36, php-7.2.5, php-7.1.17, php-7.0.30, php-7.1.17RC1, php-7.2.5RC1, php-5.6.35, php-7.0.29, php-7.2.4, php-7.1.16, php-7.1.16RC1, php-7.2.4RC1, php-7.1.15, php-5.6.34, php-7.2.3, php-7.0.28, php-7.2.3RC1, php-7.1.15RC1, php-7.1.14, php-7.2.2, php-7.1.14RC1, php-7.2.2RC1, php-7.1.13, php-5.6.33, php-7.2.1, php-7.0.27, php-7.2.1RC1, php-7.1.13RC1, php-7.0.27RC1, php-7.2.0, php-7.1.12, php-7.0.26, php-7.1.12RC1, php-7.2.0RC6, php-7.0.26RC1, php-7.1.11, php-5.6.32, php-7.2.0RC5, php-7.0.25, php-7.1.11RC1, php-7.2.0RC4, php-7.0.25RC1, php-7.1.10, php-7.2.0RC3, php-7.0.24, php-7.2.0RC2, php-7.1.10RC1, php-7.0.24RC1, php-7.1.9, php-7.2.0RC1, php-7.0.23, php-7.1.9RC1, php-7.2.0beta3, php-7.0.23RC1, php-7.1.8, php-7.2.0beta2, php-7.0.22, php-7.1.8RC1, php-7.2.0beta1, php-7.0.22RC1, php-5.6.31, php-7.0.21, php-7.1.7, php-7.2.0alpha3, php-7.1.7RC1, php-7.0.21RC1, php-7.2.0alpha2, php-7.1.6, php-7.2.0alpha1, php-7.0.20, php-7.1.6RC1, php-7.0.20RC1, php-7.1.5, php-7.0.19, php-7.0.19RC1, php-7.1.5RC1, php-7.1.4, php-7.0.18, php-7.1.4RC1, php-7.0.18RC1, php-7.1.3, php-7.0.17, php-7.1.3RC1, php-7.0.17RC1, php-7.1.2, php-7.0.16, php-7.0.16RC1, php-7.1.2RC1, php-5.6.30, php-7.0.15, php-5.6.30RC1, php-7.1.1RC1, php-7.0.15RC1, php-7.1.1, php-5.6.29, php-7.0.14, php-7.1.0, php-5.6.29RC1, php-7.0.14RC1, php-7.1.0RC6, php-5.6.28, php-7.0.13, php-5.6.28RC1, php-7.1.0RC5, php-7.0.13RC1, php-7.1.0RC4, php-5.6.27, php-7.0.12, php-7.1.0RC3, php-5.6.27RC1, php-7.0.12RC1, php-5.6.26, php-7.1.0RC2, php-7.0.11, php-5.6.26RC1, php-7.1.0RC1, php-7.0.11RC1, php-7.1.0beta3, php-5.6.25, php-7.0.10, php-7.1.0beta2, php-5.6.25RC1, php-7.0.10RC1, php-7.1.0beta1, php-5.6.24, php-7.0.9, php-5.5.38, php-5.6.24RC1, php-7.1.0alpha3, php-7.0.9RC1, php-7.1.0alpha2, php-7.0.8, php-5.6.23, php-5.5.37, php-5.6.23RC1, php-7.0.8RC1, php-7.1.0alpha1, php-5.6.22, php-5.5.36, php-7.0.7, php-5.6.22RC1, php-7.0.7RC1, php-7.0.6, php-5.6.21, php-5.5.35, php-5.6.21RC1, php-7.0.6RC1, php-5.6.20, php-5.5.34, php-7.0.5, php-5.6.20RC1, php-7.0.5RC1, php-5.6.19, php-5.5.33, php-7.0.4, php-5.6.19RC1, php-7.0.4RC1, php-5.6.18, php-7.0.3, php-5.5.32, php-5.6.18RC1, php-7.0.3RC1, php-5.6.17, php-5.5.31, php-7.0.2, php-7.0.2RC1, php-5.6.17RC1, php-7.0.1RC1, php-7.0.0, php-5.6.16, php-7.0.0RC8, php-7.0.0RC7, php-5.6.16RC1, php-5.6.15, php-7.0.0RC6, php-7.0.1, php-5.6.15RC1, php-7.0.0RC5, php-5.5.30, php-5.6.14, php-7.0.0RC4, php-5.6.14RC1, php-7.0.0RC3, php-5.6.13, php-7.0.0RC2, php-5.5.29, php-5.4.45, php-5.6.13RC1, php-7.0.0RC1, php-5.6.12, php-5.5.28, php-7.0.0beta3, php-5.4.44, php-5.6.12RC1, php-7.0.0beta2, php-7.0.0beta1, php-5.6.11, php-5.5.27, php-5.4.43, php-5.6.11RC1, php-5.5.27RC1, php-7.0.0alpha2, php-5.5.26, php-7.0.0alpha1, php-5.6.10, php-5.4.42, POST_PHP7_NSAPI_REMOVAL, PRE_PHP7_NSAPI_REMOVAL, php-5.6.10RC1, php-5.5.26RC1, php-5.5.25, php-5.6.9, php-5.4.41, php-5.6.9RC1, php-5.5.25RC1, php-5.6.8, php-5.5.24, php-5.4.40, php-5.6.8RC1, php-5.5.24RC1, php-5.6.7, php-5.5.23, php-5.4.39, php-5.6.7RC1, php-5.5.23RC1, POST_PHP7_EREG_MYSQL_REMOVALS, PRE_PHP7_EREG_MYSQL_REMOVALS, php-5.6.6, php-5.5.22, php-5.4.38, POST_PHP7_REMOVALS, PRE_PHP7_REMOVALS, php-5.6.6RC1, php-5.5.22RC1, php-5.5.21, php-5.6.5, php-5.4.37, php-5.5.21RC1, php-5.6.5RC1 |
|
#
b7a7b1a6 |
| 03-Jan-2015 |
Stanislav Malyshev |
trailing whitespace removal |