other_encodings.phpt - OpenGrok history log for /php-src/ext/mbstring/tests/other

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
Revision tags: php-8.2.0RC1, php-8.1.10, php-8.0.23, php-8.0.23RC1, php-8.1.10RC1, php-8.2.0beta3
# 983a29d3	06-Aug-2022	Alex Dowad	Legacy conversion code for '7bit' to '8bit' inserts error markers The use of a special 'vtbl' for converting between '7bit' and '8bit' text meant that '7bit' text would not be converted Legacy conversion code for '7bit' to '8bit' inserts error markers The use of a special 'vtbl' for converting between '7bit' and '8bit' text meant that '7bit' text would not be converted to wchars before going to '8bit'. This meant that the special value MBFL_BAD_INPUT, which we use to flag an erroneous byte sequence in input text (and which is required by functions like mb_check_encoding), would pass directly to the output, instead of being converted to the error marker specified by mb_substitute_character. This issue dates back to the time when I removed the mbfl 'identify filters' and made encoding validity checking and encoding detection rely only on the conversion filters. show more ... /php-src/ext/mbstring/tests/other_encodings.phpt
# a4656895	04-Aug-2022	Alex Dowad	Imitate legacy behavior when converting non-encodings using mbstring Fuzzing revealed that something was missed here when making the new encoding conversion code match the behavior of th Imitate legacy behavior when converting non-encodings using mbstring Fuzzing revealed that something was missed here when making the new encoding conversion code match the behavior of the old code. In the next major release of PHP, support for these non-encodings will be dropped, but in the meantime, it is better to match the legacy behavior. show more ... /php-src/ext/mbstring/tests/other_encodings.phpt
Revision tags: php-8.2.0beta2, php-8.1.9, php-8.0.22, php-8.1.9RC1, php-8.2.0beta1, php-8.0.22RC1, php-8.0.21, php-8.1.8, php-8.2.0alpha3, php-8.1.8RC1, php-8.2.0alpha2, php-8.0.21RC1, php-8.0.20, php-8.1.7, php-8.2.0alpha1, php-7.4.30, php-8.1.7RC1, php-8.0.20RC1, php-8.1.6, php-8.0.19, php-8.1.6RC1, php-8.0.19RC1, php-8.0.18, php-8.1.5, php-7.4.29, php-8.1.5RC1, php-8.0.18RC1, php-8.1.4, php-8.0.17, php-8.1.4RC1, php-8.0.17RC1
# ff76694f	22-Feb-2022	Alex Dowad	Merge branch 'PHP-8.1' * PHP-8.1: mb_check_encoding($str, '7bit') rejects strings with bytes over 0x7F
# 8a8533d2	22-Feb-2022	Alex Dowad	mb_check_encoding($str, '7bit') rejects strings with bytes over 0x7F This was the old behavior of mb_check_encoding() before 3e7acf901d, but yours truly broke it. If only we had more tho mb_check_encoding($str, '7bit') rejects strings with bytes over 0x7F This was the old behavior of mb_check_encoding() before 3e7acf901d, but yours truly broke it. If only we had more thorough tests at that time, this might not have slipped through the cracks. Thanks to divinity76 for the report. show more ... /php-src/ext/mbstring/tests/other_encodings.phpt
Revision tags: php-8.1.3, php-8.0.16, php-7.4.28, php-8.1.3RC1, php-8.0.16RC1, php-8.1.2, php-8.0.15, php-8.1.2RC1, php-8.0.15RC1, php-8.0.14, php-8.1.1, php-7.4.27, php-8.1.1RC1, php-8.0.14RC1, php-7.4.27RC1, php-8.1.0, php-8.0.13, php-7.4.26, php-7.3.33, php-8.1.0RC6, php-7.4.26RC1, php-8.0.13RC1, php-8.1.0RC5, php-7.3.32
# 9962aa97	19-Oct-2021	Alex Dowad	Merge branch 'PHP-8.1' * PHP-8.1: mb_detect_encoding will not return non-encodings Improve detection accuracy of mb_detect_encoding
Revision tags: php-7.4.25, php-8.0.12
# a2bc57e0	18-Oct-2021	Alex Dowad	mb_detect_encoding will not return non-encodings Among the text encodings supported by mbstring are several which are not really 'text encodings'. These include Base64, QPrint, UUencode, mb_detect_encoding will not return non-encodings Among the text encodings supported by mbstring are several which are not really 'text encodings'. These include Base64, QPrint, UUencode, HTML entities, '7 bit', and '8 bit'. Rather than providing an explicit list of text encodings which they are interested in, users may pass the output of mb_list_encodings to mb_detect_encoding. Since Base64, QPrint, and so on are included in the output of mb_list_encodings, mb_detect_encoding can return one of these as its 'detected encoding' (and in fact, this often happens). Before mb_detect_encoding was enhanced so it could detect any of the supported text encodings, this did not happen, and it is never desired. show more ... /php-src/ext/mbstring/tests/other_encodings.phpt
Revision tags: php-8.1.0RC4, php-8.0.12RC1, php-7.4.25RC1, php-8.1.0RC3, php-8.0.11, php-7.4.24, php-7.3.31, php-8.1.0RC2, php-7.4.24RC1, php-8.0.11RC1
# ae71bfde	02-Sep-2021	Alex Dowad	Add more tests for UCS-4 text conversion /php-src/ext/mbstring/tests/other_encodings.phpt
# fd0e0c73	02-Sep-2021	Alex Dowad	Add another test for UCS-2 text conversion /php-src/ext/mbstring/tests/other_encodings.phpt
Revision tags: php-8.1.0RC1
# 776296e1	30-Aug-2021	Alex Dowad	mbstring no longer provides 'long' substitutions for erroneous input bytes Previously, mbstring had a special mode whereby it would convert erroneous input byte sequences to output like mbstring no longer provides 'long' substitutions for erroneous input bytes Previously, mbstring had a special mode whereby it would convert erroneous input byte sequences to output like "BAD+XXXX", where "XXXX" would be the erroneous bytes expressed in hexadecimal. This mode could be enabled by calling `mb_substitute_character("long")`. However, accurately reproducing input byte sequences from the cached state of a conversion filter is often tricky, and this significantly complicates the implementation. Further, the means used for passing the erroneous bytes through to where the "BAD+XXXX" text is generated only allows for up to 3 bytes to be passed, meaning that some erroneous byte sequences are truncated anyways. More to the point, a search of publically available PHP code indicates that nobody is really using this feature anyways. Incidentally, this feature also provided error output like "JIS+XXXX" if the input 'should have' represented a JISX 0208 codepoint, but it decodes to a codepoint which does not exist in the JISX 0208 charset. Similarly, specific error output was provided for non-existent JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few other charsets. All of that is now consigned to the flames. However, "long" error markers also include a somewhat more useful "U+XXXX" marker for Unicode codepoints which were successfully decoded from the input text, but cannot be represented in the output encoding. Those are still supported. With this change, there is no need to use a variety of special values in the high bits of a wchar to represent different types of error values. We can (and will) just use a single error value. This will be equal to -1. One complicating factor: Text conversion functions return an integer to indicate whether the conversion operation should be immediately aborted, and the magic 'abort' marker is -1. Also, almost all of these functions would return the received byte/codepoint to indicate success. That doesn't work with the new error value; if an input filter detects an error and passes -1 to the output filter, and the output filter returns it back, that would be taken to mean 'abort'. Therefore, amend all these functions to return 0 for success. show more ... /php-src/ext/mbstring/tests/other_encodings.phpt
# 7472c82c	24-Aug-2021	Alex Dowad	Add tests for UCS-4 text conversion /php-src/ext/mbstring/tests/other_encodings.phpt
# 79015b23	24-Aug-2021	Alex Dowad	Add tests for UCS-2 text encoding /php-src/ext/mbstring/tests/other_encodings.phpt
Revision tags: php-7.4.23, php-8.0.10, php-7.3.30, php-8.1.0beta3
# 34ef8f3c	12-Aug-2021	Alex Dowad	Add tests for '7bit' and '8bit' text encodings in mbstring /php-src/ext/mbstring/tests/other_encodings.phpt