mbstring.c - OpenGrok history log for /PHP-8.2/ext/mbstring/mbstring.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# d6fc1650	15-Jul-2022	Christoph M. Becker	Drop useless TODO comment Cf. <https://github.com/php/php-src/pull/9018#issuecomment-1185481492>.
# 56137cd2	23-Jun-2022	Máté Kocsis	Declare ext/mbstring constants in stubs (#8798)
# 880803a2	13-May-2022	Alex Dowad	Use fast conversion filters to implement php_mb_ord Even for single-character strings, this is about 50% faster for ASCII, UTF-8, and UTF-16. For long strings, the performance gain is Use fast conversion filters to implement php_mb_ord Even for single-character strings, this is about 50% faster for ASCII, UTF-8, and UTF-16. For long strings, the performance gain is enormous, since the old code would convert the ENTIRE string, just to pick out the first codepoint. show more ...
# 950a7db9	02-May-2022	Alex Dowad	Use fast text conversion filters to implement mb_check_encoding Benchmarking reveals that this is about 8% slower for UTF-8 strings which have a bad codepoint at the very beginning of th Use fast text conversion filters to implement mb_check_encoding Benchmarking reveals that this is about 8% slower for UTF-8 strings which have a bad codepoint at the very beginning of the string. For good strings, or those where the first bad codepoint is much later in the string, it is significantly faster (2-3 times faster in many cases). show more ...
# 2eb2f9d7	02-Jun-2022	Remi Collet	Fix GH-8685 mbstring requires pcre
# 49202116	28-May-2022	Alex Dowad	php_mb_convert_encoding{,_ex} returns zend_string That's what all existing callers want anyways. This avoids 2 unnecessary copies of the converted string.
# 0154a5ac	13-May-2022	Alex Dowad	Use fast text conversion filters to implement php_mb_convert_encoding_ex
# 03816fba	18-Jan-2022	Christoph M. Becker	Fix GH-7902: mb_send_mail may delimit headers with LF only Email headers are supposed to be separated with CRLF. Period. We introduce a `CRLF` macro for better comprehensibility rig Fix GH-7902: mb_send_mail may delimit headers with LF only Email headers are supposed to be separated with CRLF. Period. We introduce a `CRLF` macro for better comprehensibility right away. Closes GH-7907. show more ...
# 3c732251	21-Jul-2021	Alex Dowad	New internal interface for fast text conversion in mbstring When converting text to/from wchars, mbstring makes one function call for each and every byte or wchar to be converted. Typica New internal interface for fast text conversion in mbstring When converting text to/from wchars, mbstring makes one function call for each and every byte or wchar to be converted. Typically, each of these conversion functions contains a state machine, and its state has to be restored and then saved for every single one of these calls. It doesn't take much to see that this is grossly inefficient. Instead of converting one byte or wchar on each call, the new conversion functions will either fill up or drain a whole buffer of wchars on each call. In benchmarks, this is about 3-10× faster. Adding the new, faster conversion functions for all supported legacy text encodings still needs some work. Also, all the code which uses the old-style conversion functions needs to be converted to use the new ones. After that, the old code can be dropped. (The mailparse extension will also have to be fixed up so it will still compile.) show more ...
# f07c1935	05-Dec-2021	Alex Dowad	mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint In a2bc57e0e5, mb_detect_encoding was modified to ensure it would never return 'UUENCODE', 'QPrint', or mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint In a2bc57e0e5, mb_detect_encoding was modified to ensure it would never return 'UUENCODE', 'QPrint', or other non-encodings as the "detected text encoding". Before mb_detect_encoding was enhanced so that it could detect any supported text encoding, those were never returned, and they are not desired. Actually, we want to eventually remove them completely from mbstring, since PHP already contains other implementations of UUEncode, QPrint, Base64, and HTML entities. For more clarity on why we need to suppress UUEncode, etc. from being detected by mb_detect_encoding, the existing UUEncode implementation in mbstring never treats any input as erroneous. It just accepts everything. This means that it would always be treated as a valid choice by mb_detect_encoding, and would be returned in many, many cases where the input is obviously not UUEncoded. It turns out that the form of mb_convert_encoding where the user passes multiple candidate encodings (and mbstring auto-detects which one to use) was also affected by the same issue. Apply the same fix. show more ...
# 1a4f49f1	10-Nov-2021	Dmitry Stogov	Use cheaper memchr() instead of php_memnstr()
# 9308974f	18-Oct-2021	Alex Dowad	Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc The purpose of mbstring is for working with Unicode and legacy text encodings; but Base64, QPrint, etc. are n Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc The purpose of mbstring is for working with Unicode and legacy text encodings; but Base64, QPrint, etc. are not text encodings and don't really belong in mbstring. PHP already contains separate implementations of Base64, QPrint, and HTML entities. It will be better to eventually remove these non-encodings from mbstring. Regarding HTML entities... there is a bit more to say. mbstring's implementation of HTML entities is different from the other built-in implementation (htmlspecialchars and htmlentities). Those functions convert <, >, and & to HTML entities, but mbstring does not. It appears that the original author of mbstring intended for something to be done with <, >, and &. He used a table to identify which characters should be converted to HTML entities, and </>/& all have a special value in that table. However, nothing ever checks for that special value, so the characters are passed through unconverted. This seems like a very useless implementation of HTML entities. The most important characters which need to be expressed as entities in HTML documents are those three! show more ...
# d3d6d790	21-Oct-2021	Christoph M. Becker	Fix #76167: mbstring may use pointer from some previous request We must not reuse per-request memory across multiple requests, so this check triggered during RINIT makes no sense. As ex Fix #76167: mbstring may use pointer from some previous request We must not reuse per-request memory across multiple requests, so this check triggered during RINIT makes no sense. As explained in the bug report[1], it can be even harmful, if some request startup fails, and the pointers refer to already freed memory in the next request. [1] <https://bugs.php.net/76167> Closes GH-7604. show more ...
# a2bc57e0	18-Oct-2021	Alex Dowad	mb_detect_encoding will not return non-encodings Among the text encodings supported by mbstring are several which are not really 'text encodings'. These include Base64, QPrint, UUencode, mb_detect_encoding will not return non-encodings Among the text encodings supported by mbstring are several which are not really 'text encodings'. These include Base64, QPrint, UUencode, HTML entities, '7 bit', and '8 bit'. Rather than providing an explicit list of text encodings which they are interested in, users may pass the output of mb_list_encodings to mb_detect_encoding. Since Base64, QPrint, and so on are included in the output of mb_list_encodings, mb_detect_encoding can return one of these as its 'detected encoding' (and in fact, this often happens). Before mb_detect_encoding was enhanced so it could detect any of the supported text encodings, this did not happen, and it is never desired. show more ...
# dcaa010f	07-Aug-2020	Alex Dowad	Strict validation of conversion flags to mb_convert_kana mb_convert_kana is controlled by user-provided flags, which specify what it should convert and to what. These flags come in inver Strict validation of conversion flags to mb_convert_kana mb_convert_kana is controlled by user-provided flags, which specify what it should convert and to what. These flags come in inverse pairs, for example "fullwidth numerals to halfwidth numerals" and "halfwidth numerals to fullwidth numerals". It does not make sense to combine inverse flags. But, clever reader of commit logs, you will surely say: What if I want all my halfwidth numerals to become fullwidth, and all my fullwidth numerals to become halfwidth? Much too clever, you are! Let's put aside the fact that this bizarre switch-up is ridiculous and will never be used, and face up to another stark reality: mb_convert_kana does not work for that case, and never has. This was probably never noticed because nobody ever tried. Disallowing useless combinations of flags gives freedom to rearrange the kana conversion code without changing behavior. We can also reject unrecognized flags. This may help users to catch bugs. Interestingly, the existing tests used a 'Z' flag, which is useless (it's not recognized at all). show more ...
# 78004912	22-Sep-2021	Alex Dowad	Inline SKIP_LONG_HEADER... macro which is only used once I don't find that pulling this code out into a macro makes anything clearer. Not at all.
# 8c32deb6	18-Oct-2020	Alex Dowad	Don't check for impossible error condition in mb_strwidth
# bf78070c	18-Oct-2020	Alex Dowad	Don't check for impossible error condition in mb_strlen
# d3f56e5a	18-Oct-2020	Alex Dowad	Rename php_mb_mbchar_bytes_ex to php_mb_mbchar_bytes ...And remove the original php_mb_mbchar_bytes, which was not being used.
# 774cd960	18-Oct-2020	Alex Dowad	No need to null-terminate buffer in php_mb_chr `mbfl_buffer_converter_feed_result` will not overrun the specified length.
# abf83e50	18-Oct-2020	Alex Dowad	Rename php_mb_safe_strrchr_ex to php_mb_safe_strrchr ...And remove the original php_mb_safe_strrchr, which was not being used anywhere.
# 46315def	23-Sep-2021	Nikita Popov	Use locale-independent case conversion in mb_send_mail() Headers should not be processed in a locale-depdendent fashion. Switch from upper to lowercasing because that's the standard for Use locale-independent case conversion in mb_send_mail() Headers should not be processed in a locale-depdendent fashion. Switch from upper to lowercasing because that's the standard for PHP and we provide an ASCII implementation of this operation. This is adapted from GH-7506. show more ...
# 36c979e2	18-Oct-2020	Alex Dowad	Use stack-allocated buffer in php_mb_chr
# 1170981b	18-Sep-2021	Alex Dowad	Fix mb_str_split on empty strings in variable-length text encodings Previously, when passed an empty string, and given an encoding which uses a variable number of bytes per character (an Fix mb_str_split on empty strings in variable-length text encodings Previously, when passed an empty string, and given an encoding which uses a variable number of bytes per character (and which doesn't have a 'character length table'), mb_str_split would return an array containing a single empty string, rather than an empty array. The ISO-2022 encodings are among those which were affected by this bug. show more ...
# ca33ab59	06-Sep-2021	Alex Dowad	mb_detect_encoding with only one candidate encoding uses mb_check_encoding ...Because it's about 5% faster.
1 2 3 456 7 8 9 10 >>...35