#
d6fc1650 |
| 15-Jul-2022 |
Christoph M. Becker |
Drop useless TODO comment Cf. <https://github.com/php/php-src/pull/9018#issuecomment-1185481492>.
|
#
56137cd2 |
| 23-Jun-2022 |
Máté Kocsis |
Declare ext/mbstring constants in stubs (#8798)
|
#
880803a2 |
| 13-May-2022 |
Alex Dowad |
Use fast conversion filters to implement php_mb_ord Even for single-character strings, this is about 50% faster for ASCII, UTF-8, and UTF-16. For long strings, the performance gain is
Use fast conversion filters to implement php_mb_ord Even for single-character strings, this is about 50% faster for ASCII, UTF-8, and UTF-16. For long strings, the performance gain is enormous, since the old code would convert the ENTIRE string, just to pick out the first codepoint.
show more ...
|
#
950a7db9 |
| 02-May-2022 |
Alex Dowad |
Use fast text conversion filters to implement mb_check_encoding Benchmarking reveals that this is about 8% slower for UTF-8 strings which have a bad codepoint at the very beginning of th
Use fast text conversion filters to implement mb_check_encoding Benchmarking reveals that this is about 8% slower for UTF-8 strings which have a bad codepoint at the very beginning of the string. For good strings, or those where the first bad codepoint is much later in the string, it is significantly faster (2-3 times faster in many cases).
show more ...
|
#
2eb2f9d7 |
| 02-Jun-2022 |
Remi Collet |
Fix GH-8685 mbstring requires pcre
|
#
49202116 |
| 28-May-2022 |
Alex Dowad |
php_mb_convert_encoding{,_ex} returns zend_string That's what all existing callers want anyways. This avoids 2 unnecessary copies of the converted string.
|
#
0154a5ac |
| 13-May-2022 |
Alex Dowad |
Use fast text conversion filters to implement php_mb_convert_encoding_ex
|
#
03816fba |
| 18-Jan-2022 |
Christoph M. Becker |
Fix GH-7902: mb_send_mail may delimit headers with LF only Email headers are supposed to be separated with CRLF. Period. We introduce a `CRLF` macro for better comprehensibility rig
Fix GH-7902: mb_send_mail may delimit headers with LF only Email headers are supposed to be separated with CRLF. Period. We introduce a `CRLF` macro for better comprehensibility right away. Closes GH-7907.
show more ...
|
#
3c732251 |
| 21-Jul-2021 |
Alex Dowad |
New internal interface for fast text conversion in mbstring When converting text to/from wchars, mbstring makes one function call for each and every byte or wchar to be converted. Typica
New internal interface for fast text conversion in mbstring When converting text to/from wchars, mbstring makes one function call for each and every byte or wchar to be converted. Typically, each of these conversion functions contains a state machine, and its state has to be restored and then saved for every single one of these calls. It doesn't take much to see that this is grossly inefficient. Instead of converting one byte or wchar on each call, the new conversion functions will either fill up or drain a whole buffer of wchars on each call. In benchmarks, this is about 3-10× faster. Adding the new, faster conversion functions for all supported legacy text encodings still needs some work. Also, all the code which uses the old-style conversion functions needs to be converted to use the new ones. After that, the old code can be dropped. (The mailparse extension will also have to be fixed up so it will still compile.)
show more ...
|
#
f07c1935 |
| 05-Dec-2021 |
Alex Dowad |
mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint In a2bc57e0e5, mb_detect_encoding was modified to ensure it would never return 'UUENCODE', 'QPrint', or
mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint In a2bc57e0e5, mb_detect_encoding was modified to ensure it would never return 'UUENCODE', 'QPrint', or other non-encodings as the "detected text encoding". Before mb_detect_encoding was enhanced so that it could detect any supported text encoding, those were never returned, and they are not desired. Actually, we want to eventually remove them completely from mbstring, since PHP already contains other implementations of UUEncode, QPrint, Base64, and HTML entities. For more clarity on why we need to suppress UUEncode, etc. from being detected by mb_detect_encoding, the existing UUEncode implementation in mbstring *never* treats any input as erroneous. It just accepts everything. This means that it would *always* be treated as a valid choice by mb_detect_encoding, and would be returned in many, many cases where the input is obviously not UUEncoded. It turns out that the form of mb_convert_encoding where the user passes multiple candidate encodings (and mbstring auto-detects which one to use) was also affected by the same issue. Apply the same fix.
show more ...
|
#
1a4f49f1 |
| 10-Nov-2021 |
Dmitry Stogov |
Use cheaper memchr() instead of php_memnstr()
|
#
9308974f |
| 18-Oct-2021 |
Alex Dowad |
Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc The purpose of mbstring is for working with Unicode and legacy text encodings; but Base64, QPrint, etc. are n
Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc The purpose of mbstring is for working with Unicode and legacy text encodings; but Base64, QPrint, etc. are not text encodings and don't really belong in mbstring. PHP already contains separate implementations of Base64, QPrint, and HTML entities. It will be better to eventually remove these non-encodings from mbstring. Regarding HTML entities... there is a bit more to say. mbstring's implementation of HTML entities is different from the other built-in implementation (htmlspecialchars and htmlentities). Those functions convert <, >, and & to HTML entities, but mbstring does not. It appears that the original author of mbstring intended for something to be done with <, >, and &. He used a table to identify which characters should be converted to HTML entities, and </>/& all have a special value in that table. However, nothing ever checks for that special value, so the characters are passed through unconverted. This seems like a very useless implementation of HTML entities. The most important characters which need to be expressed as entities in HTML documents are those three!
show more ...
|
#
d3d6d790 |
| 21-Oct-2021 |
Christoph M. Becker |
Fix #76167: mbstring may use pointer from some previous request We must not reuse per-request memory across multiple requests, so this check triggered during RINIT makes no sense. As ex
Fix #76167: mbstring may use pointer from some previous request We must not reuse per-request memory across multiple requests, so this check triggered during RINIT makes no sense. As explained in the bug report[1], it can be even harmful, if some request startup fails, and the pointers refer to already freed memory in the next request. [1] <https://bugs.php.net/76167> Closes GH-7604.
show more ...
|
#
a2bc57e0 |
| 18-Oct-2021 |
Alex Dowad |
mb_detect_encoding will not return non-encodings Among the text encodings supported by mbstring are several which are not really 'text encodings'. These include Base64, QPrint, UUencode,
mb_detect_encoding will not return non-encodings Among the text encodings supported by mbstring are several which are not really 'text encodings'. These include Base64, QPrint, UUencode, HTML entities, '7 bit', and '8 bit'. Rather than providing an explicit list of text encodings which they are interested in, users may pass the output of mb_list_encodings to mb_detect_encoding. Since Base64, QPrint, and so on are included in the output of mb_list_encodings, mb_detect_encoding can return one of these as its 'detected encoding' (and in fact, this often happens). Before mb_detect_encoding was enhanced so it could detect any of the supported text encodings, this did not happen, and it is never desired.
show more ...
|
#
dcaa010f |
| 07-Aug-2020 |
Alex Dowad |
Strict validation of conversion flags to mb_convert_kana mb_convert_kana is controlled by user-provided flags, which specify what it should convert and to what. These flags come in inver
Strict validation of conversion flags to mb_convert_kana mb_convert_kana is controlled by user-provided flags, which specify what it should convert and to what. These flags come in inverse pairs, for example "fullwidth numerals to halfwidth numerals" and "halfwidth numerals to fullwidth numerals". It does not make sense to combine inverse flags. But, clever reader of commit logs, you will surely say: What if I want all my halfwidth numerals to become fullwidth, and all my fullwidth numerals to become halfwidth? Much too clever, you are! Let's put aside the fact that this bizarre switch-up is ridiculous and will never be used, and face up to another stark reality: mb_convert_kana does not work for that case, and never has. This was probably never noticed because nobody ever tried. Disallowing useless combinations of flags gives freedom to rearrange the kana conversion code without changing behavior. We can also reject unrecognized flags. This may help users to catch bugs. Interestingly, the existing tests used a 'Z' flag, which is useless (it's not recognized at all).
show more ...
|
#
78004912 |
| 22-Sep-2021 |
Alex Dowad |
Inline SKIP_LONG_HEADER... macro which is only used once I don't find that pulling this code out into a macro makes anything clearer. Not at all.
|
#
8c32deb6 |
| 18-Oct-2020 |
Alex Dowad |
Don't check for impossible error condition in mb_strwidth
|
#
bf78070c |
| 18-Oct-2020 |
Alex Dowad |
Don't check for impossible error condition in mb_strlen
|
#
d3f56e5a |
| 18-Oct-2020 |
Alex Dowad |
Rename php_mb_mbchar_bytes_ex to php_mb_mbchar_bytes ...And remove the original php_mb_mbchar_bytes, which was not being used.
|
#
774cd960 |
| 18-Oct-2020 |
Alex Dowad |
No need to null-terminate buffer in php_mb_chr `mbfl_buffer_converter_feed_result` will not overrun the specified length.
|
#
abf83e50 |
| 18-Oct-2020 |
Alex Dowad |
Rename php_mb_safe_strrchr_ex to php_mb_safe_strrchr ...And remove the original php_mb_safe_strrchr, which was not being used anywhere.
|
#
46315def |
| 23-Sep-2021 |
Nikita Popov |
Use locale-independent case conversion in mb_send_mail() Headers should not be processed in a locale-depdendent fashion. Switch from upper to lowercasing because that's the standard for
Use locale-independent case conversion in mb_send_mail() Headers should not be processed in a locale-depdendent fashion. Switch from upper to lowercasing because that's the standard for PHP and we provide an ASCII implementation of this operation. This is adapted from GH-7506.
show more ...
|
#
36c979e2 |
| 18-Oct-2020 |
Alex Dowad |
Use stack-allocated buffer in php_mb_chr
|
#
1170981b |
| 18-Sep-2021 |
Alex Dowad |
Fix mb_str_split on empty strings in variable-length text encodings Previously, when passed an empty string, and given an encoding which uses a variable number of bytes per character (an
Fix mb_str_split on empty strings in variable-length text encodings Previously, when passed an empty string, and given an encoding which uses a variable number of bytes per character (and which doesn't have a 'character length table'), mb_str_split would return an array containing a single empty string, rather than an empty array. The ISO-2022 encodings are among those which were affected by this bug.
show more ...
|
#
ca33ab59 |
| 06-Sep-2021 |
Alex Dowad |
mb_detect_encoding with only one candidate encoding uses mb_check_encoding ...Because it's about 5% faster.
|