#
dc5f3b95 |
| 11-Sep-2024 |
tekimen |
Fix GH-15824 mb_detect_encoding() invalid "UTF8" (#15829) I fixed from strcasecmp to strncasecmp. However, strncasecmp is specify size to #3 parameter. Hence, Add check length to mim
Fix GH-15824 mb_detect_encoding() invalid "UTF8" (#15829) I fixed from strcasecmp to strncasecmp. However, strncasecmp is specify size to #3 parameter. Hence, Add check length to mime and aliases. Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>
show more ...
|
#
da6766d7 |
| 30-Dec-2023 |
Niels Dossche <7771979+nielsdos@users.noreply.github.com> |
Use more optimal perfect hash table
|
#
5fdb2724 |
| 22-Dec-2023 |
Alex Dowad |
Add mbstring support for GB18030-2022 text encoding The previous version of the GB-18030 standard was published in 2005. This commit adds support for the updated (2022) version of this t
Add mbstring support for GB18030-2022 text encoding The previous version of the GB-18030 standard was published in 2005. This commit adds support for the updated (2022) version of this text encoding. The existing GB18030 implementation has been left unchanged for backwards compatibility; users who want to use the new standard must explicitly indicate the desired text encoding is 'GB18030-2022'. The document which defines GB18030-2022, published by the government of the People's Republic of China, defines three levels of standards compliance. This implementation is intended to achieve Implementation Level 3, which is the highest level of compliance. Experts in the GB18030 standard are requested to assess this implementation and report any deviation from the standard.
show more ...
|
#
b0f7df1a |
| 05-Dec-2023 |
Alex Dowad |
Use optimized implementation of mb_strcut for Japanese mobile vendor UTF-8 variants To facilitate sharing of mb_cut_utf8, I combined mbfilter_utf8.c and mbfilter_utf8_mobile.c into a sin
Use optimized implementation of mb_strcut for Japanese mobile vendor UTF-8 variants To facilitate sharing of mb_cut_utf8, I combined mbfilter_utf8.c and mbfilter_utf8_mobile.c into a single source file.
show more ...
|
#
3ad422eb |
| 18-Nov-2023 |
Niels Dossche <7771979+nielsdos@users.noreply.github.com> |
Avoid temporary string allocations in php_mb_parse_encoding_list() (#12714) This brings execution time down from 0.91s to 0.86s on the reference benchmark [1]. [1] https://githu
Avoid temporary string allocations in php_mb_parse_encoding_list() (#12714) This brings execution time down from 0.91s to 0.86s on the reference benchmark [1]. [1] https://github.com/php/php-src/issues/12684#issuecomment-1813799924
show more ...
|
#
76582205 |
| 17-Nov-2023 |
Niels Dossche <7771979+nielsdos@users.noreply.github.com> |
Improve performance of mbfl_name2encoding() by using perfect hashing (#12707) mbfl_name2encoding() uses a linear loop through the encodings, comparing the name one by one, which is very
Improve performance of mbfl_name2encoding() by using perfect hashing (#12707) mbfl_name2encoding() uses a linear loop through the encodings, comparing the name one by one, which is very slow. For the benchmark [1] just looking up the name takes about 50% of run-time. By using perfect hashing instead, we no longer have to loop over the list, and the number of string comparisons is reduced to just a single one. The perfect hashing table is generated using GNU gperf and amended manually to fit in with mbstring and manually changed to reduce the cache size. [1] https://github.com/php/php-src/issues/12684#issuecomment-1813799924
show more ...
|
#
c717c79a |
| 14-Apr-2023 |
Alex Dowad |
Combine CJK encoding conversion code in a single source file This will make it easier to combine duplicated code between all the CJK text encodings (a significant amount is already combi
Combine CJK encoding conversion code in a single source file This will make it easier to combine duplicated code between all the CJK text encodings (a significant amount is already combined in this commit, such as the repeated definitions of SJIS_DECODE and SJIS_ENCODE), but I hope to remove even more redundancy in the future. The table used to implement mb_strlen for CP932 has been changed to the same table as "SJIS-win".
show more ...
|
#
117f2263 |
| 06-Feb-2023 |
Alex Dowad |
Remove unneeded function mbfl_no2preferred_mime_name
|
#
a85adb17 |
| 06-Feb-2023 |
Alex Dowad |
Remove unneeded function mbfl_name2no_encoding
|
Revision tags: php-8.2.0RC1, php-8.1.10, php-8.0.23, php-8.0.23RC1, php-8.1.10RC1, php-8.2.0beta3, php-8.2.0beta2, php-8.1.9, php-8.0.22, php-8.1.9RC1, php-8.2.0beta1, php-8.0.22RC1, php-8.0.21, php-8.1.8, php-8.2.0alpha3, php-8.1.8RC1, php-8.2.0alpha2, php-8.0.21RC1, php-8.0.20, php-8.1.7, php-8.2.0alpha1, php-7.4.30, php-8.1.7RC1, php-8.0.20RC1, php-8.1.6, php-8.0.19, php-8.1.6RC1, php-8.0.19RC1, php-8.0.18, php-8.1.5, php-7.4.29 |
|
#
371367ce |
| 07-Apr-2022 |
Alex Dowad |
Reintroduce legacy 'SJIS-win' text encoding in mbstring In e2459857af, I combined mbstring's "SJIS-win" text encoding into CP932. This was done after doing some testing which appeared
Reintroduce legacy 'SJIS-win' text encoding in mbstring In e2459857af, I combined mbstring's "SJIS-win" text encoding into CP932. This was done after doing some testing which appeared to show that the mappings for "SJIS-win" were the same as those for "CP932". Later, it was found that there was actually a small difference prior to e2459857af when converting Unicode to CP932. The mappings for the following two codepoints were different: CP932 SJIS-win U+203E 0x7E 0x81 0x50 U+00A5 0x5C 0x81 0x8F As shown, mbstring's "CP932" mapped Unicode's 'OVERLINE' and 'YEN SIGN' to the ASCII bytes which have conflicting uses in most legacy Japanese text encodings. "SJIS-win" mapped these to equivalent JIS X 0208 fullwidth characters. Since e2459867af was not intended to cause any user-visible change in behavior, I am rolling back the merge of "CP932" and "SJIS-win". It seems doubtful whether these two text encodings should be kept separate or merged in a future release. An extensive discussion of the related historical background and compatibility issues involved can be found in this GitHub thread: https://github.com/php/php-src/issues/8308
show more ...
|
Revision tags: php-8.1.5RC1, php-8.0.18RC1, php-8.1.4, php-8.0.17, php-8.1.4RC1, php-8.0.17RC1, php-8.1.3, php-8.0.16, php-7.4.28, php-8.1.3RC1, php-8.0.16RC1, php-8.1.2, php-8.0.15, php-8.1.2RC1, php-8.0.15RC1, php-8.0.14, php-8.1.1, php-7.4.27, php-8.1.1RC1, php-8.0.14RC1, php-7.4.27RC1, php-8.1.0, php-8.0.13, php-7.4.26, php-7.3.33, php-8.1.0RC6, php-7.4.26RC1, php-8.0.13RC1, php-8.1.0RC5, php-7.3.32, php-7.4.25, php-8.0.12, php-8.1.0RC4, php-8.0.12RC1, php-7.4.25RC1, php-8.1.0RC3, php-8.0.11, php-7.4.24, php-7.3.31, php-8.1.0RC2, php-7.4.24RC1, php-8.0.11RC1, php-8.1.0RC1 |
|
#
634f2e21 |
| 30-Aug-2021 |
Nikita Popov |
Don't expose wchar encoding to users (#7415) The "wchar" encoding isn't really an encoding -- it's what we internally use as the representation of decoded characters. In practic
Don't expose wchar encoding to users (#7415) The "wchar" encoding isn't really an encoding -- it's what we internally use as the representation of decoded characters. In practice, it tends to behave a lot like the 8bit encoding when used from userland, because input code units end up being treated as code points. This patch removes the wchar encoding from the public encoding list and reserves it for internal use only.
show more ...
|
Revision tags: php-7.4.23, php-8.0.10, php-7.3.30, php-8.1.0beta3, php-8.0.10RC1, php-7.4.23RC1, php-8.1.0beta2, php-8.0.9, php-7.4.22, php-8.1.0beta1, php-7.4.22RC1, php-8.0.9RC1, php-8.1.0alpha3, php-7.4.21, php-7.3.29, php-8.0.8, php-8.1.0alpha2, php-7.4.21RC1, php-8.0.8RC1, php-8.1.0alpha1, php-8.0.7, php-7.4.20, php-8.0.7RC1, php-7.4.20RC1, php-8.0.6, php-7.4.19, php-7.4.18, php-7.3.28, php-8.0.5, php-8.0.5RC1, php-7.4.18RC1, php-8.0.4RC1, php-7.4.17RC1, php-8.0.3, php-7.4.16, php-8.0.3RC1, php-7.4.16RC1, php-8.0.2, php-7.4.15, php-7.3.27, php-8.0.2RC1, php-7.4.15RC2, php-7.4.15RC1, php-8.0.1, php-7.4.14, php-7.3.26, php-7.4.14RC1, php-8.0.1RC1, php-7.3.26RC1, php-8.0.0, php-7.3.25, php-7.4.13, php-8.0.0RC5, php-7.4.13RC1, php-8.0.0RC4, php-7.3.25RC1, php-7.4.12, php-8.0.0RC3, php-7.3.24 |
|
#
e2459857 |
| 22-Oct-2020 |
Alex Dowad |
Remove duplicate implementation of CP932 from mbstring Sigh. Double sigh. After fruitlessly searching the Internet for information on this mysterious text encoding called "SJIS-open", I
Remove duplicate implementation of CP932 from mbstring Sigh. Double sigh. After fruitlessly searching the Internet for information on this mysterious text encoding called "SJIS-open", I wrote a script to try converting every Unicode codepoint from 0-0xFFFF and compare the results from different variants of Shift-JIS, to see which one "SJIS-open" would be most similar to. The result? It's just CP932. There is no difference at all. So why do we have two implementations of CP932 in mbstring? In case somebody, somewhere is using "SJIS-open" (or its aliases "SJIS-win" or "SJIS-ms"), add these as aliases to CP932 so existing code will continue to work.
show more ...
|
#
34ece408 |
| 17-Oct-2020 |
Alex Dowad |
Remove useless mbstring encoding 'JIS-ms' MicroSoft invented three encodings very similar to ISO-2022-JP/JIS7/JIS8, called CP50220, CP50221, and CP50222. All three are supported by mbstr
Remove useless mbstring encoding 'JIS-ms' MicroSoft invented three encodings very similar to ISO-2022-JP/JIS7/JIS8, called CP50220, CP50221, and CP50222. All three are supported by mbstring. Since these encodings are very similar, some code can be shared. Actually, conversion of CP50220/1/2 to Unicode is exactly the same operation; it's when converting from Unicode to CP50220/1/2 that some small differences arise in how certain katakana are handled. The most important common code was a function called `mbfl_filt_wchar_jis_ms`. The `jis_ms` part doubtless refers to the fact that these encodings are modified versions of 'JIS' invented by 'MS'. mbstring also went a step further and exported 'JIS-ms' to userland as a separate encoding from CP50220/1/2. If users requested 'JIS-ms' conversion, they got something like CP50220/1/2, minus their special ways of handling half-width katakana when converting from Unicode. But... that 'encoding' is not something which actually exists in the world outside of mbstring. CP50220/1/2 do exist in MicroSoft software, but not 'JIS-ms'. For a text encoding conversion library, inventing new variant encodings and implementing them is not very productive. Our interest is in handling text encodings which real people actually use for... you know, storing actual text and things like that.
show more ...
|
Revision tags: php-8.0.0RC2, php-7.4.12RC1, php-7.3.24RC1 |
|
#
fcbe45de |
| 07-Oct-2020 |
Alex Dowad |
Remove useless mbstring encoding 'CP50220-raw' CP50220 is a variant of ISO-2022-JP invented by MicroSoft, which handles some Unicode characters which are not representable in ISO-2022-JP
Remove useless mbstring encoding 'CP50220-raw' CP50220 is a variant of ISO-2022-JP invented by MicroSoft, which handles some Unicode characters which are not representable in ISO-2022-JP by converting them to similar characters which are representable. What, then, is CP50220-raw? An Internet search turns up absolutely nothing. Reference works which I consulted don't say anything about it. Other text conversion libraries don't support it. From looking at the code: It's just the same as CP50220, but it accepts unmapped JIS X 0208 characters passed through from other Japanese encodings and silently encodes them using the usual ISO-2022-JP escape sequence and representation for JIS X 0208 characters. It's hard to see how this could be useful. OK, let me come out and say it: it's _not_ useful. We can confidently jettison this (mis)feature.
show more ...
|
#
e169ad3b |
| 03-Nov-2020 |
Alex Dowad |
Consolidate all single-byte encodings in one source file We can squeeze out a lot of duplicated code in this way. |
#
cc03c54c |
| 04-Nov-2020 |
Alex Dowad |
Remove useless byte{2,4}{be,le} encodings from mbstring There is no meaningful difference between these and UCS-{2,4}. They are just a little bit more lax about passing errors silently.
Remove useless byte{2,4}{be,le} encodings from mbstring There is no meaningful difference between these and UCS-{2,4}. They are just a little bit more lax about passing errors silently. They also have no known use. Alias to UCS-{2,4} in case someone, somewhere is using them.
show more ...
|
Revision tags: php-7.2.34, php-8.0.0rc1, php-7.4.11, php-7.3.23, php-8.0.0beta4, php-7.4.11RC1, php-7.3.23RC1, php-8.0.0beta3, php-7.4.10, php-7.3.22, php-8.0.0beta2, php-7.3.22RC1, php-7.4.10RC1, php-8.0.0beta1, php-7.4.9, php-7.2.33, php-7.3.21, php-8.0.0alpha3, php-7.4.9RC1, php-7.3.21RC1 |
|
#
0ffc1f55 |
| 16-Jul-2020 |
Alex Dowad |
Refactor mbfl_ident.c, mbfl_encoding.c, mbfl_memory_device.c, mbfl_string.c - Make everything less gratuitously verbose - Don't litter the code with lots of unneeded NULL checks (for thi
Refactor mbfl_ident.c, mbfl_encoding.c, mbfl_memory_device.c, mbfl_string.c - Make everything less gratuitously verbose - Don't litter the code with lots of unneeded NULL checks (for things which will never be NULL) - Don't return success/failure code from functions which can never fail - For encoding structs, don't use pointers to pointers to pointers for the list of alias strings. Pointers to pointers (2 levels of indirection) is what actually makes sense. This gets rid of some extraneous dereference operations.
show more ...
|
Revision tags: php-7.4.8, php-7.2.32, php-8.0.0alpha2, php-7.3.20 |
|
#
62317d59 |
| 04-Jul-2020 |
Alex Dowad |
Remove redundant includes from mbstring (and make sure correct config.h is used) Very interesting... it turns out that when Valgrind support was enabled, `#include "config.h"` from withi
Remove redundant includes from mbstring (and make sure correct config.h is used) Very interesting... it turns out that when Valgrind support was enabled, `#include "config.h"` from within mbstring was actually including the file "config.h" from Valgrind, and not the one from mbstring!! This is because -I/usr/include/valgrind was added to the compiler invocation _before_ -Iext/mbstring/libmbfl. Make sure we actually include the file which was intended.
show more ...
|
#
a64241b5 |
| 27-Jun-2020 |
Alex Dowad |
Remove unused functions from mbstring - mbfl_buffer_converter_reset - mbfl_buffer_converter_strncat - mbfl_buffer_converter_getbuffer - mbfl_oddlen - mbfl_filter_output_pipe_
Remove unused functions from mbstring - mbfl_buffer_converter_reset - mbfl_buffer_converter_strncat - mbfl_buffer_converter_getbuffer - mbfl_oddlen - mbfl_filter_output_pipe_flush - mbfl_memory_device_output2 - mbfl_memory_device_output4 - mbfl_is_support_encoding - mbfl_buffer_converter_feed2 - _php_mb_regex_globals_dtor - mime_header_encoder_feed - mime_header_decoder_feed - mbfl_convert_filter_feed
show more ...
|
Revision tags: php-8.0.0alpha1, php-7.4.8RC1, php-7.3.20RC1, php-7.4.7, php-7.3.19, php-7.4.7RC1, php-7.3.19RC1, php-7.4.6, php-7.2.31 |
|
#
226d9dd3 |
| 07-May-2020 |
Nikita Popov |
Only allow "pass" as input/output encoding "pass" is not a real encoding, it just means "don't perform any conversion". Using it as an internal encoding or passing it to any of the m
Only allow "pass" as input/output encoding "pass" is not a real encoding, it just means "don't perform any conversion". Using it as an internal encoding or passing it to any of the mbstring() function will not work (and on master commonly assert).
show more ...
|
Revision tags: php-7.4.6RC1, php-7.3.18RC1, php-7.2.30, php-7.4.5, php-7.3.17, php-7.4.5RC1, php-7.3.17RC1, php-7.3.18, php-7.4.4, php-7.2.29, php-7.3.16, php-7.4.4RC1, php-7.3.16RC1, php-7.4.3, php-7.2.28, php-7.3.15RC1, php-7.4.3RC1, php-7.3.15, php-7.2.27, php-7.4.2, php-7.3.14, php-7.3.14RC1, php-7.4.2RC1, php-7.4.1, php-7.2.26, php-7.3.13, php-7.4.1RC1, php-7.3.13RC1, php-7.2.26RC1, php-7.4.0, php-7.2.25, php-7.3.12, php-7.4.0RC6, php-7.3.12RC1, php-7.2.25RC1, php-7.4.0RC5, php-7.1.33, php-7.2.24, php-7.3.11, php-7.4.0RC4, php-7.3.11RC1, php-7.2.24RC1, php-7.4.0RC3, php-7.2.23, php-7.3.10, php-7.4.0RC2, php-7.2.23RC1, php-7.3.10RC1, php-7.4.0RC1, php-7.1.32, php-7.2.22, php-7.3.9, php-7.4.0beta4, php-7.2.22RC1, php-7.3.9RC1, php-7.4.0beta2, php-7.1.31, php-7.2.21, php-7.3.8, php-7.4.0beta1, php-7.2.21RC1, php-7.3.8RC1, php-7.4.0alpha3, php-7.3.7, php-7.2.20, php-7.4.0alpha2, php-7.3.7RC3, php-7.3.7RC2, php-7.2.20RC2, php-7.4.0alpha1, php-7.3.7RC1, php-7.2.20RC1, php-7.2.19, php-7.3.6, php-7.1.30, php-7.2.19RC1, php-7.3.6RC1, php-7.1.29, php-7.2.18, php-7.3.5, php-7.2.18RC1, php-7.3.5RC1, php-7.2.17, php-7.3.4, php-7.1.28, php-7.3.4RC1, php-7.2.17RC1, php-7.1.27, php-7.3.3, php-7.2.16, php-7.3.3RC1, php-7.2.16RC1, php-7.2.15, php-7.3.2, php-7.2.15RC1, php-7.3.2RC1, php-5.6.40, php-7.1.26, php-7.3.1, php-7.2.14, php-7.2.14RC1, php-7.3.1RC1, php-5.6.39, php-7.1.25, php-7.2.13, php-7.0.33, php-7.3.0, php-7.1.25RC1, php-7.2.13RC1, php-7.3.0RC6, php-7.1.24, php-7.2.12, php-7.3.0RC5, php-7.1.24RC1, php-7.2.12RC1, php-7.3.0RC4 |
|
#
ad6738e8 |
| 17-Oct-2018 |
Nikita Popov |
Merge branch 'PHP-7.3'
|
#
11515546 |
| 17-Oct-2018 |
Nikita Popov |
Remove the "auto" encoding "auto" is only meaningful in functions which accept an encoding *list* and support encoding detection. These functions have explicit checks for "auto". It
Remove the "auto" encoding "auto" is only meaningful in functions which accept an encoding *list* and support encoding detection. These functions have explicit checks for "auto". It cannot be used as a standalone encoding in any meaningful capacity, so I'm dropping it entirely.
show more ...
|
Revision tags: php-7.1.23, php-7.2.11, php-7.3.0RC3, php-7.1.23RC1, php-7.2.11RC1, php-7.3.0RC2 |
|
#
d3ca28f5 |
| 15-Sep-2018 |
Peter Kokot |
Remove HAVE_STRING_H The C89 standard and later defines the `<string.h>` header as part of the standard headers [1] and on current systems it is always present. Code included al
Remove HAVE_STRING_H The C89 standard and later defines the `<string.h>` header as part of the standard headers [1] and on current systems it is always present. Code included also `<strings.h>` header as an alterinative in some files. This kind of check was relevant on some older systems where the `<strings.h>` file included definitions for the C89 compliant `<string.h>`. Today such alternative check is not required anymore. The `<strings.h>` file is part of the POSIX definition these days. Also Autoconf suggests doing this and relying on C89 or above [2] and [3]. This patch also cleans few unused `<strings.h>` inclusions in the libmbfl. [1]: https://port70.net/~nsz/c/c89/c89-draft.html#4.1.2 [2]: http://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/headers.m4 [3]: https://www.gnu.org/software/autoconf/manual/autoconf-2.69/autoconf.html
show more ...
|
Revision tags: php-5.6.38, php-7.1.22, php-7.3.0RC1, php-7.2.10, php-7.0.32 |
|
#
6c1ff61a |
| 05-Sep-2018 |
Peter Kokot |
Remove HAVE_STDDEF_H The `<stddef.h>` header file is part of the standard C89 headers [1] and on current systems there is no need for a manual check if header is present. Si
Remove HAVE_STDDEF_H The `<stddef.h>` header file is part of the standard C89 headers [1] and on current systems there is no need for a manual check if header is present. Since PHP requires at least C89 the `HAVE_STDDEF_H` symbol isn't defined by Autoconf anywhere else anymore [2] and accross the PHP source code the header is included unconditionally already. This patch syncs this also for the bundled libmbfl which is maintaned as a fork in php-src. Refs: [1] https://port70.net/~nsz/c/c89/c89-draft.html#4.1.2 [2] https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/headers.m4
show more ...
|
Revision tags: php-7.1.22RC1, php-7.3.0beta3, php-7.2.10RC1, php-7.1.21, php-7.2.9, php-7.3.0beta2, php-7.1.21RC1, php-7.3.0beta1, php-7.2.9RC1, php-5.6.37, php-7.1.20, php-7.3.0alpha4, php-7.0.31, php-7.2.8, php-7.1.20RC1, php-7.2.8RC1, php-7.3.0alpha3, php-7.3.0alpha2, php-7.1.19, php-7.2.7, php-7.1.19RC1, php-7.3.0alpha1, php-7.2.7RC1, php-7.1.18, php-7.2.6, php-7.2.6RC1, php-7.1.18RC1, php-5.6.36, php-7.2.5, php-7.1.17, php-7.0.30, php-7.1.17RC1, php-7.2.5RC1, php-5.6.35, php-7.0.29, php-7.2.4, php-7.1.16, php-7.1.16RC1, php-7.2.4RC1, php-7.1.15, php-5.6.34, php-7.2.3, php-7.0.28, php-7.2.3RC1, php-7.1.15RC1, php-7.1.14, php-7.2.2, php-7.1.14RC1, php-7.2.2RC1, php-7.1.13, php-5.6.33, php-7.2.1, php-7.0.27, php-7.2.1RC1, php-7.1.13RC1, php-7.0.27RC1, php-7.2.0, php-7.1.12, l, php-7.1.12RC1, php-7.2.0RC6, php-7.0.26RC1, php-7.1.11, php-5.6.32, php-7.2.0RC5, php-7.0.25, php-7.1.11RC1, php-7.2.0RC4, php-7.0.25RC1, php-7.1.10, php-7.2.0RC3, php-7.0.24, php-7.2.0RC2, php-7.1.10RC1, php-7.0.24RC1, php-7.1.9, php-7.2.0RC1, php-7.0.23, php-7.1.9RC1, php-7.2.0beta3, php-7.0.23RC1 |
|
#
633a471b |
| 04-Aug-2017 |
Nikita Popov |
Store input and output filters in mbfl encodings For functions like mb_chr() and mb_ord() just looking up the input/output filter for the encoding dominates the runtime. This commit
Store input and output filters in mbfl encodings For functions like mb_chr() and mb_ord() just looking up the input/output filter for the encoding dominates the runtime. This commit stores the input/output filter for an encoding in the mbfl encoding structure, so it can be looked up directly, rather than scanning through filter function lists.
show more ...
|