#
67253987 |
| 04-Jun-2024 |
Gina Peter Banyard |
ext/mbstring: Fix some [-Wsign-compare] warnings
|
#
a9035863 |
| 09-Jan-2023 |
Alex Dowad |
Implement conditional casing for Greek letter sigma when title-casing text
|
#
290efe84 |
| 09-Jan-2023 |
Alex Dowad |
Adjust code which checks if encoding is ISO-8859-9 when converting case Instead of checking the 'encoding number' to see if we are converting case for ISO-8859-9 text, compare pointers i
Adjust code which checks if encoding is ISO-8859-9 when converting case Instead of checking the 'encoding number' to see if we are converting case for ISO-8859-9 text, compare pointers instead. This should free up 1 register in php_unicode_convert_case.
show more ...
|
#
39b46a53 |
| 07-Jan-2023 |
Alex Dowad |
Implement Unicode conditional casing rules for Greek letter sigma The capital Greek letter sigma (Σ) should be lowercased as σ except when it appears at the end of a word; in that case,
Implement Unicode conditional casing rules for Greek letter sigma The capital Greek letter sigma (Σ) should be lowercased as σ except when it appears at the end of a word; in that case, it should be lowercased as the special form ς. This rule is included in the Unicode data file SpecialCasing.txt. The condition for applying the rule is called "Final_Sigma" and is defined in Unicode technical report 21. The rule is: • For the special casing form to apply, the capital letter sigma must be preceded by 0 or more "case-ignorable" characters, preceded by at least 1 "cased" character. • Further, capital sigma must NOT be followed by 0 or more case-ignorable characters and then at least 1 cased character. "Case-ignorable" characters include certain punctuation marks, like the apostrophe, as well as various accent marks. There are actually close to 500 different case-ignorable characters, including accent marks from Cyrillic, Hebrew, Armenian, Arabic, Syriac, Bengali, Gujarati, Telugu, Tibetan, and many other alphabets. This category also includes zero-width spaces, codepoints which indicate RTL/LTR text direction, certain musical symbols, etc. Since the rule involves scanning over "0 or more" of such case-ignorable characters, it may be necessary to scan arbitrarily far to the left and right of capital sigma to determine whether the special lowercase form should be used or not. However, since we are trying to be both memory-efficient and CPU-efficient, this implementation limits how far to the left we will scan. Generally, we scan up to 63 characters to the left looking for a "cased" character, but not more. When scanning to the right, we go up to the end of the string if necessary, even if it means scanning over thousands of characters. Anyways, it is almost impossible to imagine that natural text will include "words" with more than 63 successive apostrophes (for example) followed by a capital sigma. Closes GH-8096.
show more ...
|
#
4427b2e1 |
| 10-Jan-2023 |
Alex Dowad |
Mark UTF-8 strings emitted by mbstring functions as valid UTF-8 We now have a couple of mbstring functions which have fast paths for strings marked as 'valid UTF-8'. Later, we may likely
Mark UTF-8 strings emitted by mbstring functions as valid UTF-8 We now have a couple of mbstring functions which have fast paths for strings marked as 'valid UTF-8'. Later, we may likely have more. So that these fast paths can be used more frequently, mark UTF-8 strings emitted by mbstring as 'valid UTF-8'. This is always a correct thing to do, because mbstring never returns invalid UTF-8 as the result of a conversion (or similar) operation. Internally, we do have a conversion mode which deliberately emits invalid UTF-8 in some cases. (This is done to prevent unwanted matches when we are converting strings to UTF-8 before performing matching operations on them.) For such strings, don't set the 'valid UTF-8' flag. It probably wouldn't hurt anything to set it, because strings generated using that special conversion mode should *never* be returned to userland, and I don't think we do anything with them which cares about the IS_STR_VALID_UTF8 flag... but still, it would likely cause confusion for developers.
show more ...
|
#
744ca16e |
| 14-Dec-2022 |
Alex Dowad |
Speed boost for mb_stripos (when not using UTF-8) Instead of case-folding a string and then converting it to UTF-8 as a separate operation, why not convert it to UTF-8 at the same time a
Speed boost for mb_stripos (when not using UTF-8) Instead of case-folding a string and then converting it to UTF-8 as a separate operation, why not convert it to UTF-8 at the same time as we fold case? For non-UTF-8 encodings, this typically makes mb_stripos about 2x faster.
show more ...
|
#
3ce888a8 |
| 04-Oct-2022 |
Alex Dowad |
Use uint32_t for 'illegal_substchar' codepoint in mbstring This value is a wchar, so the best type for it is uint32_t.
|
#
20769fb9 |
| 24-Sep-2022 |
Alex Dowad |
Make enum for valid case_mode values (for php_unicode_convert_case)
|
#
7eef2fb4 |
| 21-Sep-2022 |
Alex Dowad |
Use fast text conversion filters for mb_convert_case, mb_strtoupper, mb_strtolower Speed increase is only about 50% for title casing, but 2-3x for other forms of case conversion.
|
Revision tags: php-8.2.0RC1, php-8.1.10, php-8.0.23, php-8.0.23RC1, php-8.1.10RC1, php-8.2.0beta3, php-8.2.0beta2, php-8.1.9, php-8.0.22, php-8.1.9RC1, php-8.2.0beta1, php-8.0.22RC1, php-8.0.21, php-8.1.8, php-8.2.0alpha3, php-8.1.8RC1, php-8.2.0alpha2, php-8.0.21RC1, php-8.0.20, php-8.1.7, php-8.2.0alpha1, php-7.4.30, php-8.1.7RC1, php-8.0.20RC1, php-8.1.6, php-8.0.19, php-8.1.6RC1, php-8.0.19RC1, php-8.0.18, php-8.1.5, php-7.4.29, php-8.1.5RC1, php-8.0.18RC1, php-8.1.4, php-8.0.17, php-8.1.4RC1, php-8.0.17RC1, php-8.1.3, php-8.0.16, php-7.4.28, php-8.1.3RC1, php-8.0.16RC1, php-8.1.2, php-8.0.15, php-8.1.2RC1, php-8.0.15RC1, php-8.0.14, php-8.1.1, php-7.4.27, php-8.1.1RC1, php-8.0.14RC1, php-7.4.27RC1, php-8.1.0, php-8.0.13, php-7.4.26, php-7.3.33, php-8.1.0RC6, php-7.4.26RC1, php-8.0.13RC1, php-8.1.0RC5, php-7.3.32, php-7.4.25, php-8.0.12, php-8.1.0RC4, php-8.0.12RC1, php-7.4.25RC1, php-8.1.0RC3, php-8.0.11, php-7.4.24, php-7.3.31, php-8.1.0RC2, php-7.4.24RC1, php-8.0.11RC1, php-8.1.0RC1, php-7.4.23, php-8.0.10, php-7.3.30, php-8.1.0beta3, php-8.0.10RC1, php-7.4.23RC1, php-8.1.0beta2, php-8.0.9, php-7.4.22, php-8.1.0beta1, php-7.4.22RC1, php-8.0.9RC1, php-8.1.0alpha3, php-7.4.21, php-7.3.29, php-8.0.8, php-8.1.0alpha2, php-7.4.21RC1, php-8.0.8RC1, php-8.1.0alpha1, php-8.0.7, php-7.4.20, php-8.0.7RC1, php-7.4.20RC1, php-8.0.6, php-7.4.19, php-7.4.18, php-7.3.28, php-8.0.5, php-8.0.5RC1, php-7.4.18RC1, php-8.0.4RC1, php-7.4.17RC1, php-8.0.3, php-7.4.16, php-8.0.3RC1, php-7.4.16RC1, php-8.0.2, php-7.4.15, php-7.3.27, php-8.0.2RC1, php-7.4.15RC2, php-7.4.15RC1, php-8.0.1, php-7.4.14, php-7.3.26, php-7.4.14RC1, php-8.0.1RC1, php-7.3.26RC1, php-8.0.0, php-7.3.25, php-7.4.13, php-8.0.0RC5, php-7.4.13RC1, php-8.0.0RC4, php-7.3.25RC1, php-7.4.12, php-8.0.0RC3, php-7.3.24, php-8.0.0RC2, php-7.4.12RC1, php-7.3.24RC1, php-7.2.34, php-8.0.0rc1, php-7.4.11, php-7.3.23, php-8.0.0beta4, php-7.4.11RC1, php-7.3.23RC1, php-8.0.0beta3, php-7.4.10, php-7.3.22, php-8.0.0beta2, php-7.3.22RC1, php-7.4.10RC1, php-8.0.0beta1, php-7.4.9, php-7.2.33, php-7.3.21 |
|
#
4e51810f |
| 26-Jul-2020 |
Alex Dowad |
Optimize mbstring upper/lowercasing: use fast path in more cases The 'fast path' in the uppercase/lowercase functions for Unicode text can be used for a slightly greater range of charact
Optimize mbstring upper/lowercasing: use fast path in more cases The 'fast path' in the uppercase/lowercase functions for Unicode text can be used for a slightly greater range of characters. This is not expected to have a big impact on performance, since the number of characters which will use the 'fast path' is only increased by about 50-60, and these are not very commonly used characters... but still, it doesn't cost anything.
show more ...
|
#
a3126206 |
| 30-Jul-2021 |
Alex Dowad |
Remove redundant NULL checks in mbstring Whoever originally wrote mbstring seems to have a deathly fear of NULL pointers lurking behind every corner. A common pattern is that one fun
Remove redundant NULL checks in mbstring Whoever originally wrote mbstring seems to have a deathly fear of NULL pointers lurking behind every corner. A common pattern is that one function will check if a pointer is NULL, then pass it to another function, which will again check if it is NULL, then pass to yet another function, which will yet again check if it is NULL... it's NULL checks all the way down. Remove all the NULL checks in places where pointers could not possibly be NULL.
show more ...
|
#
d2073179 |
| 24-Aug-2021 |
Nikita Popov |
Return bool from php_unicode_is_prop()
|
#
3be94217 |
| 24-Aug-2021 |
Nikita Popov |
Don't use sentinel value for unicode property lookup 0xffff was used to mark character properties without any members. This made the code unnecessarily complicated, because we need to
Don't use sentinel value for unicode property lookup 0xffff was used to mark character properties without any members. This made the code unnecessarily complicated, because we need to check for 0xffff values when looking up the property ranges. We can simply encode this as an empty set of ranges.
show more ...
|
#
aff36587 |
| 29-Jun-2021 |
Patrick Allaert |
Fixed some spaces used instead of tabs
|
#
01b3fc03 |
| 06-May-2021 |
KsaR |
Update http->https in license (#6945) 1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https. 2. Update few license 3.0 to 3.01 as
Update http->https in license (#6945) 1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https. 2. Update few license 3.0 to 3.01 as 3.0 states "php 5.1.1, 4.1.1, and earlier". 3. In some license comments is "at through the world-wide-web" while most is without "at", so deleted. 4. fixed indentation in some files before |
show more ...
|
Revision tags: php-8.0.0alpha3, php-7.4.9RC1, php-7.3.21RC1, php-7.4.8, php-7.2.32, php-8.0.0alpha2, php-7.3.20 |
|
#
7eddcabe |
| 05-Jul-2020 |
Alex Dowad |
Don't guard mbstring code with #ifdef HAVE_MBSTRING This is just a very silly feature of mbstring -- you can compile the source files with HAVE_MBSTRING undefined, and it will all just c
Don't guard mbstring code with #ifdef HAVE_MBSTRING This is just a very silly feature of mbstring -- you can compile the source files with HAVE_MBSTRING undefined, and it will all just compile to (almost) nothing. What is the use of this? Why compile the source files and link against them if you don't want the mbstring extension? It doesn't make any kind of sense.
show more ...
|
#
62317d59 |
| 04-Jul-2020 |
Alex Dowad |
Remove redundant includes from mbstring (and make sure correct config.h is used) Very interesting... it turns out that when Valgrind support was enabled, `#include "config.h"` from withi
Remove redundant includes from mbstring (and make sure correct config.h is used) Very interesting... it turns out that when Valgrind support was enabled, `#include "config.h"` from within mbstring was actually including the file "config.h" from Valgrind, and not the one from mbstring!! This is because -I/usr/include/valgrind was added to the compiler invocation _before_ -Iext/mbstring/libmbfl. Make sure we actually include the file which was intended.
show more ...
|
#
ea3f0ee0 |
| 27-Jun-2020 |
Alex Dowad |
Optimize php_unicode_convert_case (cuts mbstring case conversion time ~15%) This function uses various subfunctions to convert case of Unicode wchars. Previously, these subfunctions woul
Optimize php_unicode_convert_case (cuts mbstring case conversion time ~15%) This function uses various subfunctions to convert case of Unicode wchars. Previously, these subfunctions would store the case-converted characters in a buffer, and the parent function would then pass them (byte by byte) to the next filter in the filter chain. Rather than passing around that buffer, it's better for the subfunctions to directly pass the case-converted bytes to the next filter in the filter chain. This speeds things up nicely.
show more ...
|
Revision tags: php-8.0.0alpha1, php-7.4.8RC1, php-7.3.20RC1, php-7.4.7, php-7.3.19, php-7.4.7RC1, php-7.3.19RC1 |
|
#
68164f40 |
| 12-May-2020 |
George Peter Banyard |
Fix [-Wundef] warning in MBString extension
|
Revision tags: php-7.4.6, php-7.2.31, php-7.4.6RC1, php-7.3.18RC1, php-7.2.30, php-7.4.5, php-7.3.17, php-7.4.5RC1, php-7.3.17RC1, php-7.3.18, php-7.4.4, php-7.2.29, php-7.3.16 |
|
#
ebdaeb85 |
| 12-Mar-2020 |
Christoph M. Becker |
Fix #79371: mb_strtolower (UTF-32LE): stack-buffer-overflow We make sure that negative values are properly compared.
|
#
db848e14 |
| 12-Mar-2020 |
Christoph M. Becker |
Fix #79371: mb_strtolower (UTF-32LE): stack-buffer-overflow We make sure that negative values are properly compared.
|
#
1fdffd1c |
| 12-Mar-2020 |
Christoph M. Becker |
Fix #79371: mb_strtolower (UTF-32LE): stack-buffer-overflow We make sure that negative values are properly compared.
|
Revision tags: php-7.4.4RC1, php-7.3.16RC1, php-7.4.3, php-7.2.28, php-7.3.15RC1, php-7.4.3RC1, php-7.3.15, php-7.2.27, php-7.4.2, php-7.3.14, php-7.3.14RC1, php-7.4.2RC1, php-7.4.1, php-7.2.26, php-7.3.13, php-7.4.1RC1, php-7.3.13RC1, php-7.2.26RC1, php-7.4.0, php-7.2.25, php-7.3.12, php-7.4.0RC6, php-7.3.12RC1, php-7.2.25RC1, php-7.4.0RC5, php-7.1.33, php-7.2.24, php-7.3.11, php-7.4.0RC4, php-7.3.11RC1, php-7.2.24RC1, php-7.4.0RC3 |
|
#
5d6e923d |
| 24-Sep-2019 |
Gabriel Caruso |
Remove mention of PHP major version in Copyright headers Closes GH-4732.
|
Revision tags: php-7.2.23, php-7.3.10, php-7.4.0RC2, php-7.2.23RC1, php-7.3.10RC1, php-7.4.0RC1, php-7.1.32, php-7.2.22, php-7.3.9, php-7.4.0beta4, php-7.2.22RC1, php-7.3.9RC1, php-7.4.0beta2, php-7.1.31, php-7.2.21, php-7.3.8, php-7.4.0beta1, php-7.2.21RC1, php-7.3.8RC1, php-7.4.0alpha3, php-7.3.7, php-7.2.20, php-7.4.0alpha2, php-7.3.7RC3, php-7.3.7RC2, php-7.2.20RC2, php-7.4.0alpha1, php-7.3.7RC1, php-7.2.20RC1, php-7.2.19, php-7.3.6, php-7.1.30, php-7.2.19RC1, php-7.3.6RC1, php-7.1.29, php-7.2.18, php-7.3.5, php-7.2.18RC1, php-7.3.5RC1 |
|
#
8e8d129d |
| 12-Apr-2019 |
Nikita Popov |
Use EMPTY_SWITCH_DEFAULT_CASE in php_unicode.c Avoids a potentially uninitialized variable warning.
|
Revision tags: php-7.2.17, php-7.3.4, php-7.1.28, php-7.3.4RC1, php-7.2.17RC1, php-7.1.27, php-7.3.3, php-7.2.16, php-7.3.3RC1, php-7.2.16RC1, php-7.2.15, php-7.3.2, php-7.2.15RC1 |
|
#
92ac598a |
| 22-Jan-2019 |
Peter Kokot |
Remove local variables This patch removes the so called local variables defined per file basis for certain editors to properly show tab width, and similar settings. These are mainly
Remove local variables This patch removes the so called local variables defined per file basis for certain editors to properly show tab width, and similar settings. These are mainly used by Vim and Emacs editors yet with recent changes the once working definitions don't work anymore in Vim without custom plugins or additional configuration. Neither are these settings synced across the PHP code base. A simpler and better approach is EditorConfig and fixing code using some code style fixing tools in the future instead. This patch also removes the so called modelines for Vim. Modelines allow Vim editor specifically to set some editor configuration such as syntax highlighting, indentation style and tab width to be set in the first line or the last 5 lines per file basis. Since the php test files have syntax highlighting already set in most editors properly and EditorConfig takes care of the indentation settings, this patch removes these as well for the Vim 6.0 and newer versions. With the removal of local variables for certain editors such as Emacs and Vim, the footer is also probably not needed anymore when creating extensions using ext_skel.php script. Additionally, Vim modelines for setting php syntax and some editor settings has been removed from some *.phpt files. All these are mostly not relevant for phpt files neither work properly in the middle of the file.
show more ...
|