History log of /php-src/ext/mbstring/tests/casemapping.phpt (Results 1 – 6 of 6)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# a9035863 09-Jan-2023 Alex Dowad

Implement conditional casing for Greek letter sigma when title-casing text


# 39b46a53 07-Jan-2023 Alex Dowad

Implement Unicode conditional casing rules for Greek letter sigma

The capital Greek letter sigma (Σ) should be lowercased as σ except
when it appears at the end of a word; in that case,

Implement Unicode conditional casing rules for Greek letter sigma

The capital Greek letter sigma (Σ) should be lowercased as σ except
when it appears at the end of a word; in that case, it should be
lowercased as the special form ς.

This rule is included in the Unicode data file SpecialCasing.txt.
The condition for applying the rule is called "Final_Sigma" and is
defined in Unicode technical report 21. The rule is:

• For the special casing form to apply, the capital letter sigma must
be preceded by 0 or more "case-ignorable" characters, preceded by
at least 1 "cased" character.
• Further, capital sigma must NOT be followed by 0 or more
case-ignorable characters and then at least 1 cased character.

"Case-ignorable" characters include certain punctuation marks, like
the apostrophe, as well as various accent marks. There are actually
close to 500 different case-ignorable characters, including accent marks
from Cyrillic, Hebrew, Armenian, Arabic, Syriac, Bengali, Gujarati,
Telugu, Tibetan, and many other alphabets. This category also includes
zero-width spaces, codepoints which indicate RTL/LTR text direction,
certain musical symbols, etc.

Since the rule involves scanning over "0 or more" of such
case-ignorable characters, it may be necessary to scan arbitrarily far
to the left and right of capital sigma to determine whether the special
lowercase form should be used or not. However, since we are trying to
be both memory-efficient and CPU-efficient, this implementation limits
how far to the left we will scan. Generally, we scan up to 63 characters
to the left looking for a "cased" character, but not more.

When scanning to the right, we go up to the end of the string if
necessary, even if it means scanning over thousands of characters.

Anyways, it is almost impossible to imagine that natural text will
include "words" with more than 63 successive apostrophes (for example)
followed by a capital sigma.

Closes GH-8096.

show more ...


Revision tags: php-8.2.0RC1, php-8.1.10, php-8.0.23, php-8.0.23RC1, php-8.1.10RC1, php-8.2.0beta3, php-8.2.0beta2, php-8.1.9, php-8.0.22, php-8.1.9RC1, php-8.2.0beta1, php-8.0.22RC1, php-8.0.21, php-8.1.8, php-8.2.0alpha3, php-8.1.8RC1, php-8.2.0alpha2, php-8.0.21RC1, php-8.0.20, php-8.1.7, php-8.2.0alpha1, php-7.4.30, php-8.1.7RC1, php-8.0.20RC1, php-8.1.6, php-8.0.19, php-8.1.6RC1, php-8.0.19RC1, php-8.0.18, php-8.1.5, php-7.4.29, php-8.1.5RC1, php-8.0.18RC1, php-8.1.4, php-8.0.17, php-8.1.4RC1, php-8.0.17RC1, php-8.1.3, php-8.0.16, php-7.4.28, php-8.1.3RC1, php-8.0.16RC1, php-8.1.2, php-8.0.15, php-8.1.2RC1, php-8.0.15RC1, php-8.0.14, php-8.1.1, php-7.4.27, php-8.1.1RC1, php-8.0.14RC1, php-7.4.27RC1, php-8.1.0, php-8.0.13, php-7.4.26, php-7.3.33, php-8.1.0RC6, php-7.4.26RC1, php-8.0.13RC1, php-8.1.0RC5, php-7.3.32, php-7.4.25, php-8.0.12, php-8.1.0RC4, php-8.0.12RC1, php-7.4.25RC1, php-8.1.0RC3, php-8.0.11, php-7.4.24, php-7.3.31, php-8.1.0RC2, php-7.4.24RC1, php-8.0.11RC1, php-8.1.0RC1, php-7.4.23, php-8.0.10, php-7.3.30, php-8.1.0beta3, php-8.0.10RC1, php-7.4.23RC1, php-8.1.0beta2, php-8.0.9, php-7.4.22, php-8.1.0beta1, php-7.4.22RC1, php-8.0.9RC1, php-8.1.0alpha3, php-7.4.21, php-7.3.29, php-8.0.8, php-8.1.0alpha2, php-7.4.21RC1, php-8.0.8RC1
# 4083600b 11-Jun-2021 Nikita Popov

Port mbstring to use EXTENSIONS

Revision tags: php-8.1.0alpha1, php-8.0.7, php-7.4.20, php-8.0.7RC1, php-7.4.20RC1, php-8.0.6, php-7.4.19, php-7.4.18, php-7.3.28, php-8.0.5, php-8.0.5RC1, php-7.4.18RC1, php-8.0.4RC1, php-7.4.17RC1, php-8.0.3, php-7.4.16, php-8.0.3RC1, php-7.4.16RC1, php-8.0.2, php-7.4.15, php-7.3.27, php-8.0.2RC1, php-7.4.15RC2, php-7.4.15RC1, php-8.0.1, php-7.4.14, php-7.3.26, php-7.4.14RC1, php-8.0.1RC1, php-7.3.26RC1, php-8.0.0, php-7.3.25, php-7.4.13, php-8.0.0RC5, php-7.4.13RC1, php-8.0.0RC4, php-7.3.25RC1, php-7.4.12, php-8.0.0RC3, php-7.3.24, php-8.0.0RC2, php-7.4.12RC1, php-7.3.24RC1, php-7.2.34, php-8.0.0rc1, php-7.4.11, php-7.3.23, php-8.0.0beta4, php-7.4.11RC1, php-7.3.23RC1, php-8.0.0beta3, php-7.4.10, php-7.3.22, php-8.0.0beta2, php-7.3.22RC1, php-7.4.10RC1, php-8.0.0beta1, php-7.4.9, php-7.2.33, php-7.3.21
# 73dcfb6f 30-Jul-2020 Alex Dowad

Fix typos in mbstring tests

Man, I can be pedantic sometimes. Tiny little things like misspelled words just
hurt me inside. So while it's not really a big deal, I couldn't leave these ty

Fix typos in mbstring tests

Man, I can be pedantic sometimes. Tiny little things like misspelled words just
hurt me inside. So while it's not really a big deal, I couldn't leave these typos
alone...

show more ...

Revision tags: php-8.0.0alpha3, php-7.4.9RC1, php-7.3.21RC1, php-7.4.8, php-7.2.32, php-8.0.0alpha2, php-7.3.20, php-8.0.0alpha1, php-7.4.8RC1, php-7.3.20RC1, php-7.4.7, php-7.3.19, php-7.4.7RC1, php-7.3.19RC1, php-7.4.6, php-7.2.31, php-7.4.6RC1, php-7.3.18RC1, php-7.2.30, php-7.4.5, php-7.3.17, php-7.4.5RC1, php-7.3.17RC1, php-7.3.18, php-7.4.4, php-7.2.29, php-7.3.16, php-7.4.4RC1, php-7.3.16RC1, php-7.4.3, php-7.2.28, php-7.3.15RC1, php-7.4.3RC1, php-7.3.15, php-7.2.27, php-7.4.2, php-7.3.14, php-7.3.14RC1, php-7.4.2RC1, php-7.4.1, php-7.2.26, php-7.3.13, php-7.4.1RC1, php-7.3.13RC1, php-7.2.26RC1, php-7.4.0, php-7.2.25, php-7.3.12, php-7.4.0RC6, php-7.3.12RC1, php-7.2.25RC1, php-7.4.0RC5, php-7.1.33, php-7.2.24, php-7.3.11, php-7.4.0RC4, php-7.3.11RC1, php-7.2.24RC1, php-7.4.0RC3, php-7.2.23, php-7.3.10, php-7.4.0RC2, php-7.2.23RC1, php-7.3.10RC1, php-7.4.0RC1, php-7.1.32, php-7.2.22, php-7.3.9, php-7.4.0beta4, php-7.2.22RC1, php-7.3.9RC1, php-7.4.0beta2, php-7.1.31, php-7.2.21, php-7.3.8, php-7.4.0beta1, php-7.2.21RC1, php-7.3.8RC1, php-7.4.0alpha3, php-7.3.7, php-7.2.20, php-7.4.0alpha2, php-7.3.7RC3, php-7.3.7RC2, php-7.2.20RC2, php-7.4.0alpha1, php-7.3.7RC1, php-7.2.20RC1, php-7.2.19, php-7.3.6, php-7.1.30, php-7.2.19RC1, php-7.3.6RC1, php-7.1.29, php-7.2.18, php-7.3.5, php-7.2.18RC1, php-7.3.5RC1, php-7.2.17, php-7.3.4, php-7.1.28, php-7.3.4RC1, php-7.2.17RC1, php-7.1.27, php-7.3.3, php-7.2.16, php-7.3.3RC1, php-7.2.16RC1, php-7.2.15, php-7.3.2, php-7.2.15RC1, php-7.3.2RC1, php-5.6.40, php-7.1.26, php-7.3.1, php-7.2.14, php-7.2.14RC1, php-7.3.1RC1, php-5.6.39, php-7.1.25, php-7.2.13, php-7.0.33, php-7.3.0, php-7.1.25RC1, php-7.2.13RC1, php-7.3.0RC6, php-7.1.24, php-7.2.12, php-7.3.0RC5, php-7.1.24RC1, php-7.2.12RC1, php-7.3.0RC4, php-7.1.23, php-7.2.11, php-7.3.0RC3, php-7.1.23RC1, php-7.2.11RC1, php-7.3.0RC2, php-5.6.38, php-7.1.22, php-7.3.0RC1, php-7.2.10, php-7.0.32, php-7.1.22RC1, php-7.3.0beta3, php-7.2.10RC1, php-7.1.21, php-7.2.9, php-7.3.0beta2, php-7.1.21RC1, php-7.3.0beta1, php-7.2.9RC1, php-5.6.37, php-7.1.20, php-7.3.0alpha4, php-7.0.31, php-7.2.8, php-7.1.20RC1, php-7.2.8RC1, php-7.3.0alpha3, php-7.3.0alpha2, php-7.1.19, php-7.2.7, php-7.1.19RC1, php-7.3.0alpha1, php-7.2.7RC1, php-7.1.18, php-7.2.6, php-7.2.6RC1, php-7.1.18RC1, php-5.6.36, php-7.2.5, php-7.1.17, php-7.0.30, php-7.1.17RC1, php-7.2.5RC1, php-5.6.35, php-7.0.29, php-7.2.4, php-7.1.16, php-7.1.16RC1, php-7.2.4RC1, php-7.1.15, php-5.6.34, php-7.2.3, php-7.0.28, php-7.2.3RC1, php-7.1.15RC1
# c6c9e71a 03-Feb-2018 Gabriel Caruso

Add missing SKIPIF sections

Revision tags: php-7.1.14, php-7.2.2, php-7.1.14RC1, php-7.2.2RC1, php-7.1.13, php-5.6.33, php-7.2.1, php-7.0.27, php-7.2.1RC1, php-7.1.13RC1, php-7.0.27RC1, php-7.2.0, php-7.1.12, l, php-7.1.12RC1, php-7.2.0RC6, php-7.0.26RC1, php-7.1.11, php-5.6.32, php-7.2.0RC5, php-7.0.25, php-7.1.11RC1, php-7.2.0RC4, php-7.0.25RC1, php-7.1.10, php-7.2.0RC3, php-7.0.24, php-7.2.0RC2, php-7.1.10RC1, php-7.0.24RC1, php-7.1.9, php-7.2.0RC1, php-7.0.23, php-7.1.9RC1, php-7.2.0beta3, php-7.0.23RC1, php-7.1.8, php-7.2.0beta2, php-7.0.22
# 582a65b0 27-Jul-2017 Nikita Popov

Implement full case mapping

Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

Implement full case mapping

Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

* Only language-agnostic and unconditional full case mapping
is implemented. The only language-agnostic conditional case
mapping rule relates to Greek sigma in final position
(Final_Sigma). Correctly handling this requires both arbitrary
lookahead and lookbehind, which would require some larger
changes to how the case mapping is implemented. This is a
possible future extension.
* The only language-specific handling that is implemented is
for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
is used. This matches the previous behavior and makes sure
that no codepoints not supported by the encoding are
produced. A future extension would be to also handle the
Turkish mappings specified by SpecialCasing.txt based on
the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
operations continue to use simple case folding. The reason is
that full case folding of the haystack string may change the
position at which a match occurred. This would have to be
mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
mapping / folding, where full is the default. The constants
are:

* MB_CASE_LOWER (used by mb_strtolower)
* MB_CASE_UPPER (used by mb_strtolower)
* MB_CASE_TITLE
* MB_CASE_FOLD
* MB_CASE_LOWER_SIMPLE
* MB_CASE_UPPER_SIMPLE
* MB_CASE_TITLE_SIMPLE
* MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)

show more ...