History log of /PHP-8.2/ext/mbstring/mbstring.c (Results 126 – 150 of 857)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 9e1447db 05-Sep-2021 Alex Dowad

Rename KANA2HIRA and HIRA2KANA constants (for mb_convert_kana)

mb_convert_kana is able to convert fullwidth katakana to fullwidth
hiragana (and vice versa). The constants referring to th

Rename KANA2HIRA and HIRA2KANA constants (for mb_convert_kana)

mb_convert_kana is able to convert fullwidth katakana to fullwidth
hiragana (and vice versa). The constants referring to these modes had
names like MBFL_FILT_TL_ZEN2HAN_KANA2HIRA.

The "ZEN2HAN" part of the name is misleading, since these modes do not
convert fullwidth (zenkaku) kana to halfwidth (hankaku). The converted
characters are fullwidth both before and after the conversion. So...
let's name the constants accordingly.

show more ...


# 776296e1 30-Aug-2021 Alex Dowad

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like "BAD+XXXX", where "XXXX"
would be the erroneous bytes expressed in hexadecimal. This mode could
be enabled by calling `mb_substitute_character("long")`.

However, accurately reproducing input byte sequences from the cached
state of a conversion filter is often tricky, and this significantly
complicates the implementation. Further, the means used for passing
the erroneous bytes through to where the "BAD+XXXX" text is generated
only allows for up to 3 bytes to be passed, meaning that some erroneous
byte sequences are truncated anyways.

More to the point, a search of publically available PHP code indicates
that nobody is really using this feature anyways.

Incidentally, this feature also provided error output like "JIS+XXXX"
if the input 'should have' represented a JISX 0208 codepoint, but it
decodes to a codepoint which does not exist in the JISX 0208 charset.
Similarly, specific error output was provided for non-existent
JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few
other charsets. All of that is now consigned to the flames.

However, "long" error markers also include a somewhat more useful
"U+XXXX" marker for Unicode codepoints which were successfully
decoded from the input text, but cannot be represented in the output
encoding. Those are still supported.

With this change, there is no need to use a variety of special values
in the high bits of a wchar to represent different types of error
values. We can (and will) just use a single error value. This will be
equal to -1.

One complicating factor: Text conversion functions return an integer to
indicate whether the conversion operation should be immediately
aborted, and the magic 'abort' marker is -1. Also, almost all of these
functions would return the received byte/codepoint to indicate success.
That doesn't work with the new error value; if an input filter detects
an error and passes -1 to the output filter, and the output filter
returns it back, that would be taken to mean 'abort'.

Therefore, amend all these functions to return 0 for success.

show more ...


# 63901584 08-Jul-2021 Nikita Popov

Deprecate calling mb_check_encoding() without argument

Part of https://wiki.php.net/rfc/deprecations_php_8_1.


# e7135cb8 14-May-2021 George Peter Banyard

Use zend_string_equals_* API in a couple of more place

Closes GH-6979


# aca6aefd 14-May-2021 George Peter Banyard

Remove 'register' type qualifier (#6980)

The compiler should be smart enough to optimize this on its own


# 01b3fc03 06-May-2021 KsaR

Update http->https in license (#6945)

1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https.
2. Update few license 3.0 to 3.01 as

Update http->https in license (#6945)

1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https.
2. Update few license 3.0 to 3.01 as 3.0 states "php 5.1.1, 4.1.1, and earlier".
3. In some license comments is "at through the world-wide-web" while most is without "at", so deleted.
4. fixed indentation in some files before |

show more ...


# 0cafd53d 04-May-2021 Christoph M. Becker

Fix #81011: mb_convert_encoding removes references from arrays

We need to dereference references.

Closes GH-6938.


# 09efad61 08-Apr-2021 George Peter Banyard

Use zend_string_equals_(literal_)ci() API more often

Also drive-by usage of zend_ini_parse_bool()

Closes GH-6844


# 5caaf40b 29-Sep-2020 George Peter Banyard

Introduce pseudo-keyword ZEND_FALLTHROUGH

And use it instead of comments


# a06c20a1 18-Oct-2020 Alex Dowad

Remove useless constant MBFL_ENCTYPE_MBCS

This flag indicated that an encoding was 'multi-byte'; it can use a variable
number of bytes to encode each character. As it turns out, we don't

Remove useless constant MBFL_ENCTYPE_MBCS

This flag indicated that an encoding was 'multi-byte'; it can use a variable
number of bytes to encode each character. As it turns out, we don't actually
need to check this flag anywhere, so it's better to remove it.

show more ...


# 3e01f5af 15-Jan-2021 Nikita Popov

Replace zend_bool uses with bool

We're starting to see a mix between uses of zend_bool and bool.
Replace all usages with the standard bool type everywhere.

Of course, zend_bool

Replace zend_bool uses with bool

We're starting to see a mix between uses of zend_bool and bool.
Replace all usages with the standard bool type everywhere.

Of course, zend_bool is retained as an alias.

show more ...


# 72660c41 20-Sep-2020 Alex Dowad

Combine MBFL_ENCTYPE_WCS{2,4}{BE,LE} constants

These flags identify text encodings in mbstring which use a constant number of
bytes per character. While some parts of the code do use the

Combine MBFL_ENCTYPE_WCS{2,4}{BE,LE} constants

These flags identify text encodings in mbstring which use a constant number of
bytes per character. While some parts of the code do use these flags, usually
to detect cases which can be optimized due to constant-width encoding, nothing
cares whether the encodings are 'LE' (little-endian) or 'BE' (big-endian).

So we can simplify things by combining constants.

show more ...


# e169ad3b 03-Nov-2020 Alex Dowad

Consolidate all single-byte encodings in one source file

We can squeeze out a lot of duplicated code in this way.


# 3e7acf90 04-Nov-2020 Alex Dowad

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.
It would run over the string and set a 'flag' if it saw anything which
did not appear likely to be the encoding in question.

One problem with this scheme was that encodings which merely appeared
less likely to be the correct one were completely rejected, even if there
was no better candidate. Another problem was that the 'identify filters'
had a huge amount of code duplication with the 'conversion filters'.

Eliminate the identify filters. Instead, when auto-detecting text
encoding, use conversion filters to see whether the input string is valid
in candidate encodings or not. At the same type, watch the type of
codepoints which the string decodes to and mark it as less likely if
non-printable characters (ESC, form feed, bell, etc.) or 'private use
area' codepoints are seen.

Interestingly, one old test case in which JIS text was misidentified
as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed'
and the JIS string is now auto-detected as JIS.

show more ...


# be1a2155 29-Aug-2020 Alex Dowad

Optimize (AND FIX) mb_check_encoding (cut execution time by 50%+)

Previously, `mb_check_encoding` did an awful lot of unneeded work. In order to
determine whether a string was valid or n

Optimize (AND FIX) mb_check_encoding (cut execution time by 50%+)

Previously, `mb_check_encoding` did an awful lot of unneeded work. In order to
determine whether a string was valid or not, it would convert the whole string
into wchar (code points), which required dynamically allocating a (potentially
large) buffer. Then it would turn right around and convert that big 'ol buffer
of code points back to the original encoding again. Finally, it would check
whether any invalid bytes were detected during that long and onerous process.

The thing is, mbstring _already_ has machinery for detecting whether a string
is valid in a certain encoding or not, and it doesn't require copying any data
around or allocating buffers. Better yet, it can fail fast when an invalid byte
is found. Why not use it? It's sure a lot faster!

Further, the legacy code was also badly broken. Why? Because aside from
checking whether illegal characters were detected, it would also check whether
the conversion to and from wchars was lossless. But, some encodings have
more than one valid encoding for the same character. In such cases, it is
not possible to make the conversion to and from wchars lossless for every
valid character. So `mb_check_encoding` would actually reject good strings
in a lot of encodings!

show more ...


# 7dc16374 12-Oct-2020 Alex Dowad

Remove unused IS_SJIS1 and IS_SJIS2 macros


# 9b4094c3 13-Oct-2020 Nikita Popov

Fix incorrect zpp parameter count in mb_substr() / mb_strcut()

These functions only accept 4 params.


# 124bce3c 13-Oct-2020 Nikita Popov

Fix argument nullability in mbstring

These arguments were declared nullable in stubs (and should be
nullable), but didn't accept null in zpp.


# 0ffc1f55 16-Jul-2020 Alex Dowad

Refactor mbfl_ident.c, mbfl_encoding.c, mbfl_memory_device.c, mbfl_string.c

- Make everything less gratuitously verbose
- Don't litter the code with lots of unneeded NULL checks (for thi

Refactor mbfl_ident.c, mbfl_encoding.c, mbfl_memory_device.c, mbfl_string.c

- Make everything less gratuitously verbose
- Don't litter the code with lots of unneeded NULL checks (for things which
will never be NULL)
- Don't return success/failure code from functions which can never fail
- For encoding structs, don't use pointers to pointers to pointers for the
list of alias strings. Pointers to pointers (2 levels of indirection)
is what actually makes sense. This gets rid of some extraneous
dereference operations.

show more ...


# e950ca13 20-Sep-2020 Máté Kocsis

Consolidate the usage of "either" and "one of" in error messages

Closes GH-6173


# c37a1cd6 10-Sep-2020 Máté Kocsis

Promote a few remaining errors in ext/standard

Closes GH-6110


# 1c81a345 14-Sep-2020 Máté Kocsis

Make mb_send_mail() consistent with mail()

The $additional_headers parameter shouldn't accept null.


# c98d4769 10-Sep-2020 Máté Kocsis

Consolidate new union type ZPP macro names

They will now follow the canonical order of types. Older macros are
left intact due to maintaining BC.

Closes GH-6112


# f33fd9b7 11-Sep-2020 Nikita Popov

Throw ValueError on null bytes in mb_send_mail()

Instead of silently replacing with spaces.


# 5b78d76e 08-Sep-2020 Alex Dowad

mb_str_split is already documented on php.net

So remove TODO comment which implies that it's not.


12345678910>>...35