History log of /PHP-8.2/ext/mbstring/mbstring.c (Results 101 – 125 of 857)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# d6fc1650 15-Jul-2022 Christoph M. Becker

Drop useless TODO comment

Cf. <https://github.com/php/php-src/pull/9018#issuecomment-1185481492>.


# 56137cd2 23-Jun-2022 Máté Kocsis

Declare ext/mbstring constants in stubs (#8798)


# 880803a2 13-May-2022 Alex Dowad

Use fast conversion filters to implement php_mb_ord

Even for single-character strings, this is about 50% faster for
ASCII, UTF-8, and UTF-16. For long strings, the performance gain is

Use fast conversion filters to implement php_mb_ord

Even for single-character strings, this is about 50% faster for
ASCII, UTF-8, and UTF-16. For long strings, the performance gain is
enormous, since the old code would convert the ENTIRE string, just
to pick out the first codepoint.

show more ...


# 950a7db9 02-May-2022 Alex Dowad

Use fast text conversion filters to implement mb_check_encoding

Benchmarking reveals that this is about 8% slower for UTF-8 strings
which have a bad codepoint at the very beginning of th

Use fast text conversion filters to implement mb_check_encoding

Benchmarking reveals that this is about 8% slower for UTF-8 strings
which have a bad codepoint at the very beginning of the string.
For good strings, or those where the first bad codepoint is much
later in the string, it is significantly faster (2-3 times faster
in many cases).

show more ...


# 2eb2f9d7 02-Jun-2022 Remi Collet

Fix GH-8685 mbstring requires pcre


# 49202116 28-May-2022 Alex Dowad

php_mb_convert_encoding{,_ex} returns zend_string

That's what all existing callers want anyways. This avoids 2
unnecessary copies of the converted string.


# 0154a5ac 13-May-2022 Alex Dowad

Use fast text conversion filters to implement php_mb_convert_encoding_ex


# 03816fba 18-Jan-2022 Christoph M. Becker

Fix GH-7902: mb_send_mail may delimit headers with LF only

Email headers are supposed to be separated with CRLF. Period.

We introduce a `CRLF` macro for better comprehensibility rig

Fix GH-7902: mb_send_mail may delimit headers with LF only

Email headers are supposed to be separated with CRLF. Period.

We introduce a `CRLF` macro for better comprehensibility right away.

Closes GH-7907.

show more ...


# 3c732251 21-Jul-2021 Alex Dowad

New internal interface for fast text conversion in mbstring

When converting text to/from wchars, mbstring makes one function call
for each and every byte or wchar to be converted. Typica

New internal interface for fast text conversion in mbstring

When converting text to/from wchars, mbstring makes one function call
for each and every byte or wchar to be converted. Typically, each of
these conversion functions contains a state machine, and its state has
to be restored and then saved for every single one of these calls.
It doesn't take much to see that this is grossly inefficient.

Instead of converting one byte or wchar on each call, the new
conversion functions will either fill up or drain a whole buffer of
wchars on each call. In benchmarks, this is about 3-10× faster.

Adding the new, faster conversion functions for all supported legacy
text encodings still needs some work. Also, all the code which uses
the old-style conversion functions needs to be converted to use the
new ones. After that, the old code can be dropped. (The mailparse
extension will also have to be fixed up so it will still compile.)

show more ...


# f07c1935 05-Dec-2021 Alex Dowad

mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint

In a2bc57e0e5, mb_detect_encoding was modified to ensure it would never
return 'UUENCODE', 'QPrint', or

mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint

In a2bc57e0e5, mb_detect_encoding was modified to ensure it would never
return 'UUENCODE', 'QPrint', or other non-encodings as the "detected
text encoding". Before mb_detect_encoding was enhanced so that it could
detect any supported text encoding, those were never returned, and they
are not desired. Actually, we want to eventually remove them completely
from mbstring, since PHP already contains other implementations of
UUEncode, QPrint, Base64, and HTML entities.

For more clarity on why we need to suppress UUEncode, etc. from being
detected by mb_detect_encoding, the existing UUEncode implementation
in mbstring *never* treats any input as erroneous. It just accepts
everything. This means that it would *always* be treated as a valid
choice by mb_detect_encoding, and would be returned in many, many cases
where the input is obviously not UUEncoded.

It turns out that the form of mb_convert_encoding where the user passes
multiple candidate encodings (and mbstring auto-detects which one to
use) was also affected by the same issue. Apply the same fix.

show more ...


# 1a4f49f1 10-Nov-2021 Dmitry Stogov

Use cheaper memchr() instead of php_memnstr()


# 9308974f 18-Oct-2021 Alex Dowad

Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc

The purpose of mbstring is for working with Unicode and legacy text
encodings; but Base64, QPrint, etc. are n

Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc

The purpose of mbstring is for working with Unicode and legacy text
encodings; but Base64, QPrint, etc. are not text encodings and don't
really belong in mbstring. PHP already contains separate implementations
of Base64, QPrint, and HTML entities. It will be better to eventually
remove these non-encodings from mbstring.

Regarding HTML entities... there is a bit more to say. mbstring's
implementation of HTML entities is different from the other built-in
implementation (htmlspecialchars and htmlentities). Those functions
convert <, >, and & to HTML entities, but mbstring does not.

It appears that the original author of mbstring intended for something
to be done with <, >, and &. He used a table to identify which
characters should be converted to HTML entities, and </>/& all have a
special value in that table. However, nothing ever checks for that
special value, so the characters are passed through unconverted.

This seems like a very useless implementation of HTML entities. The most
important characters which need to be expressed as entities in HTML
documents are those three!

show more ...


# d3d6d790 21-Oct-2021 Christoph M. Becker

Fix #76167: mbstring may use pointer from some previous request

We must not reuse per-request memory across multiple requests, so this
check triggered during RINIT makes no sense. As ex

Fix #76167: mbstring may use pointer from some previous request

We must not reuse per-request memory across multiple requests, so this
check triggered during RINIT makes no sense. As explained in the bug
report[1], it can be even harmful, if some request startup fails, and
the pointers refer to already freed memory in the next request.

[1] <https://bugs.php.net/76167>

Closes GH-7604.

show more ...


# a2bc57e0 18-Oct-2021 Alex Dowad

mb_detect_encoding will not return non-encodings

Among the text encodings supported by mbstring are several which are
not really 'text encodings'. These include Base64, QPrint, UUencode,

mb_detect_encoding will not return non-encodings

Among the text encodings supported by mbstring are several which are
not really 'text encodings'. These include Base64, QPrint, UUencode,
HTML entities, '7 bit', and '8 bit'.

Rather than providing an explicit list of text encodings which they are
interested in, users may pass the output of mb_list_encodings to
mb_detect_encoding. Since Base64, QPrint, and so on are included in
the output of mb_list_encodings, mb_detect_encoding can return one of
these as its 'detected encoding' (and in fact, this often happens).
Before mb_detect_encoding was enhanced so it could detect any of the
supported text encodings, this did not happen, and it is never desired.

show more ...


# dcaa010f 07-Aug-2020 Alex Dowad

Strict validation of conversion flags to mb_convert_kana

mb_convert_kana is controlled by user-provided flags, which specify what it should convert
and to what. These flags come in inver

Strict validation of conversion flags to mb_convert_kana

mb_convert_kana is controlled by user-provided flags, which specify what it should convert
and to what. These flags come in inverse pairs, for example "fullwidth numerals to halfwidth
numerals" and "halfwidth numerals to fullwidth numerals". It does not make sense to combine
inverse flags.

But, clever reader of commit logs, you will surely say: What if I want all my halfwidth
numerals to become fullwidth, and all my fullwidth numerals to become halfwidth? Much too
clever, you are! Let's put aside the fact that this bizarre switch-up is ridiculous and
will never be used, and face up to another stark reality: mb_convert_kana does not work
for that case, and never has. This was probably never noticed because nobody ever tried.

Disallowing useless combinations of flags gives freedom to rearrange the kana conversion
code without changing behavior.

We can also reject unrecognized flags. This may help users to catch bugs.

Interestingly, the existing tests used a 'Z' flag, which is useless (it's not recognized
at all).

show more ...


# 78004912 22-Sep-2021 Alex Dowad

Inline SKIP_LONG_HEADER... macro which is only used once

I don't find that pulling this code out into a macro makes anything
clearer. Not at all.


# 8c32deb6 18-Oct-2020 Alex Dowad

Don't check for impossible error condition in mb_strwidth


# bf78070c 18-Oct-2020 Alex Dowad

Don't check for impossible error condition in mb_strlen


# d3f56e5a 18-Oct-2020 Alex Dowad

Rename php_mb_mbchar_bytes_ex to php_mb_mbchar_bytes

...And remove the original php_mb_mbchar_bytes, which was not being
used.


# 774cd960 18-Oct-2020 Alex Dowad

No need to null-terminate buffer in php_mb_chr

`mbfl_buffer_converter_feed_result` will not overrun the specified length.


# abf83e50 18-Oct-2020 Alex Dowad

Rename php_mb_safe_strrchr_ex to php_mb_safe_strrchr

...And remove the original php_mb_safe_strrchr, which was not being
used anywhere.


# 46315def 23-Sep-2021 Nikita Popov

Use locale-independent case conversion in mb_send_mail()

Headers should not be processed in a locale-depdendent fashion.
Switch from upper to lowercasing because that's the standard for

Use locale-independent case conversion in mb_send_mail()

Headers should not be processed in a locale-depdendent fashion.
Switch from upper to lowercasing because that's the standard for
PHP and we provide an ASCII implementation of this operation.

This is adapted from GH-7506.

show more ...


# 36c979e2 18-Oct-2020 Alex Dowad

Use stack-allocated buffer in php_mb_chr


# 1170981b 18-Sep-2021 Alex Dowad

Fix mb_str_split on empty strings in variable-length text encodings

Previously, when passed an empty string, and given an encoding which
uses a variable number of bytes per character (an

Fix mb_str_split on empty strings in variable-length text encodings

Previously, when passed an empty string, and given an encoding which
uses a variable number of bytes per character (and which doesn't have
a 'character length table'), mb_str_split would return an array
containing a single empty string, rather than an empty array.

The ISO-2022 encodings are among those which were affected by this bug.

show more ...


# ca33ab59 06-Sep-2021 Alex Dowad

mb_detect_encoding with only one candidate encoding uses mb_check_encoding

...Because it's about 5% faster.


12345678910>>...35