History log of /PHP-8.1/ext/mbstring/libmbfl/mbfl/mbfl_convert.c (Results 1 – 25 of 51)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
Revision tags: php-8.1.7RC1, php-8.1.4RC1, php-8.1.3, php-8.1.2RC1
# 929d8471 03-Dec-2021 Christoph M. Becker

Fix #81693: mb_check_encoding(7bit) segfaults

`php_mb_check_encoding()` now uses conversion to `mbfl_encoding_wchar`.
Since `mbfl_encoding_7bit` has no `input_filter`, no filter can be

Fix #81693: mb_check_encoding(7bit) segfaults

`php_mb_check_encoding()` now uses conversion to `mbfl_encoding_wchar`.
Since `mbfl_encoding_7bit` has no `input_filter`, no filter can be
found. Since we don't actually need to convert to wchar, we encode to
8bit.

Closes GH-7712.

show more ...

Revision tags: php-8.1.0, php-7.3.33, php-7.3.32, php-7.3.31
# f303fc8a 30-Aug-2021 Alex Dowad

Use bool in mbfl_filt_conv_output_hex (rather than int)

# 776296e1 30-Aug-2021 Alex Dowad

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like "BAD+XXXX", where "XXXX"
would be the erroneous bytes expressed in hexadecimal. This mode could
be enabled by calling `mb_substitute_character("long")`.

However, accurately reproducing input byte sequences from the cached
state of a conversion filter is often tricky, and this significantly
complicates the implementation. Further, the means used for passing
the erroneous bytes through to where the "BAD+XXXX" text is generated
only allows for up to 3 bytes to be passed, meaning that some erroneous
byte sequences are truncated anyways.

More to the point, a search of publically available PHP code indicates
that nobody is really using this feature anyways.

Incidentally, this feature also provided error output like "JIS+XXXX"
if the input 'should have' represented a JISX 0208 codepoint, but it
decodes to a codepoint which does not exist in the JISX 0208 charset.
Similarly, specific error output was provided for non-existent
JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few
other charsets. All of that is now consigned to the flames.

However, "long" error markers also include a somewhat more useful
"U+XXXX" marker for Unicode codepoints which were successfully
decoded from the input text, but cannot be represented in the output
encoding. Those are still supported.

With this change, there is no need to use a variety of special values
in the high bits of a wchar to represent different types of error
values. We can (and will) just use a single error value. This will be
equal to -1.

One complicating factor: Text conversion functions return an integer to
indicate whether the conversion operation should be immediately
aborted, and the magic 'abort' marker is -1. Also, almost all of these
functions would return the received byte/codepoint to indicate success.
That doesn't work with the new error value; if an input filter detects
an error and passes -1 to the output filter, and the output filter
returns it back, that would be taken to mean 'abort'.

Therefore, amend all these functions to return 0 for success.

show more ...

Revision tags: php-7.3.30
# 97b7fc89 24-Jul-2021 Alex Dowad

Output illegal character marker for 4-byte illegal characters > 0x7FFFFFFF

Some text encodings supported by mbstring (such as UCS-4) accept 4-byte
characters. When mbstring encounters an

Output illegal character marker for 4-byte illegal characters > 0x7FFFFFFF

Some text encodings supported by mbstring (such as UCS-4) accept 4-byte
characters. When mbstring encounters an illegal byte sequence for the
encoding it is using, it should emit an 'illegal character' marker,
which can either be a single character like '?', an HTML hexadecimal
entity, or a marker string like 'BAD+XXXX'.

Because of the use of signed integers to hold 4-byte characters,
illegal 4-byte sequences with a 'negative' value (one with the high
bit set) were not handled correctly when emitting the illegal char
marker. The result is that such illegal sequences were just skipped
over (and the marker was not emitted to the output). Fix that.

show more ...

Revision tags: php-7.3.29, php-7.3.28, php-7.3.27, php-7.3.26, php-7.3.26RC1, php-7.3.25, php-7.3.25RC1, php-7.3.24
# e2459857 22-Oct-2020 Alex Dowad

Remove duplicate implementation of CP932 from mbstring

Sigh. Double sigh. After fruitlessly searching the Internet for information on
this mysterious text encoding called "SJIS-open", I

Remove duplicate implementation of CP932 from mbstring

Sigh. Double sigh. After fruitlessly searching the Internet for information on
this mysterious text encoding called "SJIS-open", I wrote a script to try
converting every Unicode codepoint from 0-0xFFFF and compare the results from
different variants of Shift-JIS, to see which one "SJIS-open" would be most
similar to.

The result? It's just CP932. There is no difference at all. So why do we have
two implementations of CP932 in mbstring?

In case somebody, somewhere is using "SJIS-open" (or its aliases "SJIS-win" or
"SJIS-ms"), add these as aliases to CP932 so existing code will continue to
work.

show more ...

# a900ec33 03-Jan-2021 Alex Dowad

Remove unneeded 'filter_ctor' member from mbfl_convert_filter struct

This function pointer is only called when initializing the struct. After that
nothing is done with it. Therefore, the

Remove unneeded 'filter_ctor' member from mbfl_convert_filter struct

This function pointer is only called when initializing the struct. After that
nothing is done with it. Therefore, there is no need to keep it in the struct.

show more ...

# e169ad3b 03-Nov-2020 Alex Dowad

Consolidate all single-byte encodings in one source file

We can squeeze out a lot of duplicated code in this way.

# b05ad511 09-Nov-2020 Alex Dowad

Don't redundantly flush mbstring filters multiple times

Each flush function in a chain of mbstring conversion filters always
calls the next flush function in the chain. So it is not nece

Don't redundantly flush mbstring filters multiple times

Each flush function in a chain of mbstring conversion filters always
calls the next flush function in the chain. So it is not necessary to
explicitly flush the second filter in a chain. (Due to this bug, in many
cases, flush functions were actually being called three times.)

show more ...

# 3e7acf90 04-Nov-2020 Alex Dowad

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.
It would run over the string and set a 'flag' if it saw anything which
did not appear likely to be the encoding in question.

One problem with this scheme was that encodings which merely appeared
less likely to be the correct one were completely rejected, even if there
was no better candidate. Another problem was that the 'identify filters'
had a huge amount of code duplication with the 'conversion filters'.

Eliminate the identify filters. Instead, when auto-detecting text
encoding, use conversion filters to see whether the input string is valid
in candidate encodings or not. At the same type, watch the type of
codepoints which the string decodes to and mark it as less likely if
non-printable characters (ESC, form feed, bell, etc.) or 'private use
area' codepoints are seen.

Interestingly, one old test case in which JIS text was misidentified
as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed'
and the JIS string is now auto-detected as JIS.

show more ...

# cc03c54c 04-Nov-2020 Alex Dowad

Remove useless byte{2,4}{be,le} encodings from mbstring

There is no meaningful difference between these and UCS-{2,4}. They are
just a little bit more lax about passing errors silently.

Remove useless byte{2,4}{be,le} encodings from mbstring

There is no meaningful difference between these and UCS-{2,4}. They are
just a little bit more lax about passing errors silently. They also have
no known use.

Alias to UCS-{2,4} in case someone, somewhere is using them.

show more ...

# fde77945 18-Oct-2020 Alex Dowad

Remove dead code from mbfilter_iso8859_{2,4,5,9,10,13,14,15,16}.c

...Plus some dead code related to ISO-8859-1.

Revision tags: php-7.3.24RC1, php-7.3.23
# 3f1851de 19-Sep-2020 Alex Dowad

Avoid compiler warnings related to mbstring flush functions

# b1c5532a 15-Sep-2020 Remi Collet

fix mbfl function prototypes
re-add mbfl_convert_filter_feed API
re-add pointer cast

Revision tags: php-7.3.23RC1, php-7.3.22, php-7.3.22RC1, php-7.3.21, php-7.3.21RC1
# dcd6c604 16-Jul-2020 Alex Dowad

Remove unneeded function mbfl_filt_conv_common_dtor

This is a default destructor for mbfl_convert_filter structs. The thing is: there
isn't really anything that needs to be done to those

Remove unneeded function mbfl_filt_conv_common_dtor

This is a default destructor for mbfl_convert_filter structs. The thing is: there
isn't really anything that needs to be done to those structs before freeing them.
The default destructor just zeroed out some fields, but there's no reason why
we should actually do that.

show more ...

# 409aa20a 15-Jul-2020 Alex Dowad

Refactor mbfl_convert.c

Revision tags: php-7.3.20
# 62317d59 04-Jul-2020 Alex Dowad

Remove redundant includes from mbstring (and make sure correct config.h is used)

Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from withi

Remove redundant includes from mbstring (and make sure correct config.h is used)

Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from within mbstring was actually including the file "config.h"
from Valgrind, and not the one from mbstring!!

This is because -I/usr/include/valgrind was added to the compiler invocation _before_
-Iext/mbstring/libmbfl.

Make sure we actually include the file which was intended.

show more ...

# a64241b5 27-Jun-2020 Alex Dowad

Remove unused functions from mbstring

- mbfl_buffer_converter_reset
- mbfl_buffer_converter_strncat
- mbfl_buffer_converter_getbuffer
- mbfl_oddlen
- mbfl_filter_output_pipe_

Remove unused functions from mbstring

- mbfl_buffer_converter_reset
- mbfl_buffer_converter_strncat
- mbfl_buffer_converter_getbuffer
- mbfl_oddlen
- mbfl_filter_output_pipe_flush
- mbfl_memory_device_output2
- mbfl_memory_device_output4
- mbfl_is_support_encoding
- mbfl_buffer_converter_feed2
- _php_mb_regex_globals_dtor
- mime_header_encoder_feed
- mime_header_decoder_feed
- mbfl_convert_filter_feed

show more ...

# d4ef7ef1 27-Jun-2020 Alex Dowad

Inline unneeded indirection for mbstring memory management

All memory allocation and deallocation for mbstring bounces through a table of
function pointers before going to emalloc/efree/

Inline unneeded indirection for mbstring memory management

All memory allocation and deallocation for mbstring bounces through a table of
function pointers before going to emalloc/efree/etc. But this is unnecessary.
The allocators are never swapped out. Better to just call them directly.

show more ...

Revision tags: php-7.3.20RC1, php-7.3.19, php-7.4.7RC1, php-7.3.19RC1
# a0cae937 04-May-2020 Nikita Popov

Spec mbfl allocators as infallible

And remove all NULL checks.

Revision tags: php-7.3.18RC1, php-7.2.30, php-7.3.17, php-7.3.17RC1, php-7.3.18, php-7.3.16, php-7.3.16RC1, php-7.3.15RC1, php-7.3.15, php-7.3.14, php-7.3.14RC1, php-7.3.13, php-7.3.13RC1, php-7.2.26RC1, php-7.4.0, php-7.2.25, php-7.3.12, php-7.4.0RC6, php-7.3.12RC1, php-7.2.25RC1, php-7.4.0RC5, php-7.1.33, php-7.2.24, php-7.3.11, php-7.4.0RC4, php-7.3.11RC1, php-7.2.24RC1, php-7.4.0RC3, php-7.2.23, php-7.3.10, php-7.4.0RC2, php-7.2.23RC1, php-7.3.10RC1, php-7.4.0RC1, php-7.1.32, php-7.2.22, php-7.3.9, php-7.4.0beta4, php-7.2.22RC1, php-7.3.9RC1
# 7b152990 09-Aug-2019 Nikita Popov

Don't short-circuit MBFL_OUTPUTFILTER_ILLEGAL_MODE_NONE

Make sure we always go through mbfl_filt_conv_illegal_output(), so
that the number of illegal characters gets counted.

Revision tags: php-7.4.0beta2, php-7.1.31, php-7.2.21, php-7.3.8, php-7.4.0beta1, php-7.2.21RC1, php-7.3.8RC1, php-7.4.0alpha3, php-7.3.7, php-7.2.20, php-7.4.0alpha2, php-7.3.7RC3, php-7.3.7RC2, php-7.2.20RC2, php-7.4.0alpha1, php-7.3.7RC1, php-7.2.20RC1, php-7.2.19, php-7.3.6, php-7.1.30, php-7.2.19RC1, php-7.3.6RC1, php-7.1.29, php-7.2.18, php-7.3.5, php-7.2.18RC1, php-7.3.5RC1, php-7.2.17, php-7.3.4, php-7.1.28, php-7.3.4RC1, php-7.2.17RC1, php-7.1.27, php-7.3.3, php-7.2.16, php-7.3.3RC1, php-7.2.16RC1, php-7.2.15, php-7.3.2, php-7.2.15RC1, php-7.3.2RC1, php-5.6.40, php-7.1.26, php-7.3.1, php-7.2.14, php-7.2.14RC1, php-7.3.1RC1, php-5.6.39, php-7.1.25, php-7.2.13, php-7.0.33, php-7.3.0, php-7.1.25RC1, php-7.2.13RC1, php-7.3.0RC6, php-7.1.24, php-7.2.12, php-7.3.0RC5, php-7.1.24RC1, php-7.2.12RC1, php-7.3.0RC4
# 1ad08256 14-Oct-2018 Peter Kokot

Sync leading and final newlines in source code files

This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines.

Sync leading and final newlines in source code files

This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines.

According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.

C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."

Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2

show more ...

Revision tags: php-7.1.23, php-7.2.11, php-7.3.0RC3
# aec64214 02-Oct-2018 Nikita Popov

Merge branch 'PHP-7.3'


# 9cfd8f43 02-Oct-2018 Nikita Popov

Don't fall back to vtbl_pass if no matching vtbl found

If we don't know how to convert between two encodings, make sure
we error instead of ignoring the issue.

Explicitly use vt

Don't fall back to vtbl_pass if no matching vtbl found

If we don't know how to convert between two encodings, make sure
we error instead of ignoring the issue.

Explicitly use vtbl_pass if we are round-tripping wchar->wchar or
8bit->8bit. Fingers crossed that nothing else relies on the
vtbl_pass fallback...

show more ...

Revision tags: php-7.1.23RC1, php-7.2.11RC1, php-7.3.0RC2, php-5.6.38, php-7.1.22, php-7.3.0RC1, php-7.2.10, php-7.0.32
# 6c1ff61a 05-Sep-2018 Peter Kokot

Remove HAVE_STDDEF_H

The `<stddef.h>` header file is part of the standard C89 headers [1] and
on current systems there is no need for a manual check if header is
present.

Si

Remove HAVE_STDDEF_H

The `<stddef.h>` header file is part of the standard C89 headers [1] and
on current systems there is no need for a manual check if header is
present.

Since PHP requires at least C89 the `HAVE_STDDEF_H` symbol isn't defined
by Autoconf anywhere else anymore [2] and accross the PHP source code
the header is included unconditionally already.

This patch syncs this also for the bundled libmbfl which is maintaned as
a fork in php-src.

Refs:
[1] https://port70.net/~nsz/c/c89/c89-draft.html#4.1.2
[2] https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/headers.m4

show more ...

Revision tags: php-7.1.22RC1, php-7.3.0beta3, php-7.2.10RC1, php-7.1.21, php-7.2.9, php-7.3.0beta2, php-7.1.21RC1, php-7.3.0beta1, php-7.2.9RC1, php-5.6.37, php-7.1.20, php-7.3.0alpha4, php-7.0.31, php-7.2.8, php-7.1.20RC1, php-7.2.8RC1, php-7.3.0alpha3, php-7.3.0alpha2, php-7.1.19, php-7.2.7, php-7.1.19RC1, php-7.3.0alpha1, php-7.2.7RC1, php-7.1.18, php-7.2.6, php-7.2.6RC1, php-7.1.18RC1, php-5.6.36, php-7.2.5, php-7.1.17, php-7.0.30, php-7.1.17RC1, php-7.2.5RC1, php-5.6.35, php-7.0.29, php-7.2.4, php-7.1.16, php-7.1.16RC1, php-7.2.4RC1, php-7.1.15, php-5.6.34, php-7.2.3, php-7.0.28, php-7.2.3RC1, php-7.1.15RC1, php-7.1.14, php-7.2.2, php-7.1.14RC1, php-7.2.2RC1, php-7.1.13, php-5.6.33, php-7.2.1, php-7.0.27, php-7.2.1RC1, php-7.1.13RC1, php-7.0.27RC1, php-7.2.0, php-7.1.12, l, php-7.1.12RC1, php-7.2.0RC6, php-7.0.26RC1, php-7.1.11, php-5.6.32, php-7.2.0RC5, php-7.0.25, php-7.1.11RC1, php-7.2.0RC4, php-7.0.25RC1, php-7.1.10, php-7.2.0RC3, php-7.0.24, php-7.2.0RC2, php-7.1.10RC1, php-7.0.24RC1, php-7.1.9, php-7.2.0RC1, php-7.0.23, php-7.1.9RC1, php-7.2.0beta3, php-7.0.23RC1
# f24db768 04-Aug-2017 Nikita Popov

Optimize mb_ord()

Don't perform a full encoding conversion into UCS4-BE, instead only
perform an input conversion into a wchar device.

123