History log of /php-src/ext/mbstring/libmbfl/filters/mbfilter_utf8.c (Results 1 – 25 of 41)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# b0f7df1a 05-Dec-2023 Alex Dowad

Use optimized implementation of mb_strcut for Japanese mobile vendor UTF-8 variants

To facilitate sharing of mb_cut_utf8, I combined mbfilter_utf8.c and
mbfilter_utf8_mobile.c into a sin

Use optimized implementation of mb_strcut for Japanese mobile vendor UTF-8 variants

To facilitate sharing of mb_cut_utf8, I combined mbfilter_utf8.c and
mbfilter_utf8_mobile.c into a single source file.

show more ...


# 1f0cf133 30-Sep-2023 Alex Dowad

Add fast mb_strcut implementation for UTF-8

The old implementation runs through the entire string to pick out the
part which should be returned by mb_strcut. This creates significant

Add fast mb_strcut implementation for UTF-8

The old implementation runs through the entire string to pick out the
part which should be returned by mb_strcut. This creates significant
performance overhead. The new specialized implementation of mb_strcut
for UTF-8 usually only examines a few bytes around the starting and
ending cut points, meaning it generally runs in constant time.

For UTF-8 strings just a few bytes long, the new implementation is
around 10% faster (according to microbenchmarks which I ran locally).
For strings around 10,000 bytes in length, it is 50-300x faster.
(Yes, that is 300x and not 300%.)

The new implementation behaves identically to the old one on VALID
UTF-8 strings; a fuzzer was used to help ensure this is the case.
On invalid UTF-8 strings, there is a difference: in some cases, the
old implementation will pass invalid byte sequences through unchanged,
while in others it will remove them. The new implementation has
behavior which is perhaps slightly more predictable: it simply backs
up the starting and ending cut points to the preceding "starter
byte" (one which is not a UTF-8 continuation byte).

show more ...


# b721d0f7 10-Mar-2023 pakutoma

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are ca

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are cases where we want to use different logic for validation
and conversion. For example, if a string ends up with missing input
required by the encoding, or if a character is input that is invalid
as an encoding but can be converted, the conversion should succeed and
the validation should fail.

To achieve this, a function pointer mb_check_fn has been added to
struct mbfl_encoding to implement the logic used for validation.
Also, added implementation of validation logic for UTF-7, UTF7-IMAP,
ISO-2022-JP and JIS.

(The same change has already been made to PHP 8.2 and 8.3; see
6fc8d014df. This commit is backporting the change to PHP 8.1.)

show more ...


# 6fc8d014 21-Mar-2023 pakutoma

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are ca

Fix phpGH-10648: add check function pointer into mbfl_encoding

Previously, mbstring used the same logic for encoding validation as for
encoding conversion.

However, there are cases where we want to use different logic for validation
and conversion. For example, if a string ends up with missing input
required by the encoding, or if a character is input that is invalid
as an encoding but can be converted, the conversion should succeed and
the validation should fail.

To achieve this, a function pointer mb_check_fn has been added to
struct mbfl_encoding to implement the logic used for validation.
Also, added implementation of validation logic for UTF-7, UTF7-IMAP,
ISO-2022-JP and JIS.

show more ...


# 092ad3e4 05-Jan-2023 Alex Dowad

Optimize branch structure of UTF-8 decoder routine

I like the asm which gcc -O3 generates on this modified code...
and guess what: my CPU likes it too!

(The asm is noticeably ti

Optimize branch structure of UTF-8 decoder routine

I like the asm which gcc -O3 generates on this modified code...
and guess what: my CPU likes it too!

(The asm is noticeably tighter, without any extra operations in the
path which dispatches to the code for decoding a 1-byte, 2-byte,
3-byte, or 4-byte character. It's just CMP, conditional jump, CMP,
conditional jump, CMP, conditional jump.

...Though I was admittedly impressed to see gcc could implement the
boolean expression `c >= 0xC2 && c <= 0xDF` with just 3 instructions:
add, CMP, then conditional jump. Pretty slick stuff there, guys.)

Benchmark results:

UTF-8, short - to UTF-16LE faster by 7.36% (0.0001 vs 0.0002)
UTF-8, short - to UTF-16BE faster by 6.24% (0.0001 vs 0.0002)
UTF-8, medium - to UTF-16BE faster by 4.56% (0.0003 vs 0.0003)
UTF-8, medium - to UTF-16LE faster by 4.00% (0.0003 vs 0.0003)
UTF-8, long - to UTF-16BE faster by 1.02% (0.0215 vs 0.0217)
UTF-8, long - to UTF-16LE faster by 1.01% (0.0209 vs 0.0211)

show more ...


# 0109aa62 28-Nov-2022 Alex Dowad

Simplify decoding filter for UTF-8

When decoding a 3-byte UTF-8 code unit, redundant checks for overlong
code unit and for illegal codepoints from U+D800-DFFF were included.
Both of

Simplify decoding filter for UTF-8

When decoding a 3-byte UTF-8 code unit, redundant checks for overlong
code unit and for illegal codepoints from U+D800-DFFF were included.
Both of these conditions are caught by the line which reads:

if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {

As such, there is no reason to check for the same error conditions again.

Likewise, when decoding a 4-byte UTF-8 code unit, there was a
redundant check for overlong code unit. That was already caught by the
line which reads:

if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {

show more ...


Revision tags: php-8.2.0RC1, php-8.1.10, php-8.0.23, php-8.0.23RC1, php-8.1.10RC1, php-8.2.0beta3
# 5812b4fe 04-Aug-2022 Alex Dowad

In legacy text conversion filters, reset filter state in 'flush' function

Up until now, I believed that mbstring had been designed such
that (legacy) text conversion filter objects shoul

In legacy text conversion filters, reset filter state in 'flush' function

Up until now, I believed that mbstring had been designed such
that (legacy) text conversion filter objects should not be
re-used after the 'flush' function is called to complete a
text conversion operation.

However, it turns out that the implementation of
_php_mb_encoding_handler_ex DID re-use filter objects
after flush. That means that functions which were based on
_php_mb_encoding_handler_ex, including mb_parse_str and
php_mb_post_handler, would break in some cases; state left
over from converting one substring (perhaps a variable name)
would affect the results of converting another substring
(perhaps the value of the same variable), and could cause
extraneous characters to get inserted into the output.

All this code should be deleted soon, but fixing it helps me
to avoid spurious failures when fuzzing the new/old code to
look for differences in behavior.

(This bug fix commit was originally applied to PHP-8.2 when fuzzing
the new mbstring text conversion code to check for differences with
the old code. Later, Kentaro Ohkouchi kindly reported a problem with
mb_encode_mimeheader under PHP 8.1 which was caused by the same issue.
Hence, this commit was backported to PHP-8.1.)

Fixes GH-9683.

show more ...


# f3c8efd7 04-Aug-2022 Alex Dowad

In legacy text conversion filters, reset filter state in 'flush' function

Up until now, I believed that mbstring had been designed such
that (legacy) text conversion filter objects shoul

In legacy text conversion filters, reset filter state in 'flush' function

Up until now, I believed that mbstring had been designed such
that (legacy) text conversion filter objects should not be
re-used after the 'flush' function is called to complete a
text conversion operation.

However, it turns out that the implementation of
_php_mb_encoding_handler_ex DID re-use filter objects
after flush. That means that functions which were based on
_php_mb_encoding_handler_ex, including mb_parse_str and
php_mb_post_handler, would break in some cases; state left
over from converting one substring (perhaps a variable name)
would affect the results of converting another substring
(perhaps the value of the same variable), and could cause
extraneous characters to get inserted into the output.

All this code should be deleted soon, but fixing it helps me
to avoid spurious failures when fuzzing the new/old code to
look for differences in behavior.

show more ...

# 128768a4 05-Aug-2022 Alex Dowad

Adjust number of error markers emitted for truncated UTF-8 code units

In 04e59c916f, I amended the UTF-8 conversion code, so that when given
invalid input, it would emit a number of erro

Adjust number of error markers emitted for truncated UTF-8 code units

In 04e59c916f, I amended the UTF-8 conversion code, so that when given
invalid input, it would emit a number of errors markers harmonizing
with the WHATWG's specification of the standard UTF-8 decoding
algorithm. (Which, gentle reader of commit logs, you can find online
at https://encoding.spec.whatwg.org/#utf-8-decoder.) However, the code
in 04e59c916f was faulty in the case that a truncated UTF-8 code unit
starts with 0xF1.

Then, in dc1ba61d09, when making a small refactoring to a different
part of the UTF-8 conversion code, I inexplicably broke part of the
working code, causing the same fault which was already present with
truncated UTF-8 code units starting with 0xF1 to also occur with
0xF2 and 0xF3 as well. I don't remember what inane thoughts I was
thinking when I pulled off this feat of utter mental confusion.

None of these cases were covered by unit tests, by the way.

Thankfully, my trusty fuzzer picked up on this when testing the
new implementation of mb_parse_str (since the legacy UTF-8
conversion filter did not suffer from the same problem, and I was
fuzzing to find any differences in behavior between the old and
new implementations).

Fortuitously, the fuzzer also picked up another issue which was
present in 04e59c916f. I was emitting only one error marker for
truncated code units starting with 0xE0 or 0xED, in cases where
the WHATWG standard indicates two should be emitted. Examples
are 0xE0 0x9F <END OF STRING> or 0xED 0xA0 <END OF STRING>.

Code units starting with 0xE0-0xED should have 3 bytes. If the
first byte is 0xE0, the second MUST be 0xA0 or greater. (Otherwise,
the codepoint could have fit in a two-byte code unit.) And if the
first byte is 0xED, the second MUST be 0x9F or less. According to
the WHATWG algorithm, step 4, if the second byte is outside the
legal range, then the decoder should emit an error... AND
reprocess the out-of-range byte. The reprocessing will then
cause another error. That's why the decoder should indicate two
errors and not one.

show more ...

Revision tags: php-8.2.0beta2, php-8.1.9, php-8.0.22, php-8.1.9RC1, php-8.2.0beta1, php-8.0.22RC1, php-8.0.21, php-8.1.8, php-8.2.0alpha3, php-8.1.8RC1, php-8.2.0alpha2, php-8.0.21RC1, php-8.0.20, php-8.1.7, php-8.2.0alpha1, php-7.4.30, php-8.1.7RC1, php-8.0.20RC1, php-8.1.6, php-8.0.19, php-8.1.6RC1, php-8.0.19RC1
# dc1ba61d 23-Apr-2022 Alex Dowad

Simplify code for converting UTF-8

An overly complex boolean test was used to check if a 3-byte code unit
was valid. Convert it to an equivalent test with fewer terms.

# 3f12d26e 16-Apr-2022 Alex Dowad

Merge branch 'PHP-8.1'

* PHP-8.1:
Error handling for UTF-8 complies with WHATWG specification


# 04e59c91 15-Apr-2022 Alex Dowad

Error handling for UTF-8 complies with WHATWG specification

In 7502c86342, I adjusted the number of error markers emitted on
invalid UTF-8 text to be more consistent with mbstring's beha

Error handling for UTF-8 complies with WHATWG specification

In 7502c86342, I adjusted the number of error markers emitted on
invalid UTF-8 text to be more consistent with mbstring's behavior on
other text encodings (generally, it emits one error marker for one
unexpected byte). I didn't expect that anybody would actually care one
way or the other, but felt that it was better to be consistent than
not.

Later, Martin Auswöger kindly pointed out that the WHATWG encoding
specification, which governs how various text encodings are handled
by web browsers, does actually specify how many error markers should
be generated for any given piece of invalid UTF-8 text.

Until now, we have never really paid much attention to the WHATWG
specification, but we do want to comply with as many relevant
specifications as possible. And since PHP is commonly used for web
applications, compatibility with the behavior of web browsers is
obviously a good thing.

show more ...

Revision tags: php-8.0.18, php-8.1.5, php-7.4.29, php-8.1.5RC1, php-8.0.18RC1, php-8.1.4, php-8.0.17, php-8.1.4RC1, php-8.0.17RC1, php-8.1.3, php-8.0.16, php-7.4.28, php-8.1.3RC1, php-8.0.16RC1, php-8.1.2, php-8.0.15, php-8.1.2RC1, php-8.0.15RC1, php-8.0.14, php-8.1.1, php-7.4.27, php-8.1.1RC1, php-8.0.14RC1, php-7.4.27RC1, php-8.1.0, php-8.0.13, php-7.4.26, php-7.3.33, php-8.1.0RC6, php-7.4.26RC1, php-8.0.13RC1, php-8.1.0RC5, php-7.3.32, php-7.4.25, php-8.0.12, php-8.1.0RC4, php-8.0.12RC1, php-7.4.25RC1, php-8.1.0RC3, php-8.0.11, php-7.4.24, php-7.3.31, php-8.1.0RC2, php-7.4.24RC1, php-8.0.11RC1, php-8.1.0RC1, php-7.4.23, php-8.0.10, php-7.3.30, php-8.1.0beta3, php-8.0.10RC1, php-7.4.23RC1, php-8.1.0beta2, php-8.0.9, php-7.4.22
# 3c732251 21-Jul-2021 Alex Dowad

New internal interface for fast text conversion in mbstring

When converting text to/from wchars, mbstring makes one function call
for each and every byte or wchar to be converted. Typica

New internal interface for fast text conversion in mbstring

When converting text to/from wchars, mbstring makes one function call
for each and every byte or wchar to be converted. Typically, each of
these conversion functions contains a state machine, and its state has
to be restored and then saved for every single one of these calls.
It doesn't take much to see that this is grossly inefficient.

Instead of converting one byte or wchar on each call, the new
conversion functions will either fill up or drain a whole buffer of
wchars on each call. In benchmarks, this is about 3-10× faster.

Adding the new, faster conversion functions for all supported legacy
text encodings still needs some work. Also, all the code which uses
the old-style conversion functions needs to be converted to use the
new ones. After that, the old code can be dropped. (The mailparse
extension will also have to be fixed up so it will still compile.)

show more ...

# 626f0fec 30-Jul-2021 Alex Dowad

Remove some dead code from mbstring

mbstring has a great deal of dead code. Some common types are:

- Default switch clauses which will never be taken
- If clauses intended to co

Remove some dead code from mbstring

mbstring has a great deal of dead code. Some common types are:

- Default switch clauses which will never be taken
- If clauses intended to convert codepoints which were not present in
a conversion table... but the codepoint in question *is* in the table,
so the if clause is not needed.
- Bounds checks in places where it is not possible for a value to ever
be out of bounds.
- Checks to see if an unmatched Unicode codepoint is in CP932 extension
range 3... but every codepoint in range 3 is also in range 2, so no
codepoint will ever be matched and converted by that code.

show more ...

# 776296e1 30-Aug-2021 Alex Dowad

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like

mbstring no longer provides 'long' substitutions for erroneous input bytes

Previously, mbstring had a special mode whereby it would convert
erroneous input byte sequences to output like "BAD+XXXX", where "XXXX"
would be the erroneous bytes expressed in hexadecimal. This mode could
be enabled by calling `mb_substitute_character("long")`.

However, accurately reproducing input byte sequences from the cached
state of a conversion filter is often tricky, and this significantly
complicates the implementation. Further, the means used for passing
the erroneous bytes through to where the "BAD+XXXX" text is generated
only allows for up to 3 bytes to be passed, meaning that some erroneous
byte sequences are truncated anyways.

More to the point, a search of publically available PHP code indicates
that nobody is really using this feature anyways.

Incidentally, this feature also provided error output like "JIS+XXXX"
if the input 'should have' represented a JISX 0208 codepoint, but it
decodes to a codepoint which does not exist in the JISX 0208 charset.
Similarly, specific error output was provided for non-existent
JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few
other charsets. All of that is now consigned to the flames.

However, "long" error markers also include a somewhat more useful
"U+XXXX" marker for Unicode codepoints which were successfully
decoded from the input text, but cannot be represented in the output
encoding. Those are still supported.

With this change, there is no need to use a variety of special values
in the high bits of a wchar to represent different types of error
values. We can (and will) just use a single error value. This will be
equal to -1.

One complicating factor: Text conversion functions return an integer to
indicate whether the conversion operation should be immediately
aborted, and the magic 'abort' marker is -1. Also, almost all of these
functions would return the received byte/codepoint to indicate success.
That doesn't work with the new error value; if an input filter detects
an error and passes -1 to the output filter, and the output filter
returns it back, that would be taken to mean 'abort'.

Therefore, amend all these functions to return 0 for success.

show more ...

# 51b9d7a5 27-Jul-2021 Alex Dowad

Test behavior of 'long' illegal character markers

After mb_substitute_character("long"), mbstring will respond to
erroneous input by inserting 'long' error markers into the output.
D

Test behavior of 'long' illegal character markers

After mb_substitute_character("long"), mbstring will respond to
erroneous input by inserting 'long' error markers into the output.
Depending on the situation, these error markers will either look like
BAD+XXXX (for general bad input), U+XXXX (when the input is OK, but it
converts to Unicode codepoints which cannot be represented in the
output encoding), or an encoding-specific marker like JISX+XXXX or
W932+XXXX.

We have almost no tests for this feature. Add a bunch of tests to
ensure that all our legacy encoding handlers work in a reasonable
way when 'long' error markers are enabled.

show more ...

Revision tags: php-8.1.0beta1, php-7.4.22RC1, php-8.0.9RC1, php-8.1.0alpha3, php-7.4.21, php-7.3.29, php-8.0.8, php-8.1.0alpha2, php-7.4.21RC1, php-8.0.8RC1, php-8.1.0alpha1, php-8.0.7, php-7.4.20, php-8.0.7RC1, php-7.4.20RC1, php-8.0.6, php-7.4.19, php-7.4.18, php-7.3.28, php-8.0.5, php-8.0.5RC1, php-7.4.18RC1, php-8.0.4RC1, php-7.4.17RC1, php-8.0.3, php-7.4.16, php-8.0.3RC1, php-7.4.16RC1, php-8.0.2, php-7.4.15, php-7.3.27, php-8.0.2RC1, php-7.4.15RC2, php-7.4.15RC1, php-8.0.1, php-7.4.14, php-7.3.26, php-7.4.14RC1, php-8.0.1RC1, php-7.3.26RC1, php-8.0.0, php-7.3.25, php-7.4.13, php-8.0.0RC5, php-7.4.13RC1, php-8.0.0RC4, php-7.3.25RC1, php-7.4.12, php-8.0.0RC3, php-7.3.24, php-8.0.0RC2
# 7502c863 13-Oct-2020 Alex Dowad

Add test suite for UTF-{7,8,16,32}

Also fix a couple small problems with UTF-32 and UTF-8 support:

- UTF-32 would pass very large codepoints (>= 0x80000000), which are
not val

Add test suite for UTF-{7,8,16,32}

Also fix a couple small problems with UTF-32 and UTF-8 support:

- UTF-32 would pass very large codepoints (>= 0x80000000), which are
not valid.
- UTF-8 would sometimes emit two error marker characters for a single
bad input byte.

show more ...

# a06c20a1 18-Oct-2020 Alex Dowad

Remove useless constant MBFL_ENCTYPE_MBCS

This flag indicated that an encoding was 'multi-byte'; it can use a variable
number of bytes to encode each character. As it turns out, we don't

Remove useless constant MBFL_ENCTYPE_MBCS

This flag indicated that an encoding was 'multi-byte'; it can use a variable
number of bytes to encode each character. As it turns out, we don't actually
need to check this flag anywhere, so it's better to remove it.

show more ...

# 3e7acf90 04-Nov-2020 Alex Dowad

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.
It would run over the string and set a 'flag' if it saw anything which
did not appear likely to be the encoding in question.

One problem with this scheme was that encodings which merely appeared
less likely to be the correct one were completely rejected, even if there
was no better candidate. Another problem was that the 'identify filters'
had a huge amount of code duplication with the 'conversion filters'.

Eliminate the identify filters. Instead, when auto-detecting text
encoding, use conversion filters to see whether the input string is valid
in candidate encodings or not. At the same type, watch the type of
codepoints which the string decodes to and mark it as less likely if
non-printable characters (ESC, form feed, bell, etc.) or 'private use
area' codepoints are seen.

Interestingly, one old test case in which JIS text was misidentified
as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed'
and the JIS string is now auto-detected as JIS.

show more ...

Revision tags: php-7.4.12RC1, php-7.3.24RC1, php-7.2.34, php-8.0.0rc1, php-7.4.11, php-7.3.23, php-8.0.0beta4, php-7.4.11RC1, php-7.3.23RC1, php-8.0.0beta3, php-7.4.10, php-7.3.22, php-8.0.0beta2, php-7.3.22RC1, php-7.4.10RC1, php-8.0.0beta1, php-7.4.9, php-7.2.33, php-7.3.21, php-8.0.0alpha3, php-7.4.9RC1, php-7.3.21RC1
# 0ffc1f55 16-Jul-2020 Alex Dowad

Refactor mbfl_ident.c, mbfl_encoding.c, mbfl_memory_device.c, mbfl_string.c

- Make everything less gratuitously verbose
- Don't litter the code with lots of unneeded NULL checks (for thi

Refactor mbfl_ident.c, mbfl_encoding.c, mbfl_memory_device.c, mbfl_string.c

- Make everything less gratuitously verbose
- Don't litter the code with lots of unneeded NULL checks (for things which
will never be NULL)
- Don't return success/failure code from functions which can never fail
- For encoding structs, don't use pointers to pointers to pointers for the
list of alias strings. Pointers to pointers (2 levels of indirection)
is what actually makes sense. This gets rid of some extraneous
dereference operations.

show more ...

# a2b40ee9 16-Jul-2020 Alex Dowad

Remove unneeded function mbfl_filt_ident_common_dtor

This was the default destructor for mbfl_identify_filter structs, but there's nothing
we actually need to do to those structs before

Remove unneeded function mbfl_filt_ident_common_dtor

This was the default destructor for mbfl_identify_filter structs, but there's nothing
we actually need to do to those structs before freeing them.

show more ...

# dcd6c604 16-Jul-2020 Alex Dowad

Remove unneeded function mbfl_filt_conv_common_dtor

This is a default destructor for mbfl_convert_filter structs. The thing is: there
isn't really anything that needs to be done to those

Remove unneeded function mbfl_filt_conv_common_dtor

This is a default destructor for mbfl_convert_filter structs. The thing is: there
isn't really anything that needs to be done to those structs before freeing them.
The default destructor just zeroed out some fields, but there's no reason why
we should actually do that.

show more ...

Revision tags: php-7.4.8, php-7.2.32, php-8.0.0alpha2, php-7.3.20
# 62317d59 04-Jul-2020 Alex Dowad

Remove redundant includes from mbstring (and make sure correct config.h is used)

Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from withi

Remove redundant includes from mbstring (and make sure correct config.h is used)

Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from within mbstring was actually including the file "config.h"
from Valgrind, and not the one from mbstring!!

This is because -I/usr/include/valgrind was added to the compiler invocation _before_
-Iext/mbstring/libmbfl.

Make sure we actually include the file which was intended.

show more ...

Revision tags: php-8.0.0alpha1, php-7.4.8RC1, php-7.3.20RC1, php-7.4.7, php-7.3.19, php-7.4.7RC1, php-7.3.19RC1, php-7.4.6, php-7.2.31, php-7.4.6RC1, php-7.3.18RC1, php-7.2.30, php-7.4.5, php-7.3.17, php-7.4.5RC1, php-7.3.17RC1, php-7.3.18, php-7.4.4, php-7.2.29, php-7.3.16, php-7.4.4RC1, php-7.3.16RC1
# 363d87f2 21-Feb-2020 George Peter Banyard

Fix [-Wmissing-field-initializers] compiler warning in mbstring

Add missing NULL pointer for mbfl_convert_vtbl struct.

Revision tags: php-7.4.3, php-7.2.28, php-7.3.15RC1, php-7.4.3RC1, php-7.3.15, php-7.2.27, php-7.4.2, php-7.3.14, php-7.3.14RC1, php-7.4.2RC1, php-7.4.1, php-7.2.26, php-7.3.13, php-7.4.1RC1, php-7.3.13RC1, php-7.2.26RC1, php-7.4.0, php-7.2.25, php-7.3.12, php-7.4.0RC6, php-7.3.12RC1, php-7.2.25RC1, php-7.4.0RC5, php-7.1.33, php-7.2.24, php-7.3.11, php-7.4.0RC4, php-7.3.11RC1, php-7.2.24RC1, php-7.4.0RC3, php-7.2.23, php-7.3.10, php-7.4.0RC2, php-7.2.23RC1, php-7.3.10RC1, php-7.4.0RC1, php-7.1.32, php-7.2.22, php-7.3.9, php-7.4.0beta4, php-7.2.22RC1, php-7.3.9RC1
# 7b152990 09-Aug-2019 Nikita Popov

Don't short-circuit MBFL_OUTPUTFILTER_ILLEGAL_MODE_NONE

Make sure we always go through mbfl_filt_conv_illegal_output(), so
that the number of illegal characters gets counted.

12