History log of /php-src/ext/mbstring/tests/mb_detect_encoding.phpt (Results 1 – 25 of 29)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 81e236cd 25-Oct-2023 Alex Dowad

Fix infinite loop when mb_detect_encoding is used on UTF-8 BOM

This bug was introduced in cb840799b4.

Thanks to Ignace Nyamagana Butera for discovering this bug and
to Sebastian

Fix infinite loop when mb_detect_encoding is used on UTF-8 BOM

This bug was introduced in cb840799b4.

Thanks to Ignace Nyamagana Butera for discovering this bug and
to Sebastian Bergmann for doing an initial investigation and opening
a bug ticket.

show more ...


# 81faab92 22-Aug-2023 Alex Dowad

Improve mb_detect_encoding accuracy for text containing vowels with macrons

Among other world languages, the Māori language commonly uses vowels
with macrons.


# 7914b8ce 11-May-2023 Alex Dowad

Use pakutoma's encoding check functions for mb_detect_encoding even in non-strict mode

In 6fc8d014df, pakutoma added specialized validity checking functions
for some legacy text encoding

Use pakutoma's encoding check functions for mb_detect_encoding even in non-strict mode

In 6fc8d014df, pakutoma added specialized validity checking functions
for some legacy text encodings like ISO-2022-JP and UTF-7. These
check functions perform a more strict validity check than the encoding
conversion functions for the same text encodings. For example, the
check function for ISO-2022-JP verifies that the string ends in the
correct state required by the specification for ISO-2022-JP.

These check functions are already being used to make detection of text
encoding more accurate when 'strict' detection mode is enabled.

However, since the default is 'non-strict' detection (a bad API design
but we're stuck with it now), most users will not benefit from
pakutoma's work. I was previously reluctant to enable this new logic
for non-strict detection mode. My intention was to reduce the scope of
behavior changes, since almost *any* behavior change may affect *some*
user in a way we don't expect.

However, we definitely have users whose (production) code was broken
by the changes I made in 28b346bc06, and enabling pakutoma's check
functions for non-strict detection mode would un-break it. (See
GH-10192 as an example.) The added checks do also make sense.

In non-strict detection mode, we will not immediately reject candidate
encodings whose validity check function returns false; but they will
be much less likely to be selected. However, failure of the validity
check function is weighted less heavily than an encoding error detected
by the encoding conversion function.

show more ...


# 3ab10da7 30-Apr-2023 Alex Dowad

Take order of candidate encodings into account when guessing text encoding

The documentation for mb_detect_encoding says that this function
"Detects the most likely character encoding fo

Take order of candidate encodings into account when guessing text encoding

The documentation for mb_detect_encoding says that this function
"Detects the most likely character encoding for string `string` from an
ordered list of candidates".

Prior to 28b346bc06, mb_detect_encoding did not really attempt to
determine the "most likely" text encoding for the input string. It
would just return the first candidate encoding for which the string was
valid. In 28b346bc06, I amended this function so that it uses heuristics
to try to guess which candidate encoding is "most likely".

However, the caller did not have any way to indicate which candidate
text encoding(s) they consider to be more likely, in case the
heuristics applied are inconclusive. In the language of Bayesian
probability, there was no way for the caller to indicate their 'prior'
assignment of probabilities.

Further, the documentation for mb_detect_encoding also says that the
second parameter `encodings` is "a list of character encodings to try,
in order". The documentation clearly implies that the order of
the `encodings` argument should be significant.

Therefore, amend mb_detect_encoding so that while it still uses
heuristics to guess the most likely text encoding for the input string,
it favors those which are earlier in the list of candidate encodings.

One complication is that many callers of mb_detect_encoding use it
in this way:

mb_detect_encoding($string, mb_list_encodings());

In a majority of cases, this is bad code; mb_detect_encoding will both
be much slower and the results will be less reliable than if a smaller
list of candidates is used. However, since such code already exists and
people are using it in production, we should not unnecessarily break it.
The order of candidate encodings obviously does not express any prior
belief of which candidates are more likely in this case, and treating
it as if it did will degrade the accuracy of the result.

Since mb_list_encodings now returns a single, immutable array on each
call, we can avoid that problem by turning off the new behavior when
we receive the array of encodings returned by mb_list_encodings.
This implementation means that if the user does this:

$a = mb_list_encodings();
mb_detect_encoding($string, $a);

...then the order of candidate encodings will not be considered.
However, if the user explicitly initializes their own array of all
supported legacy text encodings, then the order *will* be considered.

The other functions which also follow this new behavior are:

• mb_convert_variables
• mb_convert_encoding (when multiple candidate input encodings are
listed)

Other places where "detection" (or really "guessing") of text encoding
may be performed include:

• mb_send_mail
• Zend engine, when determining the encoding of a PHP script
• mbstring processing of HTTP request contents, when http_input INI
parameter is set to a list

In these cases, the new logic based on order of candidate encodings
is *not* enabled. It *might* be logical to consider the order of
candidate encodings in some or all of these cases, but I'm not sure if
that is true, so it seems wiser to avoid more behavior changes than is
necessary. Further, ever since the new encoding detection heuristics
were implemented in 28b346bc06, we have not received any complaints of
user code being broken in these areas. So I am reluctant to "fix what
isn't broken".

Well, some might say that applying the new detection heuristics
to mb_send_mail, etc. in 28b346bc06 was "fixing what wasn't broken",
but (cough cough) I don't have any comment on that...

show more ...


# 6df7557e 02-Apr-2023 Alex Dowad

mb_parse_str, mb_http_input, and mb_convert_variables use fast text conversion code for automatic encoding detection

For mb_parse_str, when mbstring.http_input (INI parameter) is a list of

mb_parse_str, mb_http_input, and mb_convert_variables use fast text conversion code for automatic encoding detection

For mb_parse_str, when mbstring.http_input (INI parameter) is a list of
multiple possible text encodings (which is not the case by default),
this new implementation is about 25% faster.

When mbstring.http_input is a single value, then nothing is changed.
(No automatic encoding detection is done in that case.)

show more ...


# cb840799 12-Jan-2023 Alex Dowad

mb_detect_encoding is more accurate on strings with UTF-8/16 BOM

Thanks to the GitHub user 'titanz35' for pointing out that the new
implementation of mb_detect_encoding had poor detectio

mb_detect_encoding is more accurate on strings with UTF-8/16 BOM

Thanks to the GitHub user 'titanz35' for pointing out that the new
implementation of mb_detect_encoding had poor detection accuracy on
UTF-8 and UTF-16 strings with a byte-order mark.

show more ...


Revision tags: php-8.2.0RC1, php-8.1.10, php-8.0.23, php-8.0.23RC1, php-8.1.10RC1, php-8.2.0beta3, php-8.2.0beta2, php-8.1.9, php-8.0.22
# 0e7160b8 20-Jul-2022 Alex Dowad

Implement mb_detect_encoding using fast text conversion filters

Regarding the optional 3rd `strict` argument to mb_detect_encoding,
the documentation states:

Controls the beha

Implement mb_detect_encoding using fast text conversion filters

Regarding the optional 3rd `strict` argument to mb_detect_encoding,
the documentation states:

Controls the behaviour when string is not valid in any of the listed encodings.
If strict is set to false, the closest matching encoding will be returned;
if strict is set to true, false will be returned.

(Ref: https://www.php.net/manual/en/function.mb-detect-encoding.php)

Because of bugs in the implementation, mb_detect_encoding did not always
behave according to this description when `strict` was false.
For example:

<?php
echo var_export(mb_detect_encoding("\xc0\x00", "UTF-8", false));
// Before this commit, prints: false
// After this commit, prints: 'UTF-8'

Because `strict` is false in the above example, mb_detect_encoding
should return the 'closest matching encoding', which is UTF-8, since
that is the only candidate encoding. (Incidentally, this example shows
that using mb_detect_encoding with a single candidate encoding in
non-strict mode is useless.)

The new implementation fixes this bug. It also fixes another problem
with the old implementation as regards non-strict detection mode:

The old implementation would stop processing of the input string using
a particular candidate encoding as soon as it saw an error in that
encoding, even in non-strict mode. This means that it could not really
detect the 'closest matching encoding'; rather, what it would return
in non-strict mode was 'the encoding in which the first decoding error
is furthest from the beginning of the input string'.

In non-strict mode, the new implementation continues trying to process
the input string to its end even after seeing an error. This makes it
possible to determine in which candidate encoding the string has the
smallest number of errors, i.e. the 'closest matching encoding'.

Rejecting candidate encodings as soon as it saw an error gave the old
implementation a marked performance advantage in non-strict mode;
however, the new implementation still beats it in most cases. Here are
a few sample microbenchmark results:

UTF-8, ~100 codepoints, strict mode
Old: 0.080s (100,000 calls)
New: 0.026s (" " )

UTF-8, ~100 codepoints, non-strict mode
Old: 0.079s (100,000 calls)
New: 0.033s (" " )

UTF-8, ~10000 codepoints, strict mode
Old: 6.708s (60,000 calls)
New: 1.383s (" " )

UTF-8, ~10000 codepoints, non-strict mode
Old: 6.705s (60,000 calls)
New: 3.044s (" " )

Notice that the old implementation had almost identical performance
between strict and non-strict mode, while the new suffers a significant
performance penalty for non-strict detection. This is the cost of
implementing the behavior specified in the documentation.

A couple more sample results:

SJIS, ~10000 codepoints, strict mode
Old: 4.563s
New: 1.084s

SJIS, ~10000 codepoints, non-strict mode
Old: 4.569s
New: 2.863s

This is the only case I found where the new implementation loses:

UTF-16LE, ~10000 codepoints, non-strict mode
Old: 1.514s
New: 2.813s

The reason is because the test strings happened to be invalid right from
the first few bytes for all the candidate encodings except for UTF-16LE;
so the old implementation would immediately reject all those encodings
and only process the entire string in UTF-16LE.

I believe mb_detect_encoding could be made much faster if we identified
good criteria for when to reject candidate encodings before reaching
the end of the input string.

show more ...


# f40c3fca 29-Dec-2022 Alex Dowad

Improve mb_detect_encoding's recognition of Turkish text

Add 4 codepoints commonly used to write Turkish text to our table
of 'commonly used' Unicode codepoints. These are:

• U+

Improve mb_detect_encoding's recognition of Turkish text

Add 4 codepoints commonly used to write Turkish text to our table
of 'commonly used' Unicode codepoints. These are:

• U+011F LATIN SMALL LETTER G WITH BREVE
• U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
• U+0131 LATIN SMALL LETTER DOTLESS I
• U+015F LATIN SMALL LETTER S WITH CEDILLA

show more ...


Revision tags: php-8.1.9RC1, php-8.2.0beta1, php-8.0.22RC1, php-8.0.21, php-8.1.8, php-8.2.0alpha3, php-8.1.8RC1, php-8.2.0alpha2, php-8.0.21RC1, php-8.0.20, php-8.1.7, php-8.2.0alpha1, php-7.4.30
# 58d0aad7 25-May-2022 Alex Dowad

mb_detect_encoding recognizes all letters in Hungarian alphabet

Revision tags: php-8.1.7RC1, php-8.0.20RC1
# 6a4b6d23 24-May-2022 Alex Dowad

mb_detect_encoding recognizes all letters in Czech alphabet

Revision tags: php-8.1.6, php-8.0.19, php-8.1.6RC1, php-8.0.19RC1
# 9bb97ee8 25-Apr-2022 Alex Dowad

Fix mb_detect_encoding's recognition of Slavic names

Thanks to Côme Chilliet for reporting that mb_detect_encoding was not
detecting the desired text encoding for strings containing š or

Fix mb_detect_encoding's recognition of Slavic names

Thanks to Côme Chilliet for reporting that mb_detect_encoding was not
detecting the desired text encoding for strings containing š or Ž.
These characters are used in Czech, Serbian, Croatian, Bosnian,
Macedonian, etc. names.

show more ...

Revision tags: php-8.0.18, php-8.1.5, php-7.4.29, php-8.1.5RC1, php-8.0.18RC1, php-8.1.4, php-8.0.17, php-8.1.4RC1, php-8.0.17RC1, php-8.1.3, php-8.0.16, php-7.4.28, php-8.1.3RC1, php-8.0.16RC1, php-8.1.2, php-8.0.15, php-8.1.2RC1, php-8.0.15RC1, php-8.0.14, php-8.1.1, php-7.4.27, php-8.1.1RC1, php-8.0.14RC1, php-7.4.27RC1
# 1a2c6080 25-Nov-2021 Alex Dowad

Add unit tests for mb_detect_encoding on Polish text

Revision tags: php-8.1.0, php-8.0.13, php-7.4.26, php-7.3.33, php-8.1.0RC6, php-7.4.26RC1, php-8.0.13RC1, php-8.1.0RC5, php-7.3.32, php-7.4.25, php-8.0.12
# a2bc57e0 18-Oct-2021 Alex Dowad

mb_detect_encoding will not return non-encodings

Among the text encodings supported by mbstring are several which are
not really 'text encodings'. These include Base64, QPrint, UUencode,

mb_detect_encoding will not return non-encodings

Among the text encodings supported by mbstring are several which are
not really 'text encodings'. These include Base64, QPrint, UUencode,
HTML entities, '7 bit', and '8 bit'.

Rather than providing an explicit list of text encodings which they are
interested in, users may pass the output of mb_list_encodings to
mb_detect_encoding. Since Base64, QPrint, and so on are included in
the output of mb_list_encodings, mb_detect_encoding can return one of
these as its 'detected encoding' (and in fact, this often happens).
Before mb_detect_encoding was enhanced so it could detect any of the
supported text encodings, this did not happen, and it is never desired.

show more ...

Revision tags: php-8.1.0RC4
# 28b346bc 09-Oct-2021 Alex Dowad

Improve detection accuracy of mb_detect_encoding

Originally, `mb_detect_encoding` essentially just checked all candidate
encodings to see which ones the input string was valid in. Howeve

Improve detection accuracy of mb_detect_encoding

Originally, `mb_detect_encoding` essentially just checked all candidate
encodings to see which ones the input string was valid in. However, it
was only able to do this for a limited few of all the text encodings
which are officially supported by mbstring.

In 3e7acf901d, I modified it so it could 'detect' any text encoding
supported by mbstring. While this is arguably an improvement, if the
only text encodings one is interested in are those which
`mb_detect_encoding` could originally handle, the old
`mb_detect_encoding` may have been preferable. Because the new one has
more possible encodings which it can guess, it also has more chances to
get the answer wrong.

This commit adjusts the detection heuristics to provide accurate
detection in a wider variety of scenarios. While the previous detection
code would frequently confuse UTF-32BE with UTF-32LE or UTF-16BE with
UTF-16LE, the adjusted code is extremely accurate in those cases.
Detection for Chinese text in Chinese encodings like GB18030 or BIG5
and for Japanese text in Japanese encodings like EUC-JP or SJIS is
greatly improved. Detection of UTF-7 is also greatly improved. An 8KB
table, with one bit for each codepoint from U+0000 up to U+FFFF, is
used to achieve this.

One significant constraint is that the heuristics are completely based
on looking at each codepoint in a string in isolation, treating some
codepoints as 'likely' and others as 'unlikely'. It might still be
possible to achieve great gains in detection accuracy by looking at
sequences of codepoints rather than individual codepoints. However,
this might require huge tables. Further, we might need a huge corpus
of text in various languages to derive those tables.

Accuracy is still dismal when trying to distinguish single-byte
encodings like ISO-8859-1, ISO-8859-2, KOI8-R, and so on. This is
because the valid bytes in these encodings are basically all the same,
and all valid bytes decode to 'likely' codepoints, so our method of
detection (which is based on rating codepoints as likely or unlikely)
cannot tell any difference between the candidates at all. It just
selects the first encoding in the provided list of candidates.

Speaking of which, if one wants to get good results from
`mb_detect_encoding`, it is important to order the list of candidate
encodings according to your prior belief of which are more likely to
be correct. When the function cannot tell any difference between two
candidates, it returns whichever appeared earlier in the array.

show more ...

Revision tags: php-8.0.12RC1, php-7.4.25RC1, php-8.1.0RC3, php-8.0.11, php-7.4.24, php-7.3.31, php-8.1.0RC2, php-7.4.24RC1, php-8.0.11RC1
# c25a1ef8 06-Sep-2021 Alex Dowad

Bug #81390: mb_detect_encoding should not prematurely stop processing input

As a performance optimization, mb_detect_encoding tries to stop
processing the input string early when there i

Bug #81390: mb_detect_encoding should not prematurely stop processing input

As a performance optimization, mb_detect_encoding tries to stop
processing the input string early when there is only one 'candidate'
encoding which the input string is valid in. However, the code which
keeps count of how many candidate encodings have already been rejected
was buggy. This caused mb_detect_encoding to prematurely stop
processing the input when it should have continued.

As a result, it did not notice that in the test case provided by Alec,
the input string was not valid in UTF-16.

show more ...

Revision tags: php-8.1.0RC1, php-7.4.23, php-8.0.10, php-7.3.30, php-8.1.0beta3, php-8.0.10RC1, php-7.4.23RC1, php-8.1.0beta2, php-8.0.9, php-7.4.22, php-8.1.0beta1, php-7.4.22RC1, php-8.0.9RC1, php-8.1.0alpha3, php-7.4.21, php-7.3.29, php-8.0.8, php-8.1.0alpha2, php-7.4.21RC1, php-8.0.8RC1
# 39131219 11-Jun-2021 Nikita Popov

Migrate more SKIPIF -> EXTENSIONS (#7139)

This is a mix of more automated and manual migration. It should remove all applicable extension_loaded() checks outside of skipif.inc files.

Revision tags: php-8.1.0alpha1, php-8.0.7, php-7.4.20, php-8.0.7RC1, php-7.4.20RC1, php-8.0.6, php-7.4.19, php-7.4.18, php-7.3.28, php-8.0.5, php-8.0.5RC1, php-7.4.18RC1, php-8.0.4RC1, php-7.4.17RC1, php-8.0.3, php-7.4.16, php-8.0.3RC1, php-7.4.16RC1, php-8.0.2, php-7.4.15, php-7.3.27, php-8.0.2RC1, php-7.4.15RC2, php-7.4.15RC1, php-8.0.1, php-7.4.14, php-7.3.26, php-7.4.14RC1, php-8.0.1RC1, php-7.3.26RC1, php-8.0.0, php-7.3.25, php-7.4.13, php-8.0.0RC5, php-7.4.13RC1, php-8.0.0RC4, php-7.3.25RC1
# 3e7acf90 04-Nov-2020 Alex Dowad

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.

Remove mbstring identify filters

mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.
It would run over the string and set a 'flag' if it saw anything which
did not appear likely to be the encoding in question.

One problem with this scheme was that encodings which merely appeared
less likely to be the correct one were completely rejected, even if there
was no better candidate. Another problem was that the 'identify filters'
had a huge amount of code duplication with the 'conversion filters'.

Eliminate the identify filters. Instead, when auto-detecting text
encoding, use conversion filters to see whether the input string is valid
in candidate encodings or not. At the same type, watch the type of
codepoints which the string decodes to and mark it as less likely if
non-printable characters (ESC, form feed, bell, etc.) or 'private use
area' codepoints are seen.

Interestingly, one old test case in which JIS text was misidentified
as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed'
and the JIS string is now auto-detected as JIS.

show more ...

Revision tags: php-7.4.12, php-8.0.0RC3, php-7.3.24, php-8.0.0RC2, php-7.4.12RC1, php-7.3.24RC1, php-7.2.34, php-8.0.0rc1, php-7.4.11, php-7.3.23
# cafceea7 24-Sep-2020 Nikita Popov

Update mbstring parameter names

Closes GH-6207.

Revision tags: php-8.0.0beta4, php-7.4.11RC1, php-7.3.23RC1, php-8.0.0beta3, php-7.4.10, php-7.3.22, php-8.0.0beta2, php-7.3.22RC1, php-7.4.10RC1, php-8.0.0beta1, php-7.4.9, php-7.2.33, php-7.3.21
# 73dcfb6f 30-Jul-2020 Alex Dowad

Fix typos in mbstring tests

Man, I can be pedantic sometimes. Tiny little things like misspelled words just
hurt me inside. So while it's not really a big deal, I couldn't leave these ty

Fix typos in mbstring tests

Man, I can be pedantic sometimes. Tiny little things like misspelled words just
hurt me inside. So while it's not really a big deal, I couldn't leave these typos
alone...

show more ...

Revision tags: php-8.0.0alpha3, php-7.4.9RC1, php-7.3.21RC1, php-7.4.8, php-7.2.32, php-8.0.0alpha2, php-7.3.20, php-8.0.0alpha1, php-7.4.8RC1, php-7.3.20RC1, php-7.4.7, php-7.3.19, php-7.4.7RC1, php-7.3.19RC1, php-7.4.6, php-7.2.31, php-7.4.6RC1, php-7.3.18RC1, php-7.2.30, php-7.4.5, php-7.3.17
# fa3b8c75 01-Apr-2020 George Peter Banyard

Promote unknown encoding throws in encoding array/string list

For the string list we emit still emit a warning by comparing arg_num to 0

Closes GH-5337

Revision tags: php-7.4.5RC1, php-7.3.17RC1
# cd5a29b8 30-Mar-2020 Nikita Popov

Properly report unknown encoding in encoding lists

And clean up the related array and list parsing code.

Revision tags: php-7.3.18, php-7.4.4, php-7.2.29, php-7.3.16, php-7.4.4RC1, php-7.3.16RC1, php-7.4.3, php-7.2.28, php-7.3.15RC1, php-7.4.3RC1, php-7.3.15, php-7.2.27, php-7.4.2, php-7.3.14, php-7.3.14RC1, php-7.4.2RC1, php-7.4.1, php-7.2.26, php-7.3.13, php-7.4.1RC1, php-7.3.13RC1, php-7.2.26RC1, php-7.4.0, php-7.2.25, php-7.3.12, php-7.4.0RC6, php-7.3.12RC1, php-7.2.25RC1, php-7.4.0RC5, php-7.1.33, php-7.2.24, php-7.3.11, php-7.4.0RC4, php-7.3.11RC1, php-7.2.24RC1, php-7.4.0RC3, php-7.2.23, php-7.3.10, php-7.4.0RC2, php-7.2.23RC1, php-7.3.10RC1, php-7.4.0RC1, php-7.1.32, php-7.2.22, php-7.3.9, php-7.4.0beta4, php-7.2.22RC1, php-7.3.9RC1, php-7.4.0beta2, php-7.1.31, php-7.2.21, php-7.3.8, php-7.4.0beta1, php-7.2.21RC1, php-7.3.8RC1, php-7.4.0alpha3, php-7.3.7, php-7.2.20, php-7.4.0alpha2, php-7.3.7RC3, php-7.3.7RC2, php-7.2.20RC2, php-7.4.0alpha1, php-7.3.7RC1, php-7.2.20RC1, php-7.2.19, php-7.3.6, php-7.1.30, php-7.2.19RC1, php-7.3.6RC1, php-7.1.29, php-7.2.18, php-7.3.5, php-7.2.18RC1, php-7.3.5RC1, php-7.2.17, php-7.3.4, php-7.1.28, php-7.3.4RC1, php-7.2.17RC1, php-7.1.27, php-7.3.3, php-7.2.16
# 852485d8 19-Feb-2019 Nikita Popov

Adjust tests for zpp TypeError change

# 4bd18db8 05-Mar-2019 Nikita Popov

Remove custom error handler in mbstring tests

To make it more obvious what is tested and what the error messages
are.

Revision tags: php-7.3.3RC1, php-7.2.16RC1, php-7.2.15, php-7.3.2, php-7.2.15RC1, php-7.3.2RC1, php-5.6.40, php-7.1.26, php-7.3.1, php-7.2.14, php-7.2.14RC1, php-7.3.1RC1, php-5.6.39, php-7.1.25, php-7.2.13, php-7.0.33, php-7.3.0, php-7.1.25RC1, php-7.2.13RC1, php-7.3.0RC6, php-7.1.24, php-7.2.12, php-7.3.0RC5, php-7.1.24RC1, php-7.2.12RC1, php-7.3.0RC4
# d679f022 15-Oct-2018 Peter Kokot

Sync leading and final newlines in *.phpt sections

This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines in al

Sync leading and final newlines in *.phpt sections

This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines in all
*.phpt sections.

According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.

C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."

Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2

show more ...

# d7a3edd4 14-Oct-2018 Peter Kokot

Trim trailing whitespace in *.phpt

12