ChangeLog - OpenGrok cross reference for /PHP-5.5/ext/pcre/pcrelib/ChangeLog

Lines Matching refs:character
64 15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
87 22. Some patterns with character classes involving [: and \\ were incorrectly
159 42. In a character class such as [\W\p{Any}] where both a negative-type escape
160     ("not a word character") and a property escape were present, the property
165 44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
166     by a single ASCII character in a class item, was incorrectly compiled in
167     UCP mode. The POSIX class got lost, but only if the single character
207 7.  A UTF pattern containing a "not" match of a non-ASCII character and a
312     assertion after (?(. The code was failing to check the character after
407 12. A pattern such as /^s?c/mi8 where the optional character has more than
415 14. If a character class started [\Qx]... where x is any character, the class
418 15. If a pattern that started with a caseless match for a character with more
452 2.  The auto-possessification of character sets were improved: a normal
453     and an extended character set can be compared now. Furthermore
454     the JIT compiler optimizes more character set checks.
469 6.  Improve character range checks in JIT. Characters are read by an inprecise
470     function now, which returns with an unknown value if the character code is
499 12. In a caseless character class with UCP support, when a character with more
500     than one alternative case was not the first character of a range, not all
591     negated single-character class with a character that occupied more than one
597     recognizing that \h, \H, \v, \V, and \R must match a character.
611     (b) If the match point in a subject started with modifier character, and
613         point, and potentially beyond the first character in the subject,
644 16. Unicode character properties were updated from Unicode 6.3.0.
653 18. The character VT has been added to the default ("C" locale) set of
691     character types such as \d or \w, too many callouts were inserted, and the
747 40. Document that the same character tables must be used at compile time and
795     in repeated character loops from pcre_uchar to pcre_uint32 also gave speed
837     does not actually inspect the previous character. This is to ensure that,
838     in partial multi-segment matching, at least one character from the old
909     character types now use tail recursion, which reduces stack usage.
943 1.  Improved JIT compiler optimizations for first character search and single
944     character iterators.
949 3.  Single character iterator optimizations in the JIT compiler.
951 4.  Improved JIT compiler optimizations for character ranges.
1020 19. Improving the first n character searches.
1032         CaseFolding.txt instead of UnicodeData.txt for character case
1035     (b) The code for adding characters or ranges of characters to a character
1045     (d) The processing of \h, \H, \v, and \ in character classes now makes use
1046         of the new class addition function, using character lists defined as
1055 22. Unicode character properties were updated from Unicode 6.2.0
1059 24. Add support for 32-bit character strings, and UTF-32
1069     pcre_compile.c when checking for a zero character.
1118     of more than one character:
1134         a partial match for a CR character at the end of the subject string.
1143 8.  OP_NOT now supports any UTF character not just single-byte ones.
1195     \w+ when the character tables indicated that \x{c4} was a word character.
1233 37. Optimizing single character iterators in JIT.
1246     character. The three items that might have provoked this were recursions,
1279 6.  Add support for 16-bit character strings (a large amount of work involving
1347     (*THEN), \h, \H, \v, \V, and single character negative classes with fixed
1406 22. A caseless match of a UTF-8 character whose other case uses fewer bytes did
1407     not work when the shorter character appeared right at the end of the
1500     character after the value is now allowed for.
1571     the failing character and a reason code are placed in the vector.
1574     now returned is for the first byte of the failing character, instead of the
1585     back over a single character (\n). This seems wrong (because it treated the
1618     opcodes that mean there is no starting character; this means that when new
1698     first character it looked at was a mark character.
1728 36. \g was being checked for fancy things in a character class, when it should
1847 3.  If \s appeared in a character class, it removed the VT character from
1855     changed to the "earliest inspected character" point, because the returned
1856     data for a partial match starts at this character. This means that, for
1872     If a UTF-8 multi-byte character included the byte 0x85 (e.g. +U0445, whose
1906         mode when an empty string match preceded an ASCII character followed by
1907         a non-ASCII character. (The code for advancing by one character rather
1920     starting offset points to the beginning of a UTF-8 character was
1939 19. If \c was followed by a multibyte UTF-8 character, bad things happened. A
1941     character, that is, a byte less than 128. (In EBCDIC mode, the code is
1969 3.  Inside a character class, \B is treated as a literal by default, but
1973 4.  Inside a character class, PCRE always treated \R and \X as literals,
1977 5.  Added support for \N, which always matches any character other than
1992 9.  Added PCRE_UCP to make \b, \d, \s, \w, and certain POSIX character classes
2009     added property types that matched character-matching opcodes).
2047     standard character tables, thus making it possible to include the tests
2290     was a minimum greater than 1 for a wide character in a possessive
2293     character. Chaos in the form of incorrect output or a compiling loop could
2299     slots in the offset vector, the offset of the earliest inspected character
2310     given only if matching could not proceed because another character was
2314     final character ended with (*FAIL).
2317     if the pattern had a "must contain" character that was already found in the
2319     example, with the pattern /dog.(body)?/, the "must contain" character is
2324     changed so that it starts at the first inspected character rather than the
2325     first character of the match. This makes a difference only if the pattern
2351 19. If an odd number of negated classes containing just a single character
2356     [The bug was that it was starting one character too far in when skipping
2357     over the character class, thus treating the ] as data rather than
2462 11. Unicode property support in character classes was not working for
2475     from Martin Jerabek that uses macro names for all relevant character and
2595     the data contained the byte 0x85 as part of a UTF-8 character within its
2604     pcre_exec() in ovector are byte offsets, not character counts.
2644     NUL character as backslash + 0 rather than backslash + NUL, because PCRE
2674     (a) A lone ] character is dis-allowed (Perl treats it as data).
2679     (c) A data ] in a character class must be notated as \] because if the
2680         first data character in a class is ], it defines an empty class. (In
2683         The negative empty class [^] matches any one character, independently
2687     non-existent subpattern following a character class starting with ']' and
2725 1.  A character class containing a very large number of characters with
2776 2.  Negative specials like \S did not work in character classes in UTF-8 mode.
2814 11. The program that makes PCRE's Unicode character property table had a bug
2816     characters that have the same character type, but are in different scripts.
2857     character property caused pcre_compile() to compile bad code, which led at
2884     UTF-8 newline character). The key issue is that the pattern starts .*;
2891     character classes. PCRE was not treating the sequence [:...:] as a
2892     character class unless the ... were all letters. Perl, however, seems to
2896     for example, whereas PCRE did not - it did not recognize a POSIX character
2914     the change happens only if \r or \n (or a literal CR or LF) character is
2999     character was \x{1ec5}). *Character* 0x85 is one of the "any" newline
3001     of another character. The bug was that, for an unlimited repeat of . in
3053     because the ] is interpreted as the first data character and the
3098 21. An orphan \E inside a character class could cause a crash.
3127     character were causing crashes (broken optimization).
3215 12. If \p or \P was used in non-UTF-8 mode on a character greater than 127
3300         character were of different lengths in their UTF-8 codings (there are
3307         the other case of a UTF-8 character when checking ahead for a match
3309         matching a wide character, but failed, corruption could cause an
3311         character.
3419     that \n is character 10 (hex 0A), but it also went horribly wrong when
3424     character can have). Though this value is never used (the check for end of
3425     line is "zero bytes in current character"), it caused compiler complaints.
3495     subpattern has no definite first character. For example, (a*|b*)[cd] would
3497     first character must be a, b, c, or d.
3512     as a*b as a*+b. More specifically, if something simple (such as a character
3533     the first matched character to be a colon. This applied both to named and
3548     when there were unescaped parentheses in a character class, parentheses
3595     character classes, for example with patterns like [\Qa\E-\Qz\E] where the
3669  5. A negated single-character class was not being recognized as fixed-length
3725 17. A character class other than a single negated character that had a minimum
3727     correctly by pce_dfa_exec(). It would match only one character.
3729 18. A valid (though odd) pattern that looked like a POSIX character
3730     class but used an invalid character after [ (for example [[,abc,]]) caused
3743       the Unicode character scheme when presented with Unicode data--or
3756     translate to the appropriate multibyte character.
3763 21. PCRE has not included VT (character 0x0b) in the set of whitespace
3768     as a possible starting character. Of course, this did no harm; it just
3891     when they appeared in character classes, but not when they appeared outside
3892     a character class. The bit map for "word" characters is now created
3894     upper, lower, and digit maps. (Plus the underscore character, of course.)
3896  5. The above bug also affected the handling of POSIX character classes such as
3903  6. The [[:blank:]] character class matches horizontal, but not vertical space.
3906     subtraction was done in the overall bitmap for a character class, meaning
3986 18. Changes to the handling of Unicode character properties:
4004 19. In UTF-8 mode, a backslash followed by a non-Ascii character was not
4005     matching that character.
4292     passes PCRE_DOTALL to the pcre_compile() function, making the "." character
4315     containing multiple characters in a single byte-string. Each character
4317     byte in the character in UTF-8 mode.
4353  8. Negated POSIX character classes that used a combination of internal tables
4363     character. This was a failing pattern: "(?!.bcd).*". The bug is now fixed.
4366     starting at the last subject character, bytes beyond the end of the subject
4385 15. Added optional support for general category Unicode character properties
4410           the character tables forced to be NULL. The study data, if any, is
4479     to a byte that is the start of a UTF-8 character. If not, it returns
4517 11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character
4532     "internal error: code overflow...". This applied to any character class
4543     this pattern is that a match can start with any character.
4549  1. In UTF-8 mode, a character class containing characters with values between
4617    (iii) PCRE was using its character types table to recognize decimal and
4619          only 0-9, a-f, and A-F, but the character types table is locale-
4623          character types table is still used for matching digits in subject
4696    option, a class that contained a single character with a value between 128
4728 literal character that is needed in the subject for a match, and scans along to
4736 first character of an anchored pattern as "needed", thus provoking a search
4738 fail. The "needed" character is now not set for anchored patterns, unless it
4757 4. From 5.004, Perl has not included the VT character (0x0b) in the set defined
4786 For compatibility with Perl, \Q...\E sequences are recognized inside character
4842 19. Although correctly diagnosing a missing ']' in a character class, PCRE was
4871 26. The handling of the optimization for finding the first character of a
4872 non-anchored pattern, and for finding a character that is required later in the
5121 (i)   A character class whose characters are all within 0-255 is handled as
5123       character > 255 always failed to match such a class; however it should
5126 (ii)  A negated character class with a single character < 255 is coded as
5127       "not this character" (OP_NOT). This wasn't working properly when the test
5128       character was multibyte, either singly or repeated.
5133 (iv)  The character escapes \b, \B, \d, \D, \s, \S, \w, and \W (either
5139 (v)   Classes may now contain characters and character ranges with values
5157 55. Unknown escapes inside character classes (e.g. [\M]) and escapes that
5164 which is run to generate the source of the default character tables. They
5279 1. If an octal character was given, but the value was greater than \377, it
5300 7. Added the beginnings of support for UTF-8 character strings.
5400 11. Added support for POSIX character classes like [:alpha:], which Perl is
5446 causing the entire string to be ignored, instead of just the last character.
5452 character in the pattern, and pre-searching the subject to ensure it is present
5478 4. PCRE wasn't doing the "first character" optimization for patterns starting
5557 to character tables built by pcre_maketables() in the current locale. If NULL
5569 3. The first character computation wasn't working for (?>) groups.
5610 1. A negated single character class followed by a quantifier with a minimum
5613 containing more than one character, or to minima other than one.
5643 1. Negated character classes containing more than one character were failing if
5683 4. A regex ending with a one-character negative class (e.g. /[^k]$/) did not
5684 fail on data ending with that character. (It was going on too far, and checking
5685 the next character, typically a binary zero.) This was specific to the
5686 optimized code for single-character negative classes.
5734 like /([ab]*)*/, that is, for classes with more than one character in them.
5752 1. Fixed bug in code for optimizing classes with only one character. It was
5790 2. Fixed an incompatibility with Perl: "{" is now treated as a normal character
5813       escape is read. Inside a character class, it's always an octal escape,
5816   (d) An escaped but undefined alphabetic character is taken as a literal,
5826 6. Changed the handling of character classes; they are now done with a 32-byte
5855 1. /(b)|(:+)/ was computing an incorrect first character.
5870 6. The character tables are now in a separate module whose source is generated
5883 1. A repeat with a fixed maximum and a minimum of 1 for an ordinary character
5886 2. Caseless matching was not working in character classes if the characters in
5903 options, and the first character, if set.
5905 9. Recognize C+ or C{n,m} where n >= 1 as providing a fixed starting character.