Lines Matching refs:character

64 15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
87 22. Some patterns with character classes involving [: and \\ were incorrectly
159 42. In a character class such as [\W\p{Any}] where both a negative-type escape
160 ("not a word character") and a property escape were present, the property
165 44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
166 by a single ASCII character in a class item, was incorrectly compiled in
167 UCP mode. The POSIX class got lost, but only if the single character
207 7. A UTF pattern containing a "not" match of a non-ASCII character and a
312 assertion after (?(. The code was failing to check the character after
407 12. A pattern such as /^s?c/mi8 where the optional character has more than
415 14. If a character class started [\Qx]... where x is any character, the class
418 15. If a pattern that started with a caseless match for a character with more
452 2. The auto-possessification of character sets were improved: a normal
453 and an extended character set can be compared now. Furthermore
454 the JIT compiler optimizes more character set checks.
469 6. Improve character range checks in JIT. Characters are read by an inprecise
470 function now, which returns with an unknown value if the character code is
499 12. In a caseless character class with UCP support, when a character with more
500 than one alternative case was not the first character of a range, not all
591 negated single-character class with a character that occupied more than one
597 recognizing that \h, \H, \v, \V, and \R must match a character.
611 (b) If the match point in a subject started with modifier character, and
613 point, and potentially beyond the first character in the subject,
644 16. Unicode character properties were updated from Unicode 6.3.0.
653 18. The character VT has been added to the default ("C" locale) set of
691 character types such as \d or \w, too many callouts were inserted, and the
747 40. Document that the same character tables must be used at compile time and
795 in repeated character loops from pcre_uchar to pcre_uint32 also gave speed
837 does not actually inspect the previous character. This is to ensure that,
838 in partial multi-segment matching, at least one character from the old
909 character types now use tail recursion, which reduces stack usage.
943 1. Improved JIT compiler optimizations for first character search and single
944 character iterators.
949 3. Single character iterator optimizations in the JIT compiler.
951 4. Improved JIT compiler optimizations for character ranges.
1020 19. Improving the first n character searches.
1032 CaseFolding.txt instead of UnicodeData.txt for character case
1035 (b) The code for adding characters or ranges of characters to a character
1045 (d) The processing of \h, \H, \v, and \ in character classes now makes use
1046 of the new class addition function, using character lists defined as
1055 22. Unicode character properties were updated from Unicode 6.2.0
1059 24. Add support for 32-bit character strings, and UTF-32
1069 pcre_compile.c when checking for a zero character.
1118 of more than one character:
1134 a partial match for a CR character at the end of the subject string.
1143 8. OP_NOT now supports any UTF character not just single-byte ones.
1195 \w+ when the character tables indicated that \x{c4} was a word character.
1233 37. Optimizing single character iterators in JIT.
1246 character. The three items that might have provoked this were recursions,
1279 6. Add support for 16-bit character strings (a large amount of work involving
1347 (*THEN), \h, \H, \v, \V, and single character negative classes with fixed
1406 22. A caseless match of a UTF-8 character whose other case uses fewer bytes did
1407 not work when the shorter character appeared right at the end of the
1500 character after the value is now allowed for.
1571 the failing character and a reason code are placed in the vector.
1574 now returned is for the first byte of the failing character, instead of the
1585 back over a single character (\n). This seems wrong (because it treated the
1618 opcodes that mean there is no starting character; this means that when new
1698 first character it looked at was a mark character.
1728 36. \g was being checked for fancy things in a character class, when it should
1847 3. If \s appeared in a character class, it removed the VT character from
1855 changed to the "earliest inspected character" point, because the returned
1856 data for a partial match starts at this character. This means that, for
1872 If a UTF-8 multi-byte character included the byte 0x85 (e.g. +U0445, whose
1906 mode when an empty string match preceded an ASCII character followed by
1907 a non-ASCII character. (The code for advancing by one character rather
1920 starting offset points to the beginning of a UTF-8 character was
1939 19. If \c was followed by a multibyte UTF-8 character, bad things happened. A
1941 character, that is, a byte less than 128. (In EBCDIC mode, the code is
1969 3. Inside a character class, \B is treated as a literal by default, but
1973 4. Inside a character class, PCRE always treated \R and \X as literals,
1977 5. Added support for \N, which always matches any character other than
1992 9. Added PCRE_UCP to make \b, \d, \s, \w, and certain POSIX character classes
2009 added property types that matched character-matching opcodes).
2047 standard character tables, thus making it possible to include the tests
2290 was a minimum greater than 1 for a wide character in a possessive
2293 character. Chaos in the form of incorrect output or a compiling loop could
2299 slots in the offset vector, the offset of the earliest inspected character
2310 given only if matching could not proceed because another character was
2314 final character ended with (*FAIL).
2317 if the pattern had a "must contain" character that was already found in the
2319 example, with the pattern /dog.(body)?/, the "must contain" character is
2324 changed so that it starts at the first inspected character rather than the
2325 first character of the match. This makes a difference only if the pattern
2351 19. If an odd number of negated classes containing just a single character
2356 [The bug was that it was starting one character too far in when skipping
2357 over the character class, thus treating the ] as data rather than
2462 11. Unicode property support in character classes was not working for
2475 from Martin Jerabek that uses macro names for all relevant character and
2595 the data contained the byte 0x85 as part of a UTF-8 character within its
2604 pcre_exec() in ovector are byte offsets, not character counts.
2644 NUL character as backslash + 0 rather than backslash + NUL, because PCRE
2674 (a) A lone ] character is dis-allowed (Perl treats it as data).
2679 (c) A data ] in a character class must be notated as \] because if the
2680 first data character in a class is ], it defines an empty class. (In
2683 The negative empty class [^] matches any one character, independently
2687 non-existent subpattern following a character class starting with ']' and
2725 1. A character class containing a very large number of characters with
2776 2. Negative specials like \S did not work in character classes in UTF-8 mode.
2814 11. The program that makes PCRE's Unicode character property table had a bug
2816 characters that have the same character type, but are in different scripts.
2857 character property caused pcre_compile() to compile bad code, which led at
2884 UTF-8 newline character). The key issue is that the pattern starts .*;
2891 character classes. PCRE was not treating the sequence [:...:] as a
2892 character class unless the ... were all letters. Perl, however, seems to
2896 for example, whereas PCRE did not - it did not recognize a POSIX character
2914 the change happens only if \r or \n (or a literal CR or LF) character is
2999 character was \x{1ec5}). *Character* 0x85 is one of the "any" newline
3001 of another character. The bug was that, for an unlimited repeat of . in
3053 because the ] is interpreted as the first data character and the
3098 21. An orphan \E inside a character class could cause a crash.
3127 character were causing crashes (broken optimization).
3215 12. If \p or \P was used in non-UTF-8 mode on a character greater than 127
3300 character were of different lengths in their UTF-8 codings (there are
3307 the other case of a UTF-8 character when checking ahead for a match
3309 matching a wide character, but failed, corruption could cause an
3311 character.
3419 that \n is character 10 (hex 0A), but it also went horribly wrong when
3424 character can have). Though this value is never used (the check for end of
3425 line is "zero bytes in current character"), it caused compiler complaints.
3495 subpattern has no definite first character. For example, (a*|b*)[cd] would
3497 first character must be a, b, c, or d.
3512 as a*b as a*+b. More specifically, if something simple (such as a character
3533 the first matched character to be a colon. This applied both to named and
3548 when there were unescaped parentheses in a character class, parentheses
3595 character classes, for example with patterns like [\Qa\E-\Qz\E] where the
3669 5. A negated single-character class was not being recognized as fixed-length
3725 17. A character class other than a single negated character that had a minimum
3727 correctly by pce_dfa_exec(). It would match only one character.
3729 18. A valid (though odd) pattern that looked like a POSIX character
3730 class but used an invalid character after [ (for example [[,abc,]]) caused
3743 the Unicode character scheme when presented with Unicode data--or
3756 translate to the appropriate multibyte character.
3763 21. PCRE has not included VT (character 0x0b) in the set of whitespace
3768 as a possible starting character. Of course, this did no harm; it just
3891 when they appeared in character classes, but not when they appeared outside
3892 a character class. The bit map for "word" characters is now created
3894 upper, lower, and digit maps. (Plus the underscore character, of course.)
3896 5. The above bug also affected the handling of POSIX character classes such as
3903 6. The [[:blank:]] character class matches horizontal, but not vertical space.
3906 subtraction was done in the overall bitmap for a character class, meaning
3986 18. Changes to the handling of Unicode character properties:
4004 19. In UTF-8 mode, a backslash followed by a non-Ascii character was not
4005 matching that character.
4292 passes PCRE_DOTALL to the pcre_compile() function, making the "." character
4315 containing multiple characters in a single byte-string. Each character
4317 byte in the character in UTF-8 mode.
4353 8. Negated POSIX character classes that used a combination of internal tables
4363 character. This was a failing pattern: "(?!.bcd).*". The bug is now fixed.
4366 starting at the last subject character, bytes beyond the end of the subject
4385 15. Added optional support for general category Unicode character properties
4410 the character tables forced to be NULL. The study data, if any, is
4479 to a byte that is the start of a UTF-8 character. If not, it returns
4517 11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character
4532 "internal error: code overflow...". This applied to any character class
4543 this pattern is that a match can start with any character.
4549 1. In UTF-8 mode, a character class containing characters with values between
4617 (iii) PCRE was using its character types table to recognize decimal and
4619 only 0-9, a-f, and A-F, but the character types table is locale-
4623 character types table is still used for matching digits in subject
4696 option, a class that contained a single character with a value between 128
4728 literal character that is needed in the subject for a match, and scans along to
4736 first character of an anchored pattern as "needed", thus provoking a search
4738 fail. The "needed" character is now not set for anchored patterns, unless it
4757 4. From 5.004, Perl has not included the VT character (0x0b) in the set defined
4786 For compatibility with Perl, \Q...\E sequences are recognized inside character
4842 19. Although correctly diagnosing a missing ']' in a character class, PCRE was
4871 26. The handling of the optimization for finding the first character of a
4872 non-anchored pattern, and for finding a character that is required later in the
5121 (i) A character class whose characters are all within 0-255 is handled as
5123 character > 255 always failed to match such a class; however it should
5126 (ii) A negated character class with a single character < 255 is coded as
5127 "not this character" (OP_NOT). This wasn't working properly when the test
5128 character was multibyte, either singly or repeated.
5133 (iv) The character escapes \b, \B, \d, \D, \s, \S, \w, and \W (either
5139 (v) Classes may now contain characters and character ranges with values
5157 55. Unknown escapes inside character classes (e.g. [\M]) and escapes that
5164 which is run to generate the source of the default character tables. They
5279 1. If an octal character was given, but the value was greater than \377, it
5300 7. Added the beginnings of support for UTF-8 character strings.
5400 11. Added support for POSIX character classes like [:alpha:], which Perl is
5446 causing the entire string to be ignored, instead of just the last character.
5452 character in the pattern, and pre-searching the subject to ensure it is present
5478 4. PCRE wasn't doing the "first character" optimization for patterns starting
5557 to character tables built by pcre_maketables() in the current locale. If NULL
5569 3. The first character computation wasn't working for (?>) groups.
5610 1. A negated single character class followed by a quantifier with a minimum
5613 containing more than one character, or to minima other than one.
5643 1. Negated character classes containing more than one character were failing if
5683 4. A regex ending with a one-character negative class (e.g. /[^k]$/) did not
5684 fail on data ending with that character. (It was going on too far, and checking
5685 the next character, typically a binary zero.) This was specific to the
5686 optimized code for single-character negative classes.
5734 like /([ab]*)*/, that is, for classes with more than one character in them.
5752 1. Fixed bug in code for optimizing classes with only one character. It was
5790 2. Fixed an incompatibility with Perl: "{" is now treated as a normal character
5813 escape is read. Inside a character class, it's always an octal escape,
5816 (d) An escaped but undefined alphabetic character is taken as a literal,
5826 6. Changed the handling of character classes; they are now done with a 32-byte
5855 1. /(b)|(:+)/ was computing an incorrect first character.
5870 6. The character tables are now in a separate module whose source is generated
5883 1. A repeat with a fixed maximum and a minimum of 1 for an ordinary character
5886 2. Caseless matching was not working in character classes if the characters in
5903 options, and the first character, if set.
5905 9. Recognize C+ or C{n,m} where n >= 1 as providing a fixed starting character.