Lines Matching refs:a

4 Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
10 1. If a group that contained a recursive back reference also contained a
11 forward reference subroutine call followed by a non-forward-reference
21 3. A repeated conditional group whose condition was a reference by name caused
22 a buffer overflow if there was more than one group with the given name.
25 4. A recursive back reference by name within a group that had the same name as
26 another group caused a buffer overflow. For example:
29 5. A forward reference by name to a group whose number is the same as the
31 a buffer overflow at compile time. This bug was discovered by the LLVM
34 6. A lookbehind assertion within a set of mutually recursive subpatterns could
35 provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
37 7. Another buffer overflow bug involved duplicate named groups with a
38 reference between their definition, with a group that reset capture
46 being treated as a literal 'l' instead of causing an error.
48 10. There was a buffer overflow if pcre_exec() was called with an ovector of
51 11. If a non-capturing group containing a conditional group that could match
56 \a and \e in test subject lines.
58 13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
64 15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
69 error (correctly) when used outside a class, but did not give an error
70 within a class.
72 17. \h within a class was incorrectly compiled in EBCDIC environments.
74 18. A pattern with an unmatched closing parenthesis that contained a backward
75 assertion which itself contained a forward reference caused buffer
85 /(?:|a|){100}x/ are analysed.
92 pcre_compile() to run for a very long time.
98 25. If (?R was followed by - or + incorrect behaviour happened instead of a
104 27. Similar to (4) above: in a pattern with duplicated named groups and an
106 reference to become recursive if a later named group with the relevant
107 number is encountered. This could lead to a buffer overflow. Wen Guanxing
110 28. If pcregrep was given the -q option with -c or -l, or when handling a
114 control verbs. This issue was found by Karl Skomski with a custom LLVM
120 31. Added a check for integer overflow in conditions (?(<digits>) and
124 32. Handling recursive references such as (?2) when the reference is to a group
126 It has been re-written for PCRE2. Here in PCRE1, a check has been added to
129 33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
130 This issue was found by Karl Skomski with a custom LLVM fuzzer.
133 repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
136 Skomski with a custom LLVM fuzzer.
144 with a custom LLVM fuzzer.
146 38. Fixed a corner case of range optimization in JIT.
154 with a custom LLVM fuzzer.
159 42. In a character class such as [\W\p{Any}] where both a negative-type escape
160 ("not a word character") and a property escape were present, the property
165 44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
166 by a single ASCII character in a class item, was incorrectly compiled in
173 46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated
175 When a Unicode property was also in the class (if PCRE_UCP is set, escapes
190 2. If an assertion condition was quantified with a minimum of zero (an odd
193 3. If a pattern in pcretest input had the P (POSIX) modifier followed by an
194 unrecognized modifier, a crash could occur.
196 4. An attempt to do global matching in pcretest with a zero-length ovector
197 caused a crash.
199 5. Fixed a memory leak during matching that could occur for a subpattern
203 6. Catch a bad opcode during auto-possessification after compiling a bad UTF
204 string with NO_UTF_CHECK. This is a tidyup, not a bug fix, as passing bad
207 7. A UTF pattern containing a "not" match of a non-ASCII character and a
210 8. When a pattern is compiled, it remembers the highest back reference so that
212 use instead. A conditional subpattern whose condition is a check on a
214 /^(?:(a)|b)(?(1)A|B)/, is another kind of back reference, but it was not
217 was no other kind of back reference (a situation which is probably quite
219 FALSE when the capture could not be consulted, leading to a incorrect
222 9. A reference to a duplicated named group (either a back reference or a test
223 for being set in a conditional) that occurred in a part of the pattern where
227 10. A mutually recursive set of back references such as (\2)(\1) caused a
232 11. If an assertion that was used as a condition was quantified with a minimum
234 unlimited repetition and could match an empty string, a segfault was
240 12. A possessive capturing group such as (a)*+ with a minimum repeat of zero
247 (a) A crash if /K and /F were both set with the option to save the compiled
250 (b) Another crash if the option to print captured substrings in a callout
251 was combined with setting a null ovector, for example \O\C+ as a subject
254 14. A pattern such as "((?2){0,1999}())?", which has a group containing a
255 forward reference repeated a large (but limited) number of times within a
256 repeated outer group that has a zero minimum quantifier, caused incorrect
263 23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
264 call within a group that also contained a recursive back reference caused
269 24. Computing the size of the JIT read-only data in advance has been a source
277 mode. In that example, the range a-j was left out of the class.
286 when this assertion was used as a condition, for example (?(?!)a|b). In
295 30. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
299 compile the pattern, leading to a buffer overflow. This bug was discovered
306 32. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
307 between a subroutine call and its quantifier was incorrectly compiled,
313 (?(?< for the ! or = that would indicate a lookbehind assertion. This bug
316 34. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
317 a fixed maximum following a group that contains a subroutine reference was
321 35. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
322 caused a stack overflow instead of the diagnosis of a non-fixed length
325 36. The use of \K in a positive lookbehind assertion in a non-anchored pattern
328 37. There was a similar problem to 36 in pcretest for global matches.
330 38. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
331 and a subsequent item in the pattern caused a non-match, backtracking over
333 causing reference to random memory and/or a segfault. There were also some
337 39. The function for finding the minimum length of a matching string could take
338 a very long time if mutual recursion was present many times in a pattern,
355 overflow would give a negative number. The tests are now applied as the
372 7. Fixed a bug concerned with zero-minimum possessive groups that could match
380 the interpreter was reporting a match of 'NON QUOTED ' only, whereas the
382 for an empty string was breaking the inner loop and carrying on at a lower
383 level, when possessive repeated groups should always return to a higher
387 8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern
390 9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl
393 10. Change 8.34/15 introduced a bug that caused the amount of memory needed
394 to hold a pattern to be incorrectly computed (too small) when there were
402 /(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
404 the reference to 'Name' was incorrectly treated as a reference to a
411 13. When a pattern starting with \s was studied, VT was not included in the
415 14. If a character class started [\Qx]... where x is any character, the class
418 15. If a pattern that started with a caseless match for a character with more
421 optimization improvement, not a bug fix.
425 17. Fixed a number of memory leaks in pcregrep.
427 18. Avoid a compiler warning (from some compilers) for a function call with
428 a cast that removes "const" from an lvalue by using an intermediate
431 19. Incorrect code was compiled if a group that contained an internal recursive
432 back reference was optional (had quantifier with a minimum of zero). This
433 example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples
436 20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a
437 group that is quantified with an indefinite repeat, caused a compile-time
438 loop which used up all the system stack and provoked a segmentation fault.
452 2. The auto-possessification of character sets were improved: a normal
460 of a match later then the end of the match. The pcretest program was not
462 binary zero. It now reports this situation in a message, and outputs the
471 above a certain threshold (e.g: 256). The only limitation is that the value
475 7. The macros whose names start with RAWUCHAR are placeholders for a future
492 backtracking path, when more than four alternatives are present inside a
499 12. In a caseless character class with UCP support, when a character with more
500 than one alternative case was not the first character of a range, not all
506 enabled. This is not used in Windows, so I have put this test inside a
509 14. Improve pattern prefix search by a simplified Boyer-Moore algorithm in JIT.
510 The algorithm provides a way to skip certain starting offsets, and usually
522 test 3 failed; it now does. If further versions of a French locale ever
525 16. If --with-pcregrep-bufsize was given a non-integer value such as "50K",
526 there was a message during ./configure, but it did not stop. This now
528 If a value less than the minimum is given, the minimum value has always
529 been used, but now a warning is given.
532 was a bug in the test system, which is now fixed. Also, the list of various
542 this issue can be fixed by a performance optimization.
545 not take account if existing stack usage. There is now a new global
551 behaved like a min-possessive qualifier, and, for example, /a{1,3}b/U did
579 3. Make a small performance improvement in strlen16() and strlen32() in
586 5. Cleaned up a "may be uninitialized" compiler warning in pcre_exec.c.
588 6. In UTF mode, the code for checking whether a group could match an empty
590 breaking an infinite loop) was broken when the group contained a repeated
591 negated single-character class with a character that occupied more than one
592 data item and had a minimum repetition of zero (for example, [^\x{100}]* in
596 7. The code for checking whether a group could match an empty string was not
597 recognizing that \h, \H, \v, \V, and \R must match a character.
603 that were repeated with a maximizing qualifier (e.g. \X* or \X{2,5}) when
606 (a) If the rest of the pattern did not match after a maximal run of
608 did not always back up over a full grapheme when characters that do not
611 (b) If the match point in a subject started with modifier character, and
614 leading to a segfault or an incorrect match result.
617 recording an incorrect first data item for a match if no other first data
618 item was recorded. For example, the pattern (?(?=ab)ab) recorded "a" as a
622 11. Change 40 for 8.33 (allowing pcregrep to find empty strings) showed up a
623 bug that caused the command "echo a | ./pcregrep -M '|a'" to loop.
639 15. A back reference to a named subpattern when there is more than one of the
647 on a patch by Zoltan Herczeg. It now happens after instead of during
651 (*NO_AUTO_POSSESS) at the start of a pattern.
668 "source" is a bash-ism.
675 literal characters 8 and 9 instead of a binary zero followed by the
690 25. If PCRE_AUTO_CALLOUT and PCRE_UCP were set for a pattern that contained
711 properties for \w, \d, etc) is present in a test regex. Otherwise if the
722 33. There is now a limit (default 250) on the depth of nesting of parentheses.
727 34. Character classes such as [A-\d] or [a-[:digit:]] now cause compile-time
734 when a quantifier follows (?!). I can't see any use for this, but it makes
738 change also in PCRE. It simplifies the code a bit.
740 37. In extended mode, Perl ignores spaces before a + that indicates a
741 possessive quantifier. PCRE allowed a space before the quantifier, but not
744 38. The use of \K (reset reported match start) within a repeated possessive
745 group such as (a\Kb)*+ was not working.
763 mean "start of word" and "end of word", respectively, as a transition aid.
765 43. A minimizing repeat of a class containing codepoints greater than 255 in
789 to unsigned int is reported to make a quite noticeable speed difference in
790 a specific Windows environment. Testing on Linux did also appear to show
811 (a) Unoptimized capturing brackets incorrectly reset on backtrack.
816 cases when there was a capture on one path that was subsequently abandoned
817 after a backtrack. Also, the capture_last value is now reset after a
821 case when an overflowing capture is in a branch that is subsequently
822 abandoned after a backtrack.
825 now prints out the matched string after a yield of 0 or 1.
836 17. The \A escape now records a lookbehind value of 1, though its execution
839 segment is retained when a new segment is processed. Otherwise, if there
841 of a new segment.
857 22. When a pattern was compiled with automatic callouts (PCRE_AUTO_CALLOUT) and
858 there was a conditional group that depended on an assertion, if the
863 condition for a conditional group, for compatibility with automatic
864 callouts, which always insert a callout at this point.
866 24. In 8.31, (*COMMIT) was confined to within a recursive subpattern. Perl also
871 26. Fix infinite loop when /(?<=(*SKIP)ac)a/ is matched against aa.
900 33. An opening parenthesis in a MARK/PRUNE/SKIP/THEN name in a pattern that
901 contained a forward subroutine reference caused a compile error.
911 37. The value of the max lookbehind was not correctly preserved if a compiled
912 and saved regex was reloaded on a host of different endianness.
921 with a pattern such as ^$. It has taken 4 years for anybody to notice! The
927 41. Applied a user patch to fix a number of spelling mistakes in comments.
937 a segmentation fault.
956 6. The PCRE_STARTLINE bit, indicating that a match can occur only at the start
957 of a line, was being set incorrectly in cases where .* appeared inside
958 atomic brackets at the start of a pattern, or where there was a subsequent
964 8. Fixed a number of issues in pcregrep, making it more compatible with GNU
967 (a) There is now no limit to the number of patterns to be matched.
969 (b) An error is given if a pattern is too long.
977 just to those obtained from scanning a directory recursively.
981 (g) In a Windows environment, the default for -d has been changed from
983 of a directory in the file list provokes an error.
989 10. Changed the meaning of \X so that it now matches a Unicode extended
992 11. Patch by Daniel Richard G to the autoconf files to add a macro for sorting
1005 explicit references to (e.g.) \x0a instead of CHAR_LF. There has been a
1007 not quite right. There is now a test that can be run on ASCII systems to
1008 check some of the EBCDIC-related things (but is it not a full test).
1011 in a small tidy to the code.
1029 (a) The Unicode property table now has offsets into a new table of sets of
1035 (b) The code for adding characters or ranges of characters to a character
1036 class has been abstracted into a generalized function that also handles
1040 (c) A bug that is fixed as a result of (b) is that codepoints less than 256
1064 26. Applied a modified version of Daniel Richard G's patch to create
1068 27. Added a definition for CHAR_NULL (helpful for the z/OS port), and use it in
1069 pcre_compile.c when checking for a zero character.
1071 28. Introducing a native interface for JIT. Through this interface, the compiled
1077 29. If pcre_exec() or pcre_dfa_exec() was called with a negative value for
1079 was confusing. There is now a new error PCRE_ERROR_BADLENGTH for this case.
1083 the "old" RFC 2279). Instead, it ended up passing a negative length to
1088 32. Running "pcretest -C pcre8" or "pcretest -C pcre16" gave a spurious error
1091 33. There is now support for generating a code coverage report for the test
1099 25. (*UTF) can now be used to start a pattern in any of the three libraries.
1107 1. Fixing a wrong JIT test case and some compiler warnings.
1109 2. Removed a bashism from the RunTest script.
1111 3. Add a cast to pcre_exec.c to fix the warning "unary minus operator applied
1120 (a) /^(..)\1/ did not partially match "aba" because checking references was
1124 (b) \R did not give a hard partial match if \r was found at the end of the
1127 (c) \X did not give a hard partial match after matching one or more
1130 (d) When newline was set to CRLF, a pattern such as /a$/ did not recognize
1131 a partial match for the string "\r".
1134 a partial match for a CR character at the end of the subject string.
1141 or /S+[+] with a digit between 1 and 7.
1152 12. Applied a (slightly modified) user-supplied patch that improves performance
1154 recursion). Instead of malloc and free for each heap frame each time a
1155 logical recursion happens, frames are retained on a chain and re-used where
1158 13. As documented, (*COMMIT) is now confined to within a recursive subpattern
1161 14. As documented, (*COMMIT) is now confined to within a positive assertion.
1172 19. Added binary file support to pcregrep, including the -a, --binary-files,
1178 21. Fixed a bug for backward assertions with REVERSE 0 in the JIT compiler.
1185 24. Fixed a very old bug in pcretest that caused errors with restarted DFA
1187 retained). Also added to pcre_dfa_exec() a simple plausibility check on
1188 some of the workspace data at the beginning of a restart.
1191 was not doing so when it should - probably a typo introduced by SVN 528
1195 \w+ when the character tables indicated that \x{c4} was a word character.
1196 There were several related cases, all because the tests for doing a table
1199 27. If a pattern contains capturing parentheses that are not used in a match,
1224 SUPPORT_LIBBZ2. This caused a build problem when bzip2 but not gzip (zlib)
1229 36. When /((?:a?)*)*c/ or /((?>a?)*)*c/ was matched against "aac", it set group
1237 codepoints that are too big for the mode are faulted, and in a UTF mode,
1245 both cases the values are those that cannot be the first data item in a UTF
1263 2. Fixed a bug in fixed-length calculation for lookbehinds that would show up
1269 4. For a non-anchored pattern, if (*SKIP) was given with a name that did not
1270 match a (*MARK), and the match failed at the start of the subject, a
1279 6. Add support for 16-bit character strings (a large amount of work involving
1284 from a file.
1292 10. Get rid of a number of -Wunused-but-set-variable warnings.
1295 "x". The similar pattern /(?=(*:x))((*:y)q|)/ did not return a mark at all.
1305 13. Applied Dmitry V. Levin's patch for a more portable method for linking with
1311 than the heap when not using the stack for recursion. This gives a
1330 parentheses, for example, (?>a(*:m)), were not being passed out. This bug
1336 6. Lookbehinds such as (?<=a{2}b) that contained a fixed repetition were
1340 7. While fixing 6 above, I noticed that a number of other items were being
1342 opcodes had not been added to the fixed-length checking code. I have (a)
1348 repetitions, e.g. [^a]{3}, with and without PCRE_CASELESS.
1355 (A)(A)++ which meant that, after a subsequent mismatch, backtracking into
1358 10. Add a cast and remove a redundant test from the code.
1368 14. Perl does not support \N without a following name in a [] class; PCRE now
1371 15. If a forward reference was repeated with an upper limit of around 2000,
1376 the default depends on LINK_SIZE. There is a new upper limit (for safety)
1380 16. A repeated forward reference in a pattern such as (a)(?2){2}(.) was
1381 incorrectly expecting the subject to contain another "a" after the start.
1383 17. When (*SKIP:name) is activated without a corresponding (*MARK:name) earlier
1391 is a non-match for a non-anchored pattern. For example, if
1392 /b(*:m)f|a(*:n)w/ is matched against "abc", the non-match returns the name
1393 "m", where previously it did not return a name. A side effect of this
1400 19. If the /S+ option was used in pcretest to study a pattern using JIT,
1406 22. A caseless match of a UTF-8 character whose other case uses fewer bytes did
1420 26. If study data was being save in a file and studying had not found a set of
1427 28. Fixed a possible uninitialized memory bug in pcre_jit_compile.c.
1436 1. Change 37 of 8.13 broke patterns like [:a]...[b:] because it thought it had
1437 a POSIX class. After further experiments with Perl, which convinced me that
1438 Perl has bugs and confusions, a closing square bracket is no longer allowed
1439 in a POSIX name. This bug also affected patterns with classes that started
1442 2. If a pattern such as /(a)b|ac/ is matched against "ac", there is no
1447 be totally fixed without adding another stack variable, which seems a lot
1448 of expense for a edge case. However, I have improved the situation in cases
1449 such as /(a)(b)x|abc/ matched against "abc", where the return code
1453 3. Related to (2) above: when there are more back references in a pattern than
1466 5. When the number of matches in a pcre_dfa_exec() run exactly filled the
1471 6. If a subpattern that was called recursively or as a subroutine contained
1472 (*PRUNE) or any other control that caused it to give a non-standard return,
1476 7. If a pattern such as /a(*SKIP)c|b(*ACCEPT)|/ was studied, it stopped
1479 computing a minimum subject length in the presence of *ACCEPT is difficult
1481 so that no minimum is registered for a pattern that contains *ACCEPT.
1483 8. If (*THEN) was present in the first (true) branch of a conditional group,
1489 10. A pathological pattern such as /(*ACCEPT)a/ was miscompiled, thinking that
1490 the first byte in a match must be "a".
1493 /a(?:.)*?a/ drastically. I've improved things by remembering whether a
1495 optimizations are restored. It would be nice to do this on a per-group
1505 14. If (*THEN) appeared in a group that was called recursively or as a
1510 matched, but there is a failure in C so that it backtracks to (*THEN), PCRE
1513 characters to be part of a surrounding alternative, whereas PCRE was
1519 16. Related to 15 above: Perl does not treat the | in a conditional group as
1520 creating alternatives. Such a group is treated in the same way as an
1524 17. If a user had set PCREGREP_COLO(U)R to something other than 1:31, the
1528 inevitable for groups that contain captures, but it can lead to a lot of
1533 suppress the check for a minimum subject length at run time. (If it was
1541 is implemented as a macro that evaluates to its argument more than once,
1542 contravening the C 90 Standard (I haven't checked a later standard). There
1559 4. There were a number of related bugs in the code for matching backrefences
1563 (a) A reference to 3 copies of a 2-byte code matched only 2 of a 3-byte
1564 code. (b) A reference to 2 copies of a 3-byte code would not match 2 of a
1571 the failing character and a reason code are placed in the vector.
1576 enough not to be a problem. It makes the returned offset consistent with
1579 7. pcretest now gives a text phrase as well as the error number when
1580 pcre_exec() or pcre_dfa_exec() fails; if the error is a UTF-8 check
1583 8. When \R was used with a maximizing quantifier it failed to skip backwards
1584 over a \r\n pair if the subsequent match failed. Instead, it just skipped
1585 back over a single character (\n). This seems wrong (because it treated the
1586 two characters as a single entity when going forwards), conflicts with the
1595 which was always a bit of a fudge. It also means that there is one less
1598 (?i:([^b]))(?1) should not match "ab", but previously PCRE gave a match.
1604 11. While implementing 10, a number of bugs in the handling of groups were
1607 (?<=(a)+) was not diagnosed as invalid (non-fixed-length lookbehind).
1608 (a|)*(?1) gave a compile-time internal error.
1609 ((a|)+)+ did not notice that the outer group could match an empty string.
1610 (^a|^)+ was not marked as anchored.
1611 (.*a|.*)+ was not marked as matching at start or after a newline.
1614 function. Special calls to this function are now indicated by setting a
1615 value in a variable in the "match data" data block.
1631 15. When (*ACCEPT) was used in a subpattern that was called recursively, the
1635 16. If a recursively called subpattern ended with (*ACCEPT) and matched an
1637 pattern had matched an empty string, and so incorrectly returned a no
1641 and also for the obeyed branch of a conditional subexpression, which used
1647 18. If a pattern containing \R was studied, it was assumed that \R always
1651 19. If a pattern containing (*ACCEPT) was studied, the minimum subject length
1654 20. If /S is present twice on a test pattern in pcretest input, it now
1662 22. When an atomic group that contained a capturing parenthesis was
1664 capturing was not being forgotten if a higher numbered group was later
1665 captured. For example, /(?>(a))b|(a)c/ when matching "ac" set capturing
1666 group 1 to "a", when in fact it should be unset. This applied to multi-
1672 subject after a captured substring, to make it easier to tell which of a
1677 values correctly. For example, if ((?>(a+)b)+aabab) is matched against
1679 "aaa". Previously, it would have been "a". As part of this code
1684 (?(?=(a))a) was matched against "a", no capturing was returned.
1686 26. When studying a pattern that contained subroutine calls or assertions,
1687 the code for finding the minimum length of a possible match was handling
1689 group 1 called group 2 while simultaneously a separate group 2 called group
1698 first character it looked at was a mark character.
1707 31. If \k was not followed by a braced, angle-bracketed, or quoted name, PCRE
1708 compiled something random. Now it gives a compile-time error (as does
1711 32. A *MARK encountered during the processing of a positive assertion is now
1714 33. If --only-matching or --colour was set on a pcregrep call whose pattern
1715 had alternative anchored branches, the search for a second match in a line
1718 with a backwards assertion. For example /\b01|\b02/ also matched "0102"
1728 36. \g was being checked for fancy things in a character class, when it should
1729 just be a literal "g".
1731 37. PCRE was rejecting [:a[:digit:]] whereas Perl was not. It seems that the
1732 appearance of a nested POSIX class supersedes an apparent external class.
1733 For example, [:a[:digit:]b:] matches "a", "b", ":", or a digit. Also,
1735 example, [:a[:abc]b:] gives unknown class "[:abc]b:]". PCRE now behaves
1738 38. PCRE was giving an error for \N with a braced quantifier such as {1,} (this
1745 such as ((?1))((?2)). There is now a runtime test that gives an error if a
1746 subgroup is called recursively as a subpattern for a second time at the
1750 41. A pattern such as /(?(R)a+|(?R)b)/ is quite safe, as the recursion can
1751 happen only once. PCRE was, however incorrectly giving a compile time error
1754 PCRE is compiling a conditional subpattern, but actual runaway loops are
1757 42. It seems that Perl allows any characters other than a closing parenthesis
1768 (a) The default value of the buffer size parameter has been increased from
1780 (e) If a line being scanned overflows pcregrep's buffer, an error is now
1783 45. Add a pointer to the latest mark to the callout data block.
1785 46. The pattern /.(*F)/, when applied to "abc" with PCRE_PARTIAL_HARD, gave a
1789 47. The pattern /f.*/8s, when applied to "for" with PCRE_PARTIAL_HARD, gave a
1790 complete match instead of a partial match. This bug was dependent on both
1793 48. For a pattern such as /\babc|\bdef/ pcre_study() was failing to set up the
1800 1. Fixed some typos in the markup of the man pages, and wrote a script that
1803 2. On a big-endian 64-bit system, pcregrep did not correctly process the
1806 went into the wrong half of a long int.)
1810 of course, ignore a request for colour when reporting lines that do not
1814 -M (multiline) and the pattern match finished with a line ending.
1816 5. In pcregrep, when a pattern that ended with a literal newline sequence was
1825 7. If pcregrep was compiled under Windows, there was a reference to the
1828 reported by a user. I've moved the definition above the reference.
1835 to it in the current branch. For example, in ((a|b)(*THEN)(*F)|c..) it
1841 2. (*COMMIT) was not overriding (*THEN), as it does in Perl. In a pattern
1847 3. If \s appeared in a character class, it removed the VT character from
1849 in [\x00-\xff\s]. (This was a bug related to the fact that VT is not part
1856 data for a partial match starts at this character. This means that, for
1857 example, /(?<=abc)def/ gives a partial match for the subject "abc"
1862 previously a full match would be given. However, setting PCRE_PARTIAL_HARD
1863 has an implication that the given string is incomplete (because a partial
1864 match is preferred over a full match). For this reason, these items now
1865 give a partial match in this situation. [Aside: previously, the one case
1866 /t\b/ matched against "cat" with PCRE_PARTIAL_HARD set did return a partial
1867 match rather than a full match, which was wrong by the old rules, but is
1870 6. There was a bug in the handling of #-introduced comments, recognized when
1872 If a UTF-8 multi-byte character included the byte 0x85 (e.g. +U0445, whose
1873 UTF-8 encoding is 0xd1,0x85), this was misinterpreted as a newline when
1885 cater for a lack of strerror(). These oversights have been fixed.
1891 11. When the -o option was used, pcregrep was setting a return code of 1, even
1901 (a) When the newline convention was "crlf", pcretest got it wrong, skipping
1905 (b) The pcretest code also had a bug, causing it to loop forever in UTF-8
1907 a non-ASCII character. (The code for advancing by one character rather
1911 the cases when CRLF is a valid newline sequence.
1914 as a starting offset was within the subject string. There is now a new
1920 starting offset points to the beginning of a UTF-8 character was
1923 16. Added PCRE_ERROR_SHORTUTF8 to make it possible to distinguish between a
1933 18. At a user's suggestion, the macros GETCHAR and friends (which pick up UTF-8
1934 characters from a string of bytes) have been redefined so as not to use
1939 19. If \c was followed by a multibyte UTF-8 character, bad things happened. A
1941 character, that is, a byte less than 128. (In EBCDIC mode, the code is
1944 20. Recognize (*NO_START_OPT) at the start of a pattern to set the PCRE_NO_
1969 3. Inside a character class, \B is treated as a literal by default, but
1973 4. Inside a character class, PCRE always treated \R and \X as literals,
1983 result to (void) does not stop the warnings; a more elaborate fudge is
1984 needed. I've used a macro to implement this.
1986 7. Minor change to pcretest.c to avoid a compiler warning.
1993 use Unicode properties. (*UCP) at the start of a pattern can be used to set
1999 11. In UTF-8 mode, if a pattern that was compiled with PCRE_CASELESS was
2000 studied, and the match started with a letter with a code point greater than
2005 12. If a pattern that was studied started with a repeated Unicode property
2011 13. pcre_study() now recognizes \h, \v, and \R when constructing a bit map of
2015 \R, and also a number of cases that involve Unicode properties, both
2018 15. If a repeated Unicode property match (e.g. \p{Lu}*) was used with non-UTF-8
2023 16. Added a lot of (int) casts to avoid compiler warnings in systems where
2026 17. Added a check for running out of memory when PCRE is compiled with
2029 18. If the last data line in a file for pcretest does not have a newline on
2030 the end, a newline was missing in the output.
2033 less than 128) in its various bitmaps. However, there is a facility for
2038 caused a problem in UTF-8 mode when pcre_study() was used to create a list
2039 of bytes that can start a match. For \s, it was including 0x85 and 0xa0,
2050 21. A pattern such as (?&t)(?#()(?(DEFINE)(?<t>a)) which has a forward
2051 reference to a subpattern the other side of a comment that contains an
2052 opening parenthesis caused either an internal compiling error, or a
2065 original author of that file, following a query about its status.
2068 inttypes.h instead. This fixes a bug that was introduced by change 8.01/8.
2070 5. A pattern such as (?&t)*+(?(DEFINE)(?<t>.)) which has a possessive
2071 quantifier applied to a forward-referencing subroutine call, could compile
2082 I've fixed the data, and added a kludgy way of testing at compile time that
2085 8. Following on from 7, I added a similar kludge to check the length of the
2089 much relocation at load time. To find a text, the string is searched,
2091 which could happen if a new error number was added without updating the
2094 10. \K gave a compile-time error if it appeared in a lookbehind assersion.
2096 11. \K was not working if it appeared in an atomic group or in a group that
2097 was called as a "subroutine", or in an assertion. Perl 5.11 documents that
2108 item in branch that calls a recursion is a subroutine call - as in the
2111 was not correctly checking the subroutine for matching a non-empty string.
2114 overrun had occurred. This is a "should never occur" error, but it can be
2124 1. If a pattern contained a conditional subpattern with only one branch (in
2125 particular, this includes all (*DEFINE) patterns), a call to pcre_study()
2129 2. For patterns such as (?i)a(?-i)b|c where an option setting at the start of
2136 3. A pattern such as ^(?!a(*SKIP)b) where a negative assertion contained one
2143 assertion subpattern, including such a pattern used as a condition,
2157 (a) Change DEBUG to PCRE_DEBUG.
2160 called "current" as "current_branch", to prevent a collision with the
2161 Linux macro when compiled as a kernel module.
2164 prevent a collision with the Linux macro when compiled as a kernel
2169 when building, a check for int64_t is made, and if it is found, it is used
2177 10. Change the standard AC_CHECK_LIB test for libbz2 in configure.ac to a
2184 - The compiler thus generates a "C" signature for the test function.
2198 most of the time, it *can* run out if it is given a pattern that contains a
2203 version may start with zero. Using 08 or 09 is a bad idea because users
2206 configure.ac, and also added a check that gives an error if 08 or 09 are
2211 in a UTF-8 pattern where \W was quantified with a minimum of 3.
2229 comment: "Figure out how to create a longlong from a string: strtoll and
2230 equivalent. It's not enough to call AC_CHECK_FUNCS: hpux has a strtoll, for
2233 18. A subtle bug concerned with back references has been fixed by a change of
2234 specification, with a corresponding code fix. A pattern such as
2235 ^(xa|=?\1a)+$ which contains a back reference inside the group to which it
2241 moved on to the rest of the pattern, a later failure that backtracks into
2244 any group that contains a reference to itself to be an atomic group; that
2257 2. Changed the call to open a subject file in pcregrep from fopen(pathname,
2258 "r") to fopen(pathname, "rb"), which fixed a problem with some of the tests
2259 in a Windows environment.
2276 (with a space rather than an '='). The man page documented the '=' forms,
2285 is correctly used as a library, but I received one complaint about 50K of
2287 program rather than using a library. Anyway, it does no harm.
2290 was a minimum greater than 1 for a wide character in a possessive
2292 which had an unlimited repeat of a nested, fixed maximum repeat of a wide
2293 character. Chaos in the form of incorrect output or a compiling loop could
2296 9. The restrictions on what a pattern can contain when partial matching is
2305 PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
2311 needed. This makes a difference in some odd cases such as Z(*FAIL) with the
2316 12. Restarting a match using pcre_dfa_exec() after a partial match did not work
2317 if the pattern had a "must contain" character that was already found in the
2323 13. The string returned by pcre_dfa_exec() after a partial match has been
2325 first character of the match. This makes a difference only if the pattern
2326 starts with a lookbehind assertion or \b or \B (\K is not supported by
2330 14. Added a pcredemo man page, created automatically from the pcredemo.c file,
2335 libpcrecpp.pc and pcre-config when PCRE is not compiled as a shared
2338 16. Added REG_UNGREEDY to the pcreposix interface, at the request of a user.
2343 17. If a caller to the POSIX matching function regexec() passes a non-zero
2344 value for nmatch with a NULL value for pmatch, the value of
2347 18. RunGrepTest did not have a test for the availability of the -u option of
2351 19. If an odd number of negated classes containing just a single character
2352 interposed, within parentheses, between a forward reference to a named
2355 subpattern. An example of a crashing pattern is /(?&A)(([^m])(?<A>))/.
2365 21. If the maximum number of capturing subpatterns in a recursion was greater
2371 PCRE did not set those parentheses (unlike Perl). I have now found a way to
2379 pattern matches a fixed length string. PCRE did not allow this; now it
2382 25. I finally figured out how to implement a request to provide the minimum
2383 length of subject string that was needed in order to match a given pattern.
2385 on.) This code has now been added to pcre_study(); it finds a lower bound
2394 Oops. What is worse, even when it was passed study data, there was a bug in
2402 confusion. (This is a difference from Perl.)
2405 numbers, as required by 27 above), and a test is made by name in a
2406 conditional pattern, either for a subpattern having been matched, or for
2407 recursion in such a pattern, all the associated numbered subpatterns are
2451 start or after a newline", because the conditional assertion was not being
2455 9. If auto-callout was enabled in a pattern with a conditional group whose
2474 17. Implemented support for UTF-8 encoding in EBCDIC environments, a patch
2478 18. Added to pcre_internal.h two configuration checks: (a) If both EBCDIC and
2486 (and probably a segfault) for patterns such as ^"((?(?=[a])[^"])|b)*"$
2490 20. If a pattern that was compiled with callouts was matched using pcre_dfa_
2491 exec(), but without supplying a callout function, matching went wrong.
2493 21. If PCRE_ERROR_MATCHLIMIT occurred during a recursion, there was a memory
2495 is smaller, the saved offsets during recursion go onto a local stack
2500 22. There was a missing #ifdef SUPPORT_UTF8 round one of the variables in the
2511 (a) PCRE_BUILD_TESTS can be set OFF not to build the tests, including
2521 ^X(?3)(a)(?|(b)|(q))(Y) is an example.
2523 26. Changed a few more instances of "const unsigned char *" to USPTR, making
2524 the feature of a custom pointer more persuasive (as requested by a user).
2530 28. Added support for (*UTF8) at the start of a pattern.
2540 Muncher (http://www.admuncher.com/) by Peter Kankowski. This uses a two-
2541 stage table and inline lookup instead of a function, giving speed ups of 2
2549 3. Change 12 for 7.7 introduced a bug in pcre_study() when a pattern contained
2550 a group with a zero qualifier. The result of the study could be incorrect,
2558 a UTF-8 string, even in non-UTF-8 mode. Now it generates a single byte in
2559 non-UTF-8 mode. If the value is greater than 255, it gives a warning about
2571 and a #define of that name to empty if it is not externally set. This is to
2577 11. An option change at the start of a pattern that had top-level alternatives
2578 could cause overwriting and/or a crash. This command provoked a crash in
2585 12. For a pattern where the match had to start at the beginning or immediately
2586 after a newline (e.g /.*anything/ without the DOTALL flag), pcre_exec() and
2591 13. The change to pcretest in 12 above threw up a couple more cases when pcre_
2595 the data contained the byte 0x85 as part of a UTF-8 character within its
2601 16. Added a missing copyright notice to pcrecpp_internal.h.
2606 18. Tidied a few places to stop certain compilers from issuing warnings.
2609 supplied by Stefan Weber. I made a further small update for 7.8 because
2610 there is a change of source arrangements: the pcre_searchfuncs.c module is
2617 1. Applied Craig's patch to sort out a long long problem: "If we can't convert
2618 a string to a long long, pretend we don't even have a long long." This is
2622 pre-7.6 versions, which defined a global no_arg variable instead of putting
2625 3. Remove a line of dead code, identified by coverity and reported by Nuno
2650 8. Applied Craig's patch to pcrecpp.cc to fix a problem in OS X that was
2651 caused by fix #2 above. (Subsequently also a second patch to fix the
2652 first patch. And a third patch - this was a messy problem.)
2664 12. Previously, a group with a zero repeat such as (...){0} was completely
2666 was called as a subroutine from elsewhere in the pattern, things went wrong
2668 pattern, with a new opcode that causes them to be skipped at execution
2674 (a) A lone ] character is dis-allowed (Perl treats it as data).
2679 (c) A data ] in a character class must be notated as \] because if the
2680 first data character in a class is ], it defines an empty class. (In
2686 14. A pattern such as /(?2)[]a()b](abc)/ which had a forward reference to a
2687 non-existent subpattern following a character class starting with ']' and
2700 16. The implementation of 13c above involved the invention of a new opcode,
2702 cannot be changed at match time, I realized I could make a small
2716 19. There was a typo in the file ucpinternal.h where f0_rangeflag was defined
2725 1. A character class containing a very large number of characters with
2726 codepoints greater than 255 (in UTF-8 mode, of course) caused a buffer
2736 - Fixed a problem with static linking.
2739 - Added a number of HAVE_XXX tests, including HAVE_WINDOWS_H and
2744 4. A user submitted a patch to Makefile that makes it easy to create
2763 of a program that users should build themselves after PCRE is installed, so
2773 1. Applied a patch from Craig: "This patch makes it possible to 'ignore'
2784 defined or documented. It seems to have been a typo for PCRE_STATIC, so
2787 5. The construct (?&) was not diagnosed as a syntax error (it referenced the
2788 first named subpattern) and a construct such as (?&a) would reference the
2789 first named subpattern whose name started with "a" (in other words, the
2791 expected" is now given for (?&) (a zero-length name), and this patch also
2793 was a reference to a non-existent subpattern).
2795 6. The erroneous patterns (?+-a) and (?-+a) give different error messages;
2799 7. Patterns such as (?(1)a|b) (a pattern that contains fewer subpatterns
2800 than the number used in the conditional) now cause a compile-time error.
2803 seems to me that giving a diagnostic is better.
2814 11. The program that makes PCRE's Unicode character property table had a bug
2817 It amalgamated them into a single range, with the script of the first of
2835 12. The -o option (show only the matching part of a line) for pcregrep was not
2836 compatible with GNU grep in that, if there was more than one match in a
2840 13. If the -o and -v options were combined for pcregrep, it printed a blank
2847 15. The pattern (?=something)(?R) was not being diagnosed as a potentially
2849 being skipped when checking for a possible empty match (negative lookaheads
2856 17. Specifying a possessive quantifier with a specific limit for a Unicode
2867 RE::GlobalReplace(). As a result, the number of replacements returned was
2868 double what it should be. I removed one of the increments, but Craig sent a
2882 22. In UTF-8 mode, with newline set to "any", a pattern such as .*a.*=.b.*
2883 crashed when matching a string such as a\x{2029}b (note that \x{2029} is a
2885 this means that the match must be either at the beginning, or after a
2886 newline. The bug was in the code for advancing after a failed match and
2887 checking that the new position followed a newline. It was not taking
2891 character classes. PCRE was not treating the sequence [:...:] as a
2896 for example, whereas PCRE did not - it did not recognize a POSIX character
2897 class. This seemed a bit dangerous, so the code has been changed to be
2911 means that a class such as [\s] counted as "explicit reference to CR or
2914 the change happens only if \r or \n (or a literal CR or LF) character is
2920 moved the internal flags into a new 16-bit field to free up more option
2923 3. The appearance of (?J) at the start of a pattern set the DUPNAMES option,
2949 relocations when a shared library is dynamically loaded. A technique of
2950 using a single long string with a table of offsets can drastically reduce
2993 I have a vague recollection that the change was concerned with compiling in
3000 characters but of course it shouldn't be taken as a newline when it is part
3003 characters when looking for a newline.
3015 7. Change 7.0/38 introduced a new limit on the number of nested non-capturing
3017 limit also applies to "virtual nesting" when a pattern is recursive, and in
3021 immediately return the result unconditionally, it uses a "tail recursion"
3022 feature to save stack. However, when a subpattern that can match an empty
3024 optimization. That gives it a stack frame in which to save the data for
3037 set at 30,000 so that the product of these two numbers did not overflow a
3041 made it possible to implement the integer overflow checks in a much more
3046 10. Fixed a bug in the documentation for get/copy named substring when
3051 11. Because Perl interprets \Q...\E at a high level, and ignores orphan \E
3060 a conditional subpattern that could match an empty string if that
3065 12. A pattern like \X?\d or \P{L}?\d in non-UTF-8 mode could cause a backtrack
3074 15. Updated the test for a valid UTF-8 string to conform to the later RFC 3629.
3087 18. An unterminated class in a pattern like (?1)\c[ with a "forward reference"
3091 something other than just ASCII characters) inside a group that had an
3092 unlimited repeat caused a loop at compile time (while checking to see
3095 20. Debugging a pattern containing \p or \P could cause a crash. For example,
3098 21. An orphan \E inside a character class could cause a crash.
3100 22. A repeated capturing bracket such as (A)? could cause a wild memory
3103 23. There are several functions in pcre_compile() that scan along a compiled
3105 behind). There were bugs in these functions when a repeated \p or \P was
3110 (a) A item such as \p{Yi}{3} in a lookbehind was not treated as fixed
3113 (b) An item such as \pL+ within a repeated group could cause crashes or
3130 \p or \P) caused a compile-time loop.
3132 28. More problems have arisen in unanchored patterns when CRLF is a valid line
3141 there's a new PCRE_INFO_HASCRORLF option for finding out whether a compiled
3164 stack recursion. This gives a massive performance boost under BSD, but just
3165 a small improvement under Linux. However, it saves one field in the frame
3170 (a) (?-n) (where n is a string of digits) is a relative subroutine or
3173 (b) (?+n) is also a relative subroutine call; it refers to the nth next
3187 (g) (?| introduces a group in which the numbering of parentheses in each
3199 9. A pattern with a very large number of alternatives (more than several
3211 pcrecpp::RE("a*").FullMatch("aaa") matches, while
3212 pcrecpp::RE("a*?").FullMatch("aaa") does not, and
3213 pcrecpp::RE("a*?\\z").FullMatch("aaa") does again.
3215 12. If \p or \P was used in non-UTF-8 mode on a character greater than 127
3227 2. Part of the patch fixed a problem with the pcregrep tests. The test of -r
3231 approprate test as a short-term fix. In the longer term there may be an
3253 6. A Windows user reported a minor discrepancy with test 2, which turned out
3254 to be caused by a trailing space on an input line that had got lost in his
3261 as "const" (a) because they are and (b) because it helps the PHP
3262 maintainers who have recently made a script to detect big data structures
3275 called when UTF-8 support is disabled. Otherwise there are problems with a
3280 (a) It was defining its arguments as char * instead of void *.
3283 a long time ago when I wrote it, but is no longer the case.
3299 (a) For a maximizing quantifier, if the two different cases of the
3302 back up over a mixture of the two cases, it incorrectly assumed they
3307 the other case of a UTF-8 character when checking ahead for a match
3308 while processing a minimizing repeat. If the check also involved
3309 matching a wide character, but failed, corruption could cause an
3310 erroneous result when trying to check for a repeat of the original
3315 (a) The RunTest script now detects the internal link size and whether there
3325 there was also a test in test 3 (the locale tests) that used /B and
3327 I have added a new /Z option to pcretest that replaces the length and
3331 14. If erroroffset was passed as NULL to pcre_compile, it provoked a
3335 ^$ would give a match between the \r and \n of a subject such as "A\r\nB".
3337 ending, and so does not match in that case. It's only a pattern such as ^$
3351 string at the end of a line did not allow for this case. They now check for
3356 buffer for a data line had to be extended.
3359 CRLF as a newline sequence.
3367 24. Added a man page for pcre-config.
3373 1. Fixed a signed/unsigned compiler warning in pcre_compile.c, shown up by
3382 default when a C program starts up. In most systems, only ASCII printing
3386 (a) When it is outputting text in the compiled version of a pattern, bytes
3389 (b) When it is outputting text that is a matched part of a subject string,
3390 it does the same, unless a different locale has been set for the match
3393 4. Fixed a major bug that caused incorrect computation of the amount of memory
3394 required for a compiled pattern when options that changed within the
3399 or a glibc crash with a message such as "pcretest: free(): invalid next
3411 5. Applied patches from Google to: (a) add a QuoteMeta function to the C++
3412 wrapper classes; (b) implement a new function in the C++ scanner that is
3414 recursion in the regex matching; (c) add a paragraph to the documentation
3423 was being set to -1 for the "end of line" case (supposedly a value that no
3428 8. In pcre_version.c, the version string was being built by a sequence of
3430 string (as it is for production releases) called a macro with an empty
3436 9. On the advice of a Windows user, included <io.h> and <fcntl.h> in Windows
3463 line or as a continued sequence of lines) by extending its input buffer if
3465 a string of junk being passed to pcre_compile() if the pattern was longer
3468 17. I have done a major re-factoring of the way pcre_compile() computes the
3469 amount of memory needed for a compiled pattern. Previously, there was code
3470 that made a preliminary scan of the pattern in order to do this. That was
3473 have been a number of bugs (see for example, 4 above). I have now found a
3474 cunning way of running the real compile function in a "fake" mode that
3476 ever using a few hundred bytes of working memory and without too many
3479 depth of parentheses has been removed (though this was never a serious
3480 limitation, I suspect). However, there is a downside: pcre_compile() now
3482 hope this isn't a big issue. There is no effect on runtime performance.
3484 18. Fixed a minor bug in pcretest: if a pattern line was not terminated by a
3485 newline (only possible for the last line of a file) and it was a
3486 pattern that set a locale (followed by /Lsomething), pcretest crashed.
3489 matching only, not compiling. (2) Both -t and -tm can be followed, as a
3490 separate command line item, by a number that specifies the number of
3494 20. Extended pcre_study() to be more clever in cases where a branch of a
3495 subpattern has no definite first character. For example, (a*|b*)[cd] would
3497 first character must be a, b, c, or d.
3500 a subpattern (or the entire pattern) that was being tested for matching an
3501 empty string contained only one non-empty item after a nested subpattern.
3505 22. The pcretest program now has a new pattern option /B and a command line
3512 as a*b as a*+b. More specifically, if something simple (such as a character
3513 or a simple class like \d) has an unlimited quantifier, and is followed by
3517 24. A recursive reference to a subpattern whose number was greater than 39
3521 25. Realized that a little bit of performance could be had by replacing
3526 27. Possessive quantifiers such as a++ were previously implemented by turning
3527 them into atomic groups such as ($>a+). Now they have their own opcodes,
3531 28. A pattern such as (?=(\w+))\1: which simulates an atomic group using a
3533 the first matched character to be a colon. This applied both to named and
3538 30. I was sent a "project" file called libpcre.a.dev which I understand makes
3541 31. There is now a check in pcretest against a ridiculously large number being
3542 returned by pcre_exec() or pcre_dfa_exec(). If this happens in a /g or /G
3548 when there were unescaped parentheses in a character class, parentheses
3549 escaped with \Q...\E, or parentheses in a #-comment in /x mode.
3555 34. Added a number of extra features that are going to be in Perl 5.10. On the
3560 (a) Named groups can now be defined as (?<name>...) or (?'name'...) as well
3564 (b) A recursion or subroutine call to a named group can now be defined as
3567 (c) A backreference to a named group can now be defined as \k<name> or
3571 (d) A conditional reference to a named group can now use the syntax
3577 is always false. There may be only one alternative in such a group.
3584 (g) The escape \gN or \g{N} has been added, where N is a positive or
3600 this code in non-simple cases. For a pattern such as ^(a()*)* matched
3601 against aaaa the result was just "a" rather than "aaaa", for example. Two
3606 capturing bracket numbers. This is a tidy that I avoided doing when I
3622 43. Updated pcregrep to support "--newline=any". In the process, I fixed a
3626 44. Added a number of casts and did some reorganization of signed/unsigned int
3628 "this" as "item" because it is a C++ keyword.
3635 definition from the final binary if PCRE is built into a static library and
3638 46. For an unanchored pattern, if a match attempt fails at the start of a
3648 necessary. The code is crude, but this _is_ just a test program. The
3657 was missing a "static" storage class specifier.
3660 containing an extended class (one that cannot be represented by a bitmap
3662 [\pZ]). Almost always one would set UTF-8 mode when processing such a
3664 [Detail: two cases were found: (a) a repeated subpattern containing an
3665 extended class; (b) a recursive reference to a subpattern that followed a
3675 write a Perl script that can interpret lines of an input file either as
3682 7. In multiline (/m) mode, PCRE was matching ^ after a terminating newline at
3688 a pointer to an int instead of a pointer to an unsigned long int. This
3691 9. Applied a patch from the folks at Google to pcrecpp.cc, to fix "another
3695 length of a subpattern name. The product of these values is used to compute
3696 the size of the memory block for a compiled pattern. By supplying a very
3697 long subpattern name and a large number of named subpatterns, the size
3706 could in principle occur. The compiled length of a repeated subpattern is
3711 13. Added the ability to use a named substring as a condition, using the
3717 15. In UTF-8 mode, with the PCRE_DOTALL option set, a quantified dot in the
3725 17. A character class other than a single negated character that had a minimum
3729 18. A valid (though odd) pattern that looked like a POSIX character
3732 in some cases to crash with a glibc free() error. This could even happen if
3733 the pattern terminated after [[ but there just happened to be a sequence of
3734 letters, a binary zero, and a closing ] in the memory that followed.
3744 instead uses a traditional byte scheme when presented with byte
3747 Sadly, a wide octal escape does not cause a switch, and in a string with
3750 /\500|\x{1ff}/ matches \500 or \777 because the whole thing is treated as a
3761 a warning about an unused variable.
3766 with the documentation.] However, when a pattern was studied with
3768 as a possible starting character. Of course, this did no harm; it just
3771 22. Removed a now-redundant internal flag bit that recorded the fact that case
3773 byte" processing, but is no longer used. This recovers a now-scarce options
3776 the days when it was an int rather than a uint) to free up another bit for
3788 recursions. This makes a big different to stack usage for some patterns.
3790 26. If a subpattern containing a named recursion or subroutine reference such
3792 the space required for the compiled pattern went wrong and gave too small a
3797 27. Applied patches from Google (a) to support the new newline modes and (b) to
3800 28. Change free() to pcre_free() in pcredemo.c. Apparently this makes a
3805 \q<number> in a data line sets the "match limit" value
3806 \Q<number> in a data line sets the "match recursion limt" value
3815 1. Change 16(a) for 6.5 broke things, because PCRE_DATA_SCOPE was not defined
3818 2. Change 25 for 6.5 broke compilation in a build directory out-of-tree
3819 because pcre.h is no longer a built file.
3831 /1234/, partially matching against "123" and then "a4" gave a match.
3835 (a) All non-match returns from pcre_exec() were being treated as failures
3840 specifying a regex that has nested indefinite repeats, for instance).
3848 used to set a locale for matching. The --locale=xxxx long option has
3849 been added (no short equivalent) to specify a locale explicitly on the
3869 (k) Allow "-" to be used as a file name for -f as well as for a data file.
3877 items such as (?R) or (?1), when the recursion could match a number of
3879 outside the recursion, there was a failure, the code tried to back up into
3892 a character class. The bit map for "word" characters is now created
3898 permanent tables. Instead, the bit maps for such a class were previously
3906 subtraction was done in the overall bitmap for a character class, meaning
3907 that a class such as [\x0c[:blank:]] was incorrect because \x0c would not
3912 (a) pcrecpp.cc: "to handle a corner case that may or may not happen in
3915 (b) pcrecpp.cc: "corrects a bug when negative radixes are used with
3922 "configure" and the latter not, in order to fix a problem somebody had
3925 (e) Improve the error-handling of the C++ wrapper a little bit.
3930 have a standard memmove() function (and is therefore rarely compiled),
3931 contained two bugs: (a) use of int instead of size_t, and (b) it was not
3932 returning a result (though PCRE never actually uses the result).
3934 9. In the POSIX regexec() interface, if nmatch is specified as a ridiculously
3941 11. The POSIX flag REG_NOSUB is now supported. When a pattern that was compiled
3956 15. Added some code to make it possible, when PCRE is compiled as a C++
3957 library, to replace subject pointers for pcre_exec() with a smart pointer
3970 (a) pcreposix.h still had just "extern" instead of either of these macros;
3978 17. Added a new limit, MATCH_LIMIT_RECURSION, which limits the depth of nesting
3988 (a) Updated the table to Unicode 4.1.0.
3992 (c) I revised the way the table is implemented to a much improved format
3997 considerably. I realized I did not need to use a tree structure after
3998 all - a binary chop search is just as efficient. Having reduced the
4004 19. In UTF-8 mode, a backslash followed by a non-Ascii character was not
4007 20. When matching a repeated Unicode property with a minimum greater than zero,
4011 there is a check for at least the minimum number of bytes.
4020 23. Recognize \x{...} as a code point specifier, even when not in UTF-8 mode,
4021 but give a compile time error if the value is greater than 0xff.
4029 "configure". I have turned pcre.h into a distributed file, no longer built
4031 no longer a pcre.h.in file.
4033 However, this change necessitated a change to the pcre-config script as
4044 1. Change 6.0/10/(l) to pcregrep introduced a bug that caused separator lines
4047 consider it to be a bug, and have restored the previous behaviour.
4055 certain files from the library's source, which is a bit cleaner.
4070 (a) If C++ support was not built, "make install" and "make test" still
4078 (d) The use of @CPP_OBJ@ directly caused a blank line preceded by a
4079 backslash in a target when C++ was disabled. This confuses some
4087 4. Added a setting of -export-symbols-regex to the link command to remove
4090 "_pcre_". This is not a perfect job, because (a) we have to except some
4092 available (and never for static libraries). I have made a note to try to
4093 find a way round (a) in the future.
4101 a minimum quantifier for a parenthesized subpattern overflowed and became
4113 5. Named capturing subpatterns were not being correctly counted when a pattern
4114 was compiled. This caused two problems: (a) If there were more than 100
4141 from "make" can be reduced a bit by putting "@" in front of each libtool
4144 5. Patch from the folks at Google for configure.in to be a bit more thorough
4145 in checking for a suitable C++ installation before trying to compile the
4146 C++ stuff. This should fix a reported problem when a compiler was present,
4151 retained in the file doc/pcre.txt, which is a concatenation in text format
4155 files that come with release 6. I also added a few comments about the C++
4171 below) to a single monolithic source would have made it really too
4172 unwieldy, quite apart from causing all the code to be include in a
4184 a different (DFA) algorithm. Although it is slower than the original
4189 including restarting after a partial match.
4199 the /f option on a pattern can be used to set this.
4206 (a) Refactored how -x works; insert ^(...)$ instead of setting
4210 (b) Added the -w (match as a word) option.
4213 than one at a time available.
4215 (d) Implemented a pcregrep test script.
4228 same. (This required a bit of code, as the output is generated
4229 automatically from a table. It wasn't just a text change.)
4232 option but starts with a hyphen. Could be a pattern or a path name
4233 starting with a hyphen, for instance.
4235 (h) "-" can be given as a file name to represent stdin.
4241 (j) The option --label=xxx can be used to supply a name to be used for
4258 greps, it now suppresses the error message for a non-existent or non-
4259 accessible file (but not the return code). There is a new option called
4277 15. Added a second compiling function called pcre_compile2(). The only
4278 difference is that it has an extra argument, which is a pointer to an
4279 integer error code. When there is a compile-time failure, this is set
4282 (but then you may as well call pcre_compile(), which is now just a
4283 wrapper). This facility is provided because some applications need a
4291 17. Added a new option, REG_DOTALL, to the POSIX function regcomp(). This
4315 containing multiple characters in a single byte-string. Each character
4316 is now matched using a separate opcode. However, there may be more than one
4327 4. On the advice of a Windows user, the lines
4337 for the benefit of those environments where the "b" makes a difference.
4340 to know about it. I have put a hack into configure.in that adds in code
4341 to set GCC=yes if CC=icc. This seems to end up at a point in the
4353 8. Negated POSIX character classes that used a combination of internal tables
4360 start at the start point or following a newline. The same bug applied to
4362 preceding ".*" at the start, unless the pattern required a fixed first
4363 character. This was a failing pattern: "(?!.bcd).*". The bug is now fixed.
4365 10. In UTF-8 mode, when moving forwards in the subject after a failed match
4370 users. (Previously there was a macro definition, but it apparently wasn't
4375 a compiled regex to be saved and re-used at a later time by a different
4378 13. Modified the pcre-config script so that, when run on Solaris, it shows a
4379 -R library as well as a -L library.
4381 14. The debugging options of pcretest (-d on the command line or D on a
4383 that contained multibyte characters and which was followed by a quantifier.
4409 This is a straight binary dump of the data, with the saved pointer to
4411 written too. After writing, pcretest reads a new pattern.
4413 (ii) If, instead of a pattern, "<rest-of-line" is given, pcretest reads a
4416 pcretest will instead treat the initial "<" as a pattern delimiter.
4421 and 16-bit fields in a compiled pattern, to simulate a pattern that
4422 was compiled on a host of opposite endianness.
4429 to pcre_exec() should be used to pass in a tables address if a value
4432 22. Calling pcre_exec() with a negative value of the "ovecsize" parameter is
4433 now diagnosed as an error. Previously, most of the time, a negative number
4435 NULL, a crash could occur.
4438 new versions from the libtool 1.5 distribution (the last one is a copy of
4439 a file called libtool.m4). This seems to have fixed the need to patch
4451 that it can be compiled in a version that does not call itself recursively.
4453 each "recursion" in a frame on the heap, and gets/frees frames whenever it
4455 of setjmp/longjmp. The whole thing is implemented by a set of macros that
4477 3. When matching a UTF-8 string, the test for a valid string at the start has
4479 to a byte that is the start of a UTF-8 character. If not, it returns
4486 that it rejects (a) strings containing 0xfe or 0xff bytes and (b) strings
4489 5. Fixed a bug (appearing twice) that I could not find any way of exploiting!
4494 6. I had used a variable called "isblank" - this is a C99 function, causing
4497 7. Cosmetic: (a) only output another newline at the end of pcretest if it is
4510 (a) Some "const" qualifiers were missing.
4517 11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character
4519 went into a loop.
4521 12. A recursive reference to a subpattern that was within another subpattern
4522 that had a minimum quantifier of zero caused PCRE to crash. For example,
4523 (x(y(?2))z)? provoked this bug with a subject that got as far as the
4524 recursion. If the recursively-called subpattern itself had a zero repeat,
4527 13. In pcretest, the buffer for reading a data line was set at 30K, but the
4533 that was followed by a possessive quantifier.
4535 15. Modified the Makefile to add libpcre.la as a prerequisite for
4536 libpcreposix.la because I was told this is needed for a parallel build to
4539 16. If a pattern that contained .* following optional items at the start was
4542 matching string must start with a or b or c. The correct conclusion for
4543 this pattern is that a match can start with any character.
4549 1. In UTF-8 mode, a character class containing characters with values between
4555 might give a very teeny performance improvement.
4557 3. Documentation bug: the value of the capture_top field in a callout is *one
4561 in incorrectly linking with a previously installed version. They now link
4568 7. If a pattern was successfully studied, and the -d (or /D) flag was given to
4570 output. Unfortunately, the structure contains a field that has a different
4572 showed this size failed. As the block is currently always of a fixed size,
4578 standard C has not required this for well over a decade. Sigh.
4581 callout_data field, which is a void * field. However, some picky compilers
4588 is found. There is a option for disabling this check in cases where the
4591 11. In response to a bug report, I changed one line in Makefile.in from
4593 -Wl,--out-implib,.libs/lib@WIN_PREFIX@pcreposix.dll.a \
4595 -Wl,--out-implib,.libs/@WIN_PREFIX@libpcreposix.dll.a \
4619 only 0-9, a-f, and A-F, but the character types table is locale-
4621 table is now used for this - though it costs 256 bytes, a table is
4638 provoke a segmentation fault.
4640 6. A lookbehind at the start of a pattern in UTF-8 mode could also cause PCRE
4643 7. A lookbehind in a pattern matched in non-UTF-8 mode on a PCRE compiled with
4645 contained bytes with the 0x80 bit set and the 0x40 bit unset in a lookbehind
4681 to give trouble on HP-UX 11.0, so getting rid of it seems like a good idea
4689 . In pcretest the fact that a const uschar * doesn't automatically cast to
4690 a void * provoked a warning.
4693 and a few more missing casts.
4696 option, a class that contained a single character with a value between 128
4700 option, a class that contained several characters, but with at least one
4707 1. Compiling with gcc -pedantic found a couple of places where casts were
4708 needed, and a string in dftables.c that was longer than standard compilers are
4711 2. Compiling with Sun's compiler found a few more places where the code could
4717 compiled code will be run. I can't find a reference for HOST_CFLAGS, but by
4727 6. A problem with one of PCRE's optimizations was discovered. PCRE remembers a
4728 literal character that is needed in the subject for a match, and scans along to
4731 Problem: the scan can take a lot of time if the subject is very long (e.g.
4736 first character of an anchored pattern as "needed", thus provoking a search
4747 1. If a comment in an extended regex that started immediately after a meta-item
4749 all kinds of weird effects. Example: /#/ was bad; /()#/ was bad; /a#/ was not.
4755 from a single perltest script.
4760 class [:space:] *does* include VT, thereby creating a mess.
4762 5. Added the class [:blank:] (a GNU extension from Perl 5.8) to match only
4765 6. Perl 5.005 was a long time ago. It's time to amalgamate the tests that use
4768 7. Perl 5.8 has changed the meaning of patterns like /a(?i)b/. Earlier versions
4790 floating-point constant arithmetic" warnings from a Microsoft compiler. Added a
4795 option for pcretest, so I've replaced it by a simple function that does just
4804 as (?>x*). In other words, if what is inside (?>...) is just a single repeated
4809 13. A change of greediness default within a pattern was not taking effect at
4810 the current level for patterns like /(b+(?U)a+)/. It did apply to parenthesized
4811 subpatterns that followed. Patterns like /b+(?U)a+/ worked because the option
4818 alternatives of a regex begin with \G, the expression is anchored to the start
4823 "a(?x: b c )d" did not match "XabcdY" but did match "Xa b c dY". It should have
4829 POSIX classes only within a class (e.g. /[[:alpha:]]/).
4842 19. Although correctly diagnosing a missing ']' in a character class, PCRE was
4854 23. Added a new extension: a condition to go with recursion. If a conditional
4858 24. When there was a very long string of literal characters (over 255 bytes
4864 start of line for a non-DOTALL pattern) when a pattern started with (.*) and
4865 there was a subsequent back reference to those brackets. This meant that, for
4869 references whatsoever. (See below for a better fix that came later.)
4871 26. The handling of the optimization for finding the first character of a
4872 non-anchored pattern, and for finding a character that is required later in the
4878 28. Added a new feature that provides some of the functionality that Perl
4879 provides with (?{...}). The facility is termed a "callout". The way it is done
4881 pcre_callout to its entry point. Like pcre_malloc and pcre_free, this is a
4885 This provides a means of identifying different callout points. When PCRE
4886 reaches such a point in the regex, if pcre_callout has been set, the external
4887 function is called. It is provided with data in a structure called
4889 matching continues; if it returns a non-zero value, the match at the current
4893 29. pcretest is upgraded to test the callout functionality. It provides a
4899 \C- do not supply a callout function
4903 30. If pcregrep was called with the -l option and just a single file name, it
4904 output "<stdin>" if a match was found, instead of the file name.
4907 slots is less than POSIX_MALLOC_THRESHOLD, use a block on the stack to pass to
4908 pcre_exec(). This saves a malloc/free per call. The default value of
4912 32. The default maximum size of a compiled pattern is 64K. There have been a
4921 (a) Moved the debugging function for printing out a compiled regex into
4926 (b) Defined the list of op-code names for debugging as a macro in
4929 (c) Defined a table of op-code lengths for simpler skipping along compiled
4930 code. This is again a macro in internal.h so that it is next to the
4941 used to name a group. Names consist of alphanumerics and underscores, and must
4943 (?P>name) which is a PCRE extension to the Python extension. Groups still have
4945 a name/number map. There are three relevant calls:
4949 PCRE_INFO_NAMETABLE yields a pointer to the map.
4951 The map is a vector of fixed-size entries. The size of each entry depends on
4960 38. There was a case of malloc(0) in the POSIX testing code in pcretest. Avoid
4961 calling malloc() with a zero argument.
4963 39. Change 25 above had to resort to a heavy-handed test for the .* anchoring
4964 optimization. I've improved things by keeping a bitmap of backreferences with
4966 fact referenced, the optimization can be applied. It is unlikely that a
4968 the match to follow \n) will appear inside brackets with a number greater than
4971 40. Added a new compile-time option PCRE_NO_AUTO_CAPTURE. This has the effect
4979 failure while inside a recursive subpattern call now causes the
4982 42. It is now possible to set a limit on the number of times the match()
4983 function is called in a call to pcre_exec(). This facility makes it possible to
4984 limit the amount of recursion and backtracking, though not in a directly
4985 obvious way, because the match() function is used in a number of different
4987 string (for non-anchored patterns). The default limit is, for compatibility, a
4990 (a) When configuring PCRE before making, you can use --with-match-limit=n
4991 to set a default value for the compiled library.
4993 (b) For each call to pcre_exec(), you can pass a pcre_extra block in which
4994 a different value is set. See 45 below.
4998 43. Added a new function pcre_config(int, void *) to enable run-time extraction
5026 of match() calls in a pcre_exec() execution. See 42 above.
5042 flags a bitmap indicating which of the following fields are set
5044 match_limit a way of specifying a limit on match() calls for a specific
5061 in a pcre_extra block provided by pcre_study(), or create your own pcre_extra
5064 46. pcretest has been extended to test the PCRE_EXTRA_MATCH_LIMIT feature. If a
5071 47. There's a new option for pcre_fullinfo() called PCRE_INFO_STUDYSIZE. It
5072 returns the size of the data block pointed to by the study_data field in a
5075 created by pcre_study(). The fourth argument should point to a size_t variable.
5076 pcretest has been extended so that this information is shown after a successful
5093 values. In particular, returning PCRE_ERROR_NOMATCH forces a standard
5097 (ii) The pcre_extra structure (see 45 above) has a void * field called
5099 pcre_callout_block structure has a field of the same name. The contents of
5103 testing, the pcretest program has a new data escape
5107 If the callout function in pcretest receives a non-zero value as
5114 51. Extensions to UTF-8 support are listed below. These all apply when (a) PCRE
5122 a bit map, and the map is inverted for negative classes. Previously, a
5123 character > 255 always failed to match such a class; however it should
5124 match if the class was a negative one (e.g. [^ab]). This has been fixed.
5126 (ii) A negated character class with a single character < 255 is coded as
5142 (vi) pcregrep now has a --utf-8 option (synonym -u) which makes it call
5146 PCRE_INFO_FIRSTBYTE because it is a byte value. However, the old name is
5147 retained for backwards compatibility. (Note that LASTLITERAL is also a byte
5151 a number of separate man pages. These also give rise to individual HTML pages;
5152 these are now put in a separate directory, and there is an index.html page that
5221 3. The distribution is now built using autoconf 2.50 and libtool 1.4. From a
5229 relinked by libtool. The documentation has been turned into a man page, so
5236 (iv) Added -f, --file to read patterns from a file.
5241 7. Upgraded Makefile.in to allow for compiling in a different directory from
5252 10. A new release of gcc defines printf() as a macro, which broke pcretest
5253 because it had an ifdef in the middle of a string argument for printf(). Fixed
5260 12. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
5265 The new limit is 65535, which I hope will not be a "real" limit.
5283 2. Perl 5.6 (if not earlier versions) accepts classes like [a-\d] and treats
5284 the hyphen as a literal. PCRE used to give an error; it now behaves like Perl.
5291 4. Add "make test" as a synonym for "make check". Corrected some comments in
5297 6. Changed the name of pgrep to pcregrep, because Solaris has introduced a
5311 This is purely a bug fixing release.
5319 2. The pcretest program was not imitating Perl correctly for the pattern /a*/g
5326 systems, as it is not a Standard C header. It has been removed.
5339 HAVE_MEMMOVE nor HAVE_BCOPY is set, use a built-in emulation function which
5376 required a bigger vector, with some working space on the end. This means that
5381 information files, and making it build pcre-config (a GNU standard). Also added
5382 libtool support for building PCRE as a shared library, which is now the
5393 9. Added a new function, pcre_fullinfo() with an extensible interface. It can
5409 the next newline as if a previous match had failed.
5412 and could get into a loop if a null string was matched other than at the start
5418 5. Added Paul Sokolovsky's minor changes to make it easy to compile a Win32 DLL
5432 3. Typo on pcretest.c; a cast of (unsigned char *) in the POSIX regexec() call
5437 However, I haven't made this a standard facility. The documentation doesn't
5445 7. Fixed bug: a zero repetition after a literal string (e.g. /abcde{0}/) was
5448 8. If a pattern like /"([^\\"]+|\\.)*"/ is applied in the normal way to a
5449 non-matching string, it can take a very, very long time, even for strings of
5453 before running the real match. In other words, it applies a heuristic to detect
5455 with a string that has no trailing " it gives "no match" very quickly.
5470 occurrences in a string.
5474 /+ outputs the rest of the string that follows a match
5480 it wasn't noticing that a match for a pattern such as /\bxyz/ has to start with
5481 the letter 'x'. On long subject strings, this gives a significant speed-up.
5490 2. Fixed a bug which caused patterns starting with .* not to work correctly
5493 not pass a newline unless PCRE_DOTALL is set. It now assumes anchoring only if
5503 If such patterns were nested a few deep, this could multiply and become a real
5506 2. Added /M option to pcretest to show the memory requirement of a specific
5507 pattern. Made -m a synonym of -s (which does this globally) for compatibility.
5510 compiled in such a way that the backtracking after subsequent failure was
5511 pessimal. Something like (a){0,3} was compiled as (a)?(a)?(a)? instead of
5512 ((a)((a)(a)?)?)? with disastrous performance if the maximum was of any size.
5525 pattern. Locked out the use of \ as a delimiter. If \ immediately follows
5528 4. Added the convenience functions for extracting substrings after a successful
5550 a building problem on Windows NT with a FAT file system.
5556 1. Changed the API for pcre_compile() to allow for the provision of a pointer
5586 such a setting is global if at outer level; local otherwise
5590 A backreference to itself in a repeated group matches the previous
5602 11. Added tests from the Perl 5.005_02 distribution. This showed up a few
5610 1. A negated single character class followed by a quantifier with a minimum
5628 1. A pattern such as /((a)*)*/ was not being diagnosed as in error (unlimited
5629 repeat of a potentially empty string).
5655 3. Fixed memory leak which occurred when a regex with back references was
5673 1. A erroneous regex with a missing opening parenthesis was correctly
5681 3. The erroneous regex a[]b caused an array overrun reference.
5683 4. A regex ending with a one-character negative class (e.g. /[^k]$/) did not
5685 the next character, typically a binary zero.) This was specific to the
5688 5. Added a contributed patch from the TIN world which does the following:
5690 + Add an undef for memmove, in case the the system defines a macro for it.
5692 + Add a definition of offsetof(), in case there isn't one. (I don't know
5712 form of a regex was going wrong in the case of back references followed by
5721 2. Applied a contributed patch to get rid of places where it used to remove
5737 as /((?>a*))*/ (a PCRE_EXTRA facility).
5753 initializing a 32-byte map regardless, which could cause it to run off the end
5771 2. Optimized negated single characters not to use a bit map.
5781 6. Added the POSIX-style API wrapper in pcreposix.a and testing facilities in
5788 1. Added a simple "pgrep" utility to the distribution.
5790 2. Fixed an incompatibility with Perl: "{" is now treated as a normal character
5794 3. Fixed serious bug. If a pattern had a back reference, but the call to
5795 pcre_exec() didn't supply a large enough ovector to record the related
5801 4. Increased the compatibility with Perl in a number of ways:
5803 (a) . no longer matches \n by default; an option PCRE_DOTALL is provided
5806 (b) $ matches before a terminating newline by default; an option
5810 (c) The handling of \ followed by a digit other than 0 is now supposed to be
5813 escape is read. Inside a character class, it's always an octal escape,
5814 even if it is a single digit.
5816 (d) An escaped but undefined alphabetic character is taken as a literal,
5826 6. Changed the handling of character classes; they are now done with a 32-byte
5837 \x20 at the start of a run of normal characters. These were being treated as
5848 2. Get pcre_study() to generate a bitmap of initial characters for non-
5863 4. Set the anchored flag if a branch starts with .* or .*? because that tests
5866 5. Split up into different modules to avoid including unneeded functions in a
5870 6. The character tables are now in a separate module whose source is generated
5883 1. A repeat with a fixed maximum and a minimum of 1 for an ordinary character
5884 (e.g. /a{1,3}/) was broken (I mis-optimized it).
5891 4. Make PCRE_ANCHORED public and accept as a compile option.
5905 9. Recognize C+ or C{n,m} where n >= 1 as providing a fixed starting character.
5912 match the empty string as in /(a*)*/. It was looping and ultimately crashing.
5915 a subpattern that had matched an empty string, e.g. /(a|)\1*/. It now does what