Lines Matching refs:pattern

32        sion pattern matching using the same syntax and semantics as Perl, with
111 feature that allows users to turn on UTF support from within a pattern,
113 pattern that begins with "(*UTF8)" or "(*UTF)" turns on UTF-8 mode,
115 instead of individual 8-bit characters. This causes both the pattern
122 pcre_fullinfo() function to check the compiled pattern's options for
124 option at compile time. This causes an compile time error if a pattern
132 Another way that performance can be hit is by running a pattern that
134 Nested unlimited repeats in a pattern are a common example. PCRE pro-
210 pcre16 *pcre16_compile(PCRE_SPTR16 pattern, int options,
214 pcre16 *pcre16_compile2(PCRE_SPTR16 pattern, int options,
327 you must take care when processing any particular pattern to use func-
463 given when a compiled pattern is passed to a function that processes
464 patterns in the other mode, for example, if a pattern compiled with
481 If there is an error while compiling a pattern, the error text that is
542 pcre32 *pcre32_compile(PCRE_SPTR32 pattern, int options,
546 pcre32 *pcre32_compile2(PCRE_SPTR32 pattern, int options,
659 pattern to use functions from just one library. For example, if you
660 want to study a pattern that was compiled with pcre32_compile(), you
792 The error PCRE_ERROR_BADMODE is given when a compiled pattern is passed
794 if a pattern compiled with pcre_compile() is passed to pcre32_exec().
809 If there is an error while compiling a pattern, the error text that is
979 appropriate) when you call one of the pattern compiling functions.
993 ters. If you want to be able to use the pattern escapes \P, \p, and \X,
1058 By default, the sequence \R in a pattern matches any Unicode newline
1087 Within a compiled pattern, offset values are used to point from one
1091 for a compiled pattern of around 64K. This is sufficient to handle all
1138 edly (sometimes recursively) when matching a pattern with the
1386 match a pattern, the two algorithms give the same answer. A difference
1388 the pattern
1403 resented as a tree structure. An unlimited repetition in the pattern
1405 pattern to a given subject string (from a given starting point) can be
1415 depth-first search of the pattern tree. That is, it proceeds along a
1430 in the pattern.
1434 strings that are matched by portions of the pattern in parentheses.
1464 subject. If the pattern
1474 ter repeats at the end of a pattern (as well as internally). For exam-
1475 ple, the pattern "a\d+" is compiled as if it were "a\d++" because there
1489 match what is quantified, for example in a pattern like this:
1493 This pattern matches "aaab!" but not "aaa!", which would be matched by
1495 it is matched as if it were a standalone pattern at the current point,
1497 pattern.
1591 pcre *pcre_compile(const char *pattern, int options,
1595 pcre *pcre_compile2(const char *pattern, int options,
1795 compiled pattern. The function pcre_version() returns a pointer to a
1799 block containing a compiled pattern. This is provided for the benefit
1829 starts to compile a parenthesized part of a pattern. When parentheses
1850 pattern is compiled, or when it is matched.
1854 the start of the pattern itself; this overrides any other settings. See
1862 ment for a non-anchored pattern. There is more detail about this in the
1879 ing, so the same compiled pattern can safely be used by several threads
1983 since it allows the compiled pattern to be up to 64K in size. Larger
1996 parentheses (of any kind) in a pattern. This limit is imposed to cap
1997 the amount of system stack used when a pattern is compiled. It is spec-
2029 pcre *pcre_compile(const char *pattern, int options,
2033 pcre *pcre_compile2(const char *pattern, int options,
2039 to compile a pattern into an internal form. The only difference between
2045 The pattern is a C string terminated by a binary zero, and is passed in
2046 the pattern argument. A pointer to a single block of memory that is
2062 unset from within the pattern (see the detailed description in the
2064 different parts of the pattern, the contents of the options argument
2071 if compilation of a pattern fails, pcre_compile() returns NULL, and
2074 try to free it. Normally, the offset from the start of the pattern to
2081 Some errors are not detected until the whole pattern has been scanned;
2082 in these cases, the offset passed back is the length of the pattern.
2096 compiled pattern, and used again by pcre_exec() and pcre_dfa_exec()
2097 when the pattern is matched. For more discussion, see the section on
2107 "^A.*Z", /* the pattern */
2118 If this bit is set, the pattern is forced to be "anchored", that is, it
2121 achieved by appropriate constructs in the pattern itself, which is the
2127 all with number 255, before each pattern item. For discussion of the
2136 PCRE is built. It can be overridden from within the pattern, or by set-
2137 ting an option when a compiled pattern is matched.
2141 If this bit is set, letters in the pattern match both upper and lower
2143 changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE
2154 If this bit is set, a dollar metacharacter in the pattern matches only
2159 Perl, and no way to set it within a pattern.
2163 If this bit is set, a dot metacharacter in the pattern matches a char-
2168 be changed within a pattern by a (?s) option setting. A negative class
2175 not be unique. This can be helpful for certain types of pattern when it
2182 If this bit is set, most white space characters in the pattern are
2198 within a pattern by a (?x) option setting.
2202 of the pattern, as described in the section entitled "Newline conven-
2204 of comment is a literal newline sequence in the pattern; escape
2210 sequences in a pattern, for example within the sequence (?( that intro-
2217 little use. When set, any backslash in a pattern that is followed by a
2224 within a pattern.
2228 If this option is set, an unanchored pattern is required to match
2238 (1) A lone closing square bracket in a pattern causes a compile-time
2240 as a data character). Thus, the pattern AB]CD becomes illegal when this
2245 tive to fail). A pattern such as (\1)(a) succeeds when this option is
2279 changed within a pattern by a (?m) option setting. If there are no new-
2280 lines in a subject string, or no occurrences of ^ or $ in a pattern,
2285 This option locks out interpretation of the pattern as UTF-8 (or UTF-16
2287 vents the creator of the pattern from switching to UTF interpretation
2288 by starting the pattern with (*UTF). This may be useful in applications
2328 The only time that a line break in a pattern is specially recognized
2341 theses in the pattern. Any opening parenthesis that is not followed by
2360 time, it is remembered with the compiled pattern and assumed at match-
2381 within the pattern.
2385 This option causes PCRE to regard both the pattern and the subject as
2393 When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
2397 pattern is valid, and you want to skip this check for performance rea-
2399 effect of passing an invalid UTF-8 string as a pattern is undefined. It
2417 1 \ at end of pattern
2418 2 \c at end of pattern
2516 If a compiled pattern is going to be used several times, it is worth
2519 tern as its first argument. If studying the pattern produces additional
2527 passed; these are described below in the section on matching a pattern.
2529 If studying the pattern does not produce any useful information,
2546 the pattern is further compiled into machine code that executes much
2566 When you are finished with a pattern, you can free the memory used for
2569 freed with pcre_free(), just like the pattern itself. This will still
2579 re = pcre_compile("pattern", 0, &error, &erroroffset, NULL);
2590 Studying a pattern does two things: first, a lower bound for the length
2591 of subject string that is needed to match the pattern is computed. This
2598 Studying a pattern is also useful for non-anchored patterns that do not
2608 You might want to do this if your pattern contains callouts or (*MARK)
2630 the PCRE_UCP option can be set when a pattern is compiled; this causes
2670 pattern, and the same tables are used via this pointer by pcre_study()
2677 sion below in the section on matching a pattern). This facility is pro-
2681 the reloaded pattern is matched. Attempting to use this facility to
2682 match a pattern in a different locale from the one in which it was com-
2696 pattern. The second argument is the result of pcre_study(), or NULL if
2697 the pattern was not studied. The third argument specifies which piece
2705 PCRE_ERROR_BADENDIANNESS the pattern was compiled with different
2710 The "magic number" is placed at the start of each compiled pattern as
2712 anness error can occur if a compiled pattern is saved and reloaded on a
2714 the length of the compiled pattern:
2729 Return the number of the highest back reference in the pattern. The
2735 Return the number of capturing subpatterns in the pattern. The fourth
2749 a non-anchored pattern. The name of this option refers to the 8-bit
2758 pattern such as (cat|cow|coyote), its value is returned. In the 8-bit
2765 (a) the pattern was compiled with the PCRE_MULTILINE option, and every
2768 (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
2769 set (if it were set, the pattern would be anchored),
2771 -1 is returned, indicating that the pattern matches only at the start
2790 a non-anchored pattern. The fourth argument should point to an int
2794 pattern such as (cat|cow|coyote), 1 is returned, and the character
2798 (a) the pattern was compiled with the PCRE_MULTILINE option, and every
2801 (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
2802 set (if it were set, the pattern would be anchored),
2804 2 is returned, indicating that the pattern matches only at the start of
2810 If the pattern was studied, and this resulted in the construction of a
2818 Return 1 if the pattern contains any explicit matches for CR or LF
2825 Return 1 if the (?J) or (?-J) option setting is used in the pattern,
2831 Return 1 if the pattern was studied with one of the JIT options, and
2834 available in this version of PCRE, or that the pattern was not studied
2836 ticular pattern. See the pcrejit documentation for details of what can
2841 If the pattern was successfully studied with a JIT option, return the
2852 example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
2862 Return 1 if the pattern can match an empty string, otherwise 0. The
2867 If the pattern set a match limit by including an item of the form
2876 lookbehind assertion in the pattern. This information is useful when
2882 is processed. Otherwise, if there are no lookbehinds in the pattern, \A
2887 If the pattern was studied and a minimum length for matching subject
2930 found in the pattern. In the absence of (?| this is the order of
2935 pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is
2953 to be different for each compiled pattern.
2957 Return 1 if the pattern can be used for partial matching with
2966 Return a copy of the options with which the pattern was compiled. The
2969 by any top-level option settings at the start of the pattern itself. In
2971 starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with
2975 A pattern is automatically anchored by PCRE if all of its top-level
2989 If the pattern set a recursion limit by including an item of the form
2997 Return the size of the compiled pattern in bytes (for all three
3003 the pcre structure. Studying a compiled pattern, with or without JIT,
3013 section entitled "Studying a pattern" above). The format of the
3027 lows something of variable length. For example, for the pattern
3044 the data block that contains a compiled pattern. It is provided for the
3047 pattern, but you want to free the block when they are all done.
3049 When a pattern is compiled, the reference count field is initialized to
3057 if a pattern is compiled on one host and then transferred to a host
3068 compiled pattern, which is passed in the code argument. If the pattern
3072 strings with the same pattern.
3079 In most applications, the pattern will have been compiled (and option-
3091 NULL, /* we didn't study the pattern */
3142 search trees. The classic example is a pattern that uses nested unlim-
3152 When pcre_exec() is called with a pattern that was successfully studied
3167 start of a pattern of the form
3195 start of a pattern of the form
3208 then reloaded, because the tables that were used to compile a pattern
3214 that were used when the pattern was compiled. If this is not the case,
3215 the behaviour of pcre_exec() is undefined. Therefore, when a pattern is
3218 matically passed with the compiled pattern from pcre_compile() to
3222 set to point to a suitable variable. If the pattern contains any back-
3226 names are within the compiled pattern; if you wish to retain such a
3227 name you must copy it before freeing the memory of a compiled pattern.
3241 If the pattern was successfully studied with one of the just-in-time
3251 matching position. If a pattern was compiled with PCRE_ANCHORED, or
3261 choice that was made or defaulted when the pattern was compiled.
3270 defaulted when the pattern was compiled. For details, see the descrip-
3274 match failure for an unanchored pattern.
3277 set, and a match attempt for an unanchored pattern fails when the cur-
3278 rent position is at a CRLF sequence, and the pattern contains no
3284 expected. For example, if the pattern is .+A (and the PCRE_DOTALL
3287 However, the pattern [\r\n]A does match that string, because it con-
3298 pattern.
3320 set. If there are alternatives in the pattern, they are tried. If all
3322 example, if the pattern
3334 not at the start of the subject is permitted. If the pattern is
3335 anchored, such a match can occur only if the pattern contains \K.
3338 PCRE_NOTEMPTY_ATSTART, but it does make a special case of a pattern
3362 pre-scan of the subject that takes place before the pattern is run.
3375 operation. Consider the pattern
3388 result is "no match". If the pattern is studied, more start-up opti-
3390 may be recorded. Consider the pattern
3396 finally an empty string. If the pattern is studied, the final attempt
3399 pattern does not affect the overall match result, which is still "no
3469 sets are valid). Unlike the pattern string, the subject may contain
3475 string and setting PCRE_NOTBOL in the case of a pattern that begins
3476 with any kind of lookbehind. For example, consider the pattern
3491 Finding all the matches in a subject is tricky when the pattern can
3502 If a non-zero starting offset is passed when the pattern is anchored,
3504 if the pattern does not require the match to be at the start of the
3509 In general, a pattern matches a certain portion of the subject, and in
3511 parts of the pattern. Following the usage in Jeffrey Friedl's book,
3513 subpattern" is used for a fragment of a pattern that picks out a sub-
3540 portion of the subject string matched by the entire pattern. The next
3563 match. For example, consider the pattern
3577 subpatterns there are in a compiled pattern. The smallest size for
3579 offsets of the substring matched by the whole pattern, is (n+1)*3.
3583 if the string "abc" is matched against the pattern (a|(z))(bc) the
3590 matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
3597 spond to capturing parentheses in the pattern are never changed. That
3598 is, if a pattern contains n capturing parentheses, no more than ovec-
3612 The subject string did not match the pattern.
3627 pattern that was compiled in an environment of one endianness is run in
3633 While running the pattern match, an unknown item was encountered in the
3634 compiled pattern. This error could be caused by a bug in PCRE or by
3635 overwriting of the compiled pattern.
3639 If a pattern contains back references, but the ovector that is passed
3694 PCRE_PARTIAL option was used with a compiled pattern containing items
3701 by a bug in PCRE or by overwriting of the compiled pattern.
3735 the pattern. Specifically, it means that either the whole pattern or a
3744 This error is returned when a pattern that was successfully studied
3751 This error is given if a pattern that was compiled by the 8-bit library
3756 This error is given if a pattern that was compiled and saved is
3758 pcre_pattern_to_host_byte_order() can be used to convert such a pattern
3763 This error is returned when a pattern that was successfully studied
3901 zero extracts the substring that matched the entire pattern, whereas
3964 ber. For example, for this pattern
3971 piled pattern, and the second is the name. The yield of the function is
3987 to the compiled pattern. This is needed in order to gain access to the
3995 Warning: If the pattern uses the (?| feature to set up multiple subpat-
4009 When a pattern is compiled with the PCRE_DUPNAMES option, names for
4028 first argument is the compiled pattern, and the second is the name. The
4034 tion entitled Information about a pattern above. Given all the rele-
4090 against a compiled pattern, using a matching algorithm that scans the
4107 keeping track of multiple paths through the pattern tree. More
4118 NULL, /* we didn't study the pattern */
4179 if the pattern
4209 character repeats at the end of a pattern (as well as internally). For
4210 example, the pattern "a\d+" is compiled as if it were "a\d++" because
4305 ily passing control to the caller of PCRE in the middle of pattern
4314 default value is zero. For example, this pattern has two callout
4319 If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled,
4321 item in the pattern. For example, if PCRE_AUTO_CALLOUT is used with the
4322 pattern
4331 alternation bar. If the pattern contains a conditional group whose con-
4341 Automatic callouts can be used for tracking the progress of pattern
4342 matching. The pcretest program has a pattern qualifier (/C) that sets
4345 to optimize the performance of a particular pattern.
4356 compiled as if it were a++[bc]. The pcretest output when this pattern
4369 passing PCRE_NO_AUTO_POSSESS to pcre_compile(), or starting the pattern
4386 callouts. For example, if the pattern is
4395 If the pattern is studied, PCRE knows the minimum length of a matching
4401 MIZE option to the matching function, or by starting the pattern with
4439 piled into the pattern (that is, the number after ?C for manual call-
4455 modified starting point. If the pattern is not anchored, the callout
4456 function may be called several times from the same point in the pattern
4484 pattern string.
4488 pattern string. When the callout immediately precedes an alternation
4489 bar, a closing parenthesis, or the end of the pattern, the length is
4565 they are not allowed in a pattern string because it is passed as a nor-
4567 the pattern to represent a binary zero.
4573 its pattern matching engine. If any of these are encountered by PCRE,
4620 rounding pattern. This is not always the case in Perl. In particular,
4626 11. If a pattern contains more than one backtracking control verb, the
4627 first one that is backtracked onto acts. For example, in the pattern
4636 captured strings when part of a pattern is repeated. For example,
4637 matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2
4641 pattern names is not as general as Perl's. This is a consequence of the
4643 ble to translate between numbers and names. In particular, a pattern
4690 (e) PCRE_ANCHORED can be used at matching time to force a pattern to be
4713 of a pattern that set overall options that cannot be changed within the
4714 pattern.
4769 set by special items at the start of a pattern. These are not Perl-com-
4770 patible, but are provided to make these options accessible to pattern
4773 together right at the start of the pattern string, and the letters must
4785 pattern must start with one of these special sequences:
4793 libraries. Starting a pattern with such a sequence is equivalent to
4794 setting the relevant option. How setting a UTF mode affects pattern
4805 Another special sequence that may appear at the start of a pattern is
4813 If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as
4821 If a pattern starts with (*NO_START_OPT), it has the same effect as
4846 sequence, the pattern
4850 changes the convention to CR. That pattern matches "a\nb" because LF is
4869 a pattern with nested unlimited repeats) and to avoid running out of
4872 by items at the start of the pattern of the form
4879 pcre_exec() for it to have any effect. In other words, the pattern
4896 A regular expression is a pattern that is matched against a subject
4898 pattern, and match the corresponding characters in the subject. As a
4899 trivial example, the pattern
4914 alternatives and repetitions in the pattern. These are encoded in the
4915 pattern by the use of metacharacters, which do not stand for themselves
4919 nized anywhere in the pattern except within square brackets, and those
4939 Part of a pattern that is in square brackets is called a "character
4960 pattern. This escaping action applies whether or not the following
4970 If a pattern is compiled with the PCRE_EXTENDED option, most white
4971 space in the pattern (other than in a character class), and characters
4974 or # character as part of the pattern.
4991 is not followed by \E later in the pattern, the literal interpretation
4992 continues to the end of the pattern (that is, \E is assumed at the
5001 terminates a pattern, but when a pattern is being prepared by text
5054 if the pattern character that follows is itself an octal digit.
5299 PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.
5303 specify these settings by starting a pattern string with one of the
5312 Perl-compatible, are recognized only at the very start of a pattern,
5315 newline convention; for example, a pattern can start with:
5461 do so by setting the PCRE_UCP option or by starting the pattern with
5541 be included in the final matched sequence. For example, the pattern:
5550 when the pattern
5559 pattern such as (?=ab\K) matches, the reported start of the match can
5620 If all the alternatives of a pattern begin with \G, the expression is
5638 Circumflex need not be the first character of the pattern if a number
5640 alternative in which it appears if the pattern is ever to match that
5642 if the pattern is constrained to match only at the start of the sub-
5643 ject, it is said to be an "anchored" pattern. (There are also other
5644 constructs that can cause a pattern to be anchored.)
5650 last character of the pattern if a number of alternatives are involved,
5667 For example, the pattern /^abc$/ matches the subject string "def\nabc"
5676 and end of the subject in both modes, and if all branches of a pattern
5683 Outside a character class, a dot in the pattern matches any one charac-
5806 ter of a range. A pattern such as [W-]46] is interpreted as a class of
5958 tions" above), and in a Perl-style pattern the preceding or following
5967 example, the pattern
5976 rest of the main pattern as well as the alternative in the subpattern.
5983 within the pattern by a sequence of Perl option letters enclosed
6004 the pattern that follows. If the change is placed right at the start of
6005 a pattern, PCRE extracts it into the global options (and it will there-
6016 in different parts of the pattern. Any changes made in one alternative
6029 some cases the pattern can contain special leading sequences such as
6044 nested. Turning part of a pattern into a subpattern does two things:
6046 1. It localizes a set of alternatives. For example, the pattern
6054 that, when the whole pattern matches, that portion of the subject
6062 string "the red king" is matched against the pattern
6075 matched against the pattern
6101 consider this pattern:
6106 turing parentheses are numbered one. Thus, when the pattern matches,
6121 that is set for that number by any subpattern. The following pattern
6127 to the first one in the pattern with the given number. The following
6128 pattern matches "abcabc" or "defabc":
6154 to capturing parentheses from other parts of the pattern, such as back
6162 to-number translation table from a compiled pattern. There is also a
6165 By default, a name must be unique within a pattern, but it is possible
6172 both cases you want to extract the abbreviation. This pattern (ignoring
6191 elsewhere in the pattern, the subpatterns to which the name refers are
6192 checked in the order in which they appear in the overall pattern. The
6270 in the pattern (but see also the section entitled "Defining subpatterns
6272 have a {0} quantifier are omitted from the compiled pattern.
6295 causing the rest of the pattern to fail. The classic example of where
6299 pattern
6312 the pattern
6325 only way the rest of the pattern matches.
6334 required for the compiled pattern, in proportion to the size of the
6337 If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
6339 the pattern is implicitly anchored, because whatever follows will be
6342 first. PCRE normally treats such a pattern as though it were preceded
6351 reference elsewhere in the pattern, a match at the start may fail where
6357 ter. For this reason, such a pattern is not implicitly anchored.
6361 fail where a later one succeeds. Consider this pattern:
6388 rest of the pattern to match. Sometimes it is useful to prevent this,
6390 than it otherwise might, when the author of the pattern knows there is
6393 Consider, for example, the pattern \d+foo when applied to the subject
6411 This kind of parenthesis "locks up" the part of the pattern it con-
6412 tains once it has matched, and a failure further into the pattern is
6417 the string of characters that an identical standalone pattern would
6424 rest of the pattern match, (?>\d+) can only match an entire sequence of
6455 ple pattern constructs. For example, the sequence A+B is treated as
6459 When a pattern contains an unlimited repeat inside a subpattern that
6462 very long time indeed. The pattern
6479 in the string.) If the pattern is changed so that it uses an atomic
6491 pattern earlier (that is, to its left) in the pattern, provided there
6535 pattern in the current subject string, rather than anything matching
6537 of doing that). So the pattern
6563 A subpattern that is referenced by name may appear in the pattern
6568 references to it always fail by default. For example, the pattern
6576 Because there may be many capturing parentheses in a pattern, all dig-
6578 ence number. If the pattern continues with a digit character, some
6588 patterns. For example, the pattern
6595 work, the pattern must be such that the first iteration does not need
6635 as if it were {0,1}. At run time, the rest of the pattern match is
6656 that the apparently similar pattern
6665 If you want to force a matching failure at some point in a pattern, the
6723 end of subject strings. Consider a simple pattern such as
6729 and then see if what follows matches the rest of the pattern. If the
6730 pattern is specified as
6738 so we are no better off. However, if the pattern is written as
6758 three characters are not "999". This pattern does not match "foo" pre-
6761 foo". A pattern to do that is
6778 is another pattern that matches "foo" preceded by three digits and any
6790 (?(condition)yes-pattern)
6791 (?(condition)yes-pattern|no-pattern)
6793 If the condition is satisfied, the yes-pattern is used; otherwise the
6794 no-pattern (if present) is used. If there are more than two alterna-
6798 applies only at the level of the condition. This pattern fragment is an
6822 Consider the following pattern, which contains non-significant white
6834 yes-pattern is executed and a closing parenthesis is required. Other-
6835 wise, since no-pattern is not present, the subpattern matches nothing.
6836 In other words, this pattern matches a sequence of non-parentheses,
6839 If you were embedding this pattern in a larger one, you could use a
6845 pattern.
6862 Checking for pattern recursion
6865 name R, the condition is true if a recursive call to the whole pattern
6885 skipped if control reaches this point in the pattern; the idea of
6888 example, a pattern to match an IPv4 address such as "192.168.23.245"
6894 The first part of the pattern is a DEFINE group inside which a another
6897 this part of the pattern is skipped because DEFINE acts like a false
6898 condition. The rest of the pattern uses references to the named group
6906 assertion. Consider this pattern, again containing non-significant
6916 otherwise it is matched against the second. This pattern matches
6927 make up a comment play no part in the pattern matching.
6933 newline character or character sequence in the pattern. Which charac-
6938 in the pattern; escape sequences that happen to represent a newline do
6939 not count. For example, consider this pattern when PCRE_EXTENDED is
6945 for a newline in the pattern. The sequence \n is still literal at this
6954 that can be done is to use a pattern that matches up to some fixed
6961 expression itself. A Perl pattern using code interpolation to solve the
6967 refers recursively to the pattern in which it appears.
6970 it supports special syntax for recursion of the entire pattern, and
6982 This PCRE pattern solves the nested parentheses problem (assume the
6989 recursive match of the pattern itself (that is, a correctly parenthe-
6994 If this were part of a larger pattern, you would not want to recurse
6995 the entire pattern, so instead you could use this:
6999 We have put the pattern into parentheses, and caused the recursion to
7000 refer to them instead of the whole pattern.
7002 In a larger pattern, keeping track of parenthesis numbers can be
7004 of (?1) in the pattern above you can write (?-2) to refer to the second
7024 This particular example pattern that we have been looking at contains
7027 tern to strings that do not match. For example, when this pattern is
7040 tion). If the pattern above is matched against
7046 pattern is not matched at the top level, its final captured value is
7050 If there are more than 15 capturing parentheses in a pattern, PCRE has
7056 recursion. Consider this pattern, which matches text in angle brack-
7063 In this pattern, (?(R) is the start of a conditional subpattern, with
7074 illustrated by the following pattern, which purports to match a palin-
7081 characters surrounding a sub-palindrome. In Perl, this pattern works;
7082 in PCRE it does not if the pattern is longer than three characters.
7096 pattern is written with the alternatives in the other order, things are
7108 To change the pattern so that it matches all palindromic strings, not
7110 the pattern to this:
7122 If you want to match typical palindromic phrases, the pattern has to
7127 If run with the PCRE_CASELESS option, this pattern matches phrases such
7147 pattern:
7151 In PCRE, this pattern matches "bab". The first capturing parentheses
7155 In Perl, the pattern fails to match because inside the recursive call
7171 An earlier example pointed out that the pattern
7176 not "sense and responsibility". If instead the pattern
7193 be changed for different calls. For example, consider this pattern:
7240 The default value is zero. For example, this pattern has two callout
7246 outs are automatically installed before each item in the pattern. They
7247 are all numbered 255. If there is a conditional group in the pattern
7259 position in the pattern, and, optionally, one item of data originally
7289 pattern.
7292 them can be used only when the pattern is to be matched using one of
7311 pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
7326 of the pattern. However, when it is inside a subpattern that is called
7363 instances of (*MARK) as you like in a pattern, and their names do not
7429 tracking to reach it. Even if the pattern is unanchored, no further
7442 If there is more than one backtracking verb in a pattern, a different
7447 Note that (*COMMIT) at the start of a pattern is not the same as an
7457 For this pattern, PCRE knows that any match must start with "a", so the
7458 optimization skips along the subject to "a" before applying the pattern
7463 to the first character. The pattern is now applied starting at "x", and
7471 ing to reach it. If the pattern is unanchored, the normal "bumpalong"
7478 any other way. In an anchored pattern (*PRUNE) has the same effect as
7489 the pattern is unanchored, the "bumpalong" advance is not to the next
7507 is triggered, the previous path through the pattern is searched for the
7521 that it can be used for a pattern-based if-then-else block:
7525 If the COND1 pattern matches, FOO is tried (and possibly further items
7541 the enclosing alternative. Consider this pattern, where A, B, etc. are
7542 complex pattern fragments that do not contain any | characters at this
7566 If the subject is "ba", this pattern does not match. Because .*? is
7571 part of the single alternative that comprises the whole pattern, and so
7579 character (for an unanchored pattern). (*SKIP) is similar, except that
7585 If more than one backtracking verb is present in a pattern, the one
7587 tern, where A, B, etc. are complex pattern fragments:
7937 The following are recognized only at the very start of a pattern or
7957 These are recognized only at the very start of the pattern or after
7969 These are recognized only at the very start of the pattern or after
8001 (?R) recurse whole pattern
8019 (?(condition)yes-pattern)
8020 (?(condition)yes-pattern|no-pattern)
8046 so only if the pattern is not anchored.
8101 the PCRE_UTF8 option flag, or the pattern must start with the sequence
8102 (*UTF8) or (*UTF). When either of these is the case, both the pattern
8112 option flag, as appropriate. Alternatively, the pattern must start with
8114 be used with either library. When UTF mode is set, both the pattern and
8174 time or at run time, PCRE assumes that the pattern or subject it is
8179 the check for the pattern; it does not also apply to subject strings.
8204 run time, PCRE assumes that the pattern or subject it is given (respec-
8227 run time, PCRE assumes that the pattern or subject it is given (respec-
8254 JIT optimization is requested for a UTF pattern that contains \C, it
8312 speed up pattern matching. However, it comes at the cost of extra pro-
8314 when the same pattern is going to be matched many times. This does not
8315 necessarily mean many calls of a matching function; if the pattern is
8373 each compiled pattern, and pass the resulting pcre_extra block to
8409 pattern is matched using interpretive code.
8416 ignored, and no JIT data is created. Otherwise, the compiled pattern is
8425 cution. There are also some pattern items that JIT cannot handle.
8437 pattern by calling pcre_fullinfo() with the PCRE_INFO_JIT option. A
8439 means that JIT support is not available, or the pattern was not studied
8441 handle the pattern.
8443 Once a pattern has been studied, with or without JIT, it can be used as
8454 The only unsupported pattern items are \C (match a single data unit)
8461 When a pattern is matched using JIT execution, the return values are
8471 searching a very large pattern tree goes on for too long, as it is in
8482 other data of a compiled pattern. Saving and restoring compiled pat-
8485 run pcre_study() on a saved and restored pattern, and thereby recreate
8488 original pattern.
8510 pattern.
8519 The extra argument must be the result of studying a pattern with
8542 You may safely use the same JIT stack for more than one pattern (either
8599 The owner of the stack is the user program, not the JIT studied pattern
8601 by pcre_exec(), (that is, it is assigned to the pattern currently run-
8610 pcre_exec() again. When you assign the stack to a pattern, only a
8613 call pcre_exec() with a pattern pointing to an already freed stack, as
8616 pattern at any time. You can even free the previous stack before
8628 if a pattern causes stack overflow with a stack of 1M? Is that 1M kept
8655 re = pcre_compile(pattern, 0, &error, &erroffset, NULL);
8726 entire pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances
8732 might be a date in the form ddmmmyy, defined by this pattern:
8762 partial matches on the same pattern. If the appropriate JIT study mode
8766 mizations. PCRE remembers the last literal data unit in a pattern, and
8769 might match only partially. If the pattern was studied, PCRE knows the
8800 ple, consider this pattern:
8804 This pattern matches "123", but only if it is preceded by "abc". If the
8817 ing continues as normal, and other alternatives in the pattern are
8822 tial match. All the various matching items in a pattern behave as if
8828 provides the data that is returned. Consider this pattern:
8861 trated by a pattern such as:
8869 On the other hand, if the pattern is made ungreedy the result is dif-
8882 The second pattern will never match "dogsbody", because it will always
8906 ungreedy pattern shown above:
8917 If a pattern ends with one of sequences \b or \B, which test for word
8919 intuitive results. Consider this pattern:
8943 repeated metasequences. If PCRE_PARTIAL was set for a pattern that did
8947 pattern can be used for partial matching now always returns 1.
8971 plete pattern, but the first two are partial matches. Similar output is
9002 That means that, for an unanchored pattern, if a continued match fails,
9029 \z, \Z, \b, \B, and $. Consider an unanchored pattern that matches
9042 Note: If the pattern contains lookbehind assertions, or \K, or starts
9051 Certain types of pattern may give problems with multi-segment matching,
9054 1. If the pattern contains a test for the beginning of a line, you need
9062 hind assertion later in the pattern could require even earlier charac-
9066 lookbehind in the pattern. This length is given in characters, not
9079 For example, if the pattern "(?<=123)abc" is partially matched against
9083 maximum lookbehind for that pattern is 3, so taking that away from 5
9104 match of an empty string" when the pattern contains lookbehinds.
9110 arises if the pattern ends with \b or \B. Another kind of difference
9151 start with the same pattern item may not work as expected when
9152 PCRE_DFA_RESTART is used. For example, consider this pattern:
9222 match the pattern. The matching functions return PCRE_ERROR_BADENDIAN-
9223 NESS if they detect a pattern with the wrong endianness.
9227 saving and restoring a compiled pattern loses any JIT optimization
9234 memory that holds the compiled pattern and associated data. You can
9238 8-bit library that compiles a pattern and writes it to a file. It
9245 re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
9252 In this example, the bytes that comprise the compiled pattern are
9258 If you want to write more than one pattern to a file, you will have to
9262 binary, one pattern to a line.
9269 If the pattern has been studied, it is also possible to save the normal
9270 study data in a similar way to the compiled pattern itself. However, if
9275 block. Its format is defined in the section on matching a pattern in
9286 Re-using a precompiled pattern is straightforward. Having reloaded it
9292 pattern was compiled (the tableptr argument of pcre[16|32]_compile()),
9297 pattern in the pcreapi documentation.
9300 the same as those that were used when the pattern was compiled. If this
9303 If you did not provide custom character tables when the pattern was
9304 compiled, the pointer in the compiled pattern is NULL, which causes the
9308 If you saved study data with the compiled pattern, you need to create
9313 function in the usual way. If the pattern was studied for just-in-time
9349 cessing time. The way you express your pattern as a regular expression
9357 there is one case where the memory usage of a compiled pattern can be
9360 is repeated in the compiled code. For example, the pattern
9374 an embarrassment. For example, the very simple pattern
9380 limit on a compiled pattern is 64K data units, and this is reached with
9381 the above pattern if the outer repetition is increased from 3 to 4.
9387 of PCRE's "subroutine" facility. Re-writing the above pattern as
9392 even with the outer repetition increased to 100. However, this pattern
9397 speed when executing the modified pattern. Nevertheless, if the atomic
9406 kinds of pattern can cause it to use large amounts of the process
9409 the most frequently raised problem with PCRE. Rewriting your pattern
9427 needs a character's property. If you can find an alternative pattern
9439 When a pattern begins with .* not in parentheses, or in parentheses
9441 is set, the pattern is implicitly anchored by PCRE, since it can match
9445 lines, the pattern may match from the character immediately following
9446 one of them instead of from the very start. For example, the pattern
9455 If you are using such a pattern with subject strings that do not con-
9457 or starting the pattern with ^.* or ^.*? to indicate explicit anchor-
9463 Consider the pattern fragment
9471 the pattern is such that the entire match is going to fail, PCRE has in
9487 with the pattern above. The former gives a failure almost instantly
9520 int regcomp(regex_t *preg, const char *pattern,
9576 The function regcomp() is called to compile a pattern into an internal
9577 form. The pattern is a C string terminated by a binary zero, and is
9578 passed in the argument pattern. The preg argument is a pointer to a
9627 compilation to the native function. This causes the pattern itself and
9686 The function regexec() is called to match a compiled pattern preg
9719 If the pattern was compiled with the REG_NOSUB flag, no data about any
9798 pattern exactly. If pointer arguments are supplied, it copies matched
9847 NULL (the corresponding matched sub-pattern is not copied)
9852 a. "text" matches "pattern" exactly;
9858 string captured as the "i"th sub-pattern. If you pass in
9861 number of sub-patterns, "i"th captured sub-pattern is
9864 CAVEAT: An optional sub-pattern that does not exist in the matched
9898 You can use the "PartialMatch" operation when you want the pattern to
9913 By default, pattern and text are plain text, one byte per character.
9914 The UTF8 flag, passed to the constructor, causes both pattern and
9917 UTF-8 than the pattern, but the match returned may depend on the UTF8
9956 "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not cap-
9997 RE(pattern,
10002 RE(pattern,
10076 You can replace the first match of "pattern" in "str" with "rewrite".
10085 pattern matches and a replacement occurs, false otherwise.
10088 of the pattern in the string with the rewrite. Replacements are not
10097 Extract is like Replace, except that if the pattern matches, "rewrite"
10214 The maximum length of a compiled pattern is approximately 64K data
10281 pattern, in order to remember the state of the match so that it can
10285 circumstances, for example, whenever a parenthesized sub-pattern is
10296 interpretive manner. If the pattern was studied with the
10305 sion or subroutine call in the pattern. This includes the processing of
10320 from the process stack. For certain kinds of pattern and data, very
10323 fore the amount of stack used, by modifying the pattern that is being
10324 matched. Consider, for example, this pattern:
10329 end of the data, and is the kind of pattern that might be used when
10335 rewritten pattern, which matches exactly the same strings:
10384 the smallest limits that allow a particular pattern to match a given