Lines Matching refs:by

33        possible was done by Zoltan Herczeg.
38 32-bit libraries. The work to make this possible was done by Christian
75 not supported by PCRE are given in separate documents. See the pcrepat-
87 data tables that are used by more than one of the exported external
88 functions, but which are not intended for use by external callers.
121 Another way that performance can be hit is by running a pattern that
177 followed by the two digits 10, at the domain cam.ac.uk.
306 work to make this possible was done by Zoltan Herczeg. The two
333 normally be accesss by adding -lpcre16 to the command for linking an
354 The type of the user-accessible structure that is returned by
380 byte order. This may be changed by byte-order marks (BOMs) anywhere in
400 returned by the matching functions are in also 16-bit units rather than
432 types for characters less than 0xff can therefore be influenced by the
441 A UTF-16 string can indicate its endianness by special code knows as a
471 passed back by pcre16_compile() or pcre16_compile2() is still an 8-bit
492 When PCRE is being built, the RunTest script that is called by "make
636 well as or instead of the original 8-bit library. This work was done by
637 Christian Persch, based on the work done by Zoltan Herczeg for the
664 normally be accesss by adding -lpcre32 to the command for linking an
685 The type of the user-accessible structure that is returned by
711 byte order. This may be changed by byte-order marks (BOMs) anywhere in
731 returned by the matching functions are in also 32-bit units rather than
763 ter types for characters less than 0xff can therefore be influenced by
771 A UTF-32 string can indicate its endianness by special code knows as a
799 passed back by pcre32_compile() or pcre32_compile2() is still an 8-bit
820 When PCRE is being built, the RunTest script that is called by "make
864 "by hand") in the text file called NON-AUTOTOOLS-BUILD. You should
874 lected by providing options to configure before running the make com-
880 by editing the config.h file, or by passing parameter settings to the
885 obtained by running
904 gle-unit characters or UTF-16 strings, by adding
911 UTF-32 strings, by adding
928 and static libraries by default. You can suppress one of these by
942 8-bit strings). You can disable this by adding
997 Just-in-time compiler support is included in the build by specifying
1016 systems. You can compile PCRE to use carriage return (CR) instead, by
1024 Alternatively, you can specify that line endings are to be indicated by
1029 to the configure command. There is a fourth option, specified by
1034 CRLF as indicating a line ending. Finally, a fifth option, specified by
1067 longer used is 10; it can be changed by adding a setting such as
1083 use three-byte or four-byte offsets by adding a setting such as
1098 ing by making recursive calls to an internal function called match().
1130 be placed on the resources used by a single call to pcre_exec(). The
1132 tation. The default is 10 million, but this can be changed by adding a
1146 by adding, for example,
1169 have to do so "by hand".)
1174 PCRE assumes by default that it will run in an environment where the
1177 ever, be compiled to run in an EBCDIC environment by adding
1187 the value 0x15 by default. However, in some EBCDIC environments, 0x25
1206 with libz or libbz2, respectively, by adding one or both of
1220 it finds a match. The size of the buffer is controlled by a parameter
1224 You can change the default parameter value by adding, for example,
1229 this value by specifying a run-time option.
1361 ject string. The "standard" algorithm is the one provided by the
1368 An alternative algorithm is provided by the pcre_dfa_exec(),
1391 The set of strings that are matched by a regular expression can be rep-
1397 matching algorithms provided by PCRE.
1411 branches are tried is controlled by the greedy or ungreedy nature of
1423 strings that are matched by portions of the pattern in parentheses.
1431 string from left to right, once, character by character, and as it does
1472 supported by the alternative matching algorithm. They are as follows:
1482 This pattern matches "aaab!" but not "aaa!", which would be matched by
1534 possible to do multi-segment matching using the standard algorithm by
1686 8.32), by means of two additional libraries. They can be built as well
1696 replaced by UTF16 or UTF32, respectively. This facility is in fact just
1719 libpcre. It can normally be accessed by adding -lpcre to the command
1742 request that it be used if available, by setting an option that is
1763 string that is matched by pcre_exec(). They are:
1812 by the caller to a "callout" function, which PCRE will then call at
1817 set by the caller to a function that is called by PCRE whenever it
1835 Each of the first three conventions is used by at least one operating
1841 At compile time, the newline convention can be specified by the options
1842 argument of pcre_compile(), or it can be specified by special text at
1856 which is controlled in a similar way, but by separate options.
1862 the proviso that the memory management functions pointed to by
1864 callout and stack-checking functions pointed to by pcre_callout and
1865 pcre_stack_guard, are shared by all threads.
1868 ing, so the same compiled pattern can safely be used by several threads
1879 later time, possibly by a different program, and even on a host other
1959 the \R escape sequence matches by default. A value of 0 means that \R
1988 into account the stack that may already be used by the calling applica-
2008 running pcre_exec() is implemented by recursive function calls that use
2034 The pattern is a C string terminated by a binary zero, and is passed in
2061 sets the variable pointed to by errptr to point to a textual error mes-
2065 placed in the variable pointed to by erroffset, which must not be NULL
2085 compiled pattern, and used again by pcre_exec() and pcre_dfa_exec()
2110 achieved by appropriate constructs in the pattern itself, which is the
2125 PCRE is built. It can be overridden from within the pattern, or by set-
2132 changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE
2157 be changed within a pattern by a (?s) option setting. A negative class
2187 within a pattern by a (?x) option setting.
2189 Which characters are interpreted as newlines is controlled by the
2190 options passed to pcre_compile() or by a special sequence at the start
2206 little use. When set, any backslash in a pattern that is followed by a
2209 backslash followed by a letter with no special meaning is treated as a
2210 literal. (Perl can, however, be persuaded to give an error for this, by
2212 controlled by this option. It can also be set by a (?X) option setting
2228 error, because this is illegal in JavaScript (by default it is treated
2233 an empty string (by default this causes the current matching alterna-
2235 set (assuming it can find an "a" in the subject), whereas it fails by
2238 (3) \U matches an upper case "U" character; by default \U causes a com-
2241 (4) \u matches a lower case "u" character unless it is followed by four
2246 (5) \x matches a lower case "x" character unless it is followed by two
2250 for example, \xz matches a binary zero character followed by z).
2268 changed within a pattern by a (?m) option setting. If there are no new-
2277 by starting the pattern with (*UTF). This may be useful in applications
2289 newline is indicated by a single character (CR or LF, respectively).
2290 Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
2330 theses in the pattern. Any opening parenthesis that is not followed by
2331 ? behaves as if it were followed by ?: but named parentheses can still
2368 are not greedy by default, but become greedy if followed by "?". It is
2369 not compatible with Perl. It can also be set by a (?U) option setting
2398 The following table lists the error codes than may be returned by
2399 pcre_compile2(), along with the error messages that may be returned by
2434 29 (?R or (?[+-]digits must be followed by )
2463 57 \g is not followed by a braced, angle-bracketed, or quoted
2464 name/number or by a plain number
2477 68 \c must be followed by an ASCII character
2478 69 \k is not followed by a braced, angle-bracketed, or quoted name
2487 78 setting UTF is disabled by the application
2515 tains other fields that can be set by the caller before the block is
2519 pcre_study() returns NULL by default. In that circumstance, if the
2542 terns the benefit of faster execution might be offset by a much slower
2543 study time. Not all patterns can be optimized by the JIT compiler. For
2556 the study data by calling pcre_free_study(). This function was added to
2583 avoid wasting time by trying to match strings that are shorter than the
2595 and the information is also used by the JIT compiler. The optimiza-
2596 tions can be disabled by setting the PCRE_NO_START_OPTIMIZE option.
2613 letters, digits, or whatever, by reference to a set of tables, indexed
2614 by character code point. When running in UTF-8 mode, or in the 16- or
2634 The internal tables can always be overridden by tables supplied by the
2639 External tables are built by calling the pcre_maketables() function,
2659 pattern, and the same tables are used via this pointer by pcre_study()
2660 and also by pcre_exec() and pcre_dfa_exec(). Thus, for any single pat-
2731 information call is provided for internal use by the pcre_study() func-
2732 tion. External callers can cause PCRE to use its internal tables by
2856 If the pattern set a match limit by including an item of the form
2893 strings by name. It is also possible to extract the data directly, by
2897 described by these three values.
2924 pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is
2958 by any top-level option settings at the start of the pattern itself. In
2964 A pattern is automatically anchored by PCRE if all of its top-level
2974 by pcre_fullinfo().
2978 If the pattern set a recursion limit by including an item of the form
2989 by pcre_compile(). The value that is passed as the argument to
2991 the compiled data is the value returned by this option plus the size of
2993 does not alter the value returned by this option.
2998 pointed to by the study_data field in a pcre_extra block. If pcre_extra
3000 ment should point to a size_t variable. The study_data field is set by
3039 zero. It is changed only by calling this function, whose action is to
3124 returned by pcre_study(), together with the appropriate flag bits. You
3125 should not set these yourself, but you may add to the block by setting
3135 repeatedly (sometimes recursively). The limit set by match_limit is
3150 cases. You can override the default by suppling pcre_exec() with a
3155 A value for the match limit may also be supplied by an item at the
3161 d is less than the limit set by the caller of pcre_exec() or, if no
3178 match_limit. You can override the default by suppling pcre_exec() with
3183 A value for the recursion limit may also be supplied by an item at the
3189 d is less than the limit set by the caller of pcre_exec() or, if no
3214 nated) is placed in the variable pointed to by the mark field. The
3217 If there is no name to pass back, the variable pointed to by the mark
3241 turned out to be anchored by virtue of its contents, it cannot be made
3269 advanced by two characters instead of one, in other words, to after the
3277 tains an explicit CR or LF reference, and so advances only by one char-
3330 matching a null string by first trying the match again at the same off-
3332 fails, by advancing the starting offset (see below) and trying an ordi-
3336 and if so, and the current character is CR followed by LF, advance the
3337 starting offset by two characters instead of one.
3427 matching continues by testing any remaining alternatives. Only if no
3444 The string to be matched by pcre_exec()
3455 and this is by far the most common case. In UTF-8 or UTF-16 mode, the
3462 in the same subject by calling pcre_exec() again after a previous suc-
3478 discover that it is preceded by a letter.
3481 match an empty string. It is possible to emulate Perl's /g behaviour by
3488 the current character is CR followed by LF, advance the starting offset
3489 by two characters instead of one.
3499 addition, further substrings from the subject may be picked out by
3513 of the vector is used as workspace by pcre_exec() while matching cap-
3529 portion of the subject string matched by the entire pattern. The next
3531 returned by pcre_exec() is one more than the highest numbered pair that
3568 offsets of the substring matched by the whole pattern, is (n+1)*3.
3588 tor[0] to ovector[2n+1] are set by pcre_exec(). The other elements (in
3623 compiled pattern. This error could be caused by a bug in PCRE or by
3640 This error is used by the pcre_copy_substring(), pcre_get_substring(),
3642 returned by pcre_exec().
3646 The backtracking limit, as specified by the match_limit field in a
3652 This error is never generated by pcre_exec() itself. It is provided for
3653 use by callout functions that want to yield a distinctive error code.
3690 by a bug in PCRE or by overwriting of the compiled pattern.
3698 The internal recursion limit, as specified by the match_limit_recursion
3740 This error is given if a pattern that was compiled by the 8-bit library
3764 Error numbers -16 to -20, -22, and 30 are not used by pcre_exec().
3788 nally defined by RFC 2279) allows for up to 6 bytes, and this is
3804 A character that is valid by the RFC 2279 rules is either 5 or 6 bytes
3805 long; these code points are excluded by RFC 3629.
3810 are excluded by RFC 3629.
3815 range of code points are reserved by RFC 3629 for use with UTF-16, and
3825 for a value that can be represented by fewer bytes, which is invalid.
3862 Captured substrings can be accessed directly by using the offsets
3863 returned by pcre_exec() in ovector. For convenience, the functions
3867 by number. The next section describes functions for extracting named
3872 string. However, you can process such a string by referring to the
3873 length that is returned by pcre_copy_substring() and pcre_get_sub-
3882 were captured by the match, including the substring that matched the
3883 entire regular expression. This is the value returned by pcre_exec() if
3886 be the number of elements in the vector divided by three.
3892 string(), the string is placed in buffer, whose length is given by
3911 the list of string pointers. The end of the list is marked by a NULL
3923 string by inspecting the appropriate offset in ovector, which is nega-
3927 string_list() can be used to free the memory returned by a previous
3929 tively. They do nothing more than call the function pointed to by
3952 To extract a substring by name, you first have to find associated num-
3959 name by calling pcre_get_stringnumber(). The first argument is the com-
3970 named functions that extract by number. As these are described in the
4000 allowed for subpatterns with the same number, created by using the (?|
4018 third and fourth are pointers to variables which are updated by the
4035 need to find all possible matches, you can kludge it up by making use
4051 that is used by pcre_exec(), to help them set recursion limits, as
4053 by pcretest when called with the -m and -C options is obtained by call-
4187 been saved by giving this only once, but it was decided to retain some
4295 matching. The caller of PCRE provides an external function by putting
4302 identified by putting a number less than 256 after the letter C. The
4357 tracks do not occur. You can disable the auto-possessify feature by
4389 You can disable these optimizations by passing the PCRE_NO_START_OPTI-
4390 MIZE option to the matching function, or by starting the pattern with
4398 tion defined by pcre_callout or pcre[16|32]_callout is called (if it is
4432 passed by the caller to the matching function. When pcre_exec() or
4507 reserved for use by callout functions; it will never be used by PCRE
4555 mal C string, terminated by zero. The escape sequence \0 can be used in
4559 \U, and \N when followed by a character name or Unicode value. (\N on
4561 are implemented by Perl's general string-handling and are not part of
4562 its pattern matching engine. If any of these are encountered by PCRE,
4563 an error is generated by default. However, if the PCRE_JAVASCRIPT_COM-
4585 \Qabc$xyz\E abc$xyz abc followed by the
4671 (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
4676 fiers is inverted, that is, by default they are not greedy, but if fol-
4677 lowed by a question mark they are.
4687 CRLF by the PCRE_BSR_ANYCRLF option.
4693 (j) Patterns compiled by PCRE can be saved and re-used at a later time,
4695 does not apply to optimized data created by the just-in-time compiler.
4730 by PCRE are described in detail below. There is a quick-reference syn-
4740 Expressions", published by O'Reilly, covers regular expressions in
4744 This document discusses the patterns that are supported by PCRE when
4758 set by special items at the start of a pattern. These are not Perl-com-
4805 the repeated item. For example, by default a+b is treated as a++b. For
4824 It is also possible to specify a newline convention by starting a pat-
4829 (*CRLF) carriage return, followed by linefeed
4857 are provoked by patterns with huge matching trees (a typical example is
4859 system stack by too much recursion. When one of these limits is
4861 by items at the start of the pattern of the form
4867 ting must be less than the value set (or defaulted) by the caller of
4869 writer can lower the limits set by the programmer, but not raise them.
4904 pattern by the use of metacharacters, which do not stand for themselves
4915 . match any character except newline (by default)
4934 [ POSIX character class (only if followed by POSIX
4943 The backslash character has several uses. Firstly, if it is followed by
4966 ters, you can do so by putting them between \Q and \E. This is differ-
4973 \Qabc$xyz\E abc$xyz abc followed by the
4979 classes. An isolated \E that is not preceded by \Q is ignored. If \Q
4980 is not followed by \E later in the pattern, the literal interpretation
4990 terminates a pattern, but when a pattern is being prepared by text
5028 sequence \0\x\07 specifies two binary zeros followed by a BEL character
5032 The escape \o must be followed by a sequence of octal digits, enclosed
5038 For greater clarity and unambiguity, it is best to avoid following \ by
5043 The handling of a backslash followed by a digit other than 0 is compli-
5066 \0113 is a tab followed by the character "3"
5075 syntax must not be introduced by a leading zero, because no more than
5078 By default, after \x that is not followed by {, from zero to two hexa-
5085 is as just described only when it is followed by two hexadecimal dig-
5087 mode, support for code points greater than 256 is provided by \u, which
5088 must be followed by four hexadecimal digits; otherwise it matches a
5091 Characters whose value is less than 256 can be defined by either of the
5092 two syntaxes for \x (or by \u in JavaScript mode). There is no differ-
5119 they are treated as the literal characters "B", "R", and "X" by
5125 In Perl, the sequences \l, \L, \u, and \U are recognized by its string
5129 \u can be used to define a character by code point, as described in the
5134 The sequence \g followed by an unsigned or a negative number, option-
5141 For compatibility with Oniguruma, the non-Perl syntax \g followed by a
5165 not set. Perl also uses \N to match characters by name; PCRE does not
5188 trolled by PCRE's low-valued character tables, and may vary if locale-
5192 are used for accented letters, and these are then matched by \w. The
5216 ASCII characters by default, these always match certain high-valued
5255 Outside a character class, by default, the escape sequence \R matches
5263 CR followed by LF, or one of the single characters LF (linefeed,
5274 the complete set of Unicode line endings) by setting the option
5279 specify these settings by starting a pattern string with one of the
5286 tion, but they can themselves be overridden by options given to a
5297 an unrecognized escape sequence, and so matches the letter "R" by
5312 The property names represented by xx above are limited to the Unicode
5316 sicalSymbols" are not currently supported by PCRE. Note that \P{Any}
5351 ified by a two-letter abbreviation. For compatibility with Perl, nega-
5352 tion can be specified by including a circumflex between the opening
5416 so cannot be tested by PCRE, unless UTF validity checking has been
5422 \p{Letter}) are not supported by PCRE, nor is it permitted to prefix
5433 Matching characters by Unicode property is not fast, because PCRE has
5436 not use Unicode properties in PCRE by default, though you can make them
5437 do so by setting the PCRE_UCP option or by starting the pattern with
5450 by zero or more characters with the "mark" property. Characters with
5455 cated kinds of composite character by giving each character a grapheme
5471 be followed by an L, V, LV, or LVT character; an LV or V character may
5472 be followed by a V or T character; an LVT or T character may be follwed
5473 only by a T character.
5505 ter that can be represented by a Universal Character Name in C++ and
5556 character class, by default it matches the corresponding literal char-
5565 UTF mode, the meanings of \w and \W can be changed by setting the
5575 tions are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options, which
5584 the start point of the match, as specified by the startoffset argument
5624 before a newline at the end of the string (by default). Note, however,
5631 very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at
5660 ter in the subject string except (by default) a character that signi-
5665 not match CR if it is immediately followed by LF, but otherwise it
5680 affected by the PCRE_DOTALL option. In other words, it matches any
5682 \N to match characters by name; PCRE does not support this.
5697 (and by default it checks this at the start of processing unless the
5720 character's individual bytes are then captured by the appropriate num-
5726 An opening square bracket introduces a character class, terminated by a
5728 cial by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set,
5736 character must be in the set of characters defined by the class, unless
5738 case the subject character must not be in the set defined by the class.
5745 characters that are in the class by enumerating those that are not. A
5752 or by using the \x{ escaping mechanism.
5783 two characters ("W" and "-") followed by a literal string "46]", so it
5786 preted as a class containing a range followed by two other characters.
5818 as the literal characters "B", "N", "R", and "X" by default, but cause
5840 enclosed by [: and :] within the enclosing square brackets. PCRE also
5873 by a ^ character after the colon. For example,
5884 character properties are used. This is achieved by replacing certain
5885 POSIX classes by other sequences, as follows:
5959 within the pattern by a sequence of Perl option letters enclosed
5968 ble to unset these options by preceding the letter with a hyphen, and a
5975 can be changed in the same way as the Perl-compatible options by using
5982 fore show up in data extracted by the pcre_fullinfo() function).
6003 Note: There are other PCRE-specific options that can be set by the
6019 Subpatterns are delimited by parentheses (round brackets), which can be
6048 by a question mark and a colon, the subpattern does not do any captur-
6097 that is set for that number by any subpattern. The following pattern
6118 Identifying capturing parentheses by number is simple, but it can be
6131 references, recursion, and conditions, can be made by name as well as
6132 by number.
6139 convenience function for extracting a captured substring by name.
6142 to relax this constraint by setting the PCRE_DUPNAMES option at compile
6161 The convenience function for extracting the data by name returns the
6184 true. This is the same behaviour as testing by number. For further
6198 Repetition is specified by quantifiers, which can follow any of the
6213 ber of permitted matches, by giving the two numbers in curly brackets
6214 (braces), separated by a comma. The numbers must be less than 65536,
6238 of which is represented by a two-byte sequence in a UTF-8 string. Simi-
6247 for use by reference only" below). Items other than subpatterns that
6257 It is possible to construct infinite loops by following a subpattern
6274 characters may appear. An attempt to match C comments by applying the
6286 However, if a quantifier is followed by a question mark, it ceases to
6300 which matches one digit by preference, but can match two if that is the
6304 Perl), the quantifiers are not greedy by default, but individual ones
6305 can be made greedy by following them with a question mark. In other
6319 by \A.
6443 digits, or digits enclosed in <>, followed by either ! or ?. When it
6465 Outside a character class, a backslash followed by a digit greater than
6489 must be followed by an unsigned number or a negative number, optionally
6507 are created by joining together fragments that contain references
6539 A subpattern that is referenced by name may appear in the pattern
6544 references to it always fail by default. For example, the pattern
6573 the example above, or by a quantifier with a minimum of zero.
6626 matches a word followed by a semicolon, but does not include the semi-
6631 matches any occurrence of "foo" that is not followed by "bar". Note
6636 does not find an occurrence of "bar" that is preceded by something
6654 does find an occurrence of "bar" that is not preceded by "foo". The
6683 to temporarily move the current position back by the fixed length and
6730 matches "foo" preceded by three digits that are not "999". Notice that
6735 ceded by six characters, the first of which are digits and the last
6749 matches an occurrence of "baz" that is preceded by "bar" which in turn
6750 is not preceded by "foo", while
6754 is another pattern that matches "foo" preceded by three digits and any
6783 Checking for a used subpattern by number
6792 most recently opened parentheses can be referenced by (?(-1), the next
6793 most recent by (?(-2), and so on. Inside loops it can also make sense
6823 Checking for a used subpattern by name
6826 used subpattern by name. For compatibility with earlier versions of
6842 or any subpattern has been made. If digits or a name preceded by amper-
6856 Defining subpatterns for use by reference only
6889 optional sequence of non-letters followed by a letter. In other words,
6900 by PCRE. In both cases, the start of the comment must not be in a char-
6910 ters are interpreted as newlines is controlled by the options passed to
6911 a compiling function or by a special sequence at the start of the pat-
6935 sions to recurse (amongst other things). It does this by interpolating
6951 A special item that consists of (? followed by a number greater than
6979 tricky. This is made easier by the use of relative references. Instead
6985 It is also possible to refer to subsequently opened parentheses, by
7028 by using pcre_malloc, freeing it via pcre_free afterwards. If no memory
7050 illustrated by the following pattern, which purports to match a palin-
7137 If the syntax for a recursive subpattern call (either by number or by
7179 For compatibility with Oniguruma, the non-Perl syntax \g followed by a
7188 PCRE supports an extension to Oniguruma: if a number is preceded by a
7208 an external function by putting its entry point in the global variable
7236 supplied by the caller of the matching function. The callout function
7257 ing parenthesis followed by an asterisk. They are generally of the form
7272 encountered by a DFA matching function.
7280 PCRE contains some optimizations that are used to speed up matching by
7286 by setting the PCRE_NO_START_OPTIMIZE option when calling pcre_com-
7287 pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
7297 be followed by a name.
7314 tured by the outer parentheses.
7403 This verb, which may not be followed by a name, causes the whole match
7406 attempts to find a match by advancing the starting point take place. If
7436 ond set of data, the escape sequence \Y is interpreted by the pcretest
7489 Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
7490 ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
7663 ported by PCRE are described in the pcrepattern documentation. This
7775 represented by a Universal Character Name
7829 In PCRE, POSIX character set names recognize only ASCII characters by
7928 the limits set by the caller of pcre_exec(), not increase them.
7938 (*CRLF) carriage return followed by linefeed
7964 \n reference by number (can be ambiguous)
7965 \gn reference by number
7966 \g{n} reference by number
7967 \g{-n} relative reference by number
7968 \k<name> reference by name (Perl)
7969 \k'name' reference by name (Perl)
7970 \g{name} reference by name (Perl)
7971 \k{name} reference by name (.NET)
7972 (?P=name) reference by name (Python)
7978 (?n) call subpattern by absolute number
7979 (?+n) call subpattern by relative number
7980 (?-n) call subpattern by relative number
7981 (?&name) call subpattern by name (Perl)
7982 (?P>name) call subpattern by name (Python)
7983 \g<name> call subpattern by name (Oniguruma)
7984 \g'name' call subpattern by name (Oniguruma)
7985 \g<n> call subpattern by absolute number (Oniguruma)
7986 \g'n' call subpattern by absolute number (Oniguruma)
7987 \g<+n> call subpattern by relative number (PCRE extension)
7988 \g'+n' call subpattern by relative number (PCRE extension)
7989 \g<-n> call subpattern by relative number (PCRE extension)
7990 \g'-n' call subpattern by relative number (PCRE extension)
8069 and UTF-32 (from release 8.32), by means of two additional libraries.
8115 may optionally be prefixed by "Is", for compatibility with Perl 5.6.
8121 and subjects are (by default) checked for validity on entry to the rel-
8132 Characters in the "Surrogate Area" of Unicode are reserved for use by
8134 greater than 0xFFFF. The code points that are encoded by UTF-16 pairs
8165 are passed as patterns and subjects are (by default) checked for valid-
8188 are passed as patterns and subjects are (by default) checked for valid-
8210 1. Codepoints less than 256 can be specified in patterns by either
8229 supported in UTF mode by the JIT optimization of pcre[16|32]_exec(). If
8231 will not succeed, and so the matching will be carried out by the normal
8235 test characters of any code value, but, by default, the characters that
8299 used. The code for this support was written by Zoltan Herczeg.
8329 port is available by calling pcre_config() with the PCRE_CONFIG_JIT
8413 pattern by calling pcre_fullinfo() with the PCRE_INFO_JIT option. A
8438 the same as those given by the interpretive pcre_exec() code, with the
8446 The error code PCRE_ERROR_MATCHLIMIT is returned by the JIT code if
8450 code is never returned by JIT execution.
8455 The code that is generated by the JIT compiler is architecture-spe-
8482 allocated by mmap or VirtualAlloc.)
8515 determine whether a match operation was executed by JIT or by the
8519 by assigning directly or by callback), as long as the patterns are all
8529 matching by multiple threads at the same time. For example, you can
8577 by pcre_exec(), (that is, it is assigned to the pattern currently run-
8578 ning), that stack must not be used by any other threads (to avoid over-
8585 You can free a JIT stack at any time, as long as it will not be used by
8590 that will cause SEGFAULT. (Also, do not free a stack currently used by
8653 patterns that have been successfully studied by JIT).
8679 Philip Hazel (FAQ by Zoltan Herczeg)
8708 might be a date in the form ddmmmyy, defined by this pattern:
8712 If the application sees the user's keystrokes one by one, and can check
8714 raise an error as soon as a mistake is made, by beeping and not
8721 PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and
8780 This pattern matches "123", but only if it is preceded by "abc". If the
8837 trated by a pattern such as:
8852 match. It might be easier to follow this explanation by thinking of the
8864 The DFA functions move along the subject string character by character,
8957 is possible to continue the match by providing additional subject data
9039 ters to be inspected. You can handle this case by using the
9052 attempt started at the offsets[2] character by setting the startoffset
9157 match at offset n in the first buffer is followed by "no match" when
9209 The value returned by pcre[16|32]_compile() points to a single block of
9211 find the length of this block in bytes by calling
9254 block itself). The length of the study data can be obtained by calling
9290 optimization, that data cannot be saved, and so is lost by a
9331 Patterns are compiled by PCRE into a reasonably efficient interpretive
9417 is set, the pattern is implicitly anchored by PCRE, since it can match
9432 tain newlines, the best performance is obtained by setting PCRE_DOTALL,
9459 used. You can see the difference by comparing the behaviour of
9518 called pcreposix.a, so can be accessed by adding -lpcreposix to the
9529 There are also some other options that are not defined by POSIX. These
9553 form. The pattern is a C string terminated by a binary zero, and is
9559 defined by the following macros:
9612 It does not affect the way newlines are matched by . (they are not) or
9613 by a negative class such as [^a] (they are).
9655 The default POSIX newline handling can be obtained by setting
9663 against a given string, which is by default terminated by a zero byte
9689 nmatch. This is a BSD extension, compatible with but not specified by
9722 by a binary zero is placed in errbuf. The length of the message,
9763 The C++ wrapper for PCRE was provided by Google Inc. Some additional
9764 functionality was added by Giuseppe Maxia. This brief man page was con-
9931 (*) Both Perl and PCRE allow non capturing parentheses by means of the
9940 instance, PCRE_CASELESS is handled by
9969 ments and creates a set of flags that are off by default. The optional
10026 could extract all words from a string by repeatedly calling
10082 The C++ wrapper was contributed by Google Inc.
10158 This is caused by the way shared library support works on those sys-
10228 the size of a subject string that can be processed by certain patterns.
10284 complexity of pcre[16|32]_dfa_exec() is controlled by the amount of
10299 fore the amount of stack used, by modifying the pattern that is being
10307 either one character that is not "<" or a "<" that is not followed by
10317 sion happens only when a "<" character that is not followed by "inet"
10333 stack, PCRE obtains and frees memory by calling the functions that are
10334 pointed to by the pcre[16|32]_stack_malloc and pcre[16|32]_stack_free
10361 subject string. This is done by calling pcre[16|32]_exec() repeatedly
10370 actually needed. A better approximation can be obtained by running this
10393 common. You can find your default limit by running the command:
10399 mally increase the limit on stack size by code such as this: