Lines Matching refs:a

19 a list of current states and checking all of them as it advanced through the
20 subject string. In the terminology of Jeffrey Friedl's book, it was a "DFA
21 algorithm", though it was not a traditional Finite State Machine (FSM). When
33 a dummy mode in order to find out how much store will be needed, and then for
46 that used an amount of store bounded by a multiple of the number of characters
48 complexity in Perl regular expressions, I couldn't do this. In any case, a
59 for a 16-bit data quantity, and the word "unit" is used for a quantity that is
60 a byte in 8-bit mode, a short in 16-bit mode and a 32-bit word in 32-bit mode.
68 Up to and including release 6.7, PCRE worked by running a very degenerate first
69 pass to calculate a maximum store size, and then a second pass to do the real
70 compile - which might use a bit less than the predicted amount of memory. The
79 By the time I was working on a potential 6.8 release, the degenerate first pass
82 I had a flash of inspiration as to how I could run the real compile function in
83 a "fake" mode that enables it to compute how much memory it would need, while
84 actually only ever using a few hundred bytes of working memory, and without too
87 should make future maintenance and development easier. As this was such a major
92 depth of parentheses was removed. However, there is a downside: pcre_compile()
94 is doing a full analysis of the pattern. My hope was that this would not be a
97 At release 8.34, a limit on the nesting depth of parentheses was re-introduced
98 (default 250, settable at build time) so as to put a limit on the amount of
99 system stack used by pcre_compile(). This is a safety feature for environments
111 just-in-time (JIT) support, and studying a compiled pattern with JIT is
119 From PCRE 6.0, there is also a supplementary matching function called
120 pcre_dfa_exec(). This implements a DFA matching algorithm that searches
127 The algorithm that is used for pcre_dfa_exec() is not a traditional FSM,
128 because it may have a number of states active at one time. More work would be
129 needed at compile time to produce a traditional FSM where only one state is
147 The compiled form of a pattern is a vector of unsigned units (bytes in 8-bit
154 within the compiled pattern. LINK_SIZE always specifies a number of bytes. The
157 LINK_SIZE values are available only in 8-bit mode.) Specifing a LINK_SIZE
202 OP_PRUNE ) OP_CLOSE, each followed by a count that
214 OP_MARK is followed by the mark name, preceded by a one-unit length, and
215 followed by a binary zero. For (*PRUNE), (*SKIP), and (*THEN) with arguments,
223 The OP_CHAR opcode is followed by a single character that is to be matched
228 If there is only one character in a character class, OP_CHAR or OP_CHARI is
229 used for a positive class, and OP_NOT or OP_NOTI for a negative one (that is,
230 for something like [^a]).
236 The common repeats (*, +, ?), when applied to a single character, use the
262 Each of these is followed by a count and then the repeated character. OP_UPTO
263 matches from 0 to the given number. A repeat with a non-zero minimum and a
268 etc.) are used for repeated, negated, single-character classes such as [^a]*.
277 that instead of a character, the opcode for the type is stored in the data
298 OP_PROP and OP_NOTPROP are used for positive and negative matches of a
300 Each is followed by two units that encode the desired property as a type and a
301 value. The types are a set of #defines of the form PT_xxx, and the values are
314 If there is only one character in a class, OP_CHAR or OP_CHARI is used for a
315 positive class, and OP_NOT or OP_NOTI for a negative one (that is, for
316 something like [^a]).
322 When there is more than one character in a class, and all the code points are
323 less than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a
324 negative one. In either case, the opcode is followed by a 32-byte (16-short,
325 8-word) bit map containing a 1 bit for every character that is acceptable. The
334 \p or \P, OP_XCLASS is used. It optionally uses a bit map if any code points
335 are less than 256, followed by a list of pairs (for a range) and single
338 OP_XCLASS is followed by a unit containing flag bits: XCL_NOT indicates that
339 this is a negative class, and XCL_MAP indicates that a bit map is present.
340 There follows the bit map, if XCL_MAP is set, and then a sequence of items
346 XCL_PROP a Unicode property (type, value) follows
347 XCL_NOTPROP a Unicode property (type, value) follows
349 If a range starts with a code point less than 256 and ends with one greater
351 This means that if no other items in the class set bits in the map, a map is
358 OP_REF (caseful) or OP_REFI (caseless) is followed by a count containing the
359 reference number if the reference is to a unique capturing group (either by
361 group with the same name. In this case, a reference by name generates OP_DNREF
403 capturing brackets and it used a different opcode for each one. From release
412 number is a count that immediately follows the offset.
417 LINK_SIZE bytes giving (as a positive number) the offset back to the matching
420 If a subpattern is quantified such that it is permitted to match zero times, it
423 subpattern entirely is a valid branch. In the case of the first two, not
425 when a pattern has the quantifier {0,0}. It cannot be entirely discarded,
426 because it may be called as a subroutine from elsewhere in the regex.
433 A subpattern with a bounded maximum repetition is replicated in a nested
439 When a repeated subpattern has an unbounded upper limit, it is checked to see
449 When a repeated group (capturing or non-capturing) is marked as possessive by
452 of OP_SCBRA. The end of such a group is marked by OP_KETRPOS. If the minimum
462 for when there is a backtrack to before the group - any captures within the
478 is OP_REVERSE, followed by a count of the number of characters to move back the
479 pointer in the subject string. In ASCII mode, the count is a number of units,
482 each alternative of a lookbehind assertion, allowing them to have different
491 the condition is a back reference, this is stored at the start of the
492 subpattern using the opcode OP_CREF followed by a count containing the
493 reference number, provided that the reference is to a unique capturing group.
500 subpattern using the opcode OP_RREF (with a value of zero for "the whole
501 pattern") or OP_DNRREF (with data as for OP_DNCREF). For a DEFINE condition,
502 just the single unit OP_DEF is used (it has no associated data). Otherwise, a
514 not strictly a recursion.
520 OP_CALLOUT is followed by one unit of data that holds a callout number in the
522 cases there follows a count giving the offset in the pattern string to the