xref: /PHP-7.3/ext/mbstring/oniguruma/README.md (revision 1979c5d1)
1[![Build Status](https://travis-ci.org/kkos/oniguruma.svg?branch=master)](https://travis-ci.org/kkos/oniguruma)
2[![Code Quality: Cpp](https://img.shields.io/lgtm/grade/cpp/g/kkos/oniguruma.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/kkos/oniguruma/context:cpp)
3[![Total Alerts](https://img.shields.io/lgtm/alerts/g/kkos/oniguruma.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/kkos/oniguruma/alerts)
4
5Oniguruma
6=========
7
8https://github.com/kkos/oniguruma
9
10Oniguruma is a modern and flexible regular expressions library. It
11encompasses features from different regular expression implementations
12that traditionally exist in different languages.
13
14Character encoding can be specified per regular expression object.
15
16Supported character encodings:
17
18  ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
19  EUC-JP, EUC-TW, EUC-KR, EUC-CN,
20  Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
21  ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
22  ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
23  ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
24
25* GB18030: contributed by KUBO Takehiro
26* CP1251:  contributed by Byte
27* doc/SYNTAX.md: contributed by seanofw
28
29
30Version 6.9.4
31-------------
32
33* NEW API: RegSet (set of regexes)
34* Fixed CVE-2019-19012
35* Fixed CVE-2019-19203 (Does not affect UTF-8, UTF-16 and UTF-32 encodings)
36* Fixed CVE-2019-19204 (Affects only PosixBasic, Emacs and Grep syntaxes)
37* Fixed CVE-2019-19246
38* Fixed some problems (found by libFuzzer test)
39
40
41Version 6.9.3 (security fix release)
42------------------------------------
43
44* Fixed CVE-2019-13224
45* Fixed CVE-2019-13225
46* Fixed CVE-2019-16163
47* Fixed many problems (found by libFuzzer test)
48
49
50Version 6.9.2 (Reiwa)
51---------------------
52
53* add doc/SYNTAX.md
54* Direct threaded code (for GCC and Clang)
55* Update Unicode version 12.1.0
56* NEW: Unicode Text Segment mode option (?y{g}) (?y{w})  (*original)
57
58
59Version 6.9.1
60-------------
61
62* Speed improvement (* especially UTF-8)
63
64
65Version 6.9.0
66-------------
67
68* Update Unicode version 11.0.0
69* NEW: add Emoji properties
70
71
72Version 6.8.2
73-------------
74
75* Fix: #80 UChar in header causes issue
76* NEW API: onig_set_callout_user_data_of_match_param()  (* omission in 6.8.0)
77* add doc/CALLOUTS.API and doc/CALLOUTS.API.ja
78
79
80Version 6.8.1
81-------------
82
83* Update shared library version to 5.0.0 for API incompatible changes from 6.7.1
84
85
86Version 6.8.0
87-------------
88
89* Retry-limit-in-match function enabled by default
90* NEW: configure option --enable-posix-api=no  (* enabled by default)
91* NEW API: onig_search_with_param(), onig_match_with_param()
92* NEW: Callouts of contents  (?{...contents...}) (?{...}\[tag]\[X<>]) (?{{...}})
93* NEW: Callouts of name      (*name) (*name\[tag]{args...})
94* NEW: Builtin callouts  (*FAIL) (*MISMATCH) (*ERROR{n}) (*COUNT) (*MAX{n}) etc..
95* Examples of Callouts program: [callout.c](sample/callout.c), [count.c](sample/count.c), [echo.c](sample/echo.c)
96
97
98Version 6.7.1
99-------------
100
101* NEW: Mechanism of retry-limit-in-match (* disabled by default)
102
103
104Version 6.7.0
105-------------
106
107* NEW: hexadecimal codepoint \uHHHH
108* NEW: add ONIG_SYNTAX_ONIGURUMA (== ONIG_SYNTAX_DEFAULT)
109* Disabled \N and \O on ONIG_SYNTAX_RUBY
110* Reduced size of object file
111
112
113Version 6.6.0
114-------------
115
116* NEW: ASCII only mode options for character type/property (?WDSP)
117* NEW: Extended Grapheme Cluster boundary \y, \Y
118* NEW: Extended Grapheme Cluster \X
119* Range-clear (Absent-clear) operator restores previous range in retractions.
120
121
122Version 6.5.0
123-------------
124
125* NEW: \K (keep)
126* NEW: \R (general newline) \N (no newline)
127* NEW: \O (true anychar)
128* NEW: if-then-else   (?(...)...\|...)
129* NEW: Backreference validity checker (?(xxx)) (*original)
130* NEW: Absent repeater (?~absent)  \[is equal to (?\~\|(?:absent)|\O*)]
131* NEW: Absent expression   (?~|absent|expr)  (*original)
132* NEW: Absent stopper (?~|absent)     (*original)
133
134
135Version 6.4.0
136-------------
137
138* Fix fatal problem of endless repeat on Windows
139* NEW: call zero (call the total regexp) \g<0>
140* NEW: relative backref/call by positive number \k<+n>, \g<+n>
141
142
143Version 6.3.0
144-------------
145
146* NEW: octal codepoint \o{.....}
147* Fixed CVE-2017-9224
148* Fixed CVE-2017-9225
149* Fixed CVE-2017-9226
150* Fixed CVE-2017-9227
151* Fixed CVE-2017-9228
152* Fixed CVE-2017-9229
153
154
155Version 6.1.2
156-------------
157
158* allow word bound, word begin and word end in look-behind.
159* NEW option: ONIG_OPTION_CHECK_VALIDITY_OF_STRING
160
161Version 6.1
162-----------
163
164* improved doc/RE
165* NEW API: onig_scan()
166
167Version 6.0
168-----------
169
170* Update Unicode 8.0 Property/Case-folding
171* NEW API: onig_unicode_define_user_property()
172
173
174License
175-------
176
177  BSD license.
178
179
180Install
181-------
182
183### Case 1: Unix and Cygwin platform
184
185   1. autoreconf -vfi   (* case: configure script is not found.)
186
187   2. ./configure
188   3. make
189   4. make install
190
191   * uninstall
192
193     make uninstall
194
195   * configuration check
196
197     onig-config --cflags
198     onig-config --libs
199     onig-config --prefix
200     onig-config --exec-prefix
201
202
203
204### Case 2: Windows 64/32bit platform (Visual Studio)
205
206   Execute make_win.bat
207
208      onig_s.lib:  static link library
209      onig.dll:    dynamic link library
210
211   * test (ASCII/Shift_JIS)
212
213      1. cd src
214      2. copy ..\windows\testc.c .
215      3. nmake -f Makefile.windows ctest
216
217   (I have checked by Visual Studio Community 2015)
218
219
220
221Regular Expressions
222-------------------
223
224  See [doc/RE](doc/RE) or [doc/RE.ja](doc/RE.ja) for Japanese.
225
226
227Usage
228-----
229
230  Include oniguruma.h in your program. (Oniguruma API)
231  See doc/API for Oniguruma API.
232
233  If you want to disable UChar type (== unsigned char) definition
234  in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
235  include oniguruma.h.
236
237  If you want to disable regex_t type definition in oniguruma.h,
238  define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
239
240  Example of the compiling/linking command line in Unix or Cygwin,
241  (prefix == /usr/local case)
242
243    cc sample.c -L/usr/local/lib -lonig
244
245
246  If you want to use static link library(onig_s.lib) in Win32,
247  add option -DONIG_EXTERN=extern to C compiler.
248
249
250
251Sample Programs
252---------------
253
254|File                  |Description                               |
255|:---------------------|:-----------------------------------------|
256|sample/callout.c      |example of callouts                       |
257|sample/count.c        |example of built-in callout *COUNT        |
258|sample/echo.c         |example of user defined callouts of name  |
259|sample/encode.c       |example of some encodings                 |
260|sample/listcap.c      |example of the capture history            |
261|sample/names.c        |example of the named group callback       |
262|sample/posix.c        |POSIX API sample                          |
263|sample/regset.c       |example of using RegSet API               |
264|sample/scan.c         |example of using onig_scan()              |
265|sample/simple.c       |example of the minimum (Oniguruma API)    |
266|sample/sql.c          |example of the variable meta characters   |
267|sample/user_property.c|example of user defined Unicode property  |
268
269
270Test Programs
271
272|File               |Description                            |
273|:------------------|:--------------------------------------|
274|sample/syntax.c    |Perl, Java and ASIS syntax test.       |
275|sample/crnl.c      |--enable-crnl-as-line-terminator test  |
276
277
278
279Source Files
280------------
281
282|File               |Description                                             |
283|:------------------|:-------------------------------------------------------|
284|oniguruma.h        |Oniguruma API header file (public)                      |
285|onig-config.in     |configuration check program template                    |
286|regenc.h           |character encodings framework header file               |
287|regint.h           |internal definitions                                    |
288|regparse.h         |internal definitions for regparse.c and regcomp.c       |
289|regcomp.c          |compiling and optimization functions                    |
290|regenc.c           |character encodings framework                           |
291|regerror.c         |error message function                                  |
292|regext.c           |extended API functions (deluxe version API)             |
293|regexec.c          |search and match functions                              |
294|regparse.c         |parsing functions.                                      |
295|regsyntax.c        |pattern syntax functions and built-in syntax definitions|
296|regtrav.c          |capture history tree data traverse functions            |
297|regversion.c       |version info function                                   |
298|st.h               |hash table functions header file                        |
299|st.c               |hash table functions                                    |
300|oniggnu.h          |GNU regex API header file (public)                      |
301|reggnu.c           |GNU regex API functions                                 |
302|onigposix.h        |POSIX API header file (public)                          |
303|regposerr.c        |POSIX error message function                            |
304|regposix.c         |POSIX API functions                                     |
305|mktable.c          |character type table generator                          |
306|ascii.c            |ASCII encoding                                          |
307|euc_jp.c           |EUC-JP encoding                                         |
308|euc_tw.c           |EUC-TW encoding                                         |
309|euc_kr.c           |EUC-KR, EUC-CN encoding                                 |
310|sjis.c             |Shift_JIS encoding                                      |
311|big5.c             |Big5      encoding                                      |
312|gb18030.c          |GB18030   encoding                                      |
313|koi8.c             |KOI8      encoding                                      |
314|koi8_r.c           |KOI8-R    encoding                                      |
315|cp1251.c           |CP1251    encoding                                      |
316|iso8859_1.c        |ISO-8859-1 (Latin-1)                                    |
317|iso8859_2.c        |ISO-8859-2 (Latin-2)                                    |
318|iso8859_3.c        |ISO-8859-3 (Latin-3)                                    |
319|iso8859_4.c        |ISO-8859-4 (Latin-4)                                    |
320|iso8859_5.c        |ISO-8859-5 (Cyrillic)                                   |
321|iso8859_6.c        |ISO-8859-6 (Arabic)                                     |
322|iso8859_7.c        |ISO-8859-7 (Greek)                                      |
323|iso8859_8.c        |ISO-8859-8 (Hebrew)                                     |
324|iso8859_9.c        |ISO-8859-9 (Latin-5 or Turkish)                         |
325|iso8859_10.c       |ISO-8859-10 (Latin-6 or Nordic)                         |
326|iso8859_11.c       |ISO-8859-11 (Thai)                                      |
327|iso8859_13.c       |ISO-8859-13 (Latin-7 or Baltic Rim)                     |
328|iso8859_14.c       |ISO-8859-14 (Latin-8 or Celtic)                         |
329|iso8859_15.c       |ISO-8859-15 (Latin-9 or West European with Euro)        |
330|iso8859_16.c       |ISO-8859-16 (Latin-10)                                  |
331|utf8.c             |UTF-8    encoding                                       |
332|utf16_be.c         |UTF-16BE encoding                                       |
333|utf16_le.c         |UTF-16LE encoding                                       |
334|utf32_be.c         |UTF-32BE encoding                                       |
335|utf32_le.c         |UTF-32LE encoding                                       |
336|unicode.c          |common codes of Unicode encoding                        |
337|unicode_fold_data.c|Unicode folding data                                    |
338|windows/testc.c    |Test program for Windows (VC++)                        |
339