xref: /PHP-7.2/ext/mbstring/oniguruma/README.md (revision 0ae2f95b)
1Oniguruma
2=========
3
4https://github.com/kkos/oniguruma
5
6Oniguruma is a regular expressions library.
7The characteristics of this library is that different character encoding
8for every regular expression object can be specified.
9
10Supported character encodings:
11
12  ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
13  EUC-JP, EUC-TW, EUC-KR, EUC-CN,
14  Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
15  ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
16  ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
17  ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
18
19* GB18030: contributed by KUBO Takehiro
20* CP1251:  contributed by Byte
21
22
23New feature of version 6.3.0
24--------------------------
25
26* NEW SYNTAX: escape-o-brace for octal codepoint.
27
28
29New feature of version 6.1.2
30--------------------------
31
32* allow word bound, word begin and word end in look-behind.
33* NEW option: ONIG_OPTION_CHECK_VALIDITY_OF_STRING
34
35New feature of version 6.1
36--------------------------
37
38* improved doc/RE
39* NEW API: onig_scan()
40
41New feature of version 6.0
42--------------------------
43
44* Update Unicode 8.0 Property/Case-folding
45* NEW API: onig_unicode_define_user_property()
46
47
48License
49-------
50
51  BSD license.
52
53
54Install
55-------
56
57### Case 1: Unix and Cygwin platform
58
59   1. autoreconf -vfi   (* case: configure script is not found.)
60
61   2. ./configure
62   3. make
63   4. make install
64
65   * uninstall
66
67     make uninstall
68
69   * configuration check
70
71     onig-config --cflags
72     onig-config --libs
73     onig-config --prefix
74     onig-config --exec-prefix
75
76
77
78### Case 2: Windows 64/32bit platform (Visual Studio)
79
80   execute make_win64 or make_win32
81
82      onig_s.lib:  static link library
83      onig.dll:    dynamic link library
84
85   * test (ASCII/Shift_JIS)
86
87      1. cd src
88      2. copy ..\windows\testc.c .
89      3. nmake -f Makefile.windows ctest
90
91   (I have checked by Visual Studio Community 2015)
92
93
94
95Regular Expressions
96-------------------
97
98  See [doc/RE](doc/RE) or [doc/RE.ja](doc/RE.ja) for Japanese.
99
100
101Usage
102-----
103
104  Include oniguruma.h in your program. (Oniguruma API)
105  See doc/API for Oniguruma API.
106
107  If you want to disable UChar type (== unsigned char) definition
108  in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
109  include oniguruma.h.
110
111  If you want to disable regex_t type definition in oniguruma.h,
112  define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
113
114  Example of the compiling/linking command line in Unix or Cygwin,
115  (prefix == /usr/local case)
116
117    cc sample.c -L/usr/local/lib -lonig
118
119
120  If you want to use static link library(onig_s.lib) in Win32,
121  add option -DONIG_EXTERN=extern to C compiler.
122
123
124
125Sample Programs
126---------------
127
128|File                  |Description                               |
129|:---------------------|:-----------------------------------------|
130|sample/simple.c       |example of the minimum (Oniguruma API)    |
131|sample/names.c        |example of the named group callback.      |
132|sample/encode.c       |example of some encodings.                |
133|sample/listcap.c      |example of the capture history.           |
134|sample/posix.c        |POSIX API sample.                         |
135|sample/scan.c         |example of using onig_scan().             |
136|sample/sql.c          |example of the variable meta characters.  |
137|sample/user_property.c|example of user defined Unicode property. |
138
139
140Test Programs
141
142|File               |Description                            |
143|:------------------|:--------------------------------------|
144|sample/syntax.c    |Perl, Java and ASIS syntax test.       |
145|sample/crnl.c      |--enable-crnl-as-line-terminator test  |
146
147
148
149Source Files
150------------
151
152|File               |Description                                             |
153|:------------------|:-------------------------------------------------------|
154|oniguruma.h        |Oniguruma API header file (public)                      |
155|onig-config.in     |configuration check program template                    |
156|regenc.h           |character encodings framework header file               |
157|regint.h           |internal definitions                                    |
158|regparse.h         |internal definitions for regparse.c and regcomp.c       |
159|regcomp.c          |compiling and optimization functions                    |
160|regenc.c           |character encodings framework                           |
161|regerror.c         |error message function                                  |
162|regext.c           |extended API functions (deluxe version API)             |
163|regexec.c          |search and match functions                              |
164|regparse.c         |parsing functions.                                      |
165|regsyntax.c        |pattern syntax functions and built-in syntax definitions|
166|regtrav.c          |capture history tree data traverse functions            |
167|regversion.c       |version info function                                   |
168|st.h               |hash table functions header file                        |
169|st.c               |hash table functions                                    |
170|oniggnu.h          |GNU regex API header file (public)                      |
171|reggnu.c           |GNU regex API functions                                 |
172|onigposix.h        |POSIX API header file (public)                          |
173|regposerr.c        |POSIX error message function                            |
174|regposix.c         |POSIX API functions                                     |
175|mktable.c          |character type table generator                          |
176|ascii.c            |ASCII encoding                                          |
177|euc_jp.c           |EUC-JP encoding                                         |
178|euc_tw.c           |EUC-TW encoding                                         |
179|euc_kr.c           |EUC-KR, EUC-CN encoding                                 |
180|sjis.c             |Shift_JIS encoding                                      |
181|big5.c             |Big5      encoding                                      |
182|gb18030.c          |GB18030   encoding                                      |
183|koi8.c             |KOI8      encoding                                      |
184|koi8_r.c           |KOI8-R    encoding                                      |
185|cp1251.c           |CP1251    encoding                                      |
186|iso8859_1.c        |ISO-8859-1 (Latin-1)                                    |
187|iso8859_2.c        |ISO-8859-2 (Latin-2)                                    |
188|iso8859_3.c        |ISO-8859-3 (Latin-3)                                    |
189|iso8859_4.c        |ISO-8859-4 (Latin-4)                                    |
190|iso8859_5.c        |ISO-8859-5 (Cyrillic)                                   |
191|iso8859_6.c        |ISO-8859-6 (Arabic)                                     |
192|iso8859_7.c        |ISO-8859-7 (Greek)                                      |
193|iso8859_8.c        |ISO-8859-8 (Hebrew)                                     |
194|iso8859_9.c        |ISO-8859-9 (Latin-5 or Turkish)                         |
195|iso8859_10.c       |ISO-8859-10 (Latin-6 or Nordic)                         |
196|iso8859_11.c       |ISO-8859-11 (Thai)                                      |
197|iso8859_13.c       |ISO-8859-13 (Latin-7 or Baltic Rim)                     |
198|iso8859_14.c       |ISO-8859-14 (Latin-8 or Celtic)                         |
199|iso8859_15.c       |ISO-8859-15 (Latin-9 or West European with Euro)        |
200|iso8859_16.c       |ISO-8859-16 (Latin-10)                                  |
201|utf8.c             |UTF-8    encoding                                       |
202|utf16_be.c         |UTF-16BE encoding                                       |
203|utf16_le.c         |UTF-16LE encoding                                       |
204|utf32_be.c         |UTF-32BE encoding                                       |
205|utf32_le.c         |UTF-32LE encoding                                       |
206|unicode.c          |common codes of Unicode encoding                        |
207|unicode_fold_data.c|Unicode folding data                                    |
208|win32/Makefile     |Makefile for Win32 (VC++)                               |
209|win32/config.h     |config.h for Win32                                      |
210