README
1README 2016/05/06
2
3Oniguruma ---- (C) K.Kosako <kkosako0@gmail.com>
4
5https://github.com/kkos/oniguruma
6
7Oniguruma is a regular expressions library.
8The characteristics of this library is that different character encoding
9for every regular expression object can be specified.
10
11Supported character encodings:
12
13 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
14 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
15 Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
16 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
17 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
18 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
19
20* GB18030: contributed by KUBO Takehiro
21* CP1251: contributed by Byte
22------------------------------------------------------------
23
24License
25
26 BSD license.
27
28
29Install
30
31 Case 1: Unix and Cygwin platform
32
33 1. autoreconf -vfi (* case: configure script is not found.)
34
35 2. ./configure
36 3. make
37 4. make install
38
39 * uninstall
40
41 make uninstall
42
43 * configuration check
44
45 onig-config --cflags
46 onig-config --libs
47 onig-config --prefix
48 onig-config --exec-prefix
49
50
51
52 Case 2: Windows 64/32bit platform (Visual Studio)
53
54 execute make_win64 or make_win32
55
56 src/onig_s.lib: static link library
57 src/onig.dll: dynamic link library
58
59 * test (ASCII/Shift_JIS)
60 1. cd src
61 2. copy ..\windows\testc.c .
62 3. nmake -f Makefile.windows ctest
63
64 (I have checked by Visual Studio Community 2015)
65
66
67
68Regular Expressions
69
70 See doc/RE (or doc/RE.ja for Japanese).
71
72
73Usage
74
75 Include oniguruma.h in your program. (Oniguruma API)
76 See doc/API for Oniguruma API.
77
78 If you want to disable UChar type (== unsigned char) definition
79 in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
80 include oniguruma.h.
81
82 If you want to disable regex_t type definition in oniguruma.h,
83 define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
84
85 Example of the compiling/linking command line in Unix or Cygwin,
86 (prefix == /usr/local case)
87
88 cc sample.c -L/usr/local/lib -lonig
89
90
91 If you want to use static link library(onig_s.lib) in Win32,
92 add option -DONIG_EXTERN=extern to C compiler.
93
94
95
96Sample Programs
97
98 sample/simple.c example of the minimum (Oniguruma API)
99 sample/names.c example of the named group callback.
100 sample/encode.c example of some encodings.
101 sample/listcap.c example of the capture history.
102 sample/posix.c POSIX API sample.
103 sample/sql.c example of the variable meta characters.
104 (SQL-like pattern matching)
105 sample/user_property.c example of user defined Unicode property.
106
107Test Programs
108 sample/syntax.c Perl, Java and ASIS syntax test.
109 sample/crnl.c --enable-crnl-as-line-terminator test
110
111
112Source Files
113
114 oniguruma.h Oniguruma API header file. (public)
115 onig-config.in configuration check program template.
116
117 regenc.h character encodings framework header file.
118 regint.h internal definitions
119 regparse.h internal definitions for regparse.c and regcomp.c
120 regcomp.c compiling and optimization functions
121 regenc.c character encodings framework.
122 regerror.c error message function
123 regext.c extended API functions. (deluxe version API)
124 regexec.c search and match functions
125 regparse.c parsing functions.
126 regsyntax.c pattern syntax functions and built-in syntax definitions.
127 regtrav.c capture history tree data traverse functions.
128 regversion.c version info function.
129 st.h hash table functions header file
130 st.c hash table functions
131
132 oniggnu.h GNU regex API header file. (public)
133 reggnu.c GNU regex API functions
134
135 onigposix.h POSIX API header file. (public)
136 regposerr.c POSIX error message function.
137 regposix.c POSIX API functions.
138
139 mktable.c character type table generator.
140 ascii.c ASCII encoding.
141 euc_jp.c EUC-JP encoding.
142 euc_tw.c EUC-TW encoding.
143 euc_kr.c EUC-KR, EUC-CN encoding.
144 sjis.c Shift_JIS encoding.
145 big5.c Big5 encoding.
146 gb18030.c GB18030 encoding.
147 koi8.c KOI8 encoding.
148 koi8_r.c KOI8-R encoding.
149 cp1251.c CP1251 encoding.
150 iso8859_1.c ISO-8859-1 encoding. (Latin-1)
151 iso8859_2.c ISO-8859-2 encoding. (Latin-2)
152 iso8859_3.c ISO-8859-3 encoding. (Latin-3)
153 iso8859_4.c ISO-8859-4 encoding. (Latin-4)
154 iso8859_5.c ISO-8859-5 encoding. (Cyrillic)
155 iso8859_6.c ISO-8859-6 encoding. (Arabic)
156 iso8859_7.c ISO-8859-7 encoding. (Greek)
157 iso8859_8.c ISO-8859-8 encoding. (Hebrew)
158 iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish)
159 iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic)
160 iso8859_11.c ISO-8859-11 encoding. (Thai)
161 iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim)
162 iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic)
163 iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro)
164 iso8859_16.c ISO-8859-16 encoding.
165 (Latin-10 or South-Eastern European with Euro)
166 utf8.c UTF-8 encoding.
167 utf16_be.c UTF-16BE encoding.
168 utf16_le.c UTF-16LE encoding.
169 utf32_be.c UTF-32BE encoding.
170 utf32_le.c UTF-32LE encoding.
171 unicode.c common codes of Unicode encoding.
172
173 win32/Makefile Makefile for Win32 (VC++)
174 win32/config.h config.h for Win32
175
176
177
178ToDo
179
180 ? case fold flag: Katakana <-> Hiragana.
181 ? add ONIG_OPTION_NOTBOS/NOTEOS. (\A, \z, \Z)
182 ?? \X (== \PM\pM*)
183 ?? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS.
184 ?? transmission stopper. (return ONIG_STOP from match_at())
185
186and I'm thankful to Akinori MUSHA.
187
188
189Mail Address: K.Kosako <kkosako0@gmail.com>
190
README.ja
1README.ja 2016/05/06
2
3���� ---- (C) K.Kosako <kkosako0@gmail.com>
4
5https://github.com/kkos/oniguruma
6
7���֤�����ɽ���饤�֥��Ǥ��롣
8���Υ饤�֥�����Ĺ�ϡ����줾�������ɽ�����֥������Ȥ��Ȥ�
9ʸ�������ǥ������Ǥ��뤳�ȤǤ��롣
10
11���ݡ��Ȥ��Ƥ���ʸ�������ǥ���:
12
13 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
14 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
15 Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
16 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
17 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
18 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
19
20* GB18030: ���ݷ��λ���
21* CP1251: Byte����
22------------------------------------------------------------
23
24�饤����
25
26 BSD�饤����
27
28
29���ȡ���
30
31 ��������: Unix��Cygwin�Ķ�
32
33 1. autoreconf -vfi (* configure������ץȤ��ʤ��Ȥ�����)
34
35 2. ./configure
36 3. make
37 4. make install
38
39 ���ȡ���
40
41 make uninstall
42
43 ������ǧ
44
45 onig-config --cflags
46 onig-config --libs
47 onig-config --prefix
48 onig-config --exec-prefix
49
50
51
52 ��������: Windows 64/32bit (Visual Studio)�Ķ�
53
54 make_win64 ���뤤�� make_win32 ��¹�
55
56 onig_s.lib: static link library
57 onig.dll: dynamic link library
58
59 * ư��ƥ��� (ASCII/Shift_JIS)
60 1. cd src
61 2. copy ..\windows\testc.c .
62 3. nmake -f Makefile.windows ctest
63
64 (Visual Studio Community 2015 ��ư���ǧ)
65
66
67
68����ɽ��
69
70 doc/RE.ja��
71
72
73������ˡ
74
75 ���Ѥ���ץ����ǡ�oniguruma.h�롼�ɤ���(Oniguruma API�ξ��)��
76 Oniguruma API�ˤĤ��Ƥϡ�doc/API.ja�ȡ�
77
78 oniguruma.h���������Ƥ��뷿̾UChar(== unsigned char)��̵���ˤ��������
79 �ˤϡ�ONIG_ESCAPE_UCHAR_COLLISION��define���Ƥ���oniguruma.h�롼��
80 ���뤳�ȡ����ΤȤ��ˤ�UChar��������줺��OnigUChar�Ȥ���̾��������Τߤ�
81 ͭ���ˤʤ롣
82
83 oniguruma.h���������Ƥ��뷿̾regex_t��̵���ˤ��������ˤϡ�
84 ONIG_ESCAPE_REGEX_T_COLLISION��define���Ƥ���oniguruma.h�롼��
85 ���뤳�ȡ����ΤȤ��ˤ�regex_t��������줺��OnigRegexType, OnigRegex�Ȥ���
86 ̾��������Τߤ�ͭ���ˤʤ롣
87
88 Unix/Cygwin��ǥ���ѥ��롢���������㡧
89 (prefix��/usr/local�ΤȤ�)
90 cc sample.c -L/usr/local/lib -lonig
91
92 GNU libtool����Ѥ��Ƥ���Τǡ��ץ�åȥե����ब��ͭ�饤�֥��ݡ��Ȥ���
93 ����С����ѤǤ���褦�ˤʤäƤ��롣
94 ��Ū�饤�֥��ȶ�ͭ�饤�֥��Τɤ������Ѥ��뤫����ꤹ����ˡ���¹Ի����Ǥ�
95 �Ķ�������ˡ�ˤĤ��Ƥϡ���ʬ��Ĵ�٤Ʋ�������
96
97
98 Win32�ǥ����ƥ��å���饤�֥��(onig_s.lib)��������ˤϡ�
99 ����ѥ��뤹��Ȥ��� -DONIG_EXTERN=extern ��ѥ���������ɲä��뤳�ȡ�
100
101
102������ץ����
103
104 sample/simple.c �Ǿ��� (Oniguruma API)
105 sample/names.c ̾���դ����롼�ץ�����Хå�������
106 sample/encode.c ���Ĥ���ʸ�������ǥ�������
107 sample/listcap.c �������ǽ�λ�����
108 sample/posix.c POSIX API������
109 sample/sql.c ���ѥʸ����ǽ������ (SQL-like �ѥ�����)
110 sample/user_property.c �桼�����Unicode�ץ�ѥƥ��λ�����
111
112
113�ƥ��ȥץ����
114 sample/syntax.c Perl��Java��ASISʸˡ�Υƥ���
115 sample/crnl.c --enable-crnl-as-line-terminator �ƥ���
116
117
118�������ե�����
119
120 oniguruma.h ����API�إå� (����)
121 onig-config.in onig-config�ץ���� �ƥ�ץ졼��
122
123 regenc.h ʸ�������ǥ������Ȥߥإå�
124 regint.h �������
125 regparse.h regparse.c��regcomp.c������������
126 regcomp.c ����ѥ��롢��Ŭ���ؿ�
127 regenc.c ʸ�������ǥ������Ȥ�
128 regerror.c ���顼��å������ؿ�
129 regext.c ��ĥAPI�ؿ�
130 regexec.c �������ȹ�ؿ�
131 regparse.c ����ɽ���ѥ�������ϴؿ�
132 regsyntax.c ����ɽ���ѥ�����ʸˡ�ؿ����ȹ���ʸˡ���
133 regtrav.c ��������ڽ��ؿ�
134 regversion.c �Ǿ���ؿ�
135 st.h �ϥå���ơ��֥�ؿ����
136 st.c �ϥå���ơ��֥�ؿ�
137
138 oniggnu.h GNU regex API�إå� (����)
139 reggnu.c GNU regex API�ؿ�
140
141 onigposix.h POSIX API�إå� (����)
142 regposerr.c POSIX API���顼��å������ؿ�
143 regposix.c POSIX API�ؿ�
144
145 mktable.c ʸ�������ץơ��֥������ץ����
146 ascii.c ASCII �����ǥ���
147 euc_jp.c EUC-JP �����ǥ���
148 euc_tw.c EUC-TW �����ǥ���
149 euc_kr.c EUC-KR, EUC-CN �����ǥ���
150 sjis.c Shift_JIS �����ǥ���
151 big5.c Big5 �����ǥ���
152 gb18030.c GB18030 �����ǥ���
153 koi8.c KOI8 �����ǥ���
154 koi8_r.c KOI8-R �����ǥ���
155 cp1251.c CP1251 �����ǥ���
156 iso8859_1.c ISO-8859-1 (Latin-1)
157 iso8859_2.c ISO-8859-2 (Latin-2)
158 iso8859_3.c ISO-8859-3 (Latin-3)
159 iso8859_4.c ISO-8859-4 (Latin-4)
160 iso8859_5.c ISO-8859-5 (Cyrillic)
161 iso8859_6.c ISO-8859-6 (Arabic)
162 iso8859_7.c ISO-8859-7 (Greek)
163 iso8859_8.c ISO-8859-8 (Hebrew)
164 iso8859_9.c ISO-8859-9 (Latin-5 �ޤ��� Turkish)
165 iso8859_10.c ISO-8859-10 (Latin-6 �ޤ��� Nordic)
166 iso8859_11.c ISO-8859-11 (Thai)
167 iso8859_13.c ISO-8859-13 (Latin-7 �ޤ��� Baltic Rim)
168 iso8859_14.c ISO-8859-14 (Latin-8 �ޤ��� Celtic)
169 iso8859_15.c ISO-8859-15 (Latin-9 �ޤ��� West European with Euro)
170 iso8859_16.c ISO-8859-16
171 (Latin-10 �ޤ��� South-Eastern European with Euro)
172 utf8.c UTF-8 �����ǥ���
173 utf16_be.c UTF-16BE �����ǥ���
174 utf16_le.c UTF-16LE �����ǥ���
175 utf32_be.c UTF-32BE �����ǥ���
176 utf32_le.c UTF-32LE �����ǥ���
177 unicode.c Unicode�����ǥ��ζ��̽���
178
179 win32/Makefile Win32�� Makefile (for VC++)
180 win32/config.h Win32�� config.h
181
182
183
184�ķ�
185
186 ? case fold flag: Katakana <-> Hiragana
187 ? ONIG_OPTION_NOTBOS/NOTEOS�ɲ� (\A, \z, \Z)
188 ?? \X (== \PM\pM*)
189 ?? ʸˡ���� ONIG_SYN_CONTEXT_INDEP_ANCHORS�μ���
190 ?? �������ְ�ư��߱黻�� (match_at()����ONIG_STOP���֤�)
191
192and I'm thankful to Akinori MUSHA.
193
194
195���ɥ쥹: K.Kosako <kkosako0@gmail.com>
196
README.md
1Oniguruma
2=========
3
4https://github.com/kkos/oniguruma
5
6Oniguruma is a regular expressions library.
7The characteristics of this library is that different character encoding
8for every regular expression object can be specified.
9
10Supported character encodings:
11
12 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
13 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
14 Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
15 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
16 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
17 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
18
19* GB18030: contributed by KUBO Takehiro
20* CP1251: contributed by Byte
21
22
23New feature of version 6.3.0
24--------------------------
25
26* NEW SYNTAX: escape-o-brace for octal codepoint.
27
28
29New feature of version 6.1.2
30--------------------------
31
32* allow word bound, word begin and word end in look-behind.
33* NEW option: ONIG_OPTION_CHECK_VALIDITY_OF_STRING
34
35New feature of version 6.1
36--------------------------
37
38* improved doc/RE
39* NEW API: onig_scan()
40
41New feature of version 6.0
42--------------------------
43
44* Update Unicode 8.0 Property/Case-folding
45* NEW API: onig_unicode_define_user_property()
46
47
48License
49-------
50
51 BSD license.
52
53
54Install
55-------
56
57### Case 1: Unix and Cygwin platform
58
59 1. autoreconf -vfi (* case: configure script is not found.)
60
61 2. ./configure
62 3. make
63 4. make install
64
65 * uninstall
66
67 make uninstall
68
69 * configuration check
70
71 onig-config --cflags
72 onig-config --libs
73 onig-config --prefix
74 onig-config --exec-prefix
75
76
77
78### Case 2: Windows 64/32bit platform (Visual Studio)
79
80 execute make_win64 or make_win32
81
82 onig_s.lib: static link library
83 onig.dll: dynamic link library
84
85 * test (ASCII/Shift_JIS)
86
87 1. cd src
88 2. copy ..\windows\testc.c .
89 3. nmake -f Makefile.windows ctest
90
91 (I have checked by Visual Studio Community 2015)
92
93
94
95Regular Expressions
96-------------------
97
98 See [doc/RE](doc/RE) or [doc/RE.ja](doc/RE.ja) for Japanese.
99
100
101Usage
102-----
103
104 Include oniguruma.h in your program. (Oniguruma API)
105 See doc/API for Oniguruma API.
106
107 If you want to disable UChar type (== unsigned char) definition
108 in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
109 include oniguruma.h.
110
111 If you want to disable regex_t type definition in oniguruma.h,
112 define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
113
114 Example of the compiling/linking command line in Unix or Cygwin,
115 (prefix == /usr/local case)
116
117 cc sample.c -L/usr/local/lib -lonig
118
119
120 If you want to use static link library(onig_s.lib) in Win32,
121 add option -DONIG_EXTERN=extern to C compiler.
122
123
124
125Sample Programs
126---------------
127
128|File |Description |
129|:---------------------|:-----------------------------------------|
130|sample/simple.c |example of the minimum (Oniguruma API) |
131|sample/names.c |example of the named group callback. |
132|sample/encode.c |example of some encodings. |
133|sample/listcap.c |example of the capture history. |
134|sample/posix.c |POSIX API sample. |
135|sample/scan.c |example of using onig_scan(). |
136|sample/sql.c |example of the variable meta characters. |
137|sample/user_property.c|example of user defined Unicode property. |
138
139
140Test Programs
141
142|File |Description |
143|:------------------|:--------------------------------------|
144|sample/syntax.c |Perl, Java and ASIS syntax test. |
145|sample/crnl.c |--enable-crnl-as-line-terminator test |
146
147
148
149Source Files
150------------
151
152|File |Description |
153|:------------------|:-------------------------------------------------------|
154|oniguruma.h |Oniguruma API header file (public) |
155|onig-config.in |configuration check program template |
156|regenc.h |character encodings framework header file |
157|regint.h |internal definitions |
158|regparse.h |internal definitions for regparse.c and regcomp.c |
159|regcomp.c |compiling and optimization functions |
160|regenc.c |character encodings framework |
161|regerror.c |error message function |
162|regext.c |extended API functions (deluxe version API) |
163|regexec.c |search and match functions |
164|regparse.c |parsing functions. |
165|regsyntax.c |pattern syntax functions and built-in syntax definitions|
166|regtrav.c |capture history tree data traverse functions |
167|regversion.c |version info function |
168|st.h |hash table functions header file |
169|st.c |hash table functions |
170|oniggnu.h |GNU regex API header file (public) |
171|reggnu.c |GNU regex API functions |
172|onigposix.h |POSIX API header file (public) |
173|regposerr.c |POSIX error message function |
174|regposix.c |POSIX API functions |
175|mktable.c |character type table generator |
176|ascii.c |ASCII encoding |
177|euc_jp.c |EUC-JP encoding |
178|euc_tw.c |EUC-TW encoding |
179|euc_kr.c |EUC-KR, EUC-CN encoding |
180|sjis.c |Shift_JIS encoding |
181|big5.c |Big5 encoding |
182|gb18030.c |GB18030 encoding |
183|koi8.c |KOI8 encoding |
184|koi8_r.c |KOI8-R encoding |
185|cp1251.c |CP1251 encoding |
186|iso8859_1.c |ISO-8859-1 (Latin-1) |
187|iso8859_2.c |ISO-8859-2 (Latin-2) |
188|iso8859_3.c |ISO-8859-3 (Latin-3) |
189|iso8859_4.c |ISO-8859-4 (Latin-4) |
190|iso8859_5.c |ISO-8859-5 (Cyrillic) |
191|iso8859_6.c |ISO-8859-6 (Arabic) |
192|iso8859_7.c |ISO-8859-7 (Greek) |
193|iso8859_8.c |ISO-8859-8 (Hebrew) |
194|iso8859_9.c |ISO-8859-9 (Latin-5 or Turkish) |
195|iso8859_10.c |ISO-8859-10 (Latin-6 or Nordic) |
196|iso8859_11.c |ISO-8859-11 (Thai) |
197|iso8859_13.c |ISO-8859-13 (Latin-7 or Baltic Rim) |
198|iso8859_14.c |ISO-8859-14 (Latin-8 or Celtic) |
199|iso8859_15.c |ISO-8859-15 (Latin-9 or West European with Euro) |
200|iso8859_16.c |ISO-8859-16 (Latin-10) |
201|utf8.c |UTF-8 encoding |
202|utf16_be.c |UTF-16BE encoding |
203|utf16_le.c |UTF-16LE encoding |
204|utf32_be.c |UTF-32BE encoding |
205|utf32_le.c |UTF-32LE encoding |
206|unicode.c |common codes of Unicode encoding |
207|unicode_fold_data.c|Unicode folding data |
208|win32/Makefile |Makefile for Win32 (VC++) |
209|win32/config.h |config.h for Win32 |
210