README
1README 2007/05/31
2
3Oniguruma ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
4
5http://www.geocities.jp/kosako3/oniguruma/
6
7Oniguruma is a regular expressions library.
8The characteristics of this library is that different character encoding
9for every regular expression object can be specified.
10
11Supported character encodings:
12
13 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
14 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
15 Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
16 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
17 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
18 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
19
20* GB18030: contributed by KUBO Takehiro
21* CP1251: contributed by Byte
22------------------------------------------------------------
23
24License
25
26 BSD license.
27
28
29Install
30
31 Case 1: Unix and Cygwin platform
32
33 1. ./configure
34 2. make
35 3. make install
36
37 * uninstall
38
39 make uninstall
40
41 * test (ASCII/EUC-JP)
42
43 make atest
44
45 * configuration check
46
47 onig-config --cflags
48 onig-config --libs
49 onig-config --prefix
50 onig-config --exec-prefix
51
52
53
54 Case 2: Win32 platform (VC++)
55
56 1. copy win32\Makefile Makefile
57 2. copy win32\config.h config.h
58 3. nmake
59
60 onig_s.lib: static link library
61 onig.dll: dynamic link library
62
63 * test (ASCII/Shift_JIS)
64 4. copy win32\testc.c testc.c
65 5. nmake ctest
66
67
68
69Regular Expressions
70
71 See doc/RE (or doc/RE.ja for Japanese).
72
73
74Usage
75
76 Include oniguruma.h in your program. (Oniguruma API)
77 See doc/API for Oniguruma API.
78
79 If you want to disable UChar type (== unsigned char) definition
80 in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
81 include oniguruma.h.
82
83 If you want to disable regex_t type definition in oniguruma.h,
84 define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
85
86 Example of the compiling/linking command line in Unix or Cygwin,
87 (prefix == /usr/local case)
88
89 cc sample.c -L/usr/local/lib -lonig
90
91
92 If you want to use static link library(onig_s.lib) in Win32,
93 add option -DONIG_EXTERN=extern to C compiler.
94
95
96
97Sample Programs
98
99 sample/simple.c example of the minimum (Oniguruma API)
100 sample/names.c example of the named group callback.
101 sample/encode.c example of some encodings.
102 sample/listcap.c example of the capture history.
103 sample/posix.c POSIX API sample.
104 sample/sql.c example of the variable meta characters.
105 (SQL-like pattern matching)
106
107Test Programs
108 sample/syntax.c Perl, Java and ASIS syntax test.
109 sample/crnl.c --enable-crnl-as-line-terminator test
110
111
112Source Files
113
114 oniguruma.h Oniguruma API header file. (public)
115 onig-config.in configuration check program template.
116
117 regenc.h character encodings framework header file.
118 regint.h internal definitions
119 regparse.h internal definitions for regparse.c and regcomp.c
120 regcomp.c compiling and optimization functions
121 regenc.c character encodings framework.
122 regerror.c error message function
123 regext.c extended API functions. (deluxe version API)
124 regexec.c search and match functions
125 regparse.c parsing functions.
126 regsyntax.c pattern syntax functions and built-in syntax definitions.
127 regtrav.c capture history tree data traverse functions.
128 regversion.c version info function.
129 st.h hash table functions header file
130 st.c hash table functions
131
132 oniggnu.h GNU regex API header file. (public)
133 reggnu.c GNU regex API functions
134
135 onigposix.h POSIX API header file. (public)
136 regposerr.c POSIX error message function.
137 regposix.c POSIX API functions.
138
139 enc/mktable.c character type table generator.
140 enc/ascii.c ASCII encoding.
141 enc/euc_jp.c EUC-JP encoding.
142 enc/euc_tw.c EUC-TW encoding.
143 enc/euc_kr.c EUC-KR, EUC-CN encoding.
144 enc/sjis.c Shift_JIS encoding.
145 enc/big5.c Big5 encoding.
146 enc/gb18030.c GB18030 encoding.
147 enc/koi8.c KOI8 encoding.
148 enc/koi8_r.c KOI8-R encoding.
149 enc/cp1251.c CP1251 encoding.
150 enc/iso8859_1.c ISO-8859-1 encoding. (Latin-1)
151 enc/iso8859_2.c ISO-8859-2 encoding. (Latin-2)
152 enc/iso8859_3.c ISO-8859-3 encoding. (Latin-3)
153 enc/iso8859_4.c ISO-8859-4 encoding. (Latin-4)
154 enc/iso8859_5.c ISO-8859-5 encoding. (Cyrillic)
155 enc/iso8859_6.c ISO-8859-6 encoding. (Arabic)
156 enc/iso8859_7.c ISO-8859-7 encoding. (Greek)
157 enc/iso8859_8.c ISO-8859-8 encoding. (Hebrew)
158 enc/iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish)
159 enc/iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic)
160 enc/iso8859_11.c ISO-8859-11 encoding. (Thai)
161 enc/iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim)
162 enc/iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic)
163 enc/iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro)
164 enc/iso8859_16.c ISO-8859-16 encoding.
165 (Latin-10 or South-Eastern European with Euro)
166 enc/utf8.c UTF-8 encoding.
167 enc/utf16_be.c UTF-16BE encoding.
168 enc/utf16_le.c UTF-16LE encoding.
169 enc/utf32_be.c UTF-32BE encoding.
170 enc/utf32_le.c UTF-32LE encoding.
171 enc/unicode.c Unicode information data.
172
173 win32/Makefile Makefile for Win32 (VC++)
174 win32/config.h config.h for Win32
175
176
177
178ToDo
179
180 ? case fold flag: Katakana <-> Hiragana.
181 ? add ONIG_OPTION_NOTBOS/NOTEOS. (\A, \z, \Z)
182 ?? \X (== \PM\pM*)
183 ?? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS.
184 ?? transmission stopper. (return ONIG_STOP from match_at())
185
186and I'm thankful to Akinori MUSHA.
187
188
189Mail Address: K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
190
README.ja
1README.ja 2007/05/31
2
3���� ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
4
5http://www.geocities.jp/kosako3/oniguruma/
6
7���֤�����ɽ���饤�֥��Ǥ��롣
8���Υ饤�֥�����Ĺ�ϡ����줾�������ɽ�����֥������Ȥ��Ȥ�
9ʸ�������ǥ������Ǥ��뤳�ȤǤ��롣
10
11���ݡ��Ȥ��Ƥ���ʸ�������ǥ���:
12
13 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
14 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
15 Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
16 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
17 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
18 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
19
20* GB18030: ���ݷ��λ���
21* CP1251: Byte����
22------------------------------------------------------------
23
24�饤����
25
26 BSD�饤���˽�����
27
28
29���ȡ���
30
31 ��������: Unix��Cygwin�Ķ�
32
33 1. ./configure
34 2. make
35 3. make install
36
37 ���ȡ���
38
39 make uninstall
40
41 ư��ƥ��� (ASCII/EUC-JP)
42
43 make atest
44
45
46 ������ǧ
47
48 onig-config --cflags
49 onig-config --libs
50 onig-config --prefix
51 onig-config --exec-prefix
52
53
54
55 ��������: Win32(VC++)�Ķ�
56
57 1. copy win32\Makefile Makefile
58 2. copy win32\config.h config.h
59 3. nmake
60
61 onig_s.lib: static link library
62 onig.dll: dynamic link library
63
64 * ư��ƥ��� (ASCII/Shift_JIS)
65 4. copy win32\testc.c testc.c
66 5. nmake ctest
67
68
69
70����ɽ��
71
72 doc/RE.ja��
73
74
75������ˡ
76
77 ���Ѥ���ץ����ǡ�oniguruma.h�롼�ɤ���(Oniguruma API�ξ��)��
78 Oniguruma API�ˤĤ��Ƥϡ�doc/API.ja�ȡ�
79
80 oniguruma.h���������Ƥ��뷿̾UChar(== unsigned char)��̵���ˤ��������
81 �ˤϡ�ONIG_ESCAPE_UCHAR_COLLISION��define���Ƥ���oniguruma.h�롼��
82 ���뤳�ȡ����ΤȤ��ˤ�UChar��������줺��OnigUChar�Ȥ���̾��������Τߤ�
83 ͭ���ˤʤ롣
84
85 oniguruma.h���������Ƥ��뷿̾regex_t��̵���ˤ��������ˤϡ�
86 ONIG_ESCAPE_REGEX_T_COLLISION��define���Ƥ���oniguruma.h�롼��
87 ���뤳�ȡ����ΤȤ��ˤ�regex_t��������줺��OnigRegexType, OnigRegex�Ȥ���
88 ̾��������Τߤ�ͭ���ˤʤ롣
89
90 Unix/Cygwin��ǥ���ѥ��롢���������㡧
91 (prefix��/usr/local�ΤȤ�)
92 cc sample.c -L/usr/local/lib -lonig
93
94 GNU libtool����Ѥ��Ƥ���Τǡ��ץ�åȥե����ब��ͭ�饤�֥��ݡ��Ȥ���
95 ����С����ѤǤ���褦�ˤʤäƤ��롣
96 ��Ū�饤�֥��ȶ�ͭ�饤�֥��Τɤ������Ѥ��뤫����ꤹ����ˡ���¹Ի����Ǥ�
97 �Ķ�������ˡ�ˤĤ��Ƥϡ���ʬ��Ĵ�٤Ʋ�������
98
99
100 Win32�ǥ����ƥ��å���饤�֥��(onig_s.lib)��������ˤϡ�
101 ����ѥ��뤹��Ȥ��� -DONIG_EXTERN=extern ��ѥ���������ɲä��뤳�ȡ�
102
103
104������ץ����
105
106 sample/simple.c �Ǿ��� (Oniguruma API)
107 sample/names.c ̾���դ����롼�ץ�����Хå�������
108 sample/encode.c ���Ĥ���ʸ�������ǥ�������
109 sample/listcap.c �������ǽ�λ�����
110 sample/posix.c POSIX API������
111 sample/sql.c ���ѥʸ����ǽ������ (SQL-like �ѥ�����)
112
113�ƥ��ȥץ����
114 sample/syntax.c Perl��Java��ASISʸˡ�Υƥ���
115 sample/crnl.c --enable-crnl-as-line-terminator �ƥ���
116
117
118�������ե�����
119
120 oniguruma.h ����API�إå� (����)
121 onig-config.in onig-config�ץ���� �ƥ�ץ졼��
122
123 regenc.h ʸ�������ǥ������Ȥߥإå�
124 regint.h �������
125 regparse.h regparse.c��regcomp.c������������
126 regcomp.c ����ѥ��롢��Ŭ���ؿ�
127 regenc.c ʸ�������ǥ������Ȥ�
128 regerror.c ���顼��å������ؿ�
129 regext.c ��ĥAPI�ؿ�
130 regexec.c �������ȹ�ؿ�
131 regparse.c ����ɽ���ѥ�������ϴؿ�
132 regsyntax.c ����ɽ���ѥ�����ʸˡ�ؿ����ȹ���ʸˡ���
133 regtrav.c ��������ڽ��ؿ�
134 regversion.c �Ǿ���ؿ�
135 st.h �ϥå���ơ��֥�ؿ����
136 st.c �ϥå���ơ��֥�ؿ�
137
138 oniggnu.h GNU regex API�إå� (����)
139 reggnu.c GNU regex API�ؿ�
140
141 onigposix.h POSIX API�إå� (����)
142 regposerr.c POSIX API���顼��å������ؿ�
143 regposix.c POSIX API�ؿ�
144
145 enc/mktable.c ʸ�������ץơ��֥������ץ����
146 enc/ascii.c ASCII �����ǥ���
147 enc/euc_jp.c EUC-JP �����ǥ���
148 enc/euc_tw.c EUC-TW �����ǥ���
149 enc/euc_kr.c EUC-KR, EUC-CN �����ǥ���
150 enc/sjis.c Shift_JIS �����ǥ���
151 enc/big5.c Big5 �����ǥ���
152 enc/gb18030.c GB18030 �����ǥ���
153 enc/koi8.c KOI8 �����ǥ���
154 enc/koi8_r.c KOI8-R �����ǥ���
155 enc/cp1251.c CP1251 �����ǥ���
156 enc/iso8859_1.c ISO-8859-1 (Latin-1)
157 enc/iso8859_2.c ISO-8859-2 (Latin-2)
158 enc/iso8859_3.c ISO-8859-3 (Latin-3)
159 enc/iso8859_4.c ISO-8859-4 (Latin-4)
160 enc/iso8859_5.c ISO-8859-5 (Cyrillic)
161 enc/iso8859_6.c ISO-8859-6 (Arabic)
162 enc/iso8859_7.c ISO-8859-7 (Greek)
163 enc/iso8859_8.c ISO-8859-8 (Hebrew)
164 enc/iso8859_9.c ISO-8859-9 (Latin-5 �ޤ��� Turkish)
165 enc/iso8859_10.c ISO-8859-10 (Latin-6 �ޤ��� Nordic)
166 enc/iso8859_11.c ISO-8859-11 (Thai)
167 enc/iso8859_13.c ISO-8859-13 (Latin-7 �ޤ��� Baltic Rim)
168 enc/iso8859_14.c ISO-8859-14 (Latin-8 �ޤ��� Celtic)
169 enc/iso8859_15.c ISO-8859-15 (Latin-9 �ޤ��� West European with Euro)
170 enc/iso8859_16.c ISO-8859-16
171 (Latin-10 �ޤ��� South-Eastern European with Euro)
172 enc/utf8.c UTF-8 �����ǥ���
173 enc/utf16_be.c UTF-16BE �����ǥ���
174 enc/utf16_le.c UTF-16LE �����ǥ���
175 enc/utf32_be.c UTF-32BE �����ǥ���
176 enc/utf32_le.c UTF-32LE �����ǥ���
177 enc/unicode.c Unicode����
178
179 win32/Makefile Win32�� Makefile (for VC++)
180 win32/config.h Win32�� config.h
181
182
183
184�ķ�
185
186 ? case fold flag: Katakana <-> Hiragana
187 ? ONIG_OPTION_NOTBOS/NOTEOS�ɲ� (\A, \z, \Z)
188 ?? \X (== \PM\pM*)
189 ?? ʸˡ���� ONIG_SYN_CONTEXT_INDEP_ANCHORS�μ���
190 ?? �������ְ�ư��߱黻�� (match_at()����ONIG_STOP���֤�)
191
192and I'm thankful to Akinori MUSHA.
193
194
195���ɥ쥹: K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
196