README
1README 2007/06/18
2
3Oniguruma ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
4
5http://www.geocities.jp/kosako3/oniguruma/
6http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/oniguruma/
7
8Oniguruma is a regular expressions library.
9The characteristics of this library is that different character encoding
10for every regular expression object can be specified.
11
12Supported character encodings:
13
14 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
15 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
16 Shift_JIS, Big5, GB 18030, KOI8-R, KOI8,
17 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
18 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
19 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
20
21* GB 18030: contributed by KUBO Takehiro
22* KOI8 is not included in library archive by default setup.
23 (need to edit Makefile if you want to use it.)
24------------------------------------------------------------
25
26Install
27
28 Case 1: Unix and Cygwin platform
29
30 1. ./configure
31 2. make
32 3. make install
33
34 * uninstall
35
36 make uninstall
37
38 * test (ASCII/EUC-JP)
39
40 make atest
41
42 * configuration check
43
44 onig-config --cflags
45 onig-config --libs
46 onig-config --prefix
47 onig-config --exec-prefix
48
49
50
51 Case 2: Win32 platform (VC++)
52
53 1. copy win32\Makefile Makefile
54 2. copy win32\config.h config.h
55 3. nmake
56
57 onig_s.lib: static link library
58 onig.dll: dynamic link library
59
60 * test (ASCII/Shift_JIS)
61 4. copy win32\testc.c testc.c
62 5. nmake ctest
63
64
65
66License
67
68 When this software is partly used or it is distributed with Ruby,
69 this of Ruby follows the license of Ruby.
70 It follows the BSD license in the case of the one except for it.
71
72
73
74Regular Expressions
75
76 See doc/RE (or doc/RE.ja for Japanese).
77
78
79Usage
80
81 Include oniguruma.h in your program. (Oniguruma API)
82 See doc/API for Oniguruma API.
83
84 If you want to disable UChar type (== unsigned char) definition
85 in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
86 include oniguruma.h.
87
88 If you want to disable regex_t type definition in oniguruma.h,
89 define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
90
91 Example of the compiling/linking command line in Unix or Cygwin,
92 (prefix == /usr/local case)
93
94 cc sample.c -L/usr/local/lib -lonig
95
96
97 If you want to use static link library(onig_s.lib) in Win32,
98 add option -DONIG_EXTERN=extern to C compiler.
99
100
101
102Sample Programs
103
104 sample/simple.c example of the minimum (Oniguruma API)
105 sample/names.c example of the named group callback.
106 sample/encode.c example of some encodings.
107 sample/listcap.c example of the capture history.
108 sample/posix.c POSIX API sample.
109 sample/sql.c example of the variable meta characters.
110 (SQL-like pattern matching)
111 sample/syntax.c Perl, Java and ASIS syntax test.
112
113
114Source Files
115
116 oniguruma.h Oniguruma API header file. (public)
117 onig-config.in configuration check program template.
118
119 regenc.h character encodings framework header file.
120 regint.h internal definitions
121 regparse.h internal definitions for regparse.c and regcomp.c
122 regcomp.c compiling and optimization functions
123 regenc.c character encodings framework.
124 regerror.c error message function
125 regext.c extended API functions. (deluxe version API)
126 regexec.c search and match functions
127 regparse.c parsing functions.
128 regsyntax.c pattern syntax functions and built-in syntax definitions.
129 regtrav.c capture history tree data traverse functions.
130 regversion.c version info function.
131 st.h hash table functions header file
132 st.c hash table functions
133
134 oniggnu.h GNU regex API header file. (public)
135 reggnu.c GNU regex API functions
136
137 onigposix.h POSIX API header file. (public)
138 regposerr.c POSIX error message function.
139 regposix.c POSIX API functions.
140
141 enc/mktable.c character type table generator.
142 enc/ascii.c ASCII encoding.
143 enc/euc_jp.c EUC-JP encoding.
144 enc/euc_tw.c EUC-TW encoding.
145 enc/euc_kr.c EUC-KR, EUC-CN encoding.
146 enc/sjis.c Shift_JIS encoding.
147 enc/big5.c Big5 encoding.
148 enc/gb18030.c GB 18030 encoding (contributed by KUBO Takehiro)
149 enc/koi8.c KOI8 encoding.
150 enc/koi8_r.c KOI8-R encoding.
151 enc/iso8859_1.c ISO-8859-1 encoding. (Latin-1)
152 enc/iso8859_2.c ISO-8859-2 encoding. (Latin-2)
153 enc/iso8859_3.c ISO-8859-3 encoding. (Latin-3)
154 enc/iso8859_4.c ISO-8859-4 encoding. (Latin-4)
155 enc/iso8859_5.c ISO-8859-5 encoding. (Cyrillic)
156 enc/iso8859_6.c ISO-8859-6 encoding. (Arabic)
157 enc/iso8859_7.c ISO-8859-7 encoding. (Greek)
158 enc/iso8859_8.c ISO-8859-8 encoding. (Hebrew)
159 enc/iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish)
160 enc/iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic)
161 enc/iso8859_11.c ISO-8859-11 encoding. (Thai)
162 enc/iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim)
163 enc/iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic)
164 enc/iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro)
165 enc/iso8859_16.c ISO-8859-16 encoding.
166 (Latin-10 or South-Eastern European with Euro)
167 enc/utf8.c UTF-8 encoding.
168 enc/utf16_be.c UTF-16BE encoding.
169 enc/utf16_le.c UTF-16LE encoding.
170 enc/utf32_be.c UTF-32BE encoding.
171 enc/utf32_le.c UTF-32LE encoding.
172 enc/unicode.c Unicode information data.
173
174 win32/Makefile Makefile for Win32 (VC++)
175 win32/config.h config.h for Win32
176
177
178
179API differences with Japanized GNU regex(version 0.12) of Ruby 1.8/1.6
180
181 + re_compile_fastmap() is removed.
182 + re_alloc_pattern() is added.
183
184
185
186I'm thankful to Akinori MUSHA.
187
188
189Mail Address: K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
190
README.ja
1README.ja 2007/06/18
2
3���� ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
4
5http://www.geocities.jp/kosako3/oniguruma/
6http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/oniguruma/
7
8���֤�����ɽ���饤�֥��Ǥ��롣
9���Υ饤�֥�����Ĺ�ϡ����줾�������ɽ�����֥������Ȥ��Ȥ�
10ʸ�������ǥ������Ǥ��뤳�ȤǤ��롣
11
12���ݡ��Ȥ��Ƥ���ʸ�������ǥ���:
13
14 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
15 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
16 Shift_JIS, Big5, GB 18030, KOI8-R, KOI8,
17 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
18 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
19 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
20
21* GB 18030: ���ݷ��λ���
22* KOI8�ϥǥե���ȤΥ��åȥ��åפǤϥ饤�֥�����˴ޤޤ�ʤ���
23 (ɬ�פǤ����Makefile���Խ����뤳��)
24------------------------------------------------------------
25
26���ȡ���
27
28 ��������: Unix��Cygwin�Ķ�
29
30 1. ./configure
31 2. make
32 3. make install
33
34 ���ȡ���
35
36 make uninstall
37
38 ư��ƥ��� (ASCII/EUC-JP)
39
40 make atest
41
42
43 ������ǧ
44
45 onig-config --cflags
46 onig-config --libs
47 onig-config --prefix
48 onig-config --exec-prefix
49
50
51
52 ��������: Win32(VC++)�Ķ�
53
54 1. copy win32\Makefile Makefile
55 2. copy win32\config.h config.h
56 3. nmake
57
58 onig_s.lib: static link library
59 onig.dll: dynamic link library
60
61 * ư��ƥ��� (ASCII/Shift_JIS)
62 4. copy win32\testc.c testc.c
63 5. nmake ctest
64
65
66�饤����
67
68 ���Υ��եȥ�������Ruby�Ȱ��˻��Ѥޤ������ۤ������ˤϡ�
69 Ruby�Υ饤���˽�����
70 ����ʳ��ξ��ˤϡ�BSD�饤���˽�����
71
72
73����ɽ��
74
75 doc/RE.ja��
76
77
78������ˡ
79
80 ���Ѥ���ץ����ǡ�oniguruma.h�롼�ɤ���(Oniguruma API�ξ��)��
81 Oniguruma API�ˤĤ��Ƥϡ�doc/API.ja�ȡ�
82
83 oniguruma.h���������Ƥ��뷿̾UChar(== unsigned char)��̵���ˤ��������
84 �ˤϡ�ONIG_ESCAPE_UCHAR_COLLISION��define���Ƥ���oniguruma.h�롼��
85 ���뤳�ȡ����ΤȤ��ˤ�UChar��������줺��OnigUChar�Ȥ���̾��������Τߤ�
86 ͭ���ˤʤ롣
87
88 oniguruma.h���������Ƥ��뷿̾regex_t��̵���ˤ��������ˤϡ�
89 ONIG_ESCAPE_REGEX_T_COLLISION��define���Ƥ���oniguruma.h�롼��
90 ���뤳�ȡ����ΤȤ��ˤ�regex_t��������줺��OnigRegexType, OnigRegex�Ȥ���
91 ̾��������Τߤ�ͭ���ˤʤ롣
92
93 Unix/Cygwin��ǥ���ѥ��롢���������㡧
94 (prefix��/usr/local�ΤȤ�)
95 cc sample.c -L/usr/local/lib -lonig
96
97 GNU libtool����Ѥ��Ƥ���Τǡ��ץ�åȥե����ब��ͭ�饤�֥��ݡ��Ȥ���
98 ����С����ѤǤ���褦�ˤʤäƤ��롣
99 ��Ū�饤�֥��ȶ�ͭ�饤�֥��Τɤ������Ѥ��뤫����ꤹ����ˡ���¹Ի����Ǥ�
100 �Ķ�������ˡ�ˤĤƤϡ���ʬ��Ĵ�٤Ʋ�������
101
102
103 Win32�ǥ����ƥ��å���饤�֥��(onig_s.lib)��������ˤϡ�
104 ����ѥ��뤹��Ȥ��� -DONIG_EXTERN=extern ��ѥ���������ɲä��뤳�ȡ�
105
106
107������ץ����
108
109 sample/simple.c �Ǿ��� (Oniguruma API)
110 sample/names.c ̾���դ����롼�ץ�����Хå�������
111 sample/encode.c ���Ĥ���ʸ�������ǥ�������
112 sample/listcap.c �������ǽ�λ�����
113 sample/posix.c POSIX API������
114 sample/sql.c ���ѥʸ����ǽ������ (SQL-like �ѥ�����)
115 sample/syntax.c Perl��Java��ASISʸˡ�Υƥ���
116
117
118�������ե�����
119
120 oniguruma.h ����API�إå� (����)
121 onig-config.in onig-config�ץ���� �ƥ�ץ졼��
122
123 regenc.h ʸ�������ǥ������Ȥߥإå�
124 regint.h �������
125 regparse.h regparse.c��regcomp.c������������
126 regcomp.c ����ѥ��롢��Ŭ���ؿ�
127 regenc.c ʸ�������ǥ������Ȥ�
128 regerror.c ���顼��å������ؿ�
129 regext.c ��ĥAPI�ؿ�
130 regexec.c �������ȹ�ؿ�
131 regparse.c ����ɽ���ѥ�������ϴؿ�
132 regsyntax.c ����ɽ���ѥ�����ʸˡ�ؿ����ȹ���ʸˡ���
133 regtrav.c ��������ڽ��ؿ�
134 regversion.c �Ǿ���ؿ�
135 st.h �ϥå���ơ��֥�ؿ����
136 st.c �ϥå���ơ��֥�ؿ�
137
138 oniggnu.h GNU regex API�إå� (����)
139 reggnu.c GNU regex API�ؿ�
140
141 onigposix.h POSIX API�إå� (����)
142 regposerr.c POSIX API���顼��å������ؿ�
143 regposix.c POSIX API�ؿ�
144
145 enc/mktable.c ʸ�������ץơ��֥������ץ����
146 enc/ascii.c ASCII �����ǥ���
147 enc/euc_jp.c EUC-JP �����ǥ���
148 enc/euc_tw.c EUC-TW �����ǥ���
149 enc/euc_kr.c EUC-KR, EUC-CN �����ǥ���
150 enc/sjis.c Shift_JIS �����ǥ���
151 enc/big5.c Big5 �����ǥ���
152 enc/gb18030.c GB 18030 �����ǥ��� (���ݷ��λ� ��)
153 enc/koi8.c KOI8 �����ǥ���
154 enc/koi8_r.c KOI8-R �����ǥ���
155 enc/iso8859_1.c ISO-8859-1 (Latin-1)
156 enc/iso8859_2.c ISO-8859-2 (Latin-2)
157 enc/iso8859_3.c ISO-8859-3 (Latin-3)
158 enc/iso8859_4.c ISO-8859-4 (Latin-4)
159 enc/iso8859_5.c ISO-8859-5 (Cyrillic)
160 enc/iso8859_6.c ISO-8859-6 (Arabic)
161 enc/iso8859_7.c ISO-8859-7 (Greek)
162 enc/iso8859_8.c ISO-8859-8 (Hebrew)
163 enc/iso8859_9.c ISO-8859-9 (Latin-5 �ޤ��� Turkish)
164 enc/iso8859_10.c ISO-8859-10 (Latin-6 �ޤ��� Nordic)
165 enc/iso8859_11.c ISO-8859-11 (Thai)
166 enc/iso8859_13.c ISO-8859-13 (Latin-7 �ޤ��� Baltic Rim)
167 enc/iso8859_14.c ISO-8859-14 (Latin-8 �ޤ��� Celtic)
168 enc/iso8859_15.c ISO-8859-15 (Latin-9 �ޤ��� West European with Euro)
169 enc/iso8859_16.c ISO-8859-16
170 (Latin-10 �ޤ��� South-Eastern European with Euro)
171 enc/utf8.c UTF-8 �����ǥ���
172 enc/utf16_be.c UTF-16BE �����ǥ���
173 enc/utf16_le.c UTF-16LE �����ǥ���
174 enc/utf32_be.c UTF-32BE �����ǥ���
175 enc/utf32_le.c UTF-32LE �����ǥ���
176 enc/unicode.c Unicode����
177
178 win32/Makefile Win32�� Makefile (for VC++)
179 win32/config.h Win32�� config.h
180
181
182
183Ruby 1.8/1.6�����ܸ첽GNU regex�Ȥ�API�ΰ㤤
184
185 + re_compile_fastmap() �Ϻ�����줿��
186 + re_alloc_pattern() ���ɲä��줿��
187
188
189I'm thankful to Akinori MUSHA.
190
191
192���ɥ쥹: K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
193