1README 2016/05/06 2 3Oniguruma ---- (C) K.Kosako <kkosako0@gmail.com> 4 5https://github.com/kkos/oniguruma 6 7Oniguruma is a regular expressions library. 8The characteristics of this library is that different character encoding 9for every regular expression object can be specified. 10 11Supported character encodings: 12 13 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, 14 EUC-JP, EUC-TW, EUC-KR, EUC-CN, 15 Shift_JIS, Big5, GB18030, KOI8-R, CP1251, 16 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, 17 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, 18 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16 19 20* GB18030: contributed by KUBO Takehiro 21* CP1251: contributed by Byte 22------------------------------------------------------------ 23 24License 25 26 BSD license. 27 28 29Install 30 31 Case 1: Unix and Cygwin platform 32 33 1. autoreconf -vfi (* case: configure script is not found.) 34 35 2. ./configure 36 3. make 37 4. make install 38 39 * uninstall 40 41 make uninstall 42 43 * configuration check 44 45 onig-config --cflags 46 onig-config --libs 47 onig-config --prefix 48 onig-config --exec-prefix 49 50 51 52 Case 2: Windows 64/32bit platform (Visual Studio) 53 54 execute make_win64 or make_win32 55 56 src/onig_s.lib: static link library 57 src/onig.dll: dynamic link library 58 59 * test (ASCII/Shift_JIS) 60 1. cd src 61 2. copy ..\windows\testc.c . 62 3. nmake -f Makefile.windows ctest 63 64 (I have checked by Visual Studio Community 2015) 65 66 67 68Regular Expressions 69 70 See doc/RE (or doc/RE.ja for Japanese). 71 72 73Usage 74 75 Include oniguruma.h in your program. (Oniguruma API) 76 See doc/API for Oniguruma API. 77 78 If you want to disable UChar type (== unsigned char) definition 79 in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then 80 include oniguruma.h. 81 82 If you want to disable regex_t type definition in oniguruma.h, 83 define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h. 84 85 Example of the compiling/linking command line in Unix or Cygwin, 86 (prefix == /usr/local case) 87 88 cc sample.c -L/usr/local/lib -lonig 89 90 91 If you want to use static link library(onig_s.lib) in Win32, 92 add option -DONIG_EXTERN=extern to C compiler. 93 94 95 96Sample Programs 97 98 sample/simple.c example of the minimum (Oniguruma API) 99 sample/names.c example of the named group callback. 100 sample/encode.c example of some encodings. 101 sample/listcap.c example of the capture history. 102 sample/posix.c POSIX API sample. 103 sample/sql.c example of the variable meta characters. 104 (SQL-like pattern matching) 105 sample/user_property.c example of user defined Unicode property. 106 107Test Programs 108 sample/syntax.c Perl, Java and ASIS syntax test. 109 sample/crnl.c --enable-crnl-as-line-terminator test 110 111 112Source Files 113 114 oniguruma.h Oniguruma API header file. (public) 115 onig-config.in configuration check program template. 116 117 regenc.h character encodings framework header file. 118 regint.h internal definitions 119 regparse.h internal definitions for regparse.c and regcomp.c 120 regcomp.c compiling and optimization functions 121 regenc.c character encodings framework. 122 regerror.c error message function 123 regext.c extended API functions. (deluxe version API) 124 regexec.c search and match functions 125 regparse.c parsing functions. 126 regsyntax.c pattern syntax functions and built-in syntax definitions. 127 regtrav.c capture history tree data traverse functions. 128 regversion.c version info function. 129 st.h hash table functions header file 130 st.c hash table functions 131 132 oniggnu.h GNU regex API header file. (public) 133 reggnu.c GNU regex API functions 134 135 onigposix.h POSIX API header file. (public) 136 regposerr.c POSIX error message function. 137 regposix.c POSIX API functions. 138 139 mktable.c character type table generator. 140 ascii.c ASCII encoding. 141 euc_jp.c EUC-JP encoding. 142 euc_tw.c EUC-TW encoding. 143 euc_kr.c EUC-KR, EUC-CN encoding. 144 sjis.c Shift_JIS encoding. 145 big5.c Big5 encoding. 146 gb18030.c GB18030 encoding. 147 koi8.c KOI8 encoding. 148 koi8_r.c KOI8-R encoding. 149 cp1251.c CP1251 encoding. 150 iso8859_1.c ISO-8859-1 encoding. (Latin-1) 151 iso8859_2.c ISO-8859-2 encoding. (Latin-2) 152 iso8859_3.c ISO-8859-3 encoding. (Latin-3) 153 iso8859_4.c ISO-8859-4 encoding. (Latin-4) 154 iso8859_5.c ISO-8859-5 encoding. (Cyrillic) 155 iso8859_6.c ISO-8859-6 encoding. (Arabic) 156 iso8859_7.c ISO-8859-7 encoding. (Greek) 157 iso8859_8.c ISO-8859-8 encoding. (Hebrew) 158 iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish) 159 iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic) 160 iso8859_11.c ISO-8859-11 encoding. (Thai) 161 iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim) 162 iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic) 163 iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro) 164 iso8859_16.c ISO-8859-16 encoding. 165 (Latin-10 or South-Eastern European with Euro) 166 utf8.c UTF-8 encoding. 167 utf16_be.c UTF-16BE encoding. 168 utf16_le.c UTF-16LE encoding. 169 utf32_be.c UTF-32BE encoding. 170 utf32_le.c UTF-32LE encoding. 171 unicode.c common codes of Unicode encoding. 172 173 win32/Makefile Makefile for Win32 (VC++) 174 win32/config.h config.h for Win32 175 176 177 178ToDo 179 180 ? case fold flag: Katakana <-> Hiragana. 181 ? add ONIG_OPTION_NOTBOS/NOTEOS. (\A, \z, \Z) 182 ?? \X (== \PM\pM*) 183 ?? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS. 184 ?? transmission stopper. (return ONIG_STOP from match_at()) 185 186and I'm thankful to Akinori MUSHA. 187 188 189Mail Address: K.Kosako <kkosako0@gmail.com> 190