1========================================== 2 README for I18N Package 3========================================== 4 5o Name and location of package 6 7Name: php-3.0.18-i18n-ja-2 8Location: http://www.happysize.co.jp/techie/php-ja-jp/ 9 ftp://ftp.happysize.co.jp/php-ja-jp/ 10 http://php.vdomains.org/ 11 ftp://ftp.vdomains.org/pub/php-ja-jp/ 12 http://php.jpnnet.com/ 13 14Currently, this I18N version of PHP only adds Japanese support to base 15PHP. It allows you to use Japanese in scripts, as well as conversion 16between various Japanese encodings. It will work perfectly fine with 17ASCII with i18n option enabled. (note: executable is bit larger due 18to UNICODE table). The basic design aproach is to allow for other 19languages to be added in the future. Developers are encourage to join 20us! 21 22For more information on Japanese encodings, please refer to the 23section "Additional Notes." 24 25 26o What is this package? 27 28This package allows you to handle multiple Japanese encodings (SJIS, EUC, 29UTF-8, JIS) in PHP. If you find any bugs in this package, please report 30them to the appropriate mailing list. For now, the PHP-jp mailing list 31is the best place for this. 32 33PHP-jp ML mailto:PHP-jp@sidecar.ics.es.osaka-u.ac.jp 34 http://sidecar.ics.es.osaka-u.ac.jp/php-jp/ 35 (discussions are in Japanese) 36 37 38o Who should use this 39 40Due to lack of documentation, it's not intended for beginners. If 41something goes wrong, be prepared to fix it on your own. 42 43 44o Warranty and Copyright 45 46There is no warranty with this package. Use it at your own risk. 47 48Please refer to the source code for the copyrights. In general, each 49program's copyright is owned by the programmer. Unless you obey the 50copyright holders restrictions, you are not allowed to use it in any 51form. 52 53 54o Redistribution 55 56As described in the source code, this package and the components are 57allowed to be redistributed with certain restrictions. 58 59Due to this package being still in beta, please try to redistribute 60it as an entire package. Please try not to distribute it as a form 61of patch. Because we would prefer to have this package distributed 62as one single package (not patch of patch of patch), avoid releasing 63any patch to this package. 64 65 66o Who made this 67 68A team of volunteers, PHP3 Internationalization, has been contributing 69their free time producing it. Although we are not related to the core 70PHP programmers, we are hoping to have our modifications merged into the 71core distribution in the near future. Thus, we did not call this a 72"Japanese Patch" (or distribution). Our final goal is to have true 73i18nized PHP! 74 75For anyone interested in this project, please drop us a line. 76 77Contact Address: 78 phpj-dev@kage.net 79 (Discussions are in Japanese, but feel free to write us in English) 80 81Webpage (English and Japanese): 82 http://php.jpnnet.com/ 83 84Project Outline (Japanese): 85 http://www.happysize.co.jp/techie/php-ja-jp/spec.htm 86 87Developers: 88 Hironori Sato <satoh@jpnnet.com> 89 Shigeru Kanemoto <sgk@happysize.co.jp> 90 Tsukada Takuya <tsukada@fminn.nagano.nagano.jp> 91 U. Kenkichi <kenkichi@axes.co.jp> 92 Tateyama <tateyan@amy.hi-ho.ne.jp> 93 Other gracious contributors 94 95 96o Future plans 97 98- fulfilling what's written in outline 99- support for other languages other than Japanese 100- make the character conversion as a library (?) 101- more testing 102 103 104o Special Thanks to 105 106PHP Japanese webpage maintainer, Hirokawa-san 107 http://www.cityfujisawa.ne.jp/%7Elouis/apps/phpfi/ 108PHP-JP ML's Yamamoto-san 109 http://sidecar.ics.es.osaka-u.ac.jp/php-jp/ 110Previous jp-patch developers 111 112 113 114========================================== 115 Advantages of using I18N package 116========================================== 117 118- allows you to use various character encodings for script files and 119 http output 120- distinguish character encoding in POST/GET/COOKIE 121- proper mail output using JIS as body and MIME/Base64/JIS subject 122- if http output's Content-Type is text/html, it will set proper charset 123- stable character encoding conversion 124- multibyte regex 125 126 127 128========================================== 129 Installation 130========================================== 131 132o Summary 133 134Add --enable-i18n option when running configure. For your own setup, 135add any other appropriate options as well. 136 137Don't forget to copy php3.ini-dist to desired location. 138(ex. /usr/local/lib/php3.ini) 139 140If you have already installed PHP3, copy all the entries in php3.ini-dist 141which start with "i18n.xxxx" to php3.ini. 142 143 144o configure option 145 --enable-i18n 146 include i18n features 147 148 --enable-mbregex 149 include multibyte regex library 150 (without i18n enabled, mbregex functions will not function) 151 152 153o creating cgi version 154 155 % tar xvzf php-3.0.18-i18n-ja-2.tar.gz 156 % cd php-3.0.18-i18n-ja-2 157 % ./configure --enable-i18n --enable-mbregex 158 % make 159 160 161o creating Apache version (regular module) 162 163 % tar xvzf php-3.0.18-i18n-ja-2.tar.gz 164 % tar xvzf apache_1.3.x.tar.gz 165 % cd apache_1.3.x 166 % ./configure 167 % cd ../php-3.0.18-i18n-ja-2 168 % ./configure --with-apache=../apache_1.3.x --enable-i18n --enable-mbregex 169 % make 170 % make install 171 % cd ../apache_1.3.x 172 % ./configure --activate-module=src/modules/php3/libphp3.a 173 % make 174 % make install 175 176 177o creating Apache DSO version 178 179 create DSO capable Apache first 180 % tar xvzf apache_1.3.x.tar.gz 181 % cd apache-1.3.x 182 % ./configure --enable-shared=max 183 % make 184 % make install 185 186 now create php3 187 % cd php-3.0.18-i18n-ja-2 188 % ./configure --with-apxs=/usr/local/apache/bin/apxs --enable-i18n \ 189 --enable-mbregex 190 % make 191 % make install 192 193 194========================================== 195 Additional Notes 196========================================== 197 198o Multibyte regex library 199 200From beta4, we have included the multibyte (mb) regex library which comes with 201Ruby. With this addition, you can now use regex in EUC, SJIS and UTF-8 202encoding. To avoid any conflicts with HSREGEX included with Apache, 203each function name has been changed. Therefore, mb regex functions are 204named differently from the original ereg functions in PHP. The character 205encoding used in mb regex is configured in i18n.internal_encoding. 206 207 208o Binary Output 209 210If http output encoding is set to other than 'pass', conversion of encoding 211from internal encoding to http output is done automatically. Thus, 212if you prefer to spit out anything in raw binary format, your data 213may be corrupted. In such event, set http_output to 'pass'. 214 215ex. 216 <? 217 i18n_http_output("pass"); 218 ... 219 echo $the_binary_data_string; 220 ?> 221 222 223o Content-Type 224 225Depending on the setting of http_output, PHP will output the proper charset. 226ex. Content-Type: text/html; charset="..." 227 228Be aware of following: 229 230- If you set Content-Type header using header() function, that will 231 override the automatic addition of charset. 232- Be cautious when you set i18n_http_output, since if any output is 233 made prior to this, proper header may have been sent out to the 234 client already. 235 236 237o In the event of trouble 238 239If you find any bugs or trouble, please contact us at the above address. 240It may help us to track the problem if you send us the script as well. 241 242If you encounter any memory related error such as segmentation violation, 243add --enable-debug when you run configure. This will give you more 244detail information on where error has occurred. The error is stored 245in the server log or regular http output in CGI mode. 246 247 248o About Japanese encodings 249 250Due to historical reason, there are multiple character encodings used 251for Japanese. The most common encodings are: SJIS, EUC, JIS, and UTF-8. 252Here are (very) brief description of them: 253 254EUC 255 commonly used in UNIX environment 256 8bit-8bit combo 257 always >=0x80 258 259SJIS 260 commonly used in Mac or PCs 261 similar to EUC 262 mostly 8bit-8bit (some 8bit-7bit) 263 mostly >=0x80 264 there are some halfwidth (size of ASCII) multibytes 265 266JIS 267 commonly used in 7bit environment (nntp and smtp) 268 starts with escaping char, \033 and a few more characters 269 270UTF-8 271 16bit+ encoding 272 defines many languages existing in this world 273 see http://www.unicode.org/ for more detail 274 275Because of having all these character encodings, PHP needs to translate 276between these encodings on the fly. Also, the addition of the mb regex 277library allows you to handle mb strings without fear of getting mb char 278chopped in half. 279 280Since Japanese is not the only language with multiple encodings, we 281encourage other developers to modify our code to suit your needs. We 282definitely need people to work with Korean, Chinese (both traditional 283and simplified), and Russian. Let us know if you are interested in 284this project! 285 286 287 288========================================== 289 php3.ini setting 290========================================== 291 292The following init options will allow you to change the default settings. 293Define these settings in the global section of php3.ini. 294 295All keywords are case-insensitive. 296 297o Encoding naming 298 299 For each encoding, there are three names: standarized, alias, MIME 300 301 - UTF-8 302 standard: UTF-8 303 alias: N/A 304 mime: UTF-8 305 306 - ASCII 307 standard: ASCII 308 alias: N/A 309 mime: US-ASCII 310 311 - Japanese EUC 312 standard: EUC-JP 313 alias: EUC, EUC_JP, eucJP, x-euc-jp 314 mime: EUC-JP 315 316 - Shift JIS 317 standard: SJIS 318 alias: x-sjis, MS_Kanji 319 mime: Shift_JIS 320 321 - JIS 322 standard: JIS 323 alias: N/A 324 mime: ISO-2022-JP 325 326 - Quoted-Printable 327 standard: Quoted-Printable 328 alias: qprint 329 mime: N/A 330 331 - BASE64 332 standard: BASE64 333 alias: N/A 334 mime: N/A 335 336 - no conversion 337 standard: pass 338 alias: none 339 mime: N/A 340 341 - auto encoding detection 342 standard: auto 343 alias: unknown 344 mime: N/A 345 346 * N/A - Not Applicapable 347 348o i18n.http_output - default http output encoding 349 350 i18n.http_output = EUC-JP|SJIS|JIS|UTF-8|pass 351 EUC-JP : EUC 352 SJIS: SJIS 353 JIS : JIS 354 UTF-8: UTF-8 355 pass: no conversion 356 357 The default is pass (internal encoding is used) 358 It can be re-configured on the fly using i18n_http_output(). 359 360 361o i18n.internal_encoding - internal encoding 362 363 i18n.internal_encoding = EUC-JP|SJIS|UTF-8 364 EUC-JP : EUC 365 SJIS: SJIS 366 UTF-8: UTF-8 367 368 The default is EUC-JP. 369 370 PHP parser is designed based on using ISO-8859-1. For other 371 encodings, following conditions have to be satisfied in order 372 to use them: 373 - per byte encoding 374 - single byte character in range of 00h-7fh which is compatible 375 with ASCII 376 - multibyte without 00h-7fh 377 In case of Japanese, EUC-JP and UTF-8 are the only encoding that 378 meets this criteria. 379 380 If i18n.internal_encoding and i18n.http_output differs, conversion 381 takes place at the time of output. If you convert any data within 382 PHP scripts to URL encoding, BASE64 or Quoted-Printable, encoding 383 stays as defined in i18n.internal_encoding. Thus, if you would 384 prefer to encode in compliance with i18n.http_output, you need 385 to manually convert encoding. 386 387 ex. $str = urlencode( i18n_convert($str, i18n_http_output()) ); 388 389 Encoding such as ISO-2022-** and HZ encoding which uses escape 390 sequences can not be used as internal encoding. If used, they 391 result in following errors: 392 - parser pukes funky error 393 - magic_quotes_*** breaks encoding (SJIS may have similar problem) 394 - string manipulation and regex will malfunction 395 396 397o i18n.script_encoding - script encoding 398 399 i18n.script_encoding = auto|EUC-JP|SJIS|JIS|UTF-8 400 auto: automatic 401 EUC-JP : EUC 402 SJIS: SJIS 403 JIS : JIS 404 UTF-8: UTF-8 405 406 The default is auto. 407 The script's encoding is converted to i18n.internal_encoding before 408 entering the script parser. 409 410 Be aware that auto detection may fail under some conditions. 411 For best auto detection, add multibyte character at beginning of 412 script. 413 414 415o i18n.http_input - handling of http input (GET/POST/COOKIE) 416 417 i18n.http_input = pass|auto 418 auto: auto conversion 419 pass: no conversion 420 421 The default is auto. 422 If set to pass, no conversion will take place. 423 If set to auto, it will automatically detect the encoding. If 424 detection is successful, it will convert to the proper internal 425 encoding. If not, it will assume the input as defined in 426 i18n.http_input_default. 427 428o i18n.http_input_default - default http input encoding 429 430 i18n.http_input_default = pass|EUC-JP|SJIS|JIS|UTF-8 431 pass: no conversion 432 EUC-JP : EUC 433 SJIS: SJIS 434 JIS : JIS 435 UTF-8: UTF-8 436 437 The default is pass. 438 This option is only effective as long as i18n.http_input is set to 439 auto. If the auto detection fails, this encoding is used as an 440 assumption to convert the http input to the internal encoding. 441 If set to pass, no conversion will take place. 442 443o sample settings 444 445 1) For most flexibility, we recommend using following example. 446 i18n.http_output = SJIS 447 i18n.internal_encoding = EUC-JP 448 i18n.script_encoding = auto 449 i18n.http_input = auto 450 i18n.http_input_default = SJIS 451 452 2) To avoid unexpected encoding problems, try these: 453 454 i18n.http_output = pass 455 i18n.internal_encoding = EUC-JP 456 i18n.script_encoding = pass 457 i18n.http_input = pass 458 i18n.http_input_default = pass 459 460 461 462========================================== 463 PHP functions 464========================================== 465 466The following describes the additional PHP functions. 467 468All keywords are case-insensitive. 469 470o i18n_http_output(encoding) 471o encoding = i18n_http_output() 472 473 This will set the http output encoding. Any output following this 474 function will be controlled by this function. If no argument is given, 475 the current http output encode setting is returned. 476 477 encodings 478 EUC-JP : EUC 479 SJIS: SJIS 480 JIS : JIS 481 UTF-8: UTF-8 482 pass: no conversion 483 484 NONE is not allowed 485 486 487o encoding = i18n_internal_encoding() 488 489 Returns the current internal encoding as a string. 490 491 internal encoding 492 EUC-JP : EUC 493 SJIS: SJIS 494 UTF-8: UTF-8 495 496 497o encoding = i18n_http_input() 498 499 Returns http input encoding. 500 501 encodings 502 EUC-JP : EUC 503 SJIS: SJIS 504 JIS : JIS 505 UTF-8: UTF-8 506 pass: no conversion (only if i18n.http_input is set to pass) 507 508 509o string = i18n_convert(string, encoding) 510 string = i18n_convert(string, encoding, pre-conversion-encoding) 511 512 Returns converted string in desired encoding. If 513 pre-conversion-encoding is not defined, the given 514 string is assumed to be in internal encoding. 515 516 encoding 517 EUC-JP : EUC 518 SJIS: SJIS 519 JIS : JIS 520 UTF-8: UTF-8 521 pass: no conversion 522 523 pre-conversion-encoding 524 EUC-JP : EUC 525 SJIS: SJIS 526 JIS : JIS 527 UTF-8: UTF-8 528 pass: no conversion 529 auto: auto detection 530 531 532o encoding = i18n_discover_encoding(string) 533 534 Encoding of the given string is returned (as a string). 535 536 encoding 537 EUC-JP : EUC 538 SJIS: SJIS 539 JIS : JIS 540 UTF-8: UTF-8 541 ASCII: ASCII (only 09h, 0Ah, 0Dh, 20h-7Eh) 542 pass: unable to determine (text is too short to determine) 543 unknown: unknown or possible error 544 545 546o int = mbstrlen(string) 547o int = mbstrlen(string, encoding) 548 549 Returns character length of a given string. If no encoding is defined, 550 the encoding of string is assumed to be the internal encoding. 551 552 encoding 553 EUC-JP : EUC 554 SJIS: SJIS 555 JIS : JIS 556 UTF-8: UTF-8 557 auto: automatic 558 559 560o int = mbstrpos(string1, string2) 561o int = mbstrpos(string1, string2, start) 562o int = mbstrpos(string1, string2, start, encoding) 563 564 Same as strpos. If no encoding is defined, the encoding of string 565 is assumed to be the internal encoding. 566 567 encoding 568 EUC-JP : EUC 569 SJIS: SJIS 570 JIS : JIS 571 UTF-8: UTF-8 572 573 574o int = mbstrrpos(string1, string2) 575o int = mbstrrpos(string1, string2, encoding) 576 577 Same as strrpos. If no encoding is defined, the encoding of string 578 is assumed to be the internal encoding. 579 580 encoding 581 EUC-JP : EUC 582 SJIS: SJIS 583 JIS : JIS 584 UTF-8: UTF-8 585 586 587o string = mbsubstr(string, position) 588o string = mbsubstr(string, position, length) 589o string = mbsubstr(string, position, length, encoding) 590 591 Same as substr. If no encoding is defined, the encoding of string 592 is assumed to be the internal encoding. 593 594 encoding 595 EUC-JP : EUC 596 SJIS: SJIS 597 JIS : JIS 598 UTF-8: UTF-8 599 600 601o string = mbstrcut(string, position) 602o string = mbstrcut(string, position, length) 603o string = mbstrcut(string, position, length, encoding) 604 605 Same as subcut. If position is the 2nd byte of a mb character, it will cut 606 from the first byte of that character. It will cut the string without 607 chopping a single byte from a mb character. In another words, if you 608 set length to 5, you will only get two mb characters. If no encoding 609 is defined, the encoding of string is assumed to be the internal encoding. 610 611 encoding 612 EUC-JP : EUC 613 SJIS: SJIS 614 JIS : JIS 615 UTF-8: UTF-8 616 617 618o string = i18n_mime_header_encode(string) 619 MIME encode the string in the format of =?ISO-2022-JP?B?[string]?=. 620 621 622o string = i18n_mime_header_decode(string) 623 MIME decodes the string. 624 625 626o string = i18n_ja_jp_hantozen(string) 627o string = i18n_ja_jp_hantozen(string, option) 628o string = i18n_ja_jp_hantozen(string, option, encoding) 629 630 Conversion between full width character and halfwidth character. 631 632 option 633 The following options are allowed. The default is "KV". 634 Acronym: FW = fullwidth, HW = halfwidth 635 636 "r" : FW alphabet -> HW alphabet 637 638 "R" : HW alphabet -> FW alphabet 639 640 "n" : FW number -> HW number 641 642 "N" : HW number -> FW number 643 644 "a" : FW alpha numeric (21h-7Eh) -> HW alpha numeric 645 646 "A" : HW alpha numeric (21h-7Eh) -> FW alpha numeric 647 648 "k" : FW katakana -> HW katakana 649 650 "K" : HW katakana -> FW katakana 651 652 "h" : FW hiragana -> HW hiragana 653 654 "H" : HW hiragana -> FW katakana 655 656 "c" : FW katakana -> FW hiragana 657 658 "C" : FW hiragana -> FW katakana 659 660 "V" : merge dakuon character. only works with "K" and "H" option 661 662 encoding 663 If no encoding is defined, the encoding of string is assumed to be 664 the internal encoding. 665 EUC-JP : EUC 666 SJIS: SJIS 667 JIS : JIS 668 UTF-8: UTF-8 669 670 671int = mbereg(regex_pattern, string, string) 672int = mberegi(regex_pattern, string, string) 673 mb version of ereg() and eregi() 674 675 676string = mbereg_replace(regex_pattern, string, string) 677string = mberegi_replace(regex_pattern, string, string) 678 mb version of ereg_replace() and eregi_replace() 679 680 681string_array = mbsplit(regex, string, limit) 682 mb version of split() 683 684 685 686========================================== 687 FAQ 688========================================== 689 690Here, we have gathered some commonly asked questions on PHP-jp mailing 691list. 692 693o To use Japanese in GET method 694 695If you need to assign Japanese text in GET method with argument, such as; 696xxxx.php?data=<Japanese text>, use urlencode function in PHP. If not, 697text may not be passed onto action php properly. 698 699ex: <a href="hoge.php?data=<? echo urlencode($data) ?>">Link</a> 700 701 702o When passing data via GET/POST/COOKIE, \ character sneaks in 703 704When using SJIS as internal encoding, or passed-on data includes '"\, 705PHP automatically inserts escaping character, \. Set magic_quotes_gpc 706in php3.ini from On to Off. An alternative work around to this problem 707is to use StripSlashes(). 708 709If $quote_str is in SJIS and you would like to extract Japanese text, 710use ereg_replace as follows: 711 712ereg_replace(sprintf("([%c-%c%c-%c]\\\\)\\\\",0x81,0x9f,0xe0,0xfc), 713 "\\1",$quote_str); 714 715This will effectively extract Japanese text out of $quote_str. 716 717 718o Sometimes, encoding detection fails 719 720If i18n_http_input() returns 'pass', it's likely that PHP failed to 721detect whether it's SJIS or EUC. In such case, use <input type=hidden 722value="some Japanese text"> to properly detect the incoming text's 723encoding. 724 725 726 727========================================== 728 Japanese Manual 729========================================== 730Translated manual done by "PHP Japanese Manual Project" : 731 732http://www.php.net/manual/ja/manual.php 733 734Starting 3.0.18-i18n-ja, we have removed doc-jp from tarball package. 735 736 737========================================== 738 Change Logs 739========================================== 740 741o 2000-10-28, Rui Hirokawa <hirokawa@php.net> 742 743This patch is derived from php-3.0.15-i18n-ja as well as php-3.0.16 by 744Kuwamura applied to original php-3.0.18. It also includes following fixes: 745 7461) allows you to set charset in mail(). 7472) fixed mbregex definitions to avoid conflicts with system regex 7483) php3.ini-dist now uses PASS for http_output instead of SJIS 749 750o 2000-11-24, Hironori Sato <satoh@yyplanet.com> 751 752Applied above patched and added detection for gdImageStringTTF in configure. 753Following setups are known to work: 754 755gd-1.3-6, gd-devel-1.3-6, freetype-1.3.1-5, freetype-devel-1.3.1-5 756 ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf", 757 i18n_convert("���ܸ�", "UTF-8")); 758 ImageGif($im); 759 760gd-1.7.3-1k1, gd-devel-1.7.3-1k1, freetype-1.3.1-5, freetype-devel-1.3.1-5 761 ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf","���ܸ�"); 762 ImagePng($im); 763 * i18n_internal_encoding = EUC ���� SJIS 764 765For any gd libraries before 1.6.2, you need to use i18n_convert. For 766gd-1.5.2/3, upgrade to anything above 1.7 to use ImageTTFText without 767using i18n_convert. As long as you have internal_encoding set to EUC or 768SJIS, ImageTTFText should work without mojibake. Again, make sure you 769have i18n_http_output("pass") before calling ImageGif, ImagePng, ImageJpeg! 770 771o 2000-12-09, Rui Hirokawa <hirokawa@php.net> 772 773Fixed mail() which was causing segmentation fault when header was null. 774 775