Chinese Web Sites Using PHP - v2.23, by Herong Yang
Generate 8-Bit Encoding Tables
This section provides a tutorial example on how to generate 8-bit encoding tables with a PHP script.
Since 8-bit encodings play an important role in generating corrupted Chinese text, I decide to write a PHP script to generate the encoding table of a given encoding name.
<?php #- 8-Bit-Encoding-Table.php #- Copyright (c) 2005 HerongYang.com. All Rights Reserved. $encoding = $argv[1]; $table = " 0123456789abcdef\n"; for ($i=0; $i<16; $i++) { $line = ""; for ($j=0; $j<16; $j++) { $code = dechex($i).dechex($j); if ($i==0 || $i==1) $code = "00"; $line .= $code; } $table .= dechex($i)."x ".hex2bin($line)."\n"; } $encoded = iconv($encoding, "UTF-8//IGNORE", $table); print($encoded); ?>
This script, 8-Bit-Encoding-Table.php, uses a nested loop to build a 8-bit byte table. The first 2 lines are kept empty to avoid control control characters. The iconv() function is used generate the final encoding table using "UTF-8" as the presentation encoding for my macOS computer.
8-Bit-Encoding-Table.php produce the output table for any encoding supported by the "iconv" command as shown below:
herong$ iconv -l | more ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ... UTF-8 UTF8 UTF-8-MAC UTF8-MAC ISO-10646-UCS-2 UCS-2 CSUNICODE UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11 UCS-2LE UNICODELITTLE ISO-10646-UCS-4 UCS-4 CSUCS4 UCS-4BE UCS-4LE UTF-16 UTF-16BE UTF-16LE UTF-32 UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7 UCS-2-INTERNAL UCS-2-SWAPPED UCS-4-INTERNAL UCS-4-SWAPPED C99 CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 ... ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ... ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9 CP1250 MS-EE WINDOWS-1250 CP1251 MS-CYRL WINDOWS-1251 CP1252 MS-ANSI WINDOWS-1252 CP1253 MS-GREEK WINDOWS-1253 CP1254 MS-TURK WINDOWS-1254 CP1255 MS-HEBR WINDOWS-1255 CP1256 MS-ARAB WINDOWS-1256 CP1257 WINBALTRIM WINDOWS-1257 CP1258 WINDOWS-1258 850 CP850 IBM850 CSPC850MULTILINGUAL 862 CP862 IBM862 CSPC862LATINHEBREW 866 CP866 IBM866 CSIBM866 ...
The picture below shows the encoding table for Extended ASCII or CP437 (IBM437) by running "php 8-Bit-Encoding-Table.php CP437":
The picture belows shows the encoding table for ISO-8859-1 or Latin-1 by running "php 8-Bit-Encoding-Table.php ISO-8859-1":
Table of Contents
PHP Installation on Windows Systems
Integrating PHP with Apache Web Server
charset="*" - Encodings on Chinese Web Pages
Chinese Characters in PHP String Literals
Multibyte String Functions in UTF-8 Encoding
Input Text Data from Web Forms
Input Chinese Text Data from Web Forms
MySQL - Installation on Windows
MySQL - Connecting PHP to Database
MySQL - Character Set and Encoding
MySQL - Sending Non-ASCII Text to MySQL
Retrieving Chinese Text from Database to Web Pages
Input Chinese Text Data to MySQL Database
►Chinese Text Encoding Conversion and Corruptions
Detect System Default Encoding
Root Cause of Corrupted Chinese Text
Corrupted Chinese File Name with Un-ZIP
►Generate 8-Bit Encoding Tables
Restore Corrupted Chinese Text