Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
Commonly Used Character Sets and Encodings
This section provides a list of commonly used character sets and their encodings.
The following table summaries some commonly used character sets and encodings:
Character Encoding # of Byte Language Set Bytes Type ASCII ASCII 1 7-bit English Latin1 ISO-8859-1 1 8-bit Latin languages GB2312-1980 GB 1-2 8-bit Chinese GB2312-1980 EUC-CN 1-2 8-bit Chinese GB2312-1980 HZ 1-2 7-bit Chinese GBK GBK 1-2 8-bit Chinese GB18030-2000 GB18030-2000 1-4 8-bit Chinese Big5 Big5 1-2 8-bit Chinese CNS 11643-1992 EUC-TW 1-4 8-bit Chinese JIS EUC-JP 1-2 8-bit Japanese JIS ISO-2022-JP 1-2 7-bit Japanese JIS Shift-JIS 1-2 8-bit Japanese KS EUC-KR 1-2 8-bit Korean KS ISO-2022-KR 1-2 7-bit Korean Unicode 3.0 UTF-7 1-3 8-bit Multilingual Unicode 3.0 UTF-8 1-3 8-bit Multilingual Unicode 3.0 UTF-16BE 2 8-bit Multilingual Unicode 3.0 UTF-16LE 2 8-bit Multilingual Unicode 3.1 UTF-8 1-4 8-bit Multilingual
Table of Contents
►Commonly Used Character Sets and Encodings
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor