Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

Commonly Used Character Sets and Encodings

This section provides a list of commonly used character sets and their encodings.

The following table summaries some commonly used character sets and encodings:

Character      Encoding       # of    Byte    Language
Set                           Bytes   Type  

ASCII          ASCII          1       7-bit   English
Latin1         ISO-8859-1     1       8-bit   Latin languages
GB2312-1980    GB             1-2     8-bit   Chinese
GB2312-1980    EUC-CN         1-2     8-bit   Chinese
GB2312-1980    HZ             1-2     7-bit   Chinese
GBK            GBK            1-2     8-bit   Chinese
GB18030-2000   GB18030-2000   1-4     8-bit   Chinese
Big5           Big5           1-2     8-bit   Chinese
CNS 11643-1992 EUC-TW         1-4     8-bit   Chinese
JIS            EUC-JP         1-2     8-bit   Japanese
JIS            ISO-2022-JP    1-2     7-bit   Japanese
JIS            Shift-JIS      1-2     8-bit   Japanese
KS             EUC-KR         1-2     8-bit   Korean
KS             ISO-2022-KR    1-2     7-bit   Korean
Unicode 3.0    UTF-7          1-3     8-bit   Multilingual
Unicode 3.0    UTF-8          1-3     8-bit   Multilingual
Unicode 3.0    UTF-16BE       2       8-bit   Multilingual
Unicode 3.0    UTF-16LE       2       8-bit   Multilingual
Unicode 3.1    UTF-8          1-4     8-bit   Multilingual

Sections in This Chapter

What Is Character Set?

Commonly Used Character Sets and Encodings

Dr. Herong Yang, updated in 2009
Commonly Used Character Sets and Encodings