GB2312 Tutorials - Herong's Tutorial Examples - v4.04, by Herong Yang
GB2312, GBK and GB18030
GBK Character Set is an extension of GB2312. GB18030 is an extension of GBK.
After GB2312 was introduced in 1980, the Chinese Government has extended the character set twice. So today we have 3 Chinese character set standards:
Here are more detailed descriptions of these standards:
1. What Is GB2312 Character Set? GB2312 Character Set is a set of 7445 commonly used Chinese characters established by the government of China in 1980.
GB2312 Encoding uses the following codepoints:
1-byte codes: {0x00-0x7F} Same as ASCII codes 2-byte codes: {0x81-0xCE}{0x81-0xCE} Derived from GB2312 Native Codes by adding 0x80 to both bytes.
2. What Is GBK Character Set? GBK (国标扩展码) Character Set is an extension of GB2312 with 21,886 characters. GBK was established by the government of China in 1995 to cover most Chinese characters introduced in Unicode 1.0.1.
GBK Encoding uses the following codepoints:
1-byte codes: {0x00-0x7F} Same as ASCII codes 2-byte codes: {0x81-0xFE}{0x40-0x7E} and {0x81-0xFE}{0x80-0xFE} GB2312 codes plus new characters added
3. What Is GB18030 Character Set? GB18030 Character Set is an extension of GBK established by the government of China in 2005 GB18030 use a 4-byte encoding to match the capacity of the surrogate character mechanism introduced in Unicode 2.0.
GB18030 Encoding uses the following codepoints:
1-byte codes: {0x00-0x7F} Same as ASCII codes 2-byte codes: {0x81-0xFE}{0x40-0x7E} and {0x81-0xFE}{0x80-0xFE} Same as GBK codes 4-byte codes: {81-FE}{30-39}{81-FE}{30-39} Maps linearly to Unicode codes as: GB+81308130 ... = U+0080 ... U+FFFF GB+90308130 ... = U+10000 ... U+10FFFF
Table of Contents
GB2312 Location Codes and Native Codes
GB2312Unicode.java - GB2312 to Unicode Mapping
GB2312 to Unicode Mapping - Non-Chinese Characters
GB2312 to Unicode Mapping - Level 1 Characters
GB2312 to Unicode Mapping - Level 2 Characters
UnicodeGB2312.java - Unicode to GB2312 Mapping
Unicode to GB2312 Mapping - All 7,445 Characters