GB2312, GBK and GB18030

GBK Character Set is an extension of GB2312. GB18030 is an extension of GBK.

After GB2312 was introduced in 1980, the Chinese Government has extended the character set twice. So today we have 3 Chinese character set standards:

Here are more detailed descriptions of these standards:

1. What Is GB2312 Character Set? GB2312 Character Set is a set of 7445 commonly used Chinese characters established by the government of China in 1980.

GB2312 Encoding uses the following codepoints:

1-byte codes: {0x00-0x7F}
   Same as ASCII codes

2-byte codes: {0x81-0xCE}{0x81-0xCE}
   Derived from GB2312 Native Codes by adding 0x80 to both bytes.

2. What Is GBK Character Set? GBK (国标扩展码) Character Set is an extension of GB2312 with 21,886 characters. GBK was established by the government of China in 1995 to cover most Chinese characters introduced in Unicode 1.0.1.

GBK Encoding uses the following codepoints:

1-byte codes: {0x00-0x7F}
   Same as ASCII codes

2-byte codes: {0x81-0xFE}{0x40-0x7E} and {0x81-0xFE}{0x80-0xFE}
   GB2312 codes plus new characters added

3. What Is GB18030 Character Set? GB18030 Character Set is an extension of GBK established by the government of China in 2005 GB18030 use a 4-byte encoding to match the capacity of the surrogate character mechanism introduced in Unicode 2.0.

GB18030 Encoding uses the following codepoints:

1-byte codes: {0x00-0x7F}
   Same as ASCII codes

2-byte codes: {0x81-0xFE}{0x40-0x7E} and {0x81-0xFE}{0x80-0xFE}
   Same as GBK codes

4-byte codes: {81-FE}{30-39}{81-FE}{30-39}
   Maps linearly to Unicode codes as:
      GB+81308130 ... = U+0080 ... U+FFFF
      GB+90308130 ... = U+10000 ... U+10FFFF

Table of Contents

 About This Book

Introduction to GB2312

 What Is GB2312 Character Set

 GB2312 Location Codes and Native Codes

 GB2312 Encodings

 GB2312 vs. Unicode

GB2312, GBK and GB18030

 GB2312 Usage Trends

 GB2312Unicode.java - GB2312 to Unicode Mapping

 GB2312 to Unicode Mapping - Non-Chinese Characters

 GB2312 to Unicode Mapping - Level 1 Characters

 GB2312 to Unicode Mapping - Level 2 Characters

 UnicodeGB2312.java - Unicode to GB2312 Mapping

 Unicode to GB2312 Mapping - All 7,445 Characters

 References of This Book - GB2312 Tutorials

 Full Version in PDF/ePUB