Herong's Tutorial Notes on Unicode
Dr. Herong Yang, Version 4.02

GB2312 Character Set and Encodings

GB2312 Character Set

GB: An abbreviation of Guojia Biaozhun, or Buo Biao, meaning "national standard" in Chinese.

GB2312-1980: A coded character set established by the government of People's Republic of China (PRC) in 1980.

Main features of GB2312-1980:

  • It contains 7445 characters, including 6763 Hanzi and 682 non-Hanzi characters.
  • It is for simplified Chinese characters only. The traditional Chinese characters are included in Big5 character set.
  • It is used mainly in China and Singapore.

GB2312-1980 arranges characters into a matrix of 94 rows and 94 columns. The rows are called quwei, and are organized as follows:

Rows     # of 
Qu Wei   Chars   Characters
01         94    Special symbols
02         72    Paragraph numbers
03         94    GB 1988-80 (ISO 646-CN) 
04         83    Hiragana
05         86    Katakana
06         48    Greek
07         66    Cyrillic
08         63    Pinyin accented vowels and zhuyin symbols
09         76    Box and table drawing pieces
16-55    3755    Hanzi level 1, ordered by pinyin
56-87    3008    Hanzi level 2, ordered by radical, then stroke

GB2312-1980 is a Double-Byte Character Set (DBCS), in which code point values requires 2-byte integers to hold. This is very different than the ASCII and Latin 1 character sets where every code point value can be hold by a 1-byte integer.

EUC-CN Encoding

EUC-CN: An encoding scheme for GB2312-1980 character set. EUC-CN stands for Extended Unix Code for China.

EUC-CN an 8-bit encoding schema.

EUC-CN is the default encoding for GB2312-1980.

Dr. Herong Yang, updated in 2007
Herong's Tutorial Notes on Unicode - GB2312 Character Set and Encodings