Herong's Tutorial Notes on GB2312 Character Set
Dr. Herong Yang, Version 3.05

Introduction to GB2312 Character Set

GB2312 Character Set

GB: An abbreviation of Guojia Biaozhun, or Guo Biao, meaning "national standard" in Chinese.

GB2312: A coded character set established by the government of People's Republic of China in 1980.

Main features of GB2312:

  • It contains 7445 characters, including 6763 Hanzi and 682 non-Hanzi characters.
  • It is for simplified Chinese characters only. The traditional Chinese characters are included in Big5 character set.
  • It is used mainly in mainland China and Singapore.

GB2312 arranges characters into a matrix of 94 rows and 94 columns based on the following rules:

        # of 
Rows   Chars   Characters
01        94   Special symbols
02        72   Paragraph numbers
03        94   Latin characters
04        83   Hiragana characters
05        86   Katakana characters
06        48   Greek characters
07        66   Cyrillic characters
08        63   Pinyin accented vowels and zhuyin symbols
09        76   Box and table drawing symbols
16-55   3755   Hanzi level 1, ordered by pinyin
56-87   3008   Hanzi level 2, ordered by radical, then stroke

This book provides you a list of all characters in GB2312 and thier row numbers and columns.

GB2312 Codes

GB2312 assigns a 2-byte native code for each character. The first byte is called the high byte, containing the row number plus 32; the second byte is called the low byte, containing the column number plus 32. For example, if a character is located at row 16 and column 1, its high byte will be 16 + 32 = 48 (0x30), and log byte will be 1 + 32 = 33 (0x21). Put them together, its native code will be 0x3021.

I guess the reason to add 32 on both row number and column is for the byte value to not fall into the low value range, which is usually reserved to represents controlling commands in many computer systems.

However, the byte values of GB2312 native codes are still colliding with ASCII codes. To resolve this problem, a value of 128 is added to both bytes of the native codes. For example, if a character is located at row 16 and column 1, its native code will be 0x3021, and its modified code will be 0xB0A1.

These modified codes are adopted as the GB2312 standard codes, which can be safely mixed together with ASCII codes.

This book provides you a list all GB2312 characters and their codes.

GB2312 vs. Unicode

GB2312 character set is sub set of Unicode character set. This means that every character defined in GB2312 is also defined in Unicode.

However, GB2312 codes and Unicode codes are totally un-related. For example, GB2312 character with code value of 0xB0A1 has a Unicode code value of 0x554A. There no mathematical formula to convert a GB2312 code to a Unicode code of the same character.

This book provides you a complet map of all GB2312 codes and thier corresponding Unicode codes. The corresponding UTF-8 (Unicode Transformation Format - 8-bit) are also listed in the map.

Dr. Herong Yang, updated in 2007
Herong's Tutorial Notes on GB2312 Character Set - Introduction to GB2312 Character Set