Introduction to GB2312 Character Set
GB2312 Character Set
GB: An abbreviation of Guojia Biaozhun, or Guo Biao, meaning
"national standard" in Chinese.
GB2312: A coded character set established
by the government of People's Republic of China in 1980.
Main features of GB2312:
- It contains 7445 characters, including 6763 Hanzi and 682 non-Hanzi characters.
- It is for simplified Chinese characters only. The traditional Chinese characters
are included in Big5 character set.
- It is used mainly in mainland China and Singapore.
GB2312 arranges characters into a matrix of 94 rows and 94 columns
based on the following rules:
Rows Chars Characters
01 94 Special symbols
02 72 Paragraph numbers
03 94 Latin characters
04 83 Hiragana characters
05 86 Katakana characters
06 48 Greek characters
07 66 Cyrillic characters
08 63 Pinyin accented vowels and zhuyin symbols
09 76 Box and table drawing symbols
16-55 3755 Hanzi level 1, ordered by pinyin
56-87 3008 Hanzi level 2, ordered by radical, then stroke
This book provides you a list of all characters in GB2312 and thier row numbers and
GB2312 assigns a 2-byte native code for each character. The first byte
is called the high byte, containing the row number plus 32; the second byte
is called the low byte, containing the column number plus 32. For example,
if a character is located at row 16 and column 1, its high byte will be
16 + 32 = 48 (0x30), and log byte will be 1 + 32 = 33 (0x21). Put them together,
its native code will be 0x3021.
I guess the reason to add 32 on both row number and column is for the byte value
to not fall into the low value range, which is usually reserved to represents
controlling commands in many computer systems.
However, the byte values of GB2312 native codes are still colliding with
ASCII codes. To resolve this problem, a value of 128 is added to both bytes
of the native codes. For example, if a character is located at row 16 and column 1,
its native code will be 0x3021, and its modified code will be 0xB0A1.
These modified codes are adopted as the GB2312 standard codes,
which can be safely mixed together with ASCII codes.
This book provides you a list all GB2312 characters and their codes.
GB2312 vs. Unicode
GB2312 character set is sub set of Unicode character set. This means that
every character defined in GB2312 is also defined in Unicode.
However, GB2312 codes and Unicode codes are totally un-related.
For example, GB2312 character with code value of 0xB0A1 has a Unicode
code value of 0x554A.
There no mathematical formula to convert a GB2312 code to a Unicode code
of the same character.
This book provides you a complet map of all GB2312 codes and thier
corresponding Unicode codes. The corresponding UTF-8 (Unicode Transformation
Format - 8-bit) are also listed in the map.