|
GB2312 Character Set and Encodings
GB2312 Character Set
GB: An abbreviation of Guojia Biaozhun, or Buo Biao, meaning
"national standard" in Chinese.
GB2312-1980: A coded character set established
by the government of People's Republic of China (PRC) in 1980.
Main features of GB2312-1980:
- It contains 7445 characters, including 6763 Hanzi and 682 non-Hanzi characters.
- It is for simplified Chinese characters only. The traditional Chinese characters
are included in Big5 character set.
- It is used mainly in China and Singapore.
GB2312-1980 arranges characters into a matrix of 94 rows and 94 columns.
The rows are called quwei, and are organized as follows:
Rows # of
Qu Wei Chars Characters
01 94 Special symbols
02 72 Paragraph numbers
03 94 GB 1988-80 (ISO 646-CN)
04 83 Hiragana
05 86 Katakana
06 48 Greek
07 66 Cyrillic
08 63 Pinyin accented vowels and zhuyin symbols
09 76 Box and table drawing pieces
16-55 3755 Hanzi level 1, ordered by pinyin
56-87 3008 Hanzi level 2, ordered by radical, then stroke
GB2312-1980 is a Double-Byte Character Set (DBCS), in which code point values
requires 2-byte integers to hold. This is very different than
the ASCII and Latin 1 character sets where every code point value can be hold
by a 1-byte integer.
EUC-CN Encoding
EUC-CN: An encoding scheme for GB2312-1980 character set.
EUC-CN stands for Extended Unix Code for China.
EUC-CN an 8-bit encoding schema.
EUC-CN is the default encoding for GB2312-1980.
|