Unicode Tutorials - Herong's Tutorial Examples - v5.31, by Herong Yang
GB2312 Encoding for GB2312 Character Set
This section provides a quick introduction of the GB2312 encoding for the GB2312 character set. GB2312 is a 2-byte (8 bits per bytes) encoding.
GB2312 encoding is the main encoding for the GB2312 character set. GB2312 encoding is based on native code values of GB2312 characters.
The native code value of each GB2312 character contains 2 bytes. The first byte is called the high byte, containing the row number plus 32; the second byte is called the low byte, containing the column number plus 32. For example, if a character is located at row 16 and column 1, its high byte will be 16 + 32 = 48 (0x30), and log byte will be 1 + 32 = 33 (0x21). Put them together, its native code value will be 0x3021.
I guess that the reason to add 32 on both the row number and the column number is for the byte value to not fall into the low value range, which is usually reserved to represent controlling commands in many computer systems.
However, byte values of GB2312 native codes are not directly used as GB2312 encoding byte sequences, because they are still colliding with ASCII encoding types. To resolve this problem, a value of 128 is added to both bytes of native codes. For example, if a character is located at row 16 and column 1, its native code will be 0x3021, and its modified code will be 0xB0A1.
These modified codes are adopted as the GB2312 encoding, which can be safely mixed together with the ASCII encoding.
GB2312 encoding is also called EUC-CN (Extended Unix Code for China).
GB2312 character set has another encoding called HZ, which maps each GB2312 character to 2 7-bit bytes uses ~{...~} to separate GB2312 characters from ASCII characters.
Table of Contents
ASCII Character Set and Encoding
►GB2312 Character Set and Encoding
GB2312 Character Set for Chinese Characters
►GB2312 Encoding for GB2312 Character Set
Relation of GB2312 and Unicode
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor
Using Microsoft Excel as a Unicode Text Editor
Unicode Code Point Blocks: 0000 - 0FFF
Unicode Code Point Blocks: 1000 - FFFF
Unicode Code Point Blocks: 10000 - 11FFF