GB2312 Tutorials - Herong's Tutorial Examples - v4.04, by Herong Yang
GB2312 Location Codes and Native Codes
GB2312 Location Codes represent locations of characters in the GB2312 table. GB2312 Native Codes are 2 7-bit bytes derived from Location Codes.
As part of the GB2312 standard, each character has been assigned with 2 codes:
Here are more detailed descriptions of Location Code and Native Code:
1. What IS Location Code? - The Location Code of a GB2312 character is the combination of the row number (区) and the column number (位) of the location of the character in the GB2312 table.
For example, the Chinese character 啊 is located at row 16 and column 1 in the GB2312 table. So the Location Code of 啊 is (16,1).
Since there are 94 rows and 94 columns in the GB2312 table, Location Codes will be in the range of (1,1) and (94,94).
2. What Is Native Code? - The Native Code of a GB2312 character is a sequence of 2 bytes represents the character in computer systems. The first byte of the code is called the high byte, and the second byte of the code is called the low byte.
The high byte is derived from the row number of the character by adding 32 to the row number value.
The low byte is derived from the column number of the character by adding 32 to the column number value.
For example, the Chinese character 啊 has a Location Code of (16,01). So its high byte is 0x10, because 16 + 32 = 48, or 0x30. Its low byte is 0x21, because 1 + 32 = 33, or 0x21. Putting them together, the Native Code of 啊 is 0x3021.
I guess the reason to add 32 on both the row number and the column number is for resulting byte values to not fall into the low byte value range. In computer systems, low value bytes are usually reserved to represent controlling commands.
Native Codes will be in the range of (0x21,0x21) and (0x7E,0x7E), Since there are only 94, or 0x5E rows and 94, or 0x5E columns in the GB2312 table.
GB2312 Native Codes are perfectly good for storing Chinese documents as computer files and transmitting them over computer networks without any problem, because:
However, GB2312 Native Codes are not compatible with ASCII Codes. In other words, GB2312 Native Codes and ASCII Codes can not be mixed together in a single file. This is because there is no way to differentiate if a byte is an ASCII Code or a high/low byte of a GB2312 Native Code.
For example, the byte 0x30 in a GB2312 Native Code and ASCII Code mixed file could be the ASCII '0' character, or the high byte of GB2312 啊 character.
The next section describes some solutions to this problem.
Table of Contents
►GB2312 Location Codes and Native Codes
GB2312Unicode.java - GB2312 to Unicode Mapping
GB2312 to Unicode Mapping - Non-Chinese Characters
GB2312 to Unicode Mapping - Level 1 Characters
GB2312 to Unicode Mapping - Level 2 Characters
UnicodeGB2312.java - Unicode to GB2312 Mapping
Unicode to GB2312 Mapping - All 7,445 Characters