GB2312 Encodings

GB2312 Encoding transforms GB2312 Native Codes to the 0x8080-0xFFFF range to reserve 7-bit byte values for ASCII Codes. HZ and ISO-2022-CN Encodings uses escape sequences to switch between GB2312 Native Codes and ASCII Codes.

In order to resolve the incompatibility problem of GB2312 Native Codes and ASCII Code, several encoding schemas have been developed over the years:

Here are more detailed descriptions of these encodings:

1. What Is GB2312 Encoding? - GB2312 Encoding is an encoding that transform GB2312 Native Codes to the 0x8080-0xFFFF range to reserve 7-bit byte values for ASCII Codes. This is done by adding 0x80 to the high byte and the low byte of a GB2312 Native Code.

For example, the Chinese character has a GB2312 Native Code of 0x3021. Its GB2312 Encoding will be 0xB0A1, because 0x30 + 0x80 = 0xB0, and 0x21 + 0x80 = 0xA1.

GB2312 Encoding does resolve the incompatibility problem with ASCII Codes nicely. But the resulting byte sequence will have 8-bit byte values which are not safe to be transmitted over computer networks.

2. What Is HZ Encoding? - HZ Encoding is an encoding designed in 1989 by Fung Fung Lee that uses "~{" and "~}" to group and identify GB2312 Native Codes from ASCII Codes.

The advantage of HZ Encoding is that the resulting byte sequence only have 7-bit bytes, still safe to be transmitted over computer networks. But the extra grouping sequences "~{" and "~}" may cause processing trouble.

For example, "2015~{Dj~} 1~{TB~} 1~{HU~}" is the HZ Encoding of "2015 1 1".

3. What Is ISO-2022-CN Encoding? - ISO-2022-CN Encoding is an encoding developed as part of the ISO-2022 standard to include multiple character sets in a single character encoding system, using difference escape sequences to switch to different character sets.

When using ISO-2022-CN Encoding to mix GB2312 Native Codes with ASCII Codes, you need use "ESC $ ) A" escape sequence to start GB2312 Native Codes, and "ESC ( B" escape sequence to start ASCII Codes.

Similar to HZ Encoding, ISO-2022-CN Encoding is safe to be transmitted over computer networks. But its escape sequences are much heavier than HZ Encoding.

For example, "<ESC>(B2015<ESC>$)ADj<ESC>(B 1<ESC>$)ATB<ESC> 1<ESC>$)AHU" is the ISO-2022-CN Encoding of "2015 1 1".

Out of those 3 Encodings, GB2312 Encoding is more commonly used.

Now we have learned that a character in the GB2312 character set can be identified or represented in 3 ways:

A list of all GB2312 characters and their Location Codes, GB2312 Encodings will be provided later in this book.

Table of Contents

 About This Book

Introduction to GB2312

 What Is GB2312 Character Set

 GB2312 Location Codes and Native Codes

GB2312 Encodings

 GB2312 vs. Unicode

 GB2312, GBK and GB18030

 GB2312 Usage Trends

 GB2312Unicode.java - GB2312 to Unicode Mapping

 GB2312 to Unicode Mapping - Non-Chinese Characters

 GB2312 to Unicode Mapping - Level 1 Characters

 GB2312 to Unicode Mapping - Level 2 Characters

 UnicodeGB2312.java - Unicode to GB2312 Mapping

 Unicode to GB2312 Mapping - All 7,445 Characters

 References of This Book - GB2312 Tutorials

 Full Version in PDF/ePUB