Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
What Is ASCII
This section provides a quick introduction of ASCII (American Standard Code for Information Interchange) character set and encoding.
Before we jump into Unicode character set and Unicode encodings, we should first look at a much older and simpler character set, ASCII.
What Is ASCII? ASCII (American Standard Code for Information Interchange) is a character set and an encoding schema for English letters, numbers and some control characters.
The ASCII specification was published as "American Standard Code for Information Interchange, ASA X3.4-1963" by American Standards Association, in June 17, 1963.
The ASCII character set contains 95 printable characters and 33 control characters, giving a total of 128 characters. Their code points are integers range from 0 to 127, which can be mapped to 7 bits in binary format.
The ASCII encoding is simple, each character is mapped to 1 byte with the leading bit set to 0 and other 7 bits representing the character's code point as an integer.
Here is a picture of an ASCII code chart:
Table of Contents
►ASCII Character Set and Encoding
Listing of ASCII Characters and Encoded Bytes
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor