This section provides a quick introduction of some basic concepts like character set, coded character set, code point, character encoding.
What Is a Character Set?
A character set is a collection of characters used in the a language, and/or
symbols used in a symbolic system. Examples of character set: numberic numbers,
alphabetical letters, and Chinese characters.
What Is a Coded Character Set?
A coded character set is a character set in which each character has an
assigned integral number. Examples of coded character set: US-ASCII, EBCDIC,
ISO-8859-1, GB2312-1980, and Unicode. Note that:
If character set B is a super set of character set A, we say B is backward
compatible with A.
Since we are only interested in coded character sets, so from now on I will
use the term "character set" as "coded character set".
What Is a Code Point?
A code point is an integral number assigned to a character in a coded character set.
What Is a Character Encoding
A character encoding is a map scheme between code points of a coded character
set and sequences of bytes. Note that:
One coded character set may have many character encodings.
One coded character set must have at least one character encoding.