Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

What Is Character Encoding?

This section provides a quick introduction of character encoding in Java language and the JDK package provided by Sun Microsystems.

Character Encoding: A map scheme between code points of a code character set and sequences of bytes.

Coded Character Set: A character set in which each character has an assigned integral number.

Code Point: An integral number assigned to a character in a coded character set.

Unicode: A coded character set that contains all characters used in the written languages of the world and special symbols.

As of 1.4.1, JDK supports Unicode 3.0, based on the information provided in the reference document of java.lang.Character class.

I am not sure how future version of JDK is going to support Unicode 3.1, because it now contains characters with code points greater than U+FFFF, which is the maximum value of 'char' type in Java.

Because of the 'char' limitation, JDK 1.4.1 can only support encoding and decoding code points in the 16-bit range: U+0000...U+FFFF.

Sections in This Chapter

What Is Character Encoding?

Supported Character Encodings in JDK 1.4.1

EncodingSampler.java - Testing encode() Methods

Examples of CP1252 and ISO-8859-1 Encodings

Examples of US-ASCII, UTF-8, UTF-16 and UTF-16BE Encodings

Examples of GB18030 Encoding

Testing decode() Methods

Dr. Herong Yang, updated in 2009
What Is Character Encoding?