JDK (Java Development Kit) Tutorials
Dr. Herong Yang, Version 5.00

Character Set Encoding Classes and Methods

This chapter provides tutorial notes and example codes on character set related classes. Topics include collection types: what is character encoding; JDK supported encodings like: CP1252, ISO-8859, ASCII, UTF-8, UTF-16, GB18030; methods to encode characters to byte sequences; methods to decode byte sequences back to characters.

What Is Character Encoding?

Supported Character Encodings in JDK

Charset.encode() - Method to Encode Characters

Running EncodingSampler.java with CP1252 Encoding

Running EncodingSampler.java with ISO-8859-1 and US-ASCII

Running EncodingSampler.java with UTF-8, UTF-16, UTF16-BE

Running EncodingSampler.java with GB18030

Charset.decode() - Method to Decode Byte Sequences

Conclusion:

  • JDK supports many encodings for many commonly used character sets like: CP1252, ISO-8859, ASCII, UTF-8, UTF-16, GB18030.
  • The java.nio.charset.Charset.encode() allows you to encode characters to byte sequences according to a give encoding.
  • The java.nio.charset.Charset.decode() allows you to decode byte sequences back to characters according to a give encoding.
  • The JDK default encoding is CP1252 on Windows system.

Notes and sample codes bellow are based on JDK/J2SDK 1.4.1_01.

Dr. Herong Yang, updated in 2008
Character Set Encoding Classes and Methods