Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

Character Encoding in Java

This chapter provides notes and tutorial examples on character encoding in Java. Topics including supported encodings in JDK 1.4.1; using encoding and decoding methods; examples of encoded byte sequences of various encodings.

What Is Character Encoding?

Supported Character Encodings in JDK 1.4.1

EncodingSampler.java - Testing encode() Methods

Examples of CP1252 and ISO-8859-1 Encodings

Examples of US-ASCII, UTF-8, UTF-16 and UTF-16BE Encodings

Examples of GB18030 Encoding

Testing decode() Methods

Conclusions:

  • As of JDK 1.4, Java only support Unicode version 3.0.
  • 48 encodings are supported in JDK 1.4.
  • JDK 1.4 offers 4 ways to encode and decode characters with any given encoding.
  • JDK 1.4 supports UTF-16 encoding in the Big-Endian with BOM format by default.

Notes and sample codes presented in this chapter are based on JDK 1.4.1_01.

Dr. Herong Yang, updated in 2009
Character Encoding in Java