JDK (Java Development Kit) Tutorials
Dr. Herong Yang, Version 5.00

Character Set Encoding Maps

This chapter provides tutorial notes and example codes on character set encoding maps. Topics include collection types: encoding map analyzer program; analyzing and print encoding maps for US-ASCII, ISO-8859-1/Latin 1, Windows CP1252, Unicode UTF-8, UTF-16, UTF-16LE, UTF-16BE; sample program to count valid characters in each character set encoding.

Character Set Encoding Map Analyzer

Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1

Character Set Encoding Maps - CP1252/Windows-1252

Character Set Encoding Maps - Unicode UTF-8

Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE

Character Counter Program for Any Given Encoding

Character Set Encoding Comparison

Conclusion:

  • A simple Java program can be used to print the encoding map of any given encoding supported by JDK.
  • US-ASCII encoding has a code point range of 0x0000 - 0x007F.
  • ISO-8859-1/Latin 1 encoding has a code point range of 0x0000 - 0x00FF.
  • CP1252/Windows-1252 encoding has a code point range of 0x0000 - 0x00FF plus some values outside this range.
  • UTF-8 encoding has a code point range of 0x0000 - 0x00FF except 0xD800 - 0xDFFF.
  • A simple Java program can be used to count valid characters of any given encoding.

Notes and sample codes bellow are based on JDK/J2SDK 1.4.1_01.

Dr. Herong Yang, updated in 2008
Character Set Encoding Maps