JDK (Java Development Kit) Tutorials
Dr. Herong Yang, Version 5.00

Character Set Encoding Comparison

This section provides a tutorial example on how to compare some commonly used character set encodings in number of characters, byte sequence sizes and ASCII compatibilities.

Here is the output of my sample program, EncodingCounter.java, for US-ASCII encoding:

US-ASCII encoding:
0000 > 00 - 007F > 7F = 128
0080 > XX - FFFF > XX = 65408
Total characters = 65536
Valid characters = 128
Invalid characters = 65408

This tells us that the US-ASCII character set has only 128 characters.

Run EncodingCounter.java again with ISO-8859-1 (Latin 1) as argument, you will get:

ISO-8859-1 encoding:
0000 > 00 - 00FF > FF = 256
0100 > XX - FFFF > XX = 65280
Total characters = 65536
Valid characters = 256
Invalid characters = 65280

This tells us that the ISO-8859-1 character set has only 256 characters.

The following table is based on the output of the EncodingCouter.java program. It provides a brief comparison between the some commonly used encodings:

Encoding     Map     US-ASCII 
Name         Size    Compatible   Notes

US-ASCII     128     Y   7-bit characters only
ISO-8859-1   256     Y   8-bit (single byte) characters
CP1252       251     Y   One byte output, with code points up to 0x2122
UTF-8        63488   Y   1-3 bytes, 
UTF-16BE     63488   N   2 bytes, carbon copying the code points
UTF-16LE     63488   N   2 bytes, reversing the code points 
UTF-16       63488   N   4 bytes, last 2 bytes = UTF-16BE
GBK          24068   Y   1-2 bytes, Chinese 1993 standard
GB18030      63488   Y   1-4 bytes, superset of GBK, 2000 standard

Last update: 2006.

Sections in This Chapter

Character Set Encoding Map Analyzer

Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1

Character Set Encoding Maps - CP1252/Windows-1252

Character Set Encoding Maps - Unicode UTF-8

Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE

Character Counter Program for Any Given Encoding

Character Set Encoding Comparison

Dr. Herong Yang, updated in 2008
Character Set Encoding Comparison