This section provides a tutorial example on how to compare some commonly used character set encodings in number of characters, byte sequence sizes and ASCII compatibilities.
Here is the output of my sample program, EncodingCounter.java, for US-ASCII encoding:
This tells us that the ISO-8859-1 character set has only 256 characters.
The following table is based on the output of the EncodingCouter.java program.
It provides a brief comparison between the some commonly used encodings:
Encoding Map US-ASCII
Name Size Compatible Notes
US-ASCII 128 Y 7-bit characters only
ISO-8859-1 256 Y 8-bit (single byte) characters
CP1252 251 Y One byte output, with code points up to 0x2122
UTF-8 63488 Y 1-3 bytes,
UTF-16BE 63488 N 2 bytes, carbon copying the code points
UTF-16LE 63488 N 2 bytes, reversing the code points
UTF-16 63488 N 4 bytes, last 2 bytes = UTF-16BE
GBK 24068 Y 1-2 bytes, Chinese 1993 standard
GB18030 63488 Y 1-4 bytes, superset of GBK, 2000 standard