Character Set Encoding Comparison

This section provides a tutorial example on how to compare some commonly used character set encodings in number of characters, byte sequence sizes and ASCII compatibilities.

Here is the output of my sample program, EncodingCounter.java, for US-ASCII encoding:

herong> java EncodingCounter.java US-ASCII

US-ASCII encoding:
0000 > 00 - 007F > 7F = 128
0080 > XX - FFFF > XX = 65408
Total characters = 65536
Valid characters = 128
Invalid characters = 65408

This tells us that the US-ASCII character set has only 128 characters.

Run EncodingCounter.java again with ISO-8859-1 (Latin 1) as argument, you will get:

herong> java EncodingCounter.java ISO-8859-1

ISO-8859-1 encoding:
0000 > 00 - 00FF > FF = 256
0100 > XX - FFFF > XX = 65280
Total characters = 65536
Valid characters = 256
Invalid characters = 65280

This tells us that the ISO-8859-1 character set has only 256 characters.

The following table is based on the output of the EncodingCouter.java program. It provides a brief comparison between the some commonly used encodings:

Encoding     Map     US-ASCII
Name         Size    Compatible   Notes
----------   -----   ------------------
US-ASCII     128     Y   7-bit characters only
ISO-8859-1   256     Y   8-bit (single byte) characters
CP1252       251     Y   One byte output, with code points up to 0x2122
UTF-8        63488   Y   1-3 bytes,
UTF-16BE     63488   N   2 bytes, carbon copying the code points
UTF-16LE     63488   N   2 bytes, reversing the code points
UTF-16       63488   N   4 bytes, last 2 bytes = UTF-16BE
GBK          24068   Y   1-2 bytes, Chinese 1993 standard
GB18030      63488   Y   1-4 bytes, superset of GBK, 2000 standard

Table of Contents

 About This JDK Tutorial Book

 JDK (Java Development Kit)

 Java Date-Time API

 Date, Time and Calendar Classes

 Date and Time Object and String Conversion

 Number Object and Numeric String Conversion

 Locales, Localization Methods and Resource Bundles

 Calling and Importing Classes Defined in Unnamed Packages

 HashSet, Vector, HashMap and Collection Classes

 Character Set Encoding Classes and Methods

Character Set Encoding Maps

 Character Set Encoding Map Analyzer

 Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1

 Character Set Encoding Maps - CP1252/Windows-1252

 Character Set Encoding Maps - Unicode UTF-8

 Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE

 Character Counter Program for Any Given Encoding

Character Set Encoding Comparison

 Encoding Conversion Programs for Encoded Text Files

 Java Logging

 Socket Network Communication

 Datagram Network Communication

 DOM (Document Object Model) - API for XML Files

 SAX (Simple API for XML)

 DTD (Document Type Definition) - XML Validation

 XSD (XML Schema Definition) - XML Validation

 XSL (Extensible Stylesheet Language)

 Message Digest Algorithm Implementations in JDK

 Private key and Public Key Pair Generation

 PKCS#8/X.509 Private/Public Encoding Standards

 Digital Signature Algorithm and Sample Program

 "keytool" Commands and "keystore" Files

 KeyStore and Certificate Classes

 Secret Key Generation and Management

 Cipher - Encryption and Decryption

 The SSL (Secure Socket Layer) Protocol

 SSL Socket Communication Testing Programs

 SSL Client Authentication

 HTTPS (Hypertext Transfer Protocol Secure)

 Outdated Tutorials

 References

 Full Version in PDF/EPUB