JDK Tutorials - Herong's Tutorial Examples - v6.32, by Herong Yang
Character Set Encoding Comparison
This section provides a tutorial example on how to compare some commonly used character set encodings in number of characters, byte sequence sizes and ASCII compatibilities.
Here is the output of my sample program, EncodingCounter.java, for US-ASCII encoding:
herong> java EncodingCounter.java US-ASCII US-ASCII encoding: 0000 > 00 - 007F > 7F = 128 0080 > XX - FFFF > XX = 65408 Total characters = 65536 Valid characters = 128 Invalid characters = 65408
This tells us that the US-ASCII character set has only 128 characters.
Run EncodingCounter.java again with ISO-8859-1 (Latin 1) as argument, you will get:
herong> java EncodingCounter.java ISO-8859-1 ISO-8859-1 encoding: 0000 > 00 - 00FF > FF = 256 0100 > XX - FFFF > XX = 65280 Total characters = 65536 Valid characters = 256 Invalid characters = 65280
This tells us that the ISO-8859-1 character set has only 256 characters.
The following table is based on the output of the EncodingCouter.java program. It provides a brief comparison between the some commonly used encodings:
Encoding Map US-ASCII Name Size Compatible Notes ---------- ----- ------------------ US-ASCII 128 Y 7-bit characters only ISO-8859-1 256 Y 8-bit (single byte) characters CP1252 251 Y One byte output, with code points up to 0x2122 UTF-8 63488 Y 1-3 bytes, UTF-16BE 63488 N 2 bytes, carbon copying the code points UTF-16LE 63488 N 2 bytes, reversing the code points UTF-16 63488 N 4 bytes, last 2 bytes = UTF-16BE GBK 24068 Y 1-2 bytes, Chinese 1993 standard GB18030 63488 Y 1-4 bytes, superset of GBK, 2000 standard
Table of Contents
Date, Time and Calendar Classes
Date and Time Object and String Conversion
Number Object and Numeric String Conversion
Locales, Localization Methods and Resource Bundles
Calling and Importing Classes Defined in Unnamed Packages
HashSet, Vector, HashMap and Collection Classes
Character Set Encoding Classes and Methods
Character Set Encoding Map Analyzer
Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1
Character Set Encoding Maps - CP1252/Windows-1252
Character Set Encoding Maps - Unicode UTF-8
Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE
Character Counter Program for Any Given Encoding
►Character Set Encoding Comparison
Encoding Conversion Programs for Encoded Text Files
Datagram Network Communication
DOM (Document Object Model) - API for XML Files
DTD (Document Type Definition) - XML Validation
XSD (XML Schema Definition) - XML Validation
XSL (Extensible Stylesheet Language)
Message Digest Algorithm Implementations in JDK
Private key and Public Key Pair Generation
PKCS#8/X.509 Private/Public Encoding Standards
Digital Signature Algorithm and Sample Program
"keytool" Commands and "keystore" Files
KeyStore and Certificate Classes
Secret Key Generation and Management
Cipher - Encryption and Decryption
The SSL (Secure Socket Layer) Protocol
SSL Socket Communication Testing Programs