JDK Tutorials - Herong's Tutorial Examples - v6.32, by Herong Yang
Running EncodingSampler.java with UTF-8, UTF-16, UTF16-BE
This section provides a tutorial example on how to run the character encoding sample program with UTF-8, UTF-16, and UTF16-BE encodings, which are all Unicode character set encodings.
I think we are ready to try an encoding that is designed for the Unicode character set, UTF-8:
herong> java EncodingSampler.java UTF-8 UTF-8 encoding: Char, String, Writer, Charset, Encoder 0000, 00, 00, 00, 00 003F, 3F, 3F, 3F, 3F 0040, 40, 40, 40, 40 007F, 7F, 7F, 7F, 7F 0080, C2 80, C2 80, C2 80, C2 80 00BF, C2 BF, C2 BF, C2 BF, C2 BF 00C0, C3 80, C3 80, C3 80, C3 80 00FF, C3 BF, C3 BF, C3 BF, C3 BF 0100, C4 80, C4 80, C4 80, C4 80 3FFF, E3 BF BF, E3 BF BF, E3 BF BF, E3 BF BF 4000, E4 80 80, E4 80 80, E4 80 80, E4 80 80 7FFF, E7 BF BF, E7 BF BF, E7 BF BF, E7 BF BF 8000, E8 80 80, E8 80 80, E8 80 80, E8 80 80 BFFF, EB BF BF, EB BF BF, EB BF BF, EB BF BF C000, EC 80 80, EC 80 80, EC 80 80, EC 80 80 EFFF, EE BF BF, EE BF BF, EE BF BF, EE BF BF F000, EF 80 80, EF 80 80, EF 80 80, EF 80 80 FFFF, EF BF BF, EF BF BF, EF BF BF, EF BF BF
UTF-8 generates multiple bytes sequences, starting with one byte (8 bits).
The second test is for another Unicode related encoding, UTF-16:
herong> java EncodingSampler.java UTF-16 UTF-16 encoding: Char, String, Writer, Charset, Encoder 0000, FE FF 00 00, FE FF 00 00, FE FF 00 00, FE FF 00 00 003F, FE FF 00 3F, FE FF 00 3F, FE FF 00 3F, FE FF 00 3F 0040, FE FF 00 40, FE FF 00 40, FE FF 00 40, FE FF 00 40 007F, FE FF 00 7F, FE FF 00 7F, FE FF 00 7F, FE FF 00 7F 0080, FE FF 00 80, FE FF 00 80, FE FF 00 80, FE FF 00 80 00BF, FE FF 00 BF, FE FF 00 BF, FE FF 00 BF, FE FF 00 BF 00C0, FE FF 00 C0, FE FF 00 C0, FE FF 00 C0, FE FF 00 C0 00FF, FE FF 00 FF, FE FF 00 FF, FE FF 00 FF, FE FF 00 FF 0100, FE FF 01 00, FE FF 01 00, FE FF 01 00, FE FF 01 00 3FFF, FE FF 3F FF, FE FF 3F FF, FE FF 3F FF, FE FF 3F FF 4000, FE FF 40 00, FE FF 40 00, FE FF 40 00, FE FF 40 00 7FFF, FE FF 7F FF, FE FF 7F FF, FE FF 7F FF, FE FF 7F FF 8000, FE FF 80 00, FE FF 80 00, FE FF 80 00, FE FF 80 00 BFFF, FE FF BF FF, FE FF BF FF, FE FF BF FF, FE FF BF FF C000, FE FF C0 00, FE FF C0 00, FE FF C0 00, FE FF C0 00 EFFF, FE FF EF FF, FE FF EF FF, FE FF EF FF, FE FF EF FF F000, FE FF F0 00, FE FF F0 00, FE FF F0 00, FE FF F0 00 FFFF, FE FF FF FF, FE FF FF FF, FE FF FF FF, FE FF FF FF
This is a surprise to me. Why UTF-16 generates 32-bit sequences? Why not call it UTF32?
I found the answer later: the first 16 bits, 0xEFFF, is not part of the encoding sequence. It is actually a format flag indicating that the following byte sequence is in UTF-16BE (Big Endian) format.
Here is the result of the third test on another Unicode encoding, UTF16-BE:
herong> java EncodingSampler.java UTF-16BE UTF-16BE encoding: Char, String, Writer, Charset, Encoder 0000, 00 00, 00 00, 00 00, 00 00 003F, 00 3F, 00 3F, 00 3F, 00 3F 0040, 00 40, 00 40, 00 40, 00 40 007F, 00 7F, 00 7F, 00 7F, 00 7F 0080, 00 80, 00 80, 00 80, 00 80 00BF, 00 BF, 00 BF, 00 BF, 00 BF 00C0, 00 C0, 00 C0, 00 C0, 00 C0 00FF, 00 FF, 00 FF, 00 FF, 00 FF 0100, 01 00, 01 00, 01 00, 01 00 3FFF, 3F FF, 3F FF, 3F FF, 3F FF 4000, 40 00, 40 00, 40 00, 40 00 7FFF, 7F FF, 7F FF, 7F FF, 7F FF 8000, 80 00, 80 00, 80 00, 80 00 BFFF, BF FF, BF FF, BF FF, BF FF C000, C0 00, C0 00, C0 00, C0 00 EFFF, EF FF, EF FF, EF FF, EF FF F000, F0 00, F0 00, F0 00, F0 00 FFFF, FF FF, FF FF, FF FF, FF FF
This seems to be the perfect encoding, output seems to be identical to input.
Table of Contents
Date, Time and Calendar Classes
Date and Time Object and String Conversion
Number Object and Numeric String Conversion
Locales, Localization Methods and Resource Bundles
Calling and Importing Classes Defined in Unnamed Packages
HashSet, Vector, HashMap and Collection Classes
►Character Set Encoding Classes and Methods
Supported Character Encodings in JDK
Charset.encode() - Method to Encode Characters
Running EncodingSampler.java with CP1252 Encoding
Running EncodingSampler.java with ISO-8859-1 and US-ASCII
►Running EncodingSampler.java with UTF-8, UTF-16, UTF16-BE
Running EncodingSampler.java with GB18030
Charset.decode() - Method to Decode Byte Sequences
Encoding Conversion Programs for Encoded Text Files
Datagram Network Communication
DOM (Document Object Model) - API for XML Files
DTD (Document Type Definition) - XML Validation
XSD (XML Schema Definition) - XML Validation
XSL (Extensible Stylesheet Language)
Message Digest Algorithm Implementations in JDK
Private key and Public Key Pair Generation
PKCS#8/X.509 Private/Public Encoding Standards
Digital Signature Algorithm and Sample Program
"keytool" Commands and "keystore" Files
KeyStore and Certificate Classes
Secret Key Generation and Management
Cipher - Encryption and Decryption
The SSL (Secure Socket Layer) Protocol
SSL Socket Communication Testing Programs