JDK (Java Development Kit) Tutorials
Dr. Herong Yang, Version 5.00

Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE

This section provides a tutorial example of analyzing and printing character set encoding maps for 3 encoding, UTF-16, UTF-16LE, and UTF-16BE, for Unicode character set.

Here is the output of my sample program, EncodingAnalyzer.java, for UTF-16 encoding:

Code Point > Byte Sequence - Code Point > Byte Sequence

0000 > FE FF 00 00 - 00FF > FE FF 00 FF
0100 > FE FF 01 00 - 01FF > FE FF 01 FF
0200 > FE FF 02 00 - 02FF > FE FF 02 FF
......
D700 > FE FF D7 00 - D7FF > FE FF D7 FF
D800 > FE FF FF FD - DFFF > FE FF FF FD
E000 > FE FF E0 00 - E0FF > FE FF E0 FF
E100 > FE FF E1 00 - E1FF > FE FF E1 FF
E200 > FE FF E2 00 - E2FF > FE FF E2 FF
......
FF00 > FE FF FF 00 - FFFF > FE FF FF FF

The encoding map of UTF-16, which is another encoding used for the Unicode character set, is much simpler than UTF-8:

  • The output sequence is a fixed length, 2 bytes. Note that the leading 0xFEFF is a format flag.
  • It is not backward compatible with US-ASCII.
  • One section of code points is not valid: 0xD800 - 0xDFFF.

Here is the output for UTF-16LE encoding, the little-endian variation of UTF-16 encoding:

Code Point > Byte Sequence - Code Point > Byte Sequence

0000 > 00 00 - D7FF > FF D7
D800 > FD FF - DFFF > FD FF
E000 > 00 E0 - FFFF > FF FF

The encoding map of UTF-16LE is so simple:

  • The output sequence is a fixed length, 2 bytes.
  • It is not backward compatible with US-ASCII.
  • One section of code points is not valid: 0xD800 - 0xDFFF.
  • The rest of the code points is encoded by reversing the two bytes of the code points.

Here is the output for UTF-16BE encoding, the big-endian variation of UTF-16 encoding:

Code Point > Byte Sequence - Code Point > Byte Sequence

0000 > 00 00 - 00FF > 00 FF
0100 > 01 00 - 01FF > 01 FF
0200 > 02 00 - 02FF > 02 FF
......
D700 > D7 00 - D7FF > D7 FF
D800 > FF FD - DFFF > FF FD
E000 > E0 00 - E0FF > E0 FF
E100 > E1 00 - E1FF > E1 FF
E200 > E2 00 - E2FF > E2 FF
......
FF00 > FF 00 - FFFF > FF FF

The encoding map of UTF-16BE is also simple:

  • The output sequence is a fixed length, 2 bytes.
  • It is not backward compatible with US-ASCII.
  • One section of code points is not valid: 0xD800 - 0xDFFF.
  • The rest of the code points is encoded by carbon copying the two bytes of the code points.

Last update: 2006.

Table of Contents

 About This JDK Tutorial Book

 Downloading and Installing JDK 1.3.1 on Windows

 Downloading and Installing JDK 1.4.1 on Windows

 Downloading and Installing JDK 1.5.0 on Windows

 Downloading and Installing JDK 1.6.2 on Windows

 Date, Time and Calendar Classes

 Date and Time Object and String Conversion

 Number Object and Numeric String Conversion

 Locales, Localization Methods and Resource Bundles

 Calling and Importing Classes Defined in Unnamed Packages

 HashSet, Vector, HashMap and Collection Classes

 Character Set Encoding Classes and Methods

Character Set Encoding Maps

 Character Set Encoding Map Analyzer

 Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1

 Character Set Encoding Maps - CP1252/Windows-1252

 Character Set Encoding Maps - Unicode UTF-8

Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE

 Character Counter Program for Any Given Encoding

 Character Set Encoding Comparison

 Encoding Conversion Programs for Encoded Text Files

 Socket Network Communication

 Datagram Network Communication

 DOM (Document Object Model) - API for XML Files

 SAX (Simple API for XML)

 DTD (Document Type Definition) - XML Validation

 XSD (XML Schema Definition) - XML Validation

 XSL (Extensible Stylesheet Language)

 Message Digest Algorithm Implementations in JDK

 Private key and Public Key Pair Generation

 PKCS#8/X.509 Private/Public Encoding Standards

 Digital Signature Algorithm and Sample Program

 "keytool" Commands and "keystore" Files

 KeyStore and Certificate Classes

 Secret Key Generation and Management

 Cipher - Secret Key Encryption and Decryption

 The SSL (Secure Socket Layer) Protocol

 SSL Socket Communication Testing Programs

 SSL Client Authentication

 HTTPS (Hypertext Transfer Protocol Secure)

 References

 PDF Printing Version

Dr. Herong Yang, updated in 2008
Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE