This section provides a tutorial example of analyzing and printing character set encoding maps for 3 encoding, UTF-16, UTF-16LE, and UTF-16BE, for Unicode character set.
Here is the output of my sample program, EncodingAnalyzer.java, for UTF-16 encoding:
Code Point > Byte Sequence - Code Point > Byte Sequence
0000 > FE FF 00 00 - 00FF > FE FF 00 FF
0100 > FE FF 01 00 - 01FF > FE FF 01 FF
0200 > FE FF 02 00 - 02FF > FE FF 02 FF
......
D700 > FE FF D7 00 - D7FF > FE FF D7 FF
D800 > FE FF FF FD - DFFF > FE FF FF FD
E000 > FE FF E0 00 - E0FF > FE FF E0 FF
E100 > FE FF E1 00 - E1FF > FE FF E1 FF
E200 > FE FF E2 00 - E2FF > FE FF E2 FF
......
FF00 > FE FF FF 00 - FFFF > FE FF FF FF
The encoding map of UTF-16, which is another encoding used for the Unicode character set, is much simpler than UTF-8:
The output sequence is a fixed length, 2 bytes. Note that the leading 0xFEFF is a format flag.
It is not backward compatible with US-ASCII.
One section of code points is not valid: 0xD800 - 0xDFFF.
Here is the output for UTF-16LE encoding, the little-endian variation of UTF-16 encoding: