Let's try another Unicode related encoding, UTF-16:
UTF-16 encoding:
Char, String, Writer, Charset, Encoder
0000, FE FF 00 00, FE FF 00 00, FE FF 00 00, FE FF 00 00
003F, FE FF 00 3F, FE FF 00 3F, FE FF 00 3F, FE FF 00 3F
0040, FE FF 00 40, FE FF 00 40, FE FF 00 40, FE FF 00 40
007F, FE FF 00 7F, FE FF 00 7F, FE FF 00 7F, FE FF 00 7F
0080, FE FF 00 80, FE FF 00 80, FE FF 00 80, FE FF 00 80
00BF, FE FF 00 BF, FE FF 00 BF, FE FF 00 BF, FE FF 00 BF
00C0, FE FF 00 C0, FE FF 00 C0, FE FF 00 C0, FE FF 00 C0
00FF, FE FF 00 FF, FE FF 00 FF, FE FF 00 FF, FE FF 00 FF
0100, FE FF 01 00, FE FF 01 00, FE FF 01 00, FE FF 01 00
3FFF, FE FF 3F FF, FE FF 3F FF, FE FF 3F FF, FE FF 3F FF
4000, FE FF 40 00, FE FF 40 00, FE FF 40 00, FE FF 40 00
7FFF, FE FF 7F FF, FE FF 7F FF, FE FF 7F FF, FE FF 7F FF
8000, FE FF 80 00, FE FF 80 00, FE FF 80 00, FE FF 80 00
BFFF, FE FF BF FF, FE FF BF FF, FE FF BF FF, FE FF BF FF
C000, FE FF C0 00, FE FF C0 00, FE FF C0 00, FE FF C0 00
EFFF, FE FF EF FF, FE FF EF FF, FE FF EF FF, FE FF EF FF
F000, FE FF F0 00, FE FF F0 00, FE FF F0 00, FE FF F0 00
FFFF, FE FF FF FF, FE FF FF FF, FE FF FF FF, FE FF FF FF
This is a surprise to me. Why UTF-16 generates 32-bit sequences? Why not call it UTF32?
I found the answer later: 0xFEFF is a BOM (Byte Order Mark) indicates that the following byte sequence is
in Big Endian format. In other word, JDK uses the Big-Endian with BOM format for UTF-16 encoding by default.