|
JDK - Encoding Maps
Part:
1
2
3
(Continued from previous part...)
......
8FC0 > E8 BF 80 - 8FFF > E8 BF BF
9000 > E9 80 80 - 903F > E9 80 BF
9040 > E9 81 80 - 907F > E9 81 BF
9080 > E9 82 80 - 90BF > E9 82 BF
......
9FC0 > E9 BF 80 - 9FFF > E9 BF BF
A000 > EA 80 80 - A03F > EA 80 BF
A040 > EA 81 80 - A07F > EA 81 BF
A080 > EA 82 80 - A0BF > EA 82 BF
......
AFC0 > EA BF 80 - AFFF > EA BF BF
B000 > EB 80 80 - B03F > EB 80 BF
B040 > EB 81 80 - B07F > EB 81 BF
B080 > EB 82 80 - B0BF > EB 82 BF
......
BFC0 > EB BF 80 - BFFF > EB BF BF
C000 > EC 80 80 - C03F > EC 80 BF
C040 > EC 81 80 - C07F > EC 81 BF
C080 > EC 82 80 - C0BF > EC 82 BF
......
CFC0 > EC BF 80 - CFFF > EC BF BF
D000 > ED 80 80 - D03F > ED 80 BF
D040 > ED 81 80 - D07F > ED 81 BF
D080 > ED 82 80 - D0BF > ED 82 BF
......
D7C0 > ED 9F 80 - D7FF > ED 9F BF
D800 > 3F - DFFF > 3F
E000 > EE 80 80 - E03F > EE 80 BF
E040 > EE 81 80 - E07F > EE 81 BF
E080 > EE 82 80 - E0BF > EE 82 BF
......
EFC0 > EE BF 80 - EFFF > EE BF BF
F000 > EF 80 80 - F03F > EF 80 BF
F040 > EF 81 80 - F07F > EF 81 BF
F080 > EF 82 80 - F0BF > EF 82 BF
......
FFC0 > EF BF 80 - FFFF > EF BF BF
- This is the most popular encoding used for the Unicode character set.
- The output sequence has variable number of bytes.
- It is backward compatible with US-ASCII.
- This map only valid for Unicode 3.0 and older versions. So it is
a partial UTF-8 map.
- One section of code points is not valid: 0xD800 - 0xDFFF.
UTF-16
UTF-16 encoding:
Code Code
Point Point
0000 > FE FF 00 00 - 00FF > FE FF 00 FF
0100 > FE FF 01 00 - 01FF > FE FF 01 FF
0200 > FE FF 02 00 - 02FF > FE FF 02 FF
......
D700 > FE FF D7 00 - D7FF > FE FF D7 FF
D800 > FE FF FF FD - DFFF > FE FF FF FD
E000 > FE FF E0 00 - E0FF > FE FF E0 FF
E100 > FE FF E1 00 - E1FF > FE FF E1 FF
E200 > FE FF E2 00 - E2FF > FE FF E2 FF
......
FF00 > FE FF FF 00 - FFFF > FE FF FF FF
- This is another encoding used for the Unicode character set.
- The output sequence is a fixed length, 2 bytes. Note that the leading
0xFEFF is a format flag.
- It is not backward compatible with US-ASCII.
- One section of code points is not valid: 0xD800 - 0xDFFF.
UTF-16LE
UTF-16LE encoding:
Code Code
Point Point
0000 > 00 00 - D7FF > FF D7
D800 > FD FF - DFFF > FD FF
E000 > 00 E0 - FFFF > FF FF
- This is another encoding used for the Unicode character set.
- The output sequence is a fixed length, 2 bytes.
- It is not backward compatible with US-ASCII.
- One section of code points is not valid: 0xD800 - 0xDFFF.
- The rest of the code points is encoded by reversing the two bytes
of the code points.
UTF-16BE
UTF-16BE encoding:
Code Code
Point Point
0000 > 00 00 - 00FF > 00 FF
0100 > 01 00 - 01FF > 01 FF
0200 > 02 00 - 02FF > 02 FF
......
D700 > D7 00 - D7FF > D7 FF
D800 > FF FD - DFFF > FF FD
E000 > E0 00 - E0FF > E0 FF
E100 > E1 00 - E1FF > E1 FF
E200 > E2 00 - E2FF > E2 FF
......
FF00 > FF 00 - FFFF > FF FF
- This is another encoding used for the Unicode character set.
- The output sequence is a fixed length, 2 bytes.
- It is not backward compatible with US-ASCII.
- One section of code points is not valid: 0xD800 - 0xDFFF.
- The rest of the code points is encoded by carbon copying the two bytes
of the code points.
Source: Herong's Notes on JDK.
Part:
1
2
3
|