JDK Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 4.32, 2006

Encoding Maps

Part:   1  2  3  

JDK Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Internationalization

Character Set and Encoding

Socket Communication

Document Object Model (DOM)

XSD Validation in Java

XSL - Transformer in Java

JCA - Private and Public Key Pairs

JCE - Secret Key

SSL (Secure Socket Layer)

SSL - Client Authentication

... Table of Contents

(Continued from previous part...)

7FC0 > E7 BF 80 - 7FFF > E7 BF BF
8000 > E8 80 80 - 803F > E8 80 BF
8040 > E8 81 80 - 807F > E8 81 BF
8080 > E8 82 80 - 80BF > E8 82 BF
......
8FC0 > E8 BF 80 - 8FFF > E8 BF BF
9000 > E9 80 80 - 903F > E9 80 BF
9040 > E9 81 80 - 907F > E9 81 BF
9080 > E9 82 80 - 90BF > E9 82 BF
......
9FC0 > E9 BF 80 - 9FFF > E9 BF BF
A000 > EA 80 80 - A03F > EA 80 BF
A040 > EA 81 80 - A07F > EA 81 BF
A080 > EA 82 80 - A0BF > EA 82 BF
......
AFC0 > EA BF 80 - AFFF > EA BF BF
B000 > EB 80 80 - B03F > EB 80 BF
B040 > EB 81 80 - B07F > EB 81 BF
B080 > EB 82 80 - B0BF > EB 82 BF
......
BFC0 > EB BF 80 - BFFF > EB BF BF
C000 > EC 80 80 - C03F > EC 80 BF
C040 > EC 81 80 - C07F > EC 81 BF
C080 > EC 82 80 - C0BF > EC 82 BF
......
CFC0 > EC BF 80 - CFFF > EC BF BF
D000 > ED 80 80 - D03F > ED 80 BF
D040 > ED 81 80 - D07F > ED 81 BF
D080 > ED 82 80 - D0BF > ED 82 BF
......
D7C0 > ED 9F 80 - D7FF > ED 9F BF
D800 > 3F - DFFF > 3F
E000 > EE 80 80 - E03F > EE 80 BF
E040 > EE 81 80 - E07F > EE 81 BF
E080 > EE 82 80 - E0BF > EE 82 BF
......
EFC0 > EE BF 80 - EFFF > EE BF BF
F000 > EF 80 80 - F03F > EF 80 BF
F040 > EF 81 80 - F07F > EF 81 BF
F080 > EF 82 80 - F0BF > EF 82 BF
......
FFC0 > EF BF 80 - FFFF > EF BF BF
  • This is the most popular encoding used for the Unicode character set.
  • The output sequence has variable number of bytes.
  • It is backward compatible with US-ASCII.
  • This map only valid for Unicode 3.0 and older versions. So it is a partial UTF-8 map.
  • One section of code points is not valid: 0xD800 - 0xDFFF.

UTF-16

UTF-16 encoding:

Code                 Code
Point                Point
0000 > FE FF 00 00 - 00FF > FE FF 00 FF
0100 > FE FF 01 00 - 01FF > FE FF 01 FF
0200 > FE FF 02 00 - 02FF > FE FF 02 FF
......
D700 > FE FF D7 00 - D7FF > FE FF D7 FF
D800 > FE FF FF FD - DFFF > FE FF FF FD
E000 > FE FF E0 00 - E0FF > FE FF E0 FF
E100 > FE FF E1 00 - E1FF > FE FF E1 FF
E200 > FE FF E2 00 - E2FF > FE FF E2 FF
......
FF00 > FE FF FF 00 - FFFF > FE FF FF FF
  • This is another encoding used for the Unicode character set.
  • The output sequence is a fixed length, 2 bytes. Note that the leading 0xFEFF is a format flag.
  • It is not backward compatible with US-ASCII.
  • One section of code points is not valid: 0xD800 - 0xDFFF.

UTF-16LE

UTF-16LE encoding:

Code           Code
Point          Point
0000 > 00 00 - D7FF > FF D7
D800 > FD FF - DFFF > FD FF
E000 > 00 E0 - FFFF > FF FF
  • This is another encoding used for the Unicode character set.
  • The output sequence is a fixed length, 2 bytes.
  • It is not backward compatible with US-ASCII.
  • One section of code points is not valid: 0xD800 - 0xDFFF.
  • The rest of the code points is encoded by reversing the two bytes of the code points.

UTF-16BE

UTF-16BE encoding:

Code           Code
Point          Point
0000 > 00 00 - 00FF > 00 FF
0100 > 01 00 - 01FF > 01 FF
0200 > 02 00 - 02FF > 02 FF
......
D700 > D7 00 - D7FF > D7 FF
D800 > FF FD - DFFF > FF FD
E000 > E0 00 - E0FF > E0 FF
E100 > E1 00 - E1FF > E1 FF
E200 > E2 00 - E2FF > E2 FF
......
FF00 > FF 00 - FFFF > FF FF
  • This is another encoding used for the Unicode character set.
  • The output sequence is a fixed length, 2 bytes.
  • It is not backward compatible with US-ASCII.
  • One section of code points is not valid: 0xD800 - 0xDFFF.
  • The rest of the code points is encoded by carbon copying the two bytes of the code points.

Part:   1  2  3  

Dr. Herong Yang, updated in 2006
JDK Tutorials - Herong's Tutorial Notes - Encoding Maps