Unicode Tutorials - Herong's Tutorial Examples
∟Java Language and Unicode Characters
This chapter provides notes and tutorial examples on Unicode support in Java language. Topics including Unicode versions supported in different JDK versions; using 'int' and 'String' data type to store single and multiple Unicode characters; Unicode utility methods in the 'Character' class; 'char' element index pointers and character locations in 'String' objects.
Unicode Versions Supported in Java History
'int' and 'String' - Basic Data Types for Unicode
"Character" Class with Unicode Utility Methods
Character.toChars() - "char" Sequence of Code Point
Character.getNumericValue() - Numeric Value of Code Point
"String" Class with Unicode Utility Methods
String.length() Is Not Number of Characters
String.toCharArray() Returns the UTF-16BE Sequence
String Literals and Source Code Encoding
Conclusions:
- From JDK 1.0 to JDK 1.4, Java can only support BMP (Basic Multilingual Plane) characters.
- From J2SE 5.0 (JDK 1.5) to any newer versions, Java can support the full range,
U+0000 to U+10FFFF, of Unicode characters.
- "Character" objects can not represent all Unicode characters any more.
Store code points as "int" values represent Unicode characters.
- The "Character" class offers static utility methods to help Unicode character processing.
- The length() method on a "String" object returns the number of "char" elements used to store Unicode characters
represented by the "String" object.
- The codePointCount() method on a "String" object returns the number of Unicode characters
represented by the "String" object.
- The toCharArray() method on a "String" object returns the UTF-16BE encoded "char" sequence of Unicode characters
represented by the "String" object.
- Non-ASCII characters can be represented as \uXXXX escape sequences follow
the UTF-16 encoding rule in Java String literals.
- Non-ASCII characters can also be represented as UTF-8 encoding byte sequences
follow the UTF-16 encoding rule in Java String literals. But the source code
must be stored in UTF-8 encoding and compiled with the "-encoding UTF8" option.
Table of Contents
About This Book
Character Sets and Encodings
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
Unicode Character Set
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
►Java Language and Unicode Characters
Character Encoding in Java
Character Set Encoding Maps
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor
Using Microsoft Excel as a Unicode Text Editor
Unicode Fonts
Archived Tutorials
References
Full Version in PDF/EPUB