Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
'int' and 'String' - Basic Data Types for Unicode
This section provides an introduction on basic data types for storing Unicode characters in the full range of U+0000 to U+10FFFF: 'int' for a single Unicode character; 'String' for a sequence of Unicode characters.
As we learned from the previous section, the primitive type "char" is no longer capable to support Unicode characters in the full range of U+0000 to U+10FFFF. The best way to write Unicode-friendly Java applications with J2SE 5.0 or higher is to:
Other primitive types and class types can still be used to help managing Unicode characters, but you need to remember their risks and limitations:
Examples of using Unicode-friendly data types:
int letterC = 0x43; // ASCII character int degreeCelsius = 0x2103; // BMP character int squaredC = 0x1F132; // Supplementary character StringBuilder buffer = new StringBuilder(); buffer.appendCodePoint(letterC); buffer.appendCodePoint(degreeCelsius); buffer.appendCodePoint(squaredC); String unicodeString = new String(buffer);
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
►Java Language and Unicode Characters
Unicode Versions Supported in Java History
►'int' and 'String' - Basic Data Types for Unicode
"Character" Class with Unicode Utility Methods
Character.toChars() - "char" Sequence of Code Point
Character.getNumericValue() - Numeric Value of Code Point
"String" Class with Unicode Utility Methods
String.length() Is Not Number of Characters
String.toCharArray() Returns the UTF-16BE Sequence
String Literals and Source Code Encoding
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor