Unicode Tutorials - Herong's Tutorial Examples
∟Java Language and Unicode Characters
∟"Character" Class with Unicode Utility Methods
This section provides an introduction on 'Character' class static methods added since J2SE 5.0 as Unicode utility methods.
Since designers of J2SE 5.0 did not change the internal storage size for the "Character" class,
it can not be used to support Unicode supplementary characters in the range of U+0000 to U+10FFFF.
So you should avoid using "Character" class to represent a single Unicode character in the future
to make your application Unicode-friendly.
However designers of J2SE 5.0 did add a number of static methods in the "Character" class as utility methods
to help Unicode character processing. So take a look at some of them:
- static boolean isValidCodePoint(int codePoint) - Determines whether the specified code point is a valid Unicode code point value.
- static boolean isBmpCodePoint(int codePoint) - Determines whether the specified character (Unicode code point) is in
the Basic Multilingual Plane (BMP). Such code points can be represented using a single char.
- static boolean isSupplementaryCodePoint(int codePoint) - Determines whether the specified character
(Unicode code point) is in the supplementary character range.
- static int toCodePoint(char high, char low) - Converts the specified surrogate pair to its supplementary code point value. This method does not validate the specified surrogate pair.
The caller must validate it using isSurrogatePair if necessary.
- static int codePointAt(char[] a, int index) -
Returns the code point at the given index of the char array.
If the char value at the given index in the char array is in the high-surrogate range,
the following index is less than the length of the char array, and the char value
at the following index is in the low-surrogate range, then the supplementary code point
corresponding to this surrogate pair is returned.
Otherwise, the char value at the given index is returned.
- static char highSurrogate(int codePoint) -
Returns the leading surrogate (a high surrogate code unit) of the surrogate pair
representing the specified supplementary character (Unicode code point) in the UTF-16 encoding.
If the specified character is not a supplementary character, an unspecified char is returned.
- static char lowSurrogate(int codePoint) -
Returns the trailing surrogate (a low surrogate code unit) of the surrogate pair
representing the specified supplementary character (Unicode code point) in the UTF-16 encoding.
If the specified character is not a supplementary character, an unspecified char is returned.
- static char[] toChars(int codePoint) -
Converts the specified character (Unicode code point) to its UTF-16 representation stored in a char array.
If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value,
the resulting char array has the same value as codePoint. If the specified code point is a supplementary code point,
the resulting char array has the corresponding surrogate pair.
- static boolean isDefined(int codePoint) - Determines if a character (Unicode code point) is defined in Unicode.
A character is defined if at least one of the following is true: it has an entry in the UnicodeData file or
it has a value in a range defined by the UnicodeData file.
- static String getName(int codePoint) - Returns the Unicode name of the specified character codePoint,
or null if the code point is unassigned.
- static boolean isDigit(int codePoint) -
Determines if the specified character (Unicode code point) is a digit.
A character is a digit if its general category type, provided by getType(codePoint), is DECIMAL_DIGIT_NUMBER.
- static int getNumericValue(int codePoint) -
Returns the int value that the specified character (Unicode code point) represents.
For example, the character '\u216C' (the Roman numeral fifty) will return an int with a value of 50.
- static int getType(int codePoint) -
Returns a value indicating a character's general category.
See the next section for a tutorial example on how to use "Character" class static methods.
Table of Contents
About This Book
Character Sets and Encodings
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
Unicode Character Set
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
►Java Language and Unicode Characters
Unicode Versions Supported in Java History
'int' and 'String' - Basic Data Types for Unicode
►"Character" Class with Unicode Utility Methods
Character.toChars() - "char" Sequence of Code Point
Character.getNumericValue() - Numeric Value of Code Point
"String" Class with Unicode Utility Methods
String.length() Is Not Number of Characters
String.toCharArray() Returns the UTF-16BE Sequence
String Literals and Source Code Encoding
Character Encoding in Java
Character Set Encoding Maps
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor
Using Microsoft Excel as a Unicode Text Editor
Unicode Fonts
Archived Tutorials
References
Full Version in PDF/EPUB