"Character" Class with Unicode Utility Methods

Unicode Tutorials - Herong's Tutorial Examples

∟"Character" Class with Unicode Utility Methods

This section provides an introduction on 'Character' class static methods added since J2SE 5.0 as Unicode utility methods.

Since designers of J2SE 5.0 did not change the internal storage size for the "Character" class, it can not be used to support Unicode supplementary characters in the range of U+0000 to U+10FFFF. So you should avoid using "Character" class to represent a single Unicode character in the future to make your application Unicode-friendly.

However designers of J2SE 5.0 did add a number of static methods in the "Character" class as utility methods to help Unicode character processing. So take a look at some of them:

static boolean isValidCodePoint(int codePoint) - Determines whether the specified code point is a valid Unicode code point value.
static boolean isBmpCodePoint(int codePoint) - Determines whether the specified character (Unicode code point) is in the Basic Multilingual Plane (BMP). Such code points can be represented using a single char.
static boolean isSupplementaryCodePoint(int codePoint) - Determines whether the specified character (Unicode code point) is in the supplementary character range.
static int toCodePoint(char high, char low) - Converts the specified surrogate pair to its supplementary code point value. This method does not validate the specified surrogate pair. The caller must validate it using isSurrogatePair if necessary.
static int codePointAt(char[] a, int index) - Returns the code point at the given index of the char array. If the char value at the given index in the char array is in the high-surrogate range, the following index is less than the length of the char array, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.
static char highSurrogate(int codePoint) - Returns the leading surrogate (a high surrogate code unit) of the surrogate pair representing the specified supplementary character (Unicode code point) in the UTF-16 encoding. If the specified character is not a supplementary character, an unspecified char is returned.
static char lowSurrogate(int codePoint) - Returns the trailing surrogate (a low surrogate code unit) of the surrogate pair representing the specified supplementary character (Unicode code point) in the UTF-16 encoding. If the specified character is not a supplementary character, an unspecified char is returned.
static char[] toChars(int codePoint) - Converts the specified character (Unicode code point) to its UTF-16 representation stored in a char array. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the resulting char array has the same value as codePoint. If the specified code point is a supplementary code point, the resulting char array has the corresponding surrogate pair.
static boolean isDefined(int codePoint) - Determines if a character (Unicode code point) is defined in Unicode. A character is defined if at least one of the following is true: it has an entry in the UnicodeData file or it has a value in a range defined by the UnicodeData file.
static String getName(int codePoint) - Returns the Unicode name of the specified character codePoint, or null if the code point is unassigned.
static boolean isDigit(int codePoint) - Determines if the specified character (Unicode code point) is a digit. A character is a digit if its general category type, provided by getType(codePoint), is DECIMAL_DIGIT_NUMBER.
static int getNumericValue(int codePoint) - Returns the int value that the specified character (Unicode code point) represents. For example, the character '\u216C' (the Roman numeral fifty) will return an int with a value of 50.
static int getType(int codePoint) - Returns a value indicating a character's general category.