Character.toChars() - "char" Sequence of Code Point

This section provides tutorial example on how to test 'Character' class toChars() static methods to convert Unicode code points to 'char' sequences, which is really identical to the byte sequences from the UTF-16BE encoding of the code point.

One interesting static method offered in the "Character" class is the "toChars(int codePoint)" method, which always returns "char" sequence for any given Unicode character. It returns 1 "char" if a BMP character is given; and 2 "char"s if a supplementary character is given.

Here is a tutorial example on how to use "toChars()" and other related methods:

/* UnicodeCharacterToChars.java
 * Copyright (c) 2019 HerongYang.com. All Rights Reserved.
 */
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
class UnicodeCharacterToChars {
   static int[] unicodeList = {0x43, 0x2103, 0x1F132, 0x1F1A0, 
      0x20FFFF};
   static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
                             '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
   public static void main(String[] arg) {
      try {     
         for (int i=0; i<unicodeList.length; i++) {

// Starting with the code point value
            int codePoint  = unicodeList[i];

// Dumping data in HEX numbers
            System.out.print("\n");
            System.out.print("\n                 Code point: "
               +intToHex(codePoint));

// Getting Unicode character basic properties
            System.out.print("\n                isDefined(): "
               +Character.isDefined(codePoint));
            System.out.print("\n                  getName(): "
               +Character.getName(codePoint));
            System.out.print("\n           isBmpCodePoint(): "
               +Character.isBmpCodePoint(codePoint));
            System.out.print("\n isSupplementaryCodePoint(): "
               +Character.isSupplementaryCodePoint(codePoint));
            System.out.print("\n                charCount(): "
               +Character.charCount(codePoint));

// Getting surrogate char pair
            char charHigh = Character.highSurrogate(codePoint);
            char charLow = Character.lowSurrogate(codePoint);
            System.out.print("\n            highSurrogate(): "
               +charToHex(charHigh));
            System.out.print("\n             lowSurrogate(): "
               +charToHex(charLow));
            System.out.print("\n          isSurrogatePair(): "
               +Character.isSurrogatePair(charHigh, charLow));

// Getting char sequence
            char[] charSeq = Character.toChars(codePoint);
            System.out.print("\n                  toChars():");
            for (int j=0; j<charSeq.length; j++)
               System.out.print(" "+charToHex(charSeq[j]));

// Getting UTF-16BE byte sequence
            int[] intArray = {codePoint};
            String charString = new String(intArray, 0, 1);
            byte[] utf16Seq = charString.getBytes("UTF-16BE");
            System.out.print("\n     UTF-16BE byte sequence:");
            for (int j=0; j<utf16Seq.length; j++)
               System.out.print(" "+byteToHex(utf16Seq[j]));
         }
      } catch (Exception e) {
         System.out.print("\n"+e.toString());
      }
   }
   public static String byteToHex(byte b) {
      char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(a);
   }
   public static String charToHex(char c) {
      byte hi = (byte) (c >>> 8);
      byte lo = (byte) (c & 0xff);
      return byteToHex(hi) + byteToHex(lo);
   }
   public static String intToHex(int i) {
      char hi = (char) (i >>> 16);
      char lo = (char) (i & 0xffff);
      return charToHex(hi) + charToHex(lo);
   }
}

Compile and run it with Java 11:

C:\herong>javac UnicodeCharacterToChars.java

C:\herong>java UnicodeCharacterToChars

                 Code point: 00000043
                isDefined(): true
                  getName(): LATIN CAPITAL LETTER C
           isBmpCodePoint(): true
 isSupplementaryCodePoint(): false
                charCount(): 1
            highSurrogate(): D7C0
             lowSurrogate(): DC43
          isSurrogatePair(): false
                  toChars(): 0043
     UTF-16BE byte sequence: 00 43

                 Code point: 00002103
                isDefined(): true
                  getName(): DEGREE CELSIUS
           isBmpCodePoint(): true
 isSupplementaryCodePoint(): false
                charCount(): 1
            highSurrogate(): D7C8
             lowSurrogate(): DD03
          isSurrogatePair(): false
                  toChars(): 2103
     UTF-16BE byte sequence: 21 03

                 Code point: 0001F132
                isDefined(): true
                  getName(): SQUARED LATIN CAPITAL LETTER C
           isBmpCodePoint(): false
 isSupplementaryCodePoint(): true
                charCount(): 2
            highSurrogate(): D83C
             lowSurrogate(): DD32
          isSurrogatePair(): true
                  toChars(): D83C DD32
     UTF-16BE byte sequence: D8 3C DD 32

                 Code point: 0001F1A0
                isDefined(): false
                  getName(): null
           isBmpCodePoint(): false
 isSupplementaryCodePoint(): true
                charCount(): 2
            highSurrogate(): D83C
             lowSurrogate(): DDA0
          isSurrogatePair(): true
                  toChars(): D83C DDA0
     UTF-16BE byte sequence: D8 3C DD A0

                 Code point: 0020FFFF
                isDefined(): false
java.lang.IllegalArgumentException

The output confirms that:

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

 GB2312 Character Set and Encoding

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

 Python Language and Unicode Characters

Java Language and Unicode Characters

 Unicode Versions Supported in Java History

 'int' and 'String' - Basic Data Types for Unicode

 "Character" Class with Unicode Utility Methods

Character.toChars() - "char" Sequence of Code Point

 Character.getNumericValue() - Numeric Value of Code Point

 "String" Class with Unicode Utility Methods

 String.length() Is Not Number of Characters

 String.toCharArray() Returns the UTF-16BE Sequence

 String Literals and Source Code Encoding

 Character Encoding in Java

 Character Set Encoding Maps

 Encoding Conversion Programs for Encoded Text Files

 Using Notepad as a Unicode Text Editor

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Archived Tutorials

 References

 Full Version in PDF/EPUB