Unicode Versions Supported in Java History

Unicode Tutorials - Herong's Tutorial Examples

∟Unicode Versions Supported in Java History

This section provides a quick summary of different Unicode versions supported in Java history. A major change was introduced in J2SE 5 to support Unicode 4.0 which contains supplementary characters in the range of U+10000 to U+10FFFF.

Unicode has been supported in the Java language since its first release in 1996. Both Unicode and Java have evolved with multiple versions and releases since then. Here is a summary of which version of Unicode is supported in which Java releases.

Java version   Release date         Unicode version
------------   ------------         ---------------
Java 15        March 2020           Unicode 13.0
Java 13        March 2019           Unicode 12.1
Java 11        March 2018           Unicode 10.0
Java 10        September 2018       Unicode 8.0
Java 9         September 2017       Unicode 8.0
Java 8         March 2014           Unicode 6.2
Java SE 7      July 28, 2011        Unicode 6.0
Java SE 6      December 11, 2006    Unicode 4.0
J2SE 5.0       September 30, 2004   Unicode 4.0
J2SE 1.4       February 6, 2002     Unicode 3.0
J2SE 1.3       May 8, 2000          Unicode 2.1
J2SE 1.2       December 8, 1998     Unicode 2.1
JDK 1.1        February 19, 1997    Unicode 2.0
JDK 1.1.7      September 12, 1997   Unicode 2.1
JDK 1.1        February 19, 1997    Unicode 2.0
JDK 1.0        January 23, 1996     Unicode 1.1.5

From JDK 1.0 to JDK 1.4, Java can only support Unicode code points in the range of U+0000 to U+FFFF. Unicode characters in this range are called BMP (Basic Multilingual Plane) characters.

Designers of JDK 1.0 used the following primitive types and built-in class types to support Unicode BMP characters in very natural way:

char - A primitive type with 16-bit storage size, which is perfected designed to store the code point value of a single BMP character.

Character - A built-in class that wraps a value of the primitive type char in an object. The Character class is perfectly designed to represent a single BMP character as an object. The Character class also provides some useful static methods for managing individual BMP characters.

String - A built-in class that wraps an array of primitive type char in an object. The String class is perfectly designed to represent a sequence of BMP characters as an object. The String class also provides some useful static and instance methods for managing a string of BMP characters.

Starting in J2SE 5.0, support of Unicode code points in the range of U+10000 to U+10FFFF was introduced in Java. Unicode characters in this range are called supplementary characters, implemented in Unicode 3.1 in 2001.

Designers of J2SE 5.0 did not introduce any new primitive types and built-in class types to support Unicode supplementary characters. What they did was:

Use the existing primitive type int to represent a single supplementary character by storing its code point value as is.
Use the existing primitive array type char[] to represent a sequence of supplementary characters by one surrogate char pair per supplementary character.

The surrogate char pair of a supplementary character is really the UTF-16BE encoded char sequence of the supplementary character. See other sections in this book for the UTF-16BE encoding algorithm.

Using UTF-16BE encoded char sequences to represent supplementary characters does give one main advantage to the existing class type String: A String object can still be viewed as a char sequence that represent a sequence of any Unicode characters: BMP characters and/or supplementary characters.

If you are interested in why J2SE 5.0 designers did not choose to add a new primitive type like char32 to help support supplementary characters, read the article "Supplementary Characters in the Java Platform" by Norbert Lindenberg and Masayoshi Okutsu at oracle.com/us/technologies/java/supplementary-142654.html.