JDK Tutorials - Herong's Tutorial Examples - Version 6.00, by Dr. Herong Yang

\uxxxx - Entering Unicode Data in Java Programs

This section provides a tutorial example on how to enter Unicode characters using \uxxxx escape sequences in a Java program, and same them to any giving character set encoding.

Encoding conversion is about reading strings of characters stored in a file encoded with encoding A, and writing them into another file encoded with encoding B.

Before going into details on encoding conversion, let's talk briefly about Unicode data entry. How do we enter Unicode characters into a file? There are a couple of ways to do that:

  • Using encoding specific word processors. Usually, one word processor will allow you to enter characters of a particular language or encoding.
  • Using Hex editors to enter directly the byte sequences representing the desired characters in a specific encoding.
  • Using Unicode based programming language to enter the desired characters as string literals.

Word processors are too specific to be discussed here.

Hex editors are ultimate data entry tools for Unicode characters. They can also be used to inspect and repair encoded text files. But Hex editors are very hard to use. Note that Notepad on Windows is not a Hex editor. But UltraEdit on Windows is a Hex editor.

Using Unicode based programming language, like Java, to enter Unicode characters into a file is very interesting. For each character in a string literal, you can use the \uxxxx escape sequence to represent the character by enter its code value in Hex format.

Here is a sample program, UnicodeHello.java, showing you how to use \uxxxx escape sequences:

/* UnicodeHello.java
 - Copyright (c) 2014, HerongYang.com, All Rights Reserved.
 *
 * This program is a simple tool to allow you to enter several lines of
 * text, and writh them into a file with of the specified encoding 
 * (charset name). The input text lines uses Java string convention, 
 * which allows you to enter ASCII characters directly, and any non
 * ASCII characters with escape sequences.
 *
 * This version of the program is to write out the "Hello world!" 
 * message in some different languages.
 */
import java.io.*;
class UnicodeHello {
   public static void main(String[] a) {
      // The following Array contains text to be saved into the output
      // File. To enter your own text, just replace this Array.
      String[] text = {
"Hello computer! - English", // ASCII
"\u7535\u8111\u4F60\u597D\uFF01 - Simplified Chinese", // GB2312
"\u96FB\u8166\u4F60\u597D\uFE57 - Traditional Chinese" // Big5
      };
      String outFile = "hello.utf-16be";
      if (a.length>0) outFile = a[0];
      String outCharsetName = "utf-16be";
      if (a.length>1) outCharsetName = a[1];
      String crlf = System.getProperty("line.separator");
      try {
         OutputStreamWriter out = new OutputStreamWriter(
            new FileOutputStream(outFile), outCharsetName);
         for (int i=0; i<text.length; i++) {
            out.write(text[i]);
            out.write(crlf);
         }
         out.close();
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
}

As you can see from the source code, this program will write the "Hello computer!" message in several languages. Let's compile this program and run it to get the characters saved into a file with UTF-16BE encoding:

>javac UnicodeHello.java
>java UnicodeHello hello.utf-16be utf-16be

Now we have a text file with characters saved in UTF-16BE encoding. Read the next section on how to view and inspect this UTF-16BE encoded file.

Last update: 2014.

Table of Contents

 About This JDK Tutorial Book

 Downloading and Installing JDK 1.8.0 on Windows

 Downloading and Installing JDK 1.7.0 on Windows

 Downloading and Installing JDK 1.6.2 on Windows

 Java Date-Time API

 Date, Time and Calendar Classes

 Date and Time Object and String Conversion

 Number Object and Numeric String Conversion

 Locales, Localization Methods and Resource Bundles

 Calling and Importing Classes Defined in Unnamed Packages

 HashSet, Vector, HashMap and Collection Classes

 Character Set Encoding Classes and Methods

 Character Set Encoding Maps

Encoding Conversion Programs for Encoded Text Files

\uxxxx - Entering Unicode Data in Java Programs

 HexWriter.java - Converting Encoded Byte Sequences to Hex Values

 EncodingConverter.java - Encoding Conversion Sample Program

 Viewing Encoded Text Files in Web Browsers

 Unicode Signs in Different Encodings

 Socket Network Communication

 Datagram Network Communication

 DOM (Document Object Model) - API for XML Files

 SAX (Simple API for XML)

 DTD (Document Type Definition) - XML Validation

 XSD (XML Schema Definition) - XML Validation

 XSL (Extensible Stylesheet Language)

 Message Digest Algorithm Implementations in JDK

 Private key and Public Key Pair Generation

 PKCS#8/X.509 Private/Public Encoding Standards

 Digital Signature Algorithm and Sample Program

 "keytool" Commands and "keystore" Files

 KeyStore and Certificate Classes

 Secret Key Generation and Management

 Cipher - Secret Key Encryption and Decryption

 The SSL (Secure Socket Layer) Protocol

 SSL Socket Communication Testing Programs

 SSL Client Authentication

 HTTPS (Hypertext Transfer Protocol Secure)

 Outdated Tutorials

 References

 PDF Printing Version

\uxxxx - Entering Unicode Data in Java Programs - Updated in 2014, by Dr. Herong Yang