Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

Unicode Signs in Different Encodings

This section provides a tutorial example on how to write sample programs to create some Unicode signs in various encodings and view them in a Web browser.

I wanted to play with my utility programs mentioned in this chapter one more time with some Unicode signs. So I copied UnicodeHello.java and created UnicodeSign.java:

/**
 * UnicodeSign.java
 * Copyright (c) 2002 by Dr. Herong Yang
 *
 * This program is a simple tool to allow you to enter several lines of
 * text, and writh them into a file with of the specified encoding 
 * (charset name). The input text lines uses Java string convention, 
 * which allows you to enter ASCII characters directly, and any non
 * ASCII characters with escape sequences.
 *
 * This version of the program is to write out some interesting signs.
 */
import java.io.*;
class UnicodeSign {
   public static void main(String[] a) {
      // The following Array contains text to be saved into the output
      // File. To enter your own text, just replace this Array.
      String[] text = {
"U+005C(\\)REVERSE SOLIDUS", //\u005C is '\', cannot be entered directly
"U+007E(\u007E)TILDE",
"U+00A2(\u00A2)CENT SIGN",
"U+00A3(\u00A3)POUND SING",
"U+00A5(\u00A5)YEN SIGN",
"U+00A6(\u00A6)BROKEN BAR",
"U+00A7(\u00A7)SECTION SIGN",
"U+00A9(\u00A9)COPYRIGHT SIGN",
"U+00AC(\u00AC)NOT SIGN",
"U+00AE(\u00AE)REGISTERED SIGN",
"U+2022(\u2022)BULLET",
"U+2023(\u2023)TRIANGULAR BULLET",
"U+203B(\u203B)REFERENCE MARK",
"U+2043(\u2043)HYPHEN BULLET",
"U+FF04(\uFF04)FULLWIDTH DOLLAR SIGN",
"U+FF05(\uFF05)FULLWIDTH PERCENT SIGN",
"U+FF08(\uFF08)FULLWIDTH LEFT PARENTHESIS",
"U+FF09(\uFF09)FULLWIDTH RIGHT PARENTHESIS",
"U+FF10(\uFF10)FULLWIDTH DIGIT ZERO",
"U+FF11(\uFF11)FULLWIDTH DIGIT ONE",
"U+FF21(\uFF21)FULLWIDTH LATIN CAPITAL LETTER A",
"U+FF22(\uFF22)FULLWIDTH LATIN CAPITAL LETTER B",
"U+FF41(\uFF41)FULLWIDTH LATIN SMALL LETTER A",
"U+FF42(\uFF42)FULLWIDTH LATIN SMALL LETTER B",
"U+FFE0(\uFFE0)FULLWIDTH CENT SIGN",
"U+FFE1(\uFFE1)FULLWIDTH POND SIGN",
"U+FFE5(\uFFE5)FULLWIDTH YEN SIGN"
      };
      String outFile = "sign.utf-16be";
      if (a.length>0) outFile = a[0];
      String outCharsetName = "utf-16be";
      if (a.length>1) outCharsetName = a[1];
      String crlf = System.getProperty("line.separator");
      try {
         OutputStreamWriter out = new OutputStreamWriter(
            new FileOutputStream(outFile), outCharsetName);
         for (int i=0; i<text.length; i++) {
            out.write(text[i]);
            out.write(crlf);
         }
         out.close();
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
}

Then I ran this program, and converted the output file with different encodings:

javac UnicodeSign.java
java UnicodeSign sign.utf-16be utf-16be
java EncodingConverter sign.utf-16be utf-16be sign.utf-8 utf-8
java EncodingHtml sign.utf-8 utf-8
java EncodingConverter sign.utf-16be utf-16be sign.gbk gbk
java EncodingHtml sign.gbk gbk
java EncodingConverter sign.utf-16be utf-16be sign.shift_jis shift_jis
java EncodingHtml sign.shif_jis shift_jis
java EncodingConverter sign.utf-16be utf-16be sign.johab johab
java EncodingHtml sign.johab johab

Then I viewed the different encoded test files with IE, and noticed that:

  • sign.utf-8.html - The signs looked very good except two: TRIANGULAR BULLET and DASH BULLET.
  • sign.gbk.html - Many low-code-point signs were wrong, like CENT SIGN.
  • sign.shift_jis.html - Some signs were wrong, like FULLWIDTH CENT SIGN, but CENT SIGN is correct.
  • sign.johab.html - Like the gbk encoding, many low-code-point signs were wrong, like CENT SIGN.

Last update: 2006.

Sections in This Chapter

\uxxxx - Entering Unicode Data in Java Programs

HexWriter.java - Converting Encoded Byte Sequences to Hex Values

EncodingConverter.java - Encoding Conversion Sample Program

Viewing Encoded Text Files in Web Browsers

Unicode Signs in Different Encodings

Dr. Herong Yang, updated in 2009
Unicode Signs in Different Encodings