Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
Unicode Signs in Different Encodings
This section provides a tutorial example on how to write sample programs to create some Unicode signs in various encodings and view them in a Web browser.
I wanted to play with my utility programs mentioned in this chapter one more time with some Unicode signs. So I copied UnicodeHello.java and created UnicodeSign.java:
/* UnicodeSign.java * Copyright (c) 2019 HerongYang.com. All Rights Reserved. * * This program is a simple tool to allow you to enter several lines of * text, and write them into a file using the specified encoding * (charset name). The input text lines uses Java string convention, * which allows you to enter ASCII characters directly, and any non * ASCII characters with escape sequences. * * This version of the program is to write out some interesting signs. */ import java.io.*; class UnicodeSign { public static void main(String[] a) { // The following Array contains text to be saved into the output // File. To enter your own text, just replace this Array. String[] text = { "U+005C(\\)REVERSE SOLIDUS", //\u005C is '\', cannot be entered directly "U+007E(\u007E)TILDE", "U+00A2(\u00A2)CENT SIGN", "U+00A3(\u00A3)POUND SING", "U+00A5(\u00A5)YEN SIGN", "U+00A6(\u00A6)BROKEN BAR", "U+00A7(\u00A7)SECTION SIGN", "U+00A9(\u00A9)COPYRIGHT SIGN", "U+00AC(\u00AC)NOT SIGN", "U+00AE(\u00AE)REGISTERED SIGN", "U+2022(\u2022)BULLET", "U+2023(\u2023)TRIANGULAR BULLET", "U+203B(\u203B)REFERENCE MARK", "U+2043(\u2043)HYPHEN BULLET", "U+FF04(\uFF04)FULLWIDTH DOLLAR SIGN", "U+FF05(\uFF05)FULLWIDTH PERCENT SIGN", "U+FF08(\uFF08)FULLWIDTH LEFT PARENTHESIS", "U+FF09(\uFF09)FULLWIDTH RIGHT PARENTHESIS", "U+FF10(\uFF10)FULLWIDTH DIGIT ZERO", "U+FF11(\uFF11)FULLWIDTH DIGIT ONE", "U+FF21(\uFF21)FULLWIDTH LATIN CAPITAL LETTER A", "U+FF22(\uFF22)FULLWIDTH LATIN CAPITAL LETTER B", "U+FF41(\uFF41)FULLWIDTH LATIN SMALL LETTER A", "U+FF42(\uFF42)FULLWIDTH LATIN SMALL LETTER B", "U+FFE0(\uFFE0)FULLWIDTH CENT SIGN", "U+FFE1(\uFFE1)FULLWIDTH POND SIGN", "U+FFE5(\uFFE5)FULLWIDTH YEN SIGN" }; String outFile = "sign.utf-16be"; if (a.length>0) outFile = a[0]; String outCharsetName = "utf-16be"; if (a.length>1) outCharsetName = a[1]; String crlf = System.getProperty("line.separator"); try { OutputStreamWriter out = new OutputStreamWriter( new FileOutputStream(outFile), outCharsetName); for (int i=0; i<text.length; i++) { out.write(text[i]); out.write(crlf); } out.close(); } catch (IOException e) { System.out.println(e.toString()); } } }
Then I ran this program, and converted the output file with different encodings:
javac UnicodeSign.java java UnicodeSign sign.utf-16be utf-16be java EncodingConverter sign.utf-16be utf-16be sign.utf-8 utf-8 java EncodingHtml sign.utf-8 utf-8 java EncodingConverter sign.utf-16be utf-16be sign.gbk gbk java EncodingHtml sign.gbk gbk java EncodingConverter sign.utf-16be utf-16be sign.shift_jis shift_jis java EncodingHtml sign.shif_jis shift_jis java EncodingConverter sign.utf-16be utf-16be sign.johab johab java EncodingHtml sign.johab johab
Then I viewed the different encoded test files with IE, and noticed that:
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
►Encoding Conversion Programs for Encoded Text Files
\uxxxx - Entering Unicode Data in Java Programs
HexWriter.java - Converting Encoded Byte Sequences to Hex Values
EncodingConverter.java - Encoding Conversion Sample Program
Viewing Encoded Text Files in Web Browsers
►Unicode Signs in Different Encodings
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor