Herong's Tutorial Notes on Unicode
Dr. Herong Yang, Version 4.02

JDK - Encoding Conversion

Part:   1  2  3  4 

(Continued from previous part...)

Since the text file contains non-ASCII characters, we need to convert it into Hex decimal digits to be able check the code values of the saved characters. Remember UTF-16BE encoding breaks the code values into two bytes directly without any changes. Here is a program to convert any data file into Hex decimal digits:

/**
 * HexWriter.java
 * Copyright (c) 2002 by Dr. Herong Yang
 * This program allows you to convert and data file to a new data 
 * in Hex format with 16 bytes (32 Hex digits) per line.
 */
import java.io.*;
class HexWriter {
   static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
                             '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
   public static void main(String[] a) {
      String inFile = a[0];
      String outFile = a[1];
      int bufSize = 16;
      byte[] buffer = new byte[bufSize];
      String crlf = System.getProperty("line.separator");
      try {
         FileInputStream in = new FileInputStream(inFile);
         OutputStreamWriter out = new OutputStreamWriter(
            new FileOutputStream(outFile));
         int n = in.read(buffer,0,bufSize);
	 String s = null;
         int count = 0;
         while (n!=-1) {
            count += n;
            s = bytesToHex(buffer,0,n);
            out.write(s);
            out.write(crlf);
            n = in.read(buffer,0,bufSize);
         }
         in.close();
         out.close();
         System.out.println("Number of input bytes: "+count);
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
   public static String bytesToHex(byte[] b, int off, int len) {
      StringBuffer buf = new StringBuffer();
      for (int j=0; j<len; j++)
         buf.append(byteToHex(b[off+j]));
      return buf.toString();
   }
   public static String byteToHex(byte b) {
      char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(a);
   }
}

Compile this program and run it to convert hello.utf-16be:

javac HexWriter.java
java HexWriter hello.utf-16be hello.hex

Okay, here is the content of hello.hex:

00480065006C006C006F00200063006F
006D0070007500740065007200210020
002D00200045006E0067006C00690073
0068000D000A753581114F60597DFF01
0020002D002000530069006D0070006C
00690066006900650064002000430068
0069006E006500730065000D000A96FB
81664F60597DFE570020002D00200054
007200610064006900740069006F006E
0061006C0020004300680069006E0065
00730065000D000A

If you know how to read Hex number, you should be able to see:

  • "00480065006C006C006F" represents "Hello".
  • "753581114F60597DFF01" represents the Simplified Chinese message.
  • "96FB81664F60597DFE57" represents the Traditional Chinese message.

Unicode Encoding Conversion

Now we have a text file with Unicode characters. Let's write an encoding conversion program:

/**
 * EncodingConverter.java
 * Copyright (c) 2002 by Dr. Herong Yang
 *
 * This program allows you to convert a text file in one encoding 
 * to another file in a different encoding.
 */
import java.io.*;
class EncodingConverter {
   public static void main(String[] a) {
      String inFile = a[0];
      String inCharsetName = a[1];
      String outFile = a[2];
      String outCharsetName = a[3];
      try {
         InputStreamReader in = new InputStreamReader(
            new FileInputStream(inFile), inCharsetName);
         OutputStreamWriter out = new OutputStreamWriter(
            new FileOutputStream(outFile), outCharsetName);
         int c = in.read();
         int n = 0;
         while (c!=-1) {
            out.write(c);
            n++;
            c = in.read();
         }
         in.close();
         out.close();
         System.out.println("Number of characters: "+n);
         System.out.println("Number of input bytes: "
            +(new File(inFile)).length());
         System.out.println("Number of output bytes: "
            +(new File(outFile)).length());
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
}

(Continued on next part...)

Part:   1  2  3  4 

Dr. Herong Yang, updated in 2007
Herong's Tutorial Notes on Unicode - JDK - Encoding Conversion