|
Encoding Conversion
Part:
1
2
3
4
(Continued from previous part...)
Since the text file contains non-ASCII characters, we need to convert it into Hex
decimal digits to be able check the code values of the saved characters. Remember
UTF-16BE encoding breaks the code values into two bytes directly without any changes.
Here is a program to convert any data file into Hex decimal digits:
/**
* HexWriter.java
* Copyright (c) 2002 by Dr. Herong Yang
* This program allows you to convert and data file to a new data
* in Hex format with 16 bytes (32 Hex digits) per line.
*/
import java.io.*;
class HexWriter {
static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
public static void main(String[] a) {
String inFile = a[0];
String outFile = a[1];
int bufSize = 16;
byte[] buffer = new byte[bufSize];
String crlf = System.getProperty("line.separator");
try {
FileInputStream in = new FileInputStream(inFile);
OutputStreamWriter out = new OutputStreamWriter(
new FileOutputStream(outFile));
int n = in.read(buffer,0,bufSize);
String s = null;
int count = 0;
while (n!=-1) {
count += n;
s = bytesToHex(buffer,0,n);
out.write(s);
out.write(crlf);
n = in.read(buffer,0,bufSize);
}
in.close();
out.close();
System.out.println("Number of input bytes: "+count);
} catch (IOException e) {
System.out.println(e.toString());
}
}
public static String bytesToHex(byte[] b, int off, int len) {
StringBuffer buf = new StringBuffer();
for (int j=0; j<len; j++)
buf.append(byteToHex(b[off+j]));
return buf.toString();
}
public static String byteToHex(byte b) {
char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
return new String(a);
}
}
Compile this program and run it to convert hello.utf-16be:
javac HexWriter.java
java HexWriter hello.utf-16be hello.hex
Okay, here is the content of hello.hex:
00480065006C006C006F00200063006F
006D0070007500740065007200210020
002D00200045006E0067006C00690073
0068000D000A753581114F60597DFF01
0020002D002000530069006D0070006C
00690066006900650064002000430068
0069006E006500730065000D000A96FB
81664F60597DFE570020002D00200054
007200610064006900740069006F006E
0061006C0020004300680069006E0065
00730065000D000A
If you know how to read Hex number, you should be able to see:
- "00480065006C006C006F" represents "Hello".
- "753581114F60597DFF01" represents the Simplified Chinese message.
- "96FB81664F60597DFE57" represents the Traditional Chinese message.
Unicode Encoding Conversion
Now we have a text file with Unicode characters. Let's write an encoding
conversion program:
/**
* EncodingConverter.java
* Copyright (c) 2002 by Dr. Herong Yang
*
* This program allows you to convert a text file in one encoding
* to another file in a different encoding.
*/
import java.io.*;
class EncodingConverter {
public static void main(String[] a) {
String inFile = a[0];
String inCharsetName = a[1];
String outFile = a[2];
String outCharsetName = a[3];
try {
InputStreamReader in = new InputStreamReader(
new FileInputStream(inFile), inCharsetName);
OutputStreamWriter out = new OutputStreamWriter(
new FileOutputStream(outFile), outCharsetName);
int c = in.read();
int n = 0;
while (c!=-1) {
out.write(c);
n++;
c = in.read();
}
in.close();
out.close();
System.out.println("Number of characters: "+n);
System.out.println("Number of input bytes: "
+(new File(inFile)).length());
System.out.println("Number of output bytes: "
+(new File(outFile)).length());
} catch (IOException e) {
System.out.println(e.toString());
}
}
}
Compile this program and use it to convert our hello message file into several
encodings:
javac EncodingConverter.java
java EncodingConverter hello.utf-16be utf-16be hello.ascii ascii
java EncodingConverter hello.utf-16be utf-16be hello.iso-8859-1 iso-...
java EncodingConverter hello.utf-16be utf-16be hello.utf-8 utf-8
java EncodingConverter hello.utf-16be utf-16be hello.gbk gbk
java EncodingConverter hello.utf-16be utf-16be hello.big5 big5
java EncodingConverter hello.utf-16be utf-16be hello.shift_jis shift_jis
(Continued on next part...)
Part:
1
2
3
4
|