Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
HexWriter.java - Converting Encoded Byte Sequences to Hex Values
This section provides a tutorial example on how to write a sample program, HexWriter.java, to convert encoded byte sequences to Hex values to help viewing encoded text files.
By running the sample program, UnicodeHello.java, presented in the previous section, I got this text file saved in UTF-16BE encoding, hello.utf-16be. The next question is how can I view and inspect this UTF-16BE encoded file. Normal text editors will not able to show the content of this correctly.
I have two choices: using a Hex editor to open the file or convert the file to Hex value file with a program.
I decide to write a simple Java program convert UTF-16BE byte sequences into Hex decimal digits to allow me inspecting the code values of the saved characters. Remember UTF-16BE encoding breaks the code values into two bytes directly without any changes in the value. Here is a program to convert any data file into Hex decimal digits:
/* HexWriter.java * Copyright (c) 2019 HerongYang.com. All Rights Reserved. * This program allows you to convert and data file to a new data * in Hex format with 16 bytes (32 Hex digits) per line. */ import java.io.*; class HexWriter { static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; public static void main(String[] a) { String inFile = a[0]; String outFile = a[1]; int bufSize = 16; byte[] buffer = new byte[bufSize]; String crlf = System.getProperty("line.separator"); try { FileInputStream in = new FileInputStream(inFile); OutputStreamWriter out = new OutputStreamWriter( new FileOutputStream(outFile)); int n = in.read(buffer,0,bufSize); String s = null; int count = 0; while (n!=-1) { count += n; s = bytesToHex(buffer,0,n); out.write(s); out.write(crlf); n = in.read(buffer,0,bufSize); } in.close(); out.close(); System.out.println("Number of input bytes: "+count); } catch (IOException e) { System.out.println(e.toString()); } } public static String bytesToHex(byte[] b, int off, int len) { StringBuffer buf = new StringBuffer(); for (int j=0; j<len; j++) buf.append(byteToHex(b[off+j])); return buf.toString(); } public static String byteToHex(byte b) { char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] }; return new String(a); } }
Compile this program and run it to convert hello.utf-16be:
C:\herong>javac HexWriter.java C:\herong>java java HexWriter hello.utf-16be hello.hex
Okay, here is the content of hello.hex:
00480065006C006C006F00200063006F 006D0070007500740065007200210020 002D00200045006E0067006C00690073 0068000D000A753581114F60597DFF01 0020002D002000530069006D0070006C 00690066006900650064002000430068 0069006E006500730065000D000A96FB 81664F60597DFE570020002D00200054 007200610064006900740069006F006E 0061006C0020004300680069006E0065 00730065000D000A
If you know how to read Hex number, you should be able to see:
Remember to use line break sequence 000D000A (\r\n) to help finding the first character of each line.
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
►Encoding Conversion Programs for Encoded Text Files
\uxxxx - Entering Unicode Data in Java Programs
►HexWriter.java - Converting Encoded Byte Sequences to Hex Values
EncodingConverter.java - Encoding Conversion Sample Program
Viewing Encoded Text Files in Web Browsers
Unicode Signs in Different Encodings
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor