JDK Tutorials - Herong's Tutorial Examples - v6.32, by Herong Yang
Character Set Encoding Map Analyzer
This section provides a tutorial example on how to write a simple program to analyze and print out the encoding map showing relations between character code points and their encoded byte sequences of a given encoding.
As mentioned in the previous chapter, JDK supports many build-in character set encodings.
In order to figure out the encoding map (relations between character code points and their encoded byte sequences) of a specific supported encoding, I wrote the following program to analyze a given encoding and print a map between the code points (from 0x0000 to 0xFFFF) and the encoded byte sequences:
/* EncodingAnalyzer.java * Copyright (c) HerongYang.com. All Rights Reserved. */ import java.io.*; class EncodingAnalyzer { static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; public static void main(String[] a) { String charset = null; if (a.length>0) charset = a[0]; if (charset==null) System.out.println("Default encoding:"); else System.out.println(charset+" encoding:"); int lastByte = 0; int lastLength = 0; byte[] startSequence = null; char startChar = 0; byte[] endSequence = null; char endChar = 0; boolean isFirstChar = true; for (int i=0; i<0x010000; i++) { char c = (char) i; String s = String.valueOf(c); byte[] b = null; if (charset==null) { b = s.getBytes(); } else { try { b = s.getBytes(charset); } catch (UnsupportedEncodingException e) { System.out.println(e.toString()); break; } } int l = b.length; int lb = ((int) b[l-1]) & 0x00FF; if (isFirstChar==true) { isFirstChar = false; startSequence = b; startChar = c; lastByte = lb - 1; lastLength = l; } if (!(l==lastLength && (lb==lastByte+1 || lb==lastByte))) { System.out.print(charToHex(startChar)+" >"); printBytes(startSequence); System.out.print(" - "+charToHex(endChar)+" >"); printBytes(endSequence); System.out.println(""); startSequence = b; startChar = c; } endSequence = b; endChar = c; lastLength = l; lastByte = lb; } System.out.print(charToHex(startChar)+" >"); printBytes(startSequence); System.out.print(" - "+charToHex(endChar)+" >"); printBytes(endSequence); System.out.println(""); } public static void printBytes(byte[] b) { for (int j=0; j<b.length; j++) System.out.print(" "+byteToHex(b[j])); } public static String byteToHex(byte b) { char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] }; return new String(a); } public static String charToHex(char c) { byte hi = (byte) (c >>> 8); byte lo = (byte) (c & 0xff); return byteToHex(hi) + byteToHex(lo); } }
Note that:
The output of this program will be discussed in the sections bellow.
Table of Contents
Date, Time and Calendar Classes
Date and Time Object and String Conversion
Number Object and Numeric String Conversion
Locales, Localization Methods and Resource Bundles
Calling and Importing Classes Defined in Unnamed Packages
HashSet, Vector, HashMap and Collection Classes
Character Set Encoding Classes and Methods
►Character Set Encoding Map Analyzer
Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1
Character Set Encoding Maps - CP1252/Windows-1252
Character Set Encoding Maps - Unicode UTF-8
Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE
Character Counter Program for Any Given Encoding
Character Set Encoding Comparison
Encoding Conversion Programs for Encoded Text Files
Datagram Network Communication
DOM (Document Object Model) - API for XML Files
DTD (Document Type Definition) - XML Validation
XSD (XML Schema Definition) - XML Validation
XSL (Extensible Stylesheet Language)
Message Digest Algorithm Implementations in JDK
Private key and Public Key Pair Generation
PKCS#8/X.509 Private/Public Encoding Standards
Digital Signature Algorithm and Sample Program
"keytool" Commands and "keystore" Files
KeyStore and Certificate Classes
Secret Key Generation and Management
Cipher - Encryption and Decryption
The SSL (Secure Socket Layer) Protocol
SSL Socket Communication Testing Programs