Java Tool Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.10

UTF-8 to \udddd Conversion with 'native2ascii -encoding'

This section provides a tutorial example on how to convert UTF-8 character strings to \udddd Unicode code sequences with the 'native2ascii -encoding' command.

Now let's see how we can fix the encoding problem with HelloUtf8.java demonstrated in the previous section.

1. Convert HelloUtf8.java to \udddd Unicode code sequences using the "native2ascii -encoding utf-8" command:

C:\herong>native2ascii -encoding utf-8 
   HelloUtf8.java HelloUtf8Converted.java

2. Rename the class name in HelloUtf8Converted.java with an editor:

public class HelloUtf8Converted {
   public static void main(String[] a) {
      System.out.println("Hello world!"); 	
      System.out.println("\u4e16\u754c\u4f60\u597d\uff01"); 	
   }
}

3. Compile and run HelloUtf8Converted.java:

C:\herong>javac HelloUtf8Converted.java

C:\herong>java HelloUtf8Converted

Hello world!
?????

What happens to the Chinese string printed on the console? Why I am not getting Chinese characters back in the output?

The problem is not caused by those \udddd Unicode code sequences used to represent the Chinese string. Those \udddd Unicode code sequences correctly inserted Chinese characters into the storage of a string variable. The problem is caused by the default encoding used by the "out" stream. See the next section on how to fix this problem.

Sections in This Chapter

'native2ascii' - Encoding Converter Command and Options

'javac' Using CP1252 to Process Source File

UTF-8 to \udddd Conversion with 'native2ascii -encoding'

Setting UTF-8 Encoding in PrintStream

Converting \udddd Sequences Back with "-reverse" Option

Dr. Herong Yang, updated in 2008
UTF-8 to \udddd Conversion with 'native2ascii -encoding'