JSP and JSTL Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 3.09, 2006

Localization / Internationalization - Non ASCII Characters in JSP Pages

Part:   1  2  3  4  5 

JSP/JSTL Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Using Cookies

Using JavaBean Classes

HTTP Response Header Lines

Non ASCII Characters

JSTL and Expression Language

File Upload

Execution Context

JSP Elements

JSP Standard Tag Libraries (JSTL)

JSP Custom Tag

... Table of Contents

(Continued from previous part...)

Java Strings - Byte Sequences Encoded for Local Languages

Let's try option 1 mentioned in the previous section first. Here is my sample JSP page:

<?xml version="1.0"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" 
   xmlns:c="http://java.sun.com/jstl/core" version="1.2"> 
<!-- HelpGB2312Java.jsp
     Copyright (c) 2002 by Dr. Herong Yang
-->
<jsp:directive.page contentType="text/html; charset=gb2312"/>
<jsp:declaration><![CDATA[
   private java.io.OutputStream outStream;
   private void writeGB(String s) throws Throwable {
      for (int i=0; i<s.length(); i++) {
         char c = s.charAt(i);
         byte b = (byte) (c>>8 & 0x00FF);
         //if (b>0) 
         outStream.write(b);
         b = (byte) (c & 0x00FF);
         outStream.write(b);
      }
   }
]]></jsp:declaration>
<jsp:scriptlet><![CDATA[
   outStream = response.getOutputStream();
   writeGB("<html>");
   writeGB("<meta http-equiv=\"Content-Type\""
      + " content=\"text/html; charset=gb2312\"/>");
   writeGB("<body>");
   writeGB("<b>\uCBB5\uC3F7</b>");
   writeGB("<p>\uD5E2\uCAC7\uD2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5"
      + "\uB5C4\uCBB5\uC3F7\uCAE9\uA1AD</p>");
   writeGB("</body>");
   writeGB("</html>");
]]></jsp:scriptlet>
</jsp:root>

When I opened HelpGB2312Java.jsp with IE, I saw Chinese characters correctly displayed on the screen. So option 1 works! But note that:

  • Figuring out the byte sequences of non ASCII characters in a particular encoding is not that hard. Simplified Chinese text files are usually written in byte sequences of GB2312 encoding.
  • Byte sequences can only be entered in Java statements in Hex number format.
  • response.getOutputStream() need to be called before any other output statements.
  • Once response.getOutputStream() is called, you can not call response.getWriter() any more. So the entire HTML document must be outputed in binary mode.
  • You can not add any static HTML text, because that requires response.getWriter().
  • A JSP directive.page element is needed to set the Content-Type header of the HTML reponse with the sample charset value as the HTML document.

Java Strings - Unicode Codes - Local Language Independent

Let's try option 2 now. Here is my sample JSP page:

<?xml version="1.0"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" 
   xmlns:c="http://java.sun.com/jstl/core" version="1.2"> 
<!-- HelpGB2312Unicode.jsp
     Copyright (c) 2002 by Dr. Herong Yang
-->
<jsp:scriptlet><![CDATA[
   response.setContentType("text/html; charset=gb2312");
   out.println("<html>");
   out.println("<meta http-equiv=\"Content-Type\""
      + " content=\"text/html; charset=gb2312\"/>");
   out.println("<body>");
   out.println("<b>\u8bf4\u660e</b><br/>");
   out.println("<p>\u8fd9\u662f\u4e00\u4efd\u975e\u5e38\u95f4\u5355"
      + "\u7684\u8bf4\u660e\u4e66\u2026</p>");
   out.println("</body>");
   out.println("</html>");
]]></jsp:scriptlet>
</jsp:root>

When I opened HelpGB2312Unicode.jsp with IE, I saw Chinese characters correctly displayed on the screen. Remember I have Unicode Chinese fonts installed on my system. So option 2 works! But note that:

  • Option 2 looks much simpler than option 1. No need to output HTML documents in binary mode.
  • response.setContentType() must be called before any output statements.
  • "out" is ready to use with the specified encoding schema embedded.
  • The Chinese characters must be enterred as Unicode codes, not GB2312 codes.

If you Chinese text is in GB2312 encoding format, you need to convert the text to Unicode codes in "\u" format. One good tool for this is native2ascii from JDK. Here is a sample command to convert HelpGB2312.html:

\jdk\bin\native2ascii -encoding gb2312 HelpGB2312.html test.html

You could also enter non ASCII characters as Unicode codes in UTF-8 format. This is very easy to do, if you have a special text editor that supports Unicode UTF-8 encoding and input interface for your local language characters.

Entering Non ASCII Characters as Static HTML Text

Entering non ASCII characters as static HTML text is much harder than what I initially thought. There are many factors that should be considered:

  • JSP page syntax - Using standard syntax or XML syntax.
  • Encoding schema of the JSP page source code.
  • Encoding schema of the converted Java source code.
  • Encoding schema of the HTTP response.

(Continued on next part...)

Part:   1  2  3  4  5 

Dr. Herong Yang, updated in 2006
JSP and JSTL Tutorials - Herong's Tutorial Notes - Localization / Internationalization - Non ASCII Characters in JSP Pages