JSP and JSTL Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 3.09, 2006

Localization / Internationalization - Non ASCII Characters in JSP Pages

Part:   1  2  3  4  5 

JSP/JSTL Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Using Cookies

Using JavaBean Classes

HTTP Response Header Lines

Non ASCII Characters

JSTL and Expression Language

File Upload

Execution Context

JSP Elements

JSP Standard Tag Libraries (JSTL)

JSP Custom Tag

... Table of Contents

(Continued from previous part...)

Let's look at the second part first to see how non ASCII characters are stored in HTML documents, transferred from Web servers to browsers, displayed on the screen. Here are some basic rules related to these steps:

  • Non ASCII characters must be encoded in a particular encoding schema, like GB2312, Shift-JIS, or UTF-8.
  • You only use a single encoding schema in one HTML document.
  • The encoding schema name should be given in a meta tag as the charset value. For examples, see my sample HTML document in this section.
  • Non ASCII characters can be transferred safely from Web servers to browsers.
  • The browser must decode HTML documents based on the schema name given in the documents - auto mode, or set by the browser user - manual mode.
  • Once non ASCII characters are decoded correctly, the browser must be provided with font files that match the character set in which those non ASCII characters are defined.

In order to test these rules, I translated my HelpASCII.html to Chinese with GB2312 encoding schema, and saved in a file called, HelpGB2312.html:

<html>
<!-- HelpGB2312.html
     Copyright (c) 2002 by Dr. Herong Yang
-->
<meta http-equiv="Content-Type" content="text/html; charset=gb2312"/>
<body>
<b>说明</b><br/>
<p>这是一份非常间单的说明书…</p>
</body>
</html>

You may have trouble read this file on this page, or copy it to your local system, because it contains non ASCII characters. Bellow is the same file in hex number format. You can use it to fix or regenerate HelpGB2312.html.

3C68746D6C3E0D0A3C212D2D2048656C
704742323331322E68746D6C0D0A2020
202020436F7079726967687420286329
20323030342062792044722E20486572
6F6E672059616E670D0A2D2D3E0D0A3C
6D65746120687474702D65717569763D
22436F6E74656E742D54797065222063
6F6E74656E743D22746578742F68746D
6C3B20636861727365743D6762323331
32223E0D0A3C626F64793E0D0A3C623E
CBB5C3F73C2F623E3C62722F3E0D0A3C
703ED5E2CAC7D2BBB7DDB7C7B3A3BCE4
B5A5B5C4CBB5C3F7CAE9A1AD3C2F703E
0D0A3C2F626F64793E0D0A3C2F68746D
6C3E0D0A

When I opened HelpGB2312.html with IE (Internet Explorer), I saw Chinese characters correctly displayed on the screen. I verified my IE encoding settings, View menu and Encoding command, it has "Auto-select" checked, and Chinese Simplified (GB2312) selected. I also verified my IE font settings, Tools menu, Internet Options command, and Fonts button, it has fonts installed for Chinese Simplified language.

When I changed my IE encoding setting to another encoding, like UTF-8, I got strange characters showing up on the screen, because I forced IE to decode my GB2312 encoded document with UTF-8 encoding schema.

Entering Non ASCII Characters in Java Strings

Now let's look at the first part of the process to see how non ASCII chararters can be entered in JSP pages, converted into Java programs, and outputed into HTML documents. Rules related to these steps are:

  • Non ASCII characters can be entered JSP pages in two ways: as static HTML text, and as dynamic Java statements.
  • Java strings are sequences of 2-byte characters.
  • Non ASCII characters can be entered into Java string literals as Unicode codes in \u hex digits sequences.
  • Non ASCII characters can also be entered into Java string literals as Unicode codes in UTF-8 encoding byt sequences. You may need a UTF-8 sensitive editor to enter your Java source code, because regular text editor may not be able to recongnize UTF-8 byte sequences.
  • Java can convert Unicode codes to various local language codes as encoding processes at the character based output stream level.
  • JSP server object "response" offers two output streams: response.getWriter(), and response.getOutputStream(). You can only use one of the two streams in a single JSP page.
  • response.getWrite() allows you to output characters with Unicode encoding specified by the response.setContentType() method.
  • response.getOutputStream() allows you to output binary bytes.
  • Static HTML text will be converted into out.write() statemenss
  • JSP page can be written as an XML file, which requires XML encoding rules.

Based these rules, we have three options to output a HTML document with non ASCII characters:

  • 1. Enter non ASCII characters in the encoded form required by the HTML document as sequence of types, and use Java binary output stream to generate the HTML document.
  • 2. Enter non ASCII characters in Unicode codes, and use Java writer output stream to generate the HTML with the stream set to the encoding required by the HTML document.
  • 3. Enter non ASCII characters as static HTML text, and let the JSP server to convert them into out.write() statements to generate the HTML document.

(Continued on next part...)

Part:   1  2  3  4  5 

Dr. Herong Yang, updated in 2006
JSP and JSTL Tutorials - Herong's Tutorial Notes - Localization / Internationalization - Non ASCII Characters in JSP Pages