|
Localization / Internationalization - Non ASCII Characters in JSP Pages
Part:
1
2
3
4
5
(Continued from previous part...)
Java Strings - Byte Sequences Encoded for Local Languages
Let's try option 1 mentioned in the previous section first. Here is my sample JSP page:
<?xml version="1.0"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page"
xmlns:c="http://java.sun.com/jstl/core" version="1.2">
<!-- HelpGB2312Java.jsp
Copyright (c) 2002 by Dr. Herong Yang
-->
<jsp:directive.page contentType="text/html; charset=gb2312"/>
<jsp:declaration><![CDATA[
private java.io.OutputStream outStream;
private void writeGB(String s) throws Throwable {
for (int i=0; i<s.length(); i++) {
char c = s.charAt(i);
byte b = (byte) (c>>8 & 0x00FF);
//if (b>0)
outStream.write(b);
b = (byte) (c & 0x00FF);
outStream.write(b);
}
}
]]></jsp:declaration>
<jsp:scriptlet><![CDATA[
outStream = response.getOutputStream();
writeGB("<html>");
writeGB("<meta http-equiv=\"Content-Type\""
+ " content=\"text/html; charset=gb2312\"/>");
writeGB("<body>");
writeGB("<b>\uCBB5\uC3F7</b>");
writeGB("<p>\uD5E2\uCAC7\uD2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5"
+ "\uB5C4\uCBB5\uC3F7\uCAE9\uA1AD</p>");
writeGB("</body>");
writeGB("</html>");
]]></jsp:scriptlet>
</jsp:root>
When I opened HelpGB2312Java.jsp with IE, I saw Chinese characters
correctly displayed on the screen. So option 1 works! But note that:
- Figuring out the byte sequences of non ASCII characters in a particular encoding
is not that hard. Simplified Chinese text files are usually written in byte sequences
of GB2312 encoding.
- Byte sequences can only be entered in Java statements in Hex number format.
- response.getOutputStream() need to be called before any other output statements.
- Once response.getOutputStream() is called, you can not call response.getWriter() any more.
So the entire HTML document must be outputed in binary mode.
- You can not add any static HTML text, because that requires response.getWriter().
- A JSP directive.page element is needed to set the Content-Type header of the HTML reponse
with the sample charset value as the HTML document.
Java Strings - Unicode Codes - Local Language Independent
Let's try option 2 now. Here is my sample JSP page:
<?xml version="1.0"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page"
xmlns:c="http://java.sun.com/jstl/core" version="1.2">
<!-- HelpGB2312Unicode.jsp
Copyright (c) 2002 by Dr. Herong Yang
-->
<jsp:scriptlet><![CDATA[
response.setContentType("text/html; charset=gb2312");
out.println("<html>");
out.println("<meta http-equiv=\"Content-Type\""
+ " content=\"text/html; charset=gb2312\"/>");
out.println("<body>");
out.println("<b>\u8bf4\u660e</b><br/>");
out.println("<p>\u8fd9\u662f\u4e00\u4efd\u975e\u5e38\u95f4\u5355"
+ "\u7684\u8bf4\u660e\u4e66\u2026</p>");
out.println("</body>");
out.println("</html>");
]]></jsp:scriptlet>
</jsp:root>
When I opened HelpGB2312Unicode.jsp with IE, I saw Chinese characters
correctly displayed on the screen. Remember I have Unicode Chinese fonts installed
on my system. So option 2 works! But note that:
- Option 2 looks much simpler than option 1. No need to output HTML documents
in binary mode.
- response.setContentType() must be called before any output statements.
- "out" is ready to use with the specified encoding schema embedded.
- The Chinese characters must be enterred as Unicode codes, not GB2312
codes.
If you Chinese text is in GB2312 encoding format, you need to convert the
text to Unicode codes in "\u" format. One good tool for this is native2ascii
from JDK. Here is a sample command to convert HelpGB2312.html:
\jdk\bin\native2ascii -encoding gb2312 HelpGB2312.html test.html
You could also enter non ASCII characters as Unicode codes in UTF-8 format. This
is very easy to do, if you have a special text editor that supports Unicode UTF-8 encoding
and input interface for your local language characters.
Entering Non ASCII Characters as Static HTML Text
Entering non ASCII characters as static HTML text
is much harder than what I initially thought. There are many factors
that should be considered:
- JSP page syntax - Using standard syntax or XML syntax.
- Encoding schema of the JSP page source code.
- Encoding schema of the converted Java source code.
- Encoding schema of the HTTP response.
(Continued on next part...)
Part:
1
2
3
4
5
|