|
Localization / Internationalization - Non ASCII Characters in JSP Pages
Part:
1
2
3
4
5
(Continued from previous part...)
Static HTML Text - JSP Page in XML Syntax
In the third test, the static text is inserted into a JSP page in XML syntax:
<?xml version="1.0" encoding="gb2312"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page"
xmlns:c="http://java.sun.com/jstl/core" version="1.2">
<jsp:directive.page contentType="text/html; charset=gb2312"/>
<!-- StaticGB2312.jsp
Copyright (c) 2002 by Dr. Herong Yang
-->
<html>
<body>
<p>
GB2312-binary: ˵=(0xCBB5C3F7)<br/>
GB2312-#xHEX: 쮵쏷<br/>
GB2312-\uHEX: \uCBB5\uC3F7<br/>
Unicode-binary: ----=(0x8bf4660e)<br/>
Unicode-#xHEX: 说明<br/>
Unicode-\uHEX: \u8bf4\u660e<br/>
Unicode-UTF8: 说明=(0xE8AFB4E6988E)<br/>
</p>
</body>
</html>
</jsp:root>
If you view this page with IE, you should will see that only Unicode-#xHEX line
is displayed correctly. This is a big supprise to me:
- The XML parser in Tomcat is not deconding my JSP page with gb2312.
- My JSP page seems to be decoded with ISO-8859-1, Windows default encoding
scheme.
- The 0x0e code in Unicode-binary line is causing trouble to Tomcat server,
so I have to remove those binary codes.
- The Java class file is generated in UTF-8 encoding.
- The "out" object and the Content-Type header are set correctly to GB2312.
- The XML entity codes, #xHEX lines, are decoded into binary values.
This is different than the standard syntax.
Here are the related lines of the generated Java class file:
...
response.setContentType("text/html; charset=gb2312");
...
out.write("<p>");
out.write("\nGB2312-binary: ˵Ã÷=(0xCBB5C3F7)");
out.write("<br/>");
out.write("\nGB2312-#xHEX: ");
out.write("쮵");
out.write("쏷");
out.write("<br/>");
out.write("\nGB2312-\\uHEX: \\uCBB5\\uC3F7");
out.write("<br/>");
out.write("\nUnicode-binary: ----=(0x8bf4660e)");
out.write("<br/>");
out.write("\nUnicode-#xHEX: ");
out.write("说");
out.write("明");
out.write("<br/>");
out.write("\nUnicode-\\uHEX: \\u8bf4\\u660e");
out.write("<br/>");
out.write("\nUnicode-UTF8: 说æ=(0xE8AFB4E6988E)");
out.write("<br/>");
out.write("</p>");
....
I have tried to change charset to UTF-8, but it did not work. JSP pages in
XML syntax are always decoded as ISO-8859-1. May be there is a setting somewher
to control this, but I don't know.
Supporting Characters from Multiple Languages
If you planning to write a page that has characters from multiple language encodings.
you have to use Unicode codes and UTF-8 HTML document encoding. Here is an example
with characters from two encodings: GB2312 and Big5.
<?xml version="1.0"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page"
xmlns:c="http://java.sun.com/jstl/core" version="1.2">
<!-- HelpUnicodeUTF8.jsp
Copyright (c) 2004 by Dr. Herong Yang
-->
<jsp:scriptlet><![CDATA[
response.setContentType("text/html; charset=utf-8");
out.println("<meta http-equiv=\"Content-Type\""
+ " content=\"text/html; charset=utf-8\"/>");
out.println("<body>");
out.println("<b>\u8bf4\u660e</b><br/>");
out.println("<p>\u8fd9\u662f\u4e00\u4efd\u975e\u5e38\u95f4\u5355"
+ "\u7684\u8bf4\u660e\u4e66\u2026</p>");
out.println("<b>\u8aaa\u660e</b><br/>");
out.println("<p>\u9019\u662f\u4e00\u4efd\u975e\u5e38\u9593\u55ae"
+ "\u7684\u8aaa\u660e\u66f8\u2026</p>");
out.println("</body>");
out.println("</html>");
]]></jsp:scriptlet>
</jsp:root>
View this page with IE, you should see the same message appear twice, one
as simplified Chinese, and the other as tranditional Chinese.
Conclusion
As you can see from my notes in the previous sections, localizing or
internationalizing JSP pages is not an easy task. My recommendations are:
- Avoid using static text. Put the entire page under a scriptlet, so all text
messages are generated from Java statements.
- Using Unicode codes in UTF-8 format or \uHEX format for string literals.
It allows to support characters in all local languages in a single encoding.
- Use UTF-8 as the HTML document encoding instead of encodings of a particular
local language, like GB2312. This may cause problems for users on locale systems
where Unicode fonts are not supported. But more and more locale systems are supporting
Unicode and UTF-8 encoding.
- I still don't know how to control the source code encoding of JSP pages in XML syntax.
Part:
1
2
3
4
5
|