JSP Tutorials - Herong's Tutorial Examples - v5.11, by Herong Yang
Presenting Non ASCII Characters in HTML Documents
This section provides a tutorial example on how to present non-ASCII characters in HTML documents and rules to ensure them being display correctly on Web browsers.
In order to ensure non ASCII characters entered in JSP files to show up on browser screens correctly, we need to understand how non ASCII characters are processed from one step to the other. Processing steps can be grouped into two parts:
First, let's look at the second part to see how non ASCII characters are stored in HTML documents, transferred from Web servers to browsers, displayed on the screen. Here are some basic rules related to these steps:
In order to test these rules, I translated my HelpASCII.html to Chinese with GB2312 encoding schema, and saved in a file called, HelpGB2312.html:
<html> <!-- HelpGB2312.html - Copyright (c) 2006 HerongYang.com. All Rights Reserved. --> <meta http-equiv="Content-Type" content="text/html; charset=gb2312"/> <body> <b>쮵쏷</b><br/> <p>헢쫇튻럝럇 뎣볤떥뗄쮵 쏷쫩ꆭ</p> </body> </html>
You may have trouble read this file on this page, or copy it to your local system, because it contains non ASCII characters. Below is the same file in hex number format. You can use it to fix or regenerate HelpGB2312.html.
3C68746D6C3E0D0A3C212D2D2048656C 704742323331322E68746D6C0D0A2020 202020436F7079726967687420286329 20323030342062792044722E20486572 6F6E672059616E670D0A2D2D3E0D0A3C 6D65746120687474702D65717569763D 22436F6E74656E742D54797065222063 6F6E74656E743D22746578742F68746D 6C3B20636861727365743D6762323331 32223E0D0A3C626F64793E0D0A3C623E CBB5C3F73C2F623E3C62722F3E0D0A3C 703ED5E2CAC7D2BBB7DDB7C7B3A3BCE4 B5A5B5C4CBB5C3F7CAE9A1AD3C2F703E 0D0A3C2F626F64793E0D0A3C2F68746D 6C3E0D0A
When I opened HelpGB2312.html with IE (Internet Explorer), I saw Chinese characters correctly displayed on the screen. I verified my IE encoding settings, View menu and Encoding command, it has "Auto-select" checked, and Chinese Simplified (GB2312) selected. I also verified my IE font settings, Tools menu, Internet Options command, and Fonts button, it has fonts installed for Chinese Simplified language.
When I changed my IE encoding setting to another encoding, like UTF-8, I got strange characters showing up on the screen, because I forced IE to decode my GB2312 encoded document with UTF-8 encoding schema.
Table of Contents