Presenting Non ASCII Characters in HTML Documents

This section provides a tutorial example on how to present non-ASCII characters in HTML documents and rules to ensure them being display correctly on Web browsers.

In order to ensure non ASCII characters entered in JSP files to show up on browser screens correctly, we need to understand how non ASCII characters are processed from one step to the other. Processing steps can be grouped into two parts:

First, let's look at the second part to see how non ASCII characters are stored in HTML documents, transferred from Web servers to browsers, displayed on the screen. Here are some basic rules related to these steps:

In order to test these rules, I translated my HelpASCII.html to Chinese with GB2312 encoding schema, and saved in a file called, HelpGB2312.html:

<html>
<!-- HelpGB2312.html
 - Copyright (c) 2006 HerongYang.com. All Rights Reserved.
-->
<meta http-equiv="Content-Type" content="text/html; charset=gb2312"/>
<body>
<b>&#xCBB5;&#xC3F7;</b><br/>
<p>&#xD5E2;&#xCAC7;&#xD2BB;&#xB7DD;&#xB7C7;
      &#xB3A3;&#xBCE4;&#xB5A5;&#xB5C4;&#xCBB5;
      &#xC3F7;&#xCAE9;&#xA1AD;</p>
</body>
</html>

You may have trouble read this file on this page, or copy it to your local system, because it contains non ASCII characters. Below is the same file in hex number format. You can use it to fix or regenerate HelpGB2312.html.

3C68746D6C3E0D0A3C212D2D2048656C
704742323331322E68746D6C0D0A2020
202020436F7079726967687420286329
20323030342062792044722E20486572
6F6E672059616E670D0A2D2D3E0D0A3C
6D65746120687474702D65717569763D
22436F6E74656E742D54797065222063
6F6E74656E743D22746578742F68746D
6C3B20636861727365743D6762323331
32223E0D0A3C626F64793E0D0A3C623E
CBB5C3F73C2F623E3C62722F3E0D0A3C
703ED5E2CAC7D2BBB7DDB7C7B3A3BCE4
B5A5B5C4CBB5C3F7CAE9A1AD3C2F703E
0D0A3C2F626F64793E0D0A3C2F68746D
6C3E0D0A

When I opened HelpGB2312.html with IE (Internet Explorer), I saw Chinese characters correctly displayed on the screen. I verified my IE encoding settings, View menu and Encoding command, it has "Auto-select" checked, and Chinese Simplified (GB2312) selected. I also verified my IE font settings, Tools menu, Internet Options command, and Fonts button, it has fonts installed for Chinese Simplified language.

When I changed my IE encoding setting to another encoding, like UTF-8, I got strange characters showing up on the screen, because I forced IE to decode my GB2312 encoded document with UTF-8 encoding schema.

Table of Contents

 About This Book

 JSP (JavaServer Pages) Overview

 Tomcat Installation on Windows Systems

 JSP Scripting Elements

 Java Servlet Introduction

 JSP Implicit Objects

 Syntax of JSP Pages and JSP Documents

 JSP Application Session

 Managing Cookies in JSP Pages

 JavaBean Objects and "useBean" Action Elements

 Managing HTTP Response Header Lines

Non-ASCII Characters Support in JSP Pages

 Characters Traveling from JSP Files to Browser Screens

 Handling ASCII Characters in JSP Pages

Presenting Non ASCII Characters in HTML Documents

 Entering Non ASCII Characters in JSP Pages

 Java Strings as non-Unicode Encoded Byte Sequences

 Java Strings as Unicode Encoded Byte Sequences

 Entering Non-ASCII Characters as Static Text

 Static HTML Text in HTML Page

 Static HTML Text in JSP Page in Standard Syntax

 Static HTML Text in JSP Page in XML Syntax

 Supporting Characters in Multiple Languages

 Performance of JSP Pages

 EL (Expression Language)

 Overview of JSTL (JSP Standard Tag Libraries)

 JSTL Core Library

 JSP Custom Tags

 JSP Java Tag Interface

 Custom Tag Attributes

 Multiple Tags Working Together

 File Upload Test Application

 Using Tomcat on CentOS Systems

 Using Tomcat on macOS Systems

 Connecting to SQL Server from Servlet

 Developing Web Applications with Servlet

 Archived Tutorials

 References

 Full Version in PDF/EPUB