PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Non ASCII Characters in HTML documents

Part:   1  2  3  4 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

(Continued from previous part...)

Chinese Characters in HTML Documents - UTF-8 Encoding

Now let's play with Chinese characters. They are definitely harder to work with than French characters. My first example shows you how to handle Chinese characters in HTML documents with UTF-8 encoding.

1. On a Windows system, run Start > All Programs > Accessories > Notepad.

2. In Notepad, enter the following HTML document:

<html>
<!-- HelpUtf8Chinese.html
   Copyright (c) 2002 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<body>
<b>??</b><br/>
????????????<br/>
</body>
</html>

When I copy this HTML document into this book, I have to replace all UTF-8 encoded Chinese characters with "?"s, because my book is written in HTML document with the default encoding schema. To follow my tutorial, just enter any Chinese character whenever you see "?".

3. As I mentioned earlier in this book, entering Chinese characters is not an easy job. You need to use a Chinese Windows system, or a Chinese input tool on a non-Chinese Windows system. If you don't have any Chinese input tool, you can simply go to the Yahoo Chinese Web site, http://www.yahoo.com.cn/, copy some Chinese characters, and paste them into Notepad. The Yahoo Chinese Web site is encoded in UTF-8.

4. Select menu File > Save as. Enter the file name as HelpUtf8Chinese.html. Select "UTF-8" in the Encoding field and click the Save button.

5. Copy HelpUtf8Chinese.html to c:\inetpub\wwwroot. Make sure your Internet Information Service is running the local default Web site.

6. Now run Internet Explorer (IE) with http://localhost/HelpUtf8Chinese.html. Your should see the Chinese characters displayed correctly.

7. On the IE window, select menu View > Encoding. You should see UTF-8 is selected.

Chinese Characters in HTML Documents - GB2312 Encoding

We are ready to test Chinese characters in HTML documents with GB2312 encoding schema.

1. This time, we can not use Notepad, because Notepad is not compatible with GB2312 encoding. It will actually convert GB2312 encoding to UTF-8 encoding. So don't use Notepad.

You need to go get another text editor, like Jext, to help you enter the Chinese characters in GB2312 encoding.

2. In a good text editor, enter the following HTML document:

<html>
<!-- HelpGb2312Chinese.html
   Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=gb2312"/>
<body>
<b>说明</b><br/>
这是一份非常间单的说明书…<br/>
</body>
</html>

Be careful, when you read the above code in this book, Chinese characters may not be displayed correctly. The reason is again that my book is written in ISO-8859-1 encoding.

3. Entering Chinese characters in GB2312 encoding also requires some Chinese input tools. If you don't have any Chinese input tool, you can simply go to my GB2312 page, http://www.herongyang.com/gb2312_gb/, open the source code of the page, copy some Chinese characters, and paste them into the editor. My GB2312 page is encoded in GB2312. Warning, do not copy Chinese characters from the IE browser window. The browser window copy function is assuming UTF-8 encoding and will corrupt the copied characters.

4. Select menu File > Save as. Enter the file name as HelpGb2312Chinese.html and click the Save button.

5. Copy HelpGb2312Chinese.html to c:\inetpub\wwwroot. Make sure your Internet Information Service is running the local default Web site.

6. Now run Internet Explorer (IE) with http://localhost/HelpGb2312Chinese.html. Your should see the Chinese characters displayed correctly.

7. On the IE window, select menu View > Encoding. You should see Gb2312 is selected.

Still not hard to do, right? The key point is to use an editor that compatible with GB2312.

(Continued on next part...)

Part:   1  2  3  4 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Non ASCII Characters in HTML documents