PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Non ASCII Characters in HTML documents

Part:   1  2  3  4 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

(Continued from previous part...)

Characters of Multiple Languages in HTML Documents

After going through the above examples, you should feel comfortable now on how to handle non-ASCII characters of any single language. You have a choice of using UTF-8 or a language specific encoding.

If you want to have characters of multiple languages in a single HTML document, then you have to use UTF-8 encoding. Here are the steps you can follow make a HTML document in UTF-8 for a number of languages.

1. On a Windows system, run Start > All Programs > Accessories > Notepad.

2. In Notepad, enter the following HTML document:

<html>
<!-- HelloUtf8MultiLanguages.html
   Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<body>
<b>Test</b><br/>
English: Hello world!<br/>
Spanish: ¡Hola mundo!<br/>
Korean: ???? ?? !<br/>
Chinese: ????!<br/>
</body>
</html>

Again, you will some "?" in the above source in this book. This is because my book is using ISO-8859-1 encoding.

3. Don't try to enter those hello messages yourself. Go to the Google language tool site, http://www.google.com/language_tools. You can enter "Hello world!" and translate it to other languages. On the translation output page, just copy those translations and paste them back to Notepad. This should cause no corruption, because Google site, Windows IE, and Notepad all support UTF-8.

4. Select menu File > Save as. Enter the file name as HelloUtf8MultiLanguages.html. Select "UTF-8" in the Encoding field and click the Save button.

5. Copy HelloUtf8MultiLanguages.html to c:\inetpub\wwwroot. Make sure your Internet Information Service is running the local default Web site.

6. Now run Internet Explorer (IE) with http://localhost/HelloUtf8MultiLanguages.html. Your should see all characters displayed correctly.

7. On the IE window, select menu View > Encoding. You should see UTF-8 is selected.

Conclusion

  • You can only use one encoding schema in a HTML document. You should a <meta> tag to specify the encoding name.
  • Entering non ASCII characters into a HTML document with the desired encoding is a challenge. If you are not sure on what encoding used by the editor to store HTML document, open the document with another editor to validate.
  • Use UTF-8 as the HTML document encoding instead of encodings of a particular local language, like GB2312. This may cause problems for users on local systems where Unicode fonts are not supported. But more and more local systems are supporting Unicode and UTF-8 encoding.

Part:   1  2  3  4 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Non ASCII Characters in HTML documents