|
Non ASCII Characters in HTML documents
Part:
1
2
3
4
(Continued from previous part...)
Characters of Multiple Languages in HTML Documents
After going through the above examples, you should feel comfortable now on how to handle
non-ASCII characters of any single language. You have a choice of using UTF-8 or a language
specific encoding.
If you want to have characters of multiple languages in a single HTML document, then you
have to use UTF-8 encoding. Here are the steps you can follow make a HTML document in UTF-8
for a number of languages.
1. On a Windows system, run Start > All Programs > Accessories > Notepad.
2. In Notepad, enter the following HTML document:
<html>
<!-- HelloUtf8MultiLanguages.html
Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<body>
<b>Test</b><br/>
English: Hello world!<br/>
Spanish: ¡Hola mundo!<br/>
Korean: ???? ?? !<br/>
Chinese: ????!<br/>
</body>
</html>
Again, you will some "?" in the above source in this book. This is because my book is using ISO-8859-1
encoding.
3. Don't try to enter those hello messages yourself. Go to the Google language tool site,
http://www.google.com/language_tools. You can enter "Hello world!" and translate it to other languages.
On the translation output page, just copy those translations and paste them back to Notepad.
This should cause no corruption, because Google site, Windows IE, and Notepad all support UTF-8.
4. Select menu File > Save as. Enter the file name as HelloUtf8MultiLanguages.html. Select "UTF-8"
in the Encoding field and click the Save button.
5. Copy HelloUtf8MultiLanguages.html to c:\inetpub\wwwroot. Make sure your Internet
Information Service is running the local default Web site.
6. Now run Internet Explorer (IE) with http://localhost/HelloUtf8MultiLanguages.html.
Your should see all characters displayed correctly.
7. On the IE window, select menu View > Encoding. You should see UTF-8 is selected.
Conclusion
- You can only use one encoding schema in a HTML document. You should a <meta> tag
to specify the encoding name.
- Entering non ASCII characters into a HTML document with the desired encoding is a challenge.
If you are not sure on what encoding used by the editor to store HTML document, open the
document with another editor to validate.
- Use UTF-8 as the HTML document encoding instead of encodings of a particular
local language, like GB2312. This may cause problems for users on local systems
where Unicode fonts are not supported. But more and more local systems are supporting
Unicode and UTF-8 encoding.
Part:
1
2
3
4
|