|
Non ASCII Characters in HTML documents
Part:
1
2
3
4
(Continued from previous part...)
French Characters in HTML Documents - UTF-8 Encoding
Let's play to some French characters in UTF-8 encoding first.
1. On a Windows system, run Start > All Programs > Accessories > Notepad.
2. In Notepad, enter the following HTML document:
<html>
<!-- HelpUtf8French.html
Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<body>
<b>Help</b><br/>
English: System load is very high.<br/>
French: L'utilisation de système est très haute.<br/>
</body>
</html>
3. To enter the French character, "e with grave", you can run Start >
All Programs > System Tools > Character Map. Select "e with grave" on the character map.
Click the Select button, then the Copy button. Go back to your Notepad and click Ctrl-V
to paste "e wtih grave" into your HTML document.
4. Select menu File > Save as. Enter the file name as HelpUtf8French.html. Select "UTF-8"
in the Encoding field and click the Save button.
5. Copy HelpUtf8French.html to c:\inetpub\wwwroot. Make sure your Internet
Information Service is running the local default Web site.
6. Now run Internet Explorer (IE) with http://localhost/HelpUtf8French.html.
Your should see the French characters displayed correctly as shown below:
Help
English: System load is very high.
French: L'utilisation de système est très haute.
7. On the IE window, select menu View > Encoding. You should see UTF-8 is selected.
Note that how I specify the <meta> tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Another interesting thing you should know is about how Notepad stores a UTF-8 file. When you use UTF-8
encoding to store a file in Notepad, it will insert a UTF-8 marker (3 bytes) at the beginning of the file.
Use the "type" command in a command window, you will see this:
>type HelpUtf8French.html
n++<html>
<!-- HelpUtf8French.html
Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<body>
<b>Help</b><br/>
English: System load is very high.<br/>
French: L'utilisation de système est très haute.<br/>
</body>
</html>
The hex value of the UTF-8 marker is 0xEFBBBF. IIS server will send it the client. IE browser will not show it
on the page, but it will use it to detect the encoding schema, if needed. Not sure on how other browsers will behave
this marker.
French Characters in HTML Documents - ISO-8859-1 Encoding
Now we know how to make French characters working in a HTML document in UTF-8 encoding schema.
Next let's see how to use French characters in HTML documents in ISO-8859-1 encoding schema.
1. On a Windows system, run Start > All Programs > Accessories > Notepad.
2. In Notepad, enter the following HTML document:
<html>
<!-- HelpIsoFrench.html
Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<body>
<b>Help</b><br/>
English: System load is very high.<br/>
French: L'utilisation de système est très haute.<br/>
</body>
</html>
3. To enter the French character, "e with grave", you can run Start >
All Programs > System Tools > Character Map. Select "e with grave" on the character map.
Click the Select button, then the Copy button. Go back to your Notepad and click Ctrl-V
to paste "e wtih grave" into your HTML document.
4. Select menu File > Save as. Enter the file name as HelpIsoFrench.html. Select "ANSI"
in the Encoding field and click the Save button. Note that ANSI is encoding schema defined by Microsoft
and used on Windows system. ANSI contains more characters than ISO-8859-1. But it is compatible with ISO-8859-1.
5. Copy HelpIsoFrench.html to c:\inetpub\wwwroot. Make sure your Internet
Information Service is running the local default Web site.
6. Now run Internet Explorer (IE) with http://localhost/HelpIsoFrench.html.
Your should see the French characters displayed correctly as shown below:
Help
English: System load is very high.
French: L'utilisation de système est très haute.
7. On the IE window, select menu View > Encoding. You should see "Western European" is selected.
Again "Western European" is a different name for ISO-8859-1.
Easy to do, right? We could make it even easier. You can remove the <meta> tag setting in
HelpIsoFrench.html. French characters will still show up on the IE window. This is because ISO-8859-1
is the default encoding schema to IE.
(Continued on next part...)
Part:
1
2
3
4
|