Building Chinese Web Sites using PHP
Dr. Herong Yang, Version 2.11

UTF-8 Encoding Pages with GB18030 Characters

This section describes an error case where a UTF-8 encoding page contains GB18030 characters.

The most common errors occur on Chinese Web pages are some characters using encodings different than the page encoding setting. For example, a Web page is set with charset=utf-8. But some characters are entered in GB18030 encoding. In this case, those GB18030 characters will not be displayed correctly.

To show you this problem, I created this test Web page. The page is set with charset=utf-8 and most Chinese characters are entered in UTF-8 encoding. But some Chinese characters are entered in GB18030 encoding.

<html>
<!-- Hello-UTF-8-Error.html
   Copyright (c) 2007 by Dr. Herong Yang, http://www.herongyang.com/
-->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<body>
<b>Chinese characters in UTF-8</b><br/>
Simplified characters: 简体中文网页<br/>
Traditional characters: 繁體中文網頁<br/>
<br/>
<b>Error: GB13080 characters included in a UTF-8 page</b><br/>
Simplified characters: ??????<br/>
</body>
</html>

As expected, this Web page, http://localhost/Hello-UTF-8-Error.html, does not display those GB18030 characters correctly:
Chinese Web Page using UTF-8 with GB18030 Characters

Sections in This Chapter

Chinese Character Set Encoding Options

HTML Document Travel Path

Chinese Web Pages with UTF-8 Encoding

Chinese Web Pages with GB18030 Encoding

Chinese Web Pages with Big5 Encoding

UTF-8 Encoding Pages with GB18030 Characters

Dr. Herong Yang, updated in 2007
UTF-8 Encoding Pages with GB18030 Characters