Building Chinese Web Sites Using PHP - Version 2.20, by Dr. Herong Yang
UTF-8 Encoding Pages with GB18030 Characters
This section describes an error case where a UTF-8 encoding page contains GB18030 characters.
The most common errors occur on Chinese Web pages are some characters using encodings different than the page encoding setting. For example, a Web page is set with charset=utf-8. But some characters are entered in GB18030 encoding. In this case, those GB18030 characters will not be displayed correctly.
To show you this problem, I created this test Web page. The page is set with charset=utf-8 and most Chinese characters are entered in UTF-8 encoding. But some Chinese characters are entered in GB18030 encoding.
<html> <!-- Hello-UTF-8-Error.html #- Copyright (c) 2005 HerongYang.com, All Rights Reserved. --> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <body> <b>Chinese characters in UTF-8</b><br/> Simplified characters: 简体中文网页<br/> Traditional characters: 繁體中文網頁<br/> <br/> <b>Error: GB13080 characters included in a UTF-8 page</b><br/> Simplified characters: ??????<br/> </body> </html>
As expected, this Web page, http://localhost/Hello-UTF-8-Error.html, does not display those GB18030 characters correctly:
Last update: 2015.
Table of Contents