UTF-8 Encoding Pages with GB18030 Characters

This section describes an error case where a UTF-8 encoding page contains GB18030 characters.

The most common errors occur on Chinese Web pages are some characters using encodings different than the page encoding setting. For example, a Web page is set with charset=utf-8. But some characters are entered in GB18030 encoding. In this case, those GB18030 characters will not be displayed correctly.

To show you this problem, I created this test Web page. The page is set with charset=utf-8 and most Chinese characters are entered in UTF-8 encoding. But some Chinese characters are entered in GB18030 encoding.

<!-- Hello-UTF-8-Error.html
#- Copyright (c) 2015, HerongYang.com, All Rights Reserved.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<b>Chinese characters in UTF-8</b><br/>
Simplified characters: 简体中文网页<br/>
Traditional characters: 繁體中文網頁<br/>
<b>Error: GB13080 characters included in a UTF-8 page</b><br/>
Simplified characters: ??????<br/>

As expected, this Web page, http://localhost/Hello-UTF-8-Error.html, does not display those GB18030 characters correctly:
Chinese Web Page using UTF-8 with GB18030 Characters

Last update: 2015.

