Building Chinese Web Sites using PHP
Dr. Herong Yang, Version 2.11

Chinese Character Set Encoding Options

This section providing information on character set encoding options for Chinese Web pages: UTF-8, GB, and Big5

By default, Web servers and Web browsers use ISO-8859-1 (Latin1) encoding to process Web pages. ISO-8859-1 is also the default encoding for many desktop software and is good for many western languages. But ISO-8859-1 is not good enough to support Chinese characters.

To support Chinese characters in Web pages, we have 3 options:

1. Create and serve the static page with Chinese text with UTF-8 encoding. Good for both simplified and traditional Chinese characters. The Web page must contain a <meta> tag to info the browser to use UTF-8 encoding:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

2. Create and serve the static page with Chinese text with GB encoding. Mainly for simplified Chinese characters. The Web page must contain a <meta> tag to info the browser to use GB encoding:

<meta http-equiv="Content-Type" content="text/html; charset=gb18030"/>

3. Create and serve the static page with Chinese text with Big5 encoding. Mainly for traditional Chinese characters. The Web page must contain a <meta> tag to info the browser to use Big5 encoding:

<meta http-equiv="Content-Type" content="text/html; charset=big5"/>

I strongly suggest you to use UTF-8 encoding for your Chinese static text pages. If you have old Web pages written in GB of Big5 encoding, you can easily convert them into UTF-8 encoding.

Sections in This Chapter

Chinese Character Set Encoding Options

HTML Document Travel Path

Chinese Web Pages with UTF-8 Encoding

Chinese Web Pages with GB18030 Encoding

Chinese Web Pages with Big5 Encoding

UTF-8 Encoding Pages with GB18030 Characters

Dr. Herong Yang, updated in 2007
Chinese Character Set Encoding Options