Building Chinese Web Sites using PHP
Dr. Herong Yang, Version 2.11

UTF-8 Encoding Pages with Big5 Characters

This section describes an error case where a UTF-8 encoding page contains Big5 character strings.

The most common errors occur on Chinese Web pages generated from PHP scripts are some character strings using encodings different than the page encoding setting. For example, a PHP sets the output Web page with charset=utf-8. But some character strings are entered in Big5 encoding. In this case, those Big5 characters will not be displayed correctly.

To show you this problem, I created this test PHP script. The output Web page is set with charset=utf-8 and most Chinese characters are entered in UTF-8 encoding. But some Chinese characters are entered in Big5 encoding.

<?php #String-UTF-8-Error.php
# Copyright (c) 2007 by Dr. Herong Yang, http://www.herongyang.com/
#
  $help_simplified = '这是一份非常间单的说明书…';
  $help_tradition = '這是一份非常間單的說明書…';
  $help_big5 = '?????????????';
  print('<html>');
  print('<meta http-equiv="Content-Type"'.
    ' content="text/html; charset=utf-8"/>');
  print('<body>');
  print('<b>Chinese string in UTF-8 in PHP</b><br/>');
  print($help_simplified.'<br/>');
  print($help_tradition.'<br/>');
  print('<b>Big5 string included in a UTF-8 page</b><br/>');
  print($help_big5.'<br/>');
  print('</body>');
  print('</html>');
?>

As expected, this Web page, http://localhost/String-UTF-8-Error.html, does not display those Big5 characters correctly:
Chinese Web Page using UTF-8 with Big5 Characters

Sections in This Chapter

String Data Type, Literals and Functions

String Literal Travel Path

Chinese Character String with UTF-8 Encoding

Chinese Character String with GB18030 Encoding

Chinese Character String with Big5 Encoding

UTF-8 Encoding Pages with Big5 Characters

Dr. Herong Yang, updated in 2007
UTF-8 Encoding Pages with Big5 Characters