Building Chinese Web Sites using PHP
Dr. Herong Yang, Version 2.11

Non-ASCII Test Result Analysis

This section provides a detailed analysis of the test results of a PHP script storing non-ASCII characters.

In previous sections, we reviewed MySQL's character set encoding conversions happened between client programs and MySQL server. Now I want to look again at the PHP test script that stores non-ASCII characters to Comment_Mixed table's text columns defined with different encodings.

<?php #MySQL-Insert-Non-ASCII-Review.php
# Copyright (c) 2007 by Dr. Herong Yang, http://www.herongyang.com/
#
  $con = mysql_connect("localhost", "Herong", "TopSecret");
  $ok = mysql_select_db("HerongDB", $con);
  $test_name = "Non-ASCII Test";

# Show character set encoding variables
  $sql = "SHOW VARIABLES LIKE 'character_set_%'";
  $res = mysql_query($sql, $con);
  while ($row = mysql_fetch_array($res)) {
    print($row['Variable_name']." = ".$row['Value']."\n");
  }  
  mysql_free_result($res);

# Delete the record
  $sql = "DELETE FROM Comment_Mixed WHERE Test_Name ='$test_name'";
  mysql_query($sql, $con);
  print("\nNumber of rows deleted: ".mysql_affected_rows()."\n");

# Build the SQL INSERT statement
  $sql = <<<END_OF_MESSAGE
INSERT INTO Comment_Mixed (Test_name, String_ASCII, 
    String_Latin1, String_UTF8, String_GBK, String_Big5)
  VALUES ('$test_name', 'Television', 
    X'54E96CE9766973696E6F', 
    X'E794B5E8A786E69CBA2FE99BBBE8A696E6A99F', 
    X'B5E7CAD3BBFA', 
    X'B971B5F8BEF7');
END_OF_MESSAGE;

# Run the SQL statement
  if (mysql_query($sql, $con)) {
    print("\nNumber of rows inserted: ".mysql_affected_rows()."\n");
  } else {
    print("SQL statement failed.\n");
    print(mysql_errno($con).": ".mysql_error($con)."\n"); 
  }

# Get the recod back
  $sql = "SELECT * FROM Comment_Mixed"
    . " WHERE Test_Name = '$test_name'";
  $res = mysql_query($sql, $con);
  if ($row = mysql_fetch_array($res)) {
    print("\nTest Name = ".$row['Test_Name']."\n");
    print("   String_ASCII: 0x".bin2hex($row['String_ASCII'])."\n");
    print("   String_Latin1: 0x".bin2hex($row['String_Latin1'])."\n");
    print("   String_UTF8: 0x".bin2hex($row['String_UTF8'])."\n");
    print("   String_GBK: 0x".bin2hex($row['String_GBK'])."\n");
    print("   String_Big5: 0x".bin2hex($row['String_Big5'])."\n");
  }  
  mysql_free_result($res);

  mysql_close($con); 
?>

The test PHP script was modified to show the current character set settings. Here is the output:

C:\>\local\php\php MySQL-Insert-Non-ASCII-Review.php
character_set_client = latin1
character_set_connection = latin1
character_set_database = latin1
character_set_filesystem = binary
character_set_results = latin1
character_set_server = latin1
character_set_system = utf8
character_sets_dir = \local\mysql\share\charsets\

Number of rows deleted: 1

Number of rows inserted: 1

Test Name = Non-ASCII Test
   String_ASCII: 0x54656c65766973696f6e
   String_Latin1: 0x54e96ce9766973696e6f
   String_UTF8: 0x3f3f3f2f3f3f3f
   String_GBK: 0x3f3f3f
   String_Big5: 0x3f3f3f

We know that the test failed on 3 columns: String_UTF8, String_GBK, and String_Big5. Based on MySQL character set variable settings, the INSERT statement was executed successfully and non-ASCII characters (represented as HEX values) were stored correctly. Here is my explanation:

  • The INSERT statement was created with pure ASCII characters. All non-ASCII characters were included in the SQL statement as HEX numbers representing their encoded bytes.
  • MySQL did not perform any conversion on the INSERT statement from "character_set_client" encoding to "character_set_connection" encoding, because they are identical.
  • MySQL executed the INSERT statement. HEX values of encoded byte strings were stored directly to different text columns. No conversion should happen to any input data included in the INSERT statement.

But when the SELECT statement was executed, non-ASCII characters fetched from those non-ASCII columns were not properly transferred from MySQL server to PHP. Remember that all text data fetched from database columns must be converted to "character_set_results" encoding, latin1. That means:

  • ASCII characters fetched from String_ASCII column were converted to Latin1. No problem. This is why the test shows correct output for String_ASCII.
  • Latin1 characters fetched from String_Latin1 column were converted to Latin1. No problem. This is why the test shows correct output for String_Latin1.
  • UTF-8 characters fetched from String_UTF8 column were converted to Latin1. Big problem. This is why the test shows incorrect output for String_UTF8.
  • GBK characters fetched from String_GBK column were converted to Latin1. Big problem. This is why the test shows incorrect output for String_GBK.
  • Big5 characters fetched from String_Big5 column were converted to Latin1. Big problem. This is why the test shows correct output for String_Big5.

Sections in This Chapter

Specifying Character Set for Text Columns

Creating a Table with Multiple Character Sets

Checking Character Set Setting

Storing ASCII Characters in Non-ASCII Columns

Storing Non-ASCII Characters with Encoded Bytes

Transmitting Non-ASCII Characters between PHP and MySQL

Viewing Character Set Variables

Non-ASCII Test Result Analysis

Fetching Non-ASCII Text from MySQL

Dr. Herong Yang, updated in 2007
Non-ASCII Test Result Analysis