This section provides a detailed analysis of the test results of a PHP script storing non-ASCII characters.
In previous sections, we reviewed MySQL's character set encoding conversions happened between client programs and MySQL server.
Now I want to look again at the PHP test script that stores non-ASCII characters to Comment_Mixed table's text columns defined with
different encodings.
<?php #MySQL-Insert-Non-ASCII-Review.php
# Copyright (c) 2007 by Dr. Herong Yang, http://www.herongyang.com/
#
$con = mysql_connect("localhost", "Herong", "TopSecret");
$ok = mysql_select_db("HerongDB", $con);
$test_name = "Non-ASCII Test";
# Show character set encoding variables
$sql = "SHOW VARIABLES LIKE 'character_set_%'";
$res = mysql_query($sql, $con);
while ($row = mysql_fetch_array($res)) {
print($row['Variable_name']." = ".$row['Value']."\n");
}
mysql_free_result($res);
# Delete the record
$sql = "DELETE FROM Comment_Mixed WHERE Test_Name ='$test_name'";
mysql_query($sql, $con);
print("\nNumber of rows deleted: ".mysql_affected_rows()."\n");
# Build the SQL INSERT statement
$sql = <<<END_OF_MESSAGE
INSERT INTO Comment_Mixed (Test_name, String_ASCII,
String_Latin1, String_UTF8, String_GBK, String_Big5)
VALUES ('$test_name', 'Television',
X'54E96CE9766973696E6F',
X'E794B5E8A786E69CBA2FE99BBBE8A696E6A99F',
X'B5E7CAD3BBFA',
X'B971B5F8BEF7');
END_OF_MESSAGE;
# Run the SQL statement
if (mysql_query($sql, $con)) {
print("\nNumber of rows inserted: ".mysql_affected_rows()."\n");
} else {
print("SQL statement failed.\n");
print(mysql_errno($con).": ".mysql_error($con)."\n");
}
# Get the recod back
$sql = "SELECT * FROM Comment_Mixed"
. " WHERE Test_Name = '$test_name'";
$res = mysql_query($sql, $con);
if ($row = mysql_fetch_array($res)) {
print("\nTest Name = ".$row['Test_Name']."\n");
print(" String_ASCII: 0x".bin2hex($row['String_ASCII'])."\n");
print(" String_Latin1: 0x".bin2hex($row['String_Latin1'])."\n");
print(" String_UTF8: 0x".bin2hex($row['String_UTF8'])."\n");
print(" String_GBK: 0x".bin2hex($row['String_GBK'])."\n");
print(" String_Big5: 0x".bin2hex($row['String_Big5'])."\n");
}
mysql_free_result($res);
mysql_close($con);
?>
The test PHP script was modified to show the current character set settings.
Here is the output:
C:\>\local\php\php MySQL-Insert-Non-ASCII-Review.php
character_set_client = latin1
character_set_connection = latin1
character_set_database = latin1
character_set_filesystem = binary
character_set_results = latin1
character_set_server = latin1
character_set_system = utf8
character_sets_dir = \local\mysql\share\charsets\
Number of rows deleted: 1
Number of rows inserted: 1
Test Name = Non-ASCII Test
String_ASCII: 0x54656c65766973696f6e
String_Latin1: 0x54e96ce9766973696e6f
String_UTF8: 0x3f3f3f2f3f3f3f
String_GBK: 0x3f3f3f
String_Big5: 0x3f3f3f
We know that the test failed on 3 columns: String_UTF8, String_GBK, and String_Big5.
Based on MySQL character set variable settings, the INSERT statement was executed successfully
and non-ASCII characters (represented as HEX values) were stored correctly. Here is my explanation:
The INSERT statement was created with pure ASCII characters. All non-ASCII characters were included in the SQL statement as HEX numbers
representing their encoded bytes.
MySQL did not perform any conversion on the INSERT statement from "character_set_client" encoding to "character_set_connection" encoding,
because they are identical.
MySQL executed the INSERT statement. HEX values of encoded byte strings were stored directly to different text columns.
No conversion should happen to any input data included in the INSERT statement.
But when the SELECT statement was executed, non-ASCII characters fetched from those non-ASCII columns were not
properly transferred from MySQL server to PHP. Remember that all text data fetched from database columns must be converted
to "character_set_results" encoding, latin1. That means:
ASCII characters fetched from String_ASCII column were converted to Latin1. No problem. This is why the test shows correct output for String_ASCII.
Latin1 characters fetched from String_Latin1 column were converted to Latin1. No problem. This is why the test shows correct output for String_Latin1.
UTF-8 characters fetched from String_UTF8 column were converted to Latin1. Big problem. This is why the test shows incorrect output for String_UTF8.
GBK characters fetched from String_GBK column were converted to Latin1. Big problem. This is why the test shows incorrect output for String_GBK.
Big5 characters fetched from String_Big5 column were converted to Latin1. Big problem. This is why the test shows correct output for String_Big5.