This section describes how to set character_set_results properly to retrieve non-ASCII character strings from MySQL server.
Based on the analysis provided in the previous section, I revised my PHP script to set character_set_results to utf8.
Hope this will allow me to fetch non-ASCII characters correctly from the String_UTF8 column:
<?php #MySQL-Fetch-Non-ASCII-UTF8.php
# Copyright (c) 2007 by Dr. Herong Yang, http://www.herongyang.com/
#
$con = mysql_connect("localhost", "Herong", "TopSecret");
$ok = mysql_select_db("HerongDB", $con);
$test_name = "Non-ASCII Test";
# Set character_set_results
$sql = "SET character_set_results=utf8";
mysql_query($sql, $con);
# Show character set encoding variables
$sql = "SHOW VARIABLES LIKE 'character_set_%'";
$res = mysql_query($sql, $con);
while ($row = mysql_fetch_array($res)) {
print($row['Variable_name']." = ".$row['Value']."\n");
}
mysql_free_result($res);
# Delete the record
$sql = "DELETE FROM Comment_Mixed WHERE Test_Name ='$test_name'";
mysql_query($sql, $con);
print("\nNumber of rows deleted: ".mysql_affected_rows()."\n");
# Build the SQL INSERT statement
$sql = <<<END_OF_MESSAGE
INSERT INTO Comment_Mixed (Test_name, String_ASCII,
String_Latin1, String_UTF8, String_GBK, String_Big5)
VALUES ('$test_name', 'Television',
X'54E96CE9766973696E6F',
X'E794B5E8A786E69CBA2FE99BBBE8A696E6A99F',
X'B5E7CAD3BBFA',
X'B971B5F8BEF7');
END_OF_MESSAGE;
# Run the SQL statement
if (mysql_query($sql, $con)) {
print("\nNumber of rows inserted: ".mysql_affected_rows()."\n");
} else {
print("SQL statement failed.\n");
print(mysql_errno($con).": ".mysql_error($con)."\n");
}
# Get the recod back
$sql = "SELECT * FROM Comment_Mixed"
. " WHERE Test_Name = '$test_name'";
$res = mysql_query($sql, $con);
if ($row = mysql_fetch_array($res)) {
print("\nTest Name = ".$row['Test_Name']."\n");
print(" String_ASCII: 0x".bin2hex($row['String_ASCII'])."\n");
print(" String_Latin1: 0x".bin2hex($row['String_Latin1'])."\n");
print(" String_UTF8: 0x".bin2hex($row['String_UTF8'])."\n");
print(" String_GBK: 0x".bin2hex($row['String_GBK'])."\n");
print(" String_Big5: 0x".bin2hex($row['String_Big5'])."\n");
}
mysql_free_result($res);
mysql_close($con);
?>
The output confirms that character_set_results=utf8 helped to stop the wrong conversion on non-ASCII characters fetched from the String_UTF8 column:
C:\>\local\php\php MySQL-Fetch-Non-ASCII-UTF8.php
character_set_client = latin1
character_set_connection = latin1
character_set_database = latin1
character_set_filesystem = binary
character_set_results = utf8
character_set_server = latin1
character_set_system = utf8
character_sets_dir = \local\mysql\share\charsets\
Number of rows deleted: 1
Number of rows inserted: 1
Test Name = Non-ASCII Test
String_ASCII: 0x54656c65766973696f6e
String_Latin1: 0x54c3a96cc3a9766973696e6f
String_UTF8: 0xe794b5e8a786e69cba2fe99bbbe8a696e6a99f
String_GBK: 0xe794b5e8a786e69cba
String_Big5: 0xe99bbbe8a696e6a99f
However, other encoding conversions were performed on other non-ASCII columns:
ASCII characters fetched from String_ASCII column were converted to UTF8. No problem.
Latin1 characters fetched from String_Latin1 column were converted to UTF8. MySQL did a real conversion here.
Notice that the French character 0xE9 were converted to 0xC3A9.
UTF-8 characters fetched from String_UTF8 column were converted to UTF8. Perfect match. No conversion.
GBK characters fetched from String_GBK column were converted to UTF8. MySQL did a real conversion and did it correctly.
Notice that the GBK string 0xB5E7CAD3BBFA was converted to 0xE794B5E8A786E69CBA, which match the UTF-8 string I entered for the String_UTF8 column.
Big5 characters fetched from String_Big5 column were converted to UTF8. MySQL did a real conversion and did it correctly.
Notice that the Big5 string 0xB971B5F8BEF7 was converted to 0xE99BBBE8A696E6A99F, which match the UTF-8 string I entered for the String_UTF8 column.
My conclusion is that:
If you are working with a single character set encoding, set character_set_results to that encoding.
If you are working with multiple character set encodings, set character_set_results to the UTF-8, which is bigger than othe encodings.
MySQL will convert fetched text from the encoding of the original column to UTF-8 correctly for you.