Chinese Web Sites Using PHP - v2.23, by Herong Yang
MySQL Encoding Issue Example
This section provides an example of MySQL server encoding issue with Chinese characters, when incorrectly encoded string is stored in database and wrong character_set_* settings are used.
Here is a real example of MySQL encoding issue I have encountered recently. Take a look at the following query statement used in a PHP script and its output:
SELECT Code FROM Dictionary WHERE Code = '鸡' Code: (踡)
Why does MySQL server match and return a Chinese character that is different than the character given in the query statement?
To find out the root cause of the problem, I modified the query statement as shown below:
SELECT Code, HEX(Code) AS CodeHex, HEX('踡') AS OutHex, HEX('鸡') AS InHex FROM Dictionary WHERE Code = '鸡' Code: (踡) CodeHex: (C3A8C2B8C2A1) OutHex: (E8B8A1) InHex: (E9B8A1)
Based on my knowledge to Chinese character encodings, I can see what really happened when the above statement was executed.
1. Since Hex string 0xC3A8C2B8C2A1 from the database was able to be displayed as correct Chinese character of 踡, I can assume that the following:
2. Since the condition "Code = '鸡'" results to "True", I can also assume that the following:
To validate my assumptions, I wrote the following simulation PHP script.
<?php #- MySQL-Encoding-Issue.php #- Copyright (c) 2005 HerongYang.com. All Rights Reserved. mb_internal_encoding("utf8"); $con = mysqli_connect('127.0.0.1', 'test', 'test', 'test'); mysqli_query($con, "SET character_set_results=latin1"); mysqli_query($con, "SET character_set_connection=latin1"); mysqli_query($con, "SET character_set_client=latin1"); $res = mysqli_query($con, "SHOW VARIABLES LIKE 'character_set_%'"); while ($row = mysqli_fetch_array($res)) { print($row['Variable_name'].": ".$row['Value']."\n"); } mysqli_free_result($res); mysqli_query($con, "DROP TABLE Dictionary"); mysqli_query($con, "CREATE TABLE Dictionary (Code VARCHAR(64)) DEFAULT CHARSET=utf8"); mysqli_query($con, "INSERT INTO Dictionary (Code) VALUES (X'C3A8C2B8C2A1')"); $sql = "SELECT Code, HEX(Code) AS CodeHex," . " HEX('踡') AS OutHex, HEX('鸡') AS InHex" . " FROM Dictionary WHERE Code = X'C3A9C2B8C2A1'"; $res = mysqli_query($con, $sql); $row = mysqli_fetch_array($res); print("SQL: (".$sql.")\n"); print("Code: (".$row['Code'].")\n"); print("CodeHex: (".$row['CodeHex'].")\n"); print("OutHex: (".$row['OutHex'].")\n"); print("InHex: (".$row['InHex'].")\n"); mysqli_free_result($res); $sql = "SELECT IF(CONVERT(X'C3A8C2B8C2A1' USING utf8)" . " = CONVERT(X'C3A9C2B8C2A1' USING utf8), 'True', 'False')" . " AS Comparison"; $res = mysqli_query($con, $sql); $row = mysqli_fetch_array($res); print("SQL: (".$sql.")\n"); print("Comparison: (".$row['Comparison'].")\n"); mysqli_free_result($res); mysqli_close($con); ?>
You can try the above PHP script on your MySQL server. And you will get similar output shown below:
herong$ php MySQL-Encoding-Issue.php character_set_client: latin1 character_set_connection: latin1 character_set_database: utf8mb4 character_set_filesystem: binary character_set_results: latin1 character_set_server: utf8 character_set_system: utf8 character_sets_dir: /usr/local/mysql-8.0.17-.../share/charsets/ SQL: (SELECT Code, HEX(Code) AS CodeHex, HEX('踡') AS OutHex, HEX('鸡') AS InHex FROM Dictionary WHERE Code = X'C3A9C2B8C2A1') Code: (踡) CodeHex: (C3A8C2B8C2A1) OutHex: (E8B8A1) InHex: (E9B8A1) SQL: (SELECT IF(CONVERT(X'C3A8C2B8C2A1' USING utf8) = CONVERT(X'C3A9C2B8C2A1' USING utf8), 'True', 'False') AS Comparison) Comparison: (True)
Obviously, the issue can be resolved by:
However, one thing that I will don't understand today is why MySQL server evaluate Boolean expression of "CONVERT(X'C3A8C2B8C2A1' USING utf8) = CONVERT(X'C3A9C2B8C2A1' USING utf8)" to true. Please share it with me, if you know the answer.
Table of Contents
PHP Installation on Windows Systems
Integrating PHP with Apache Web Server
charset="*" - Encodings on Chinese Web Pages
Chinese Characters in PHP String Literals
Multibyte String Functions in UTF-8 Encoding
Input Text Data from Web Forms
Input Chinese Text Data from Web Forms
MySQL - Installation on Windows
MySQL - Connecting PHP to Database
►MySQL - Character Set and Encoding
Specifying Character Set for Text Columns
Creating a Table with Multiple Character Sets
Checking Character Set Setting
Storing ASCII Characters in Non-ASCII Columns
Storing Non-ASCII Characters with Encoded Bytes
Transmitting Non-ASCII Characters between PHP and MySQL
Viewing Character Set Variables
Non-ASCII Test Result Analysis
Fetching Non-ASCII Text from MySQL
MySQL - Sending Non-ASCII Text to MySQL
Retrieving Chinese Text from Database to Web Pages
Input Chinese Text Data to MySQL Database