MySQL Encoding Issue Example

This section provides an example of MySQL server encoding issue with Chinese characters, when incorrectly encoded string is stored in database and wrong character_set_* settings are used.

Here is a real example of MySQL encoding issue I have encountered recently. Take a look at the following query statement used in a PHP script and its output:

SELECT Code FROM Dictionary WHERE Code = ''

Code: ()

Why does MySQL server match and return a Chinese character that is different than the character given in the query statement?

To find out the root cause of the problem, I modified the query statement as shown below:

SELECT Code, HEX(Code) AS CodeHex, HEX('') AS OutHex, HEX('') AS InHex 
  FROM Dictionary WHERE Code = ''

Code: ()
CodeHex: (C3A8C2B8C2A1)
OutHex: (E8B8A1)
InHex: (E9B8A1)

Based on my knowledge to Chinese character encodings, I can see what really happened when the above statement was executed.

1. Since Hex string 0xC3A8C2B8C2A1 from the database was able to be displayed as correct Chinese character of , I can assume that the following:

2. Since the condition "Code = ''" results to "True", I can also assume that the following:

To validate my assumptions, I wrote the following simulation PHP script.

<?php 
#- MySQL-Encoding-Issue.php
#- Copyright (c) 2005 HerongYang.com. All Rights Reserved.
  mb_internal_encoding("utf8");
  $con = mysqli_connect('127.0.0.1', 'test', 'test', 'test');

  mysqli_query($con, "SET character_set_results=latin1");
  mysqli_query($con, "SET character_set_connection=latin1");
  mysqli_query($con, "SET character_set_client=latin1");

  $res = mysqli_query($con, "SHOW VARIABLES LIKE 'character_set_%'");
  while ($row = mysqli_fetch_array($res)) {
    print($row['Variable_name'].": ".$row['Value']."\n");
  }
  mysqli_free_result($res);

  mysqli_query($con, "DROP TABLE Dictionary");
  mysqli_query($con, 
    "CREATE TABLE Dictionary (Code VARCHAR(64)) DEFAULT CHARSET=utf8");
  mysqli_query($con, 
    "INSERT INTO Dictionary (Code) VALUES (X'C3A8C2B8C2A1')");

  $sql = "SELECT Code, HEX(Code) AS CodeHex,"
    . " HEX('') AS OutHex, HEX('') AS InHex" 
    . " FROM Dictionary WHERE Code = X'C3A9C2B8C2A1'";
  $res = mysqli_query($con, $sql);
  $row = mysqli_fetch_array($res);
  print("SQL: (".$sql.")\n");
  print("Code: (".$row['Code'].")\n");
  print("CodeHex: (".$row['CodeHex'].")\n");
  print("OutHex: (".$row['OutHex'].")\n");
  print("InHex: (".$row['InHex'].")\n");
  mysqli_free_result($res);

  $sql = "SELECT IF(CONVERT(X'C3A8C2B8C2A1' USING utf8)"
    . " = CONVERT(X'C3A9C2B8C2A1' USING utf8), 'True', 'False')"
    . " AS Comparison";
  $res = mysqli_query($con, $sql);
  $row = mysqli_fetch_array($res);
  print("SQL: (".$sql.")\n");
  print("Comparison: (".$row['Comparison'].")\n");
  mysqli_free_result($res);

  mysqli_close($con);
?>

You can try the above PHP script on your MySQL server. And you will get similar output shown below:

herong$ php MySQL-Encoding-Issue.php 

character_set_client: latin1
character_set_connection: latin1
character_set_database: utf8mb4
character_set_filesystem: binary
character_set_results: latin1
character_set_server: utf8
character_set_system: utf8
character_sets_dir: /usr/local/mysql-8.0.17-.../share/charsets/

SQL: (SELECT Code, HEX(Code) AS CodeHex, HEX('') AS OutHex, HEX('') 
  AS InHex FROM Dictionary WHERE Code = X'C3A9C2B8C2A1')
Code: ()
CodeHex: (C3A8C2B8C2A1)
OutHex: (E8B8A1)
InHex: (E9B8A1)

SQL: (SELECT IF(CONVERT(X'C3A8C2B8C2A1' USING utf8) 
  = CONVERT(X'C3A9C2B8C2A1' USING utf8), 'True', 'False') AS Comparison)
Comparison: (True)

Obviously, the issue can be resolved by:

However, one thing that I will don't understand today is why MySQL server evaluate Boolean expression of "CONVERT(X'C3A8C2B8C2A1' USING utf8) = CONVERT(X'C3A9C2B8C2A1' USING utf8)" to true. Please share it with me, if you know the answer.

Table of Contents

 About This Book

 PHP Installation on Windows Systems

 Integrating PHP with Apache Web Server

 charset="*" - Encodings on Chinese Web Pages

 Chinese Characters in PHP String Literals

 Multibyte String Functions in UTF-8 Encoding

 Input Text Data from Web Forms

 Input Chinese Text Data from Web Forms

 MySQL - Installation on Windows

 MySQL - Connecting PHP to Database

MySQL - Character Set and Encoding

 Specifying Character Set for Text Columns

 Creating a Table with Multiple Character Sets

 Checking Character Set Setting

 Storing ASCII Characters in Non-ASCII Columns

 Storing Non-ASCII Characters with Encoded Bytes

 Transmitting Non-ASCII Characters between PHP and MySQL

 Viewing Character Set Variables

 Non-ASCII Test Result Analysis

 Fetching Non-ASCII Text from MySQL

MySQL Encoding Issue Example

 MySQL - Sending Non-ASCII Text to MySQL

 Retrieving Chinese Text from Database to Web Pages

 Input Chinese Text Data to MySQL Database

 Chinese Text Encoding Conversion and Corruptions

 Archived Tutorials

 References

 Full Version in PDF/EPUB