Building Chinese Web Sites using PHP
Dr. Herong Yang, Version 2.11

Sending Non-ASCII Text to MySQL in GBK Encoding

This section provides a test on sending non-ASCII character strings to MySQL server in GBK or GB2312 encoding.

The next test I did was to re-run the same SQL INSERT statement that included Latin1, UTF-8, GBK and Big5 characters directly. But I changed MySQL session settings with character_set_client=gbk and character_set_connection=gbk to see how MySQL applies the conversion when executing the INSERT statement.

<?php #MySQL-Send-Non-ASCII-GBK.php
# Copyright (c) 2007 by Dr. Herong Yang, http://www.herongyang.com/
#
  $con = mysql_connect("localhost", "Herong", "TopSecret");
  $ok = mysql_select_db("HerongDB", $con);
  $test_name = "Send Non-ASCII";

# Set character_set_results
  mysql_query("SET character_set_results=utf8", $con);

# Set character_set_client and character_set_connection
  mysql_query("SET character_set_client=gbk", $con);
  mysql_query("SET character_set_connection=gbk", $con);

# Show character set encoding variables
  $sql = "SHOW VARIABLES LIKE 'character_set_%'";
  $res = mysql_query($sql, $con);
  while ($row = mysql_fetch_array($res)) {
    print($row['Variable_name']." = ".$row['Value']."\n");
  }
  mysql_free_result($res);

# Delete the record
  $sql = "DELETE FROM Comment_Mixed WHERE Test_Name ='$test_name'";
  mysql_query($sql, $con);
  print("\nNumber of rows deleted: ".mysql_affected_rows()."\n");

# Build the SQL INSERT statement
  $sql = <<<END_OF_MESSAGE
INSERT INTO Comment_Mixed (Test_name, String_ASCII, 
    String_Latin1, String_UTF8, String_GBK, String_Big5)
  VALUES ('$test_name', 'Television', 
    'T?l?vision', '电视机/電視機', '???', '???');
END_OF_MESSAGE;

# Run the SQL statement
  if (mysql_query($sql, $con)) {
    print("\nNumber of rows inserted: ".mysql_affected_rows()."\n");
  } else {
    print("SQL statement failed.\n");
    print(mysql_errno($con).": ".mysql_error($con)."\n"); 
  }

# Get the recod back
  $sql = "SELECT * FROM Comment_Mixed"
    . " WHERE Test_Name = '$test_name'";
  $res = mysql_query($sql, $con);
  if ($row = mysql_fetch_array($res)) {
    print("\nTest Name = ".$row['Test_Name']."\n");
    print("   String_ASCII: 0x".bin2hex($row['String_ASCII'])."\n");
    print("   String_Latin1: 0x".bin2hex($row['String_Latin1'])."\n");
    print("   String_UTF8: 0x".bin2hex($row['String_UTF8'])."\n");
    print("   String_GBK: 0x".bin2hex($row['String_GBK'])."\n");
    print("   String_Big5: 0x".bin2hex($row['String_Big5'])."\n");
  }  
  mysql_free_result($res);

  mysql_close($con); 
?>

Note that string literals in the source code above will not be displayed properly, because this book uses UTF-8 encoding.

Here is the output:

C:\>\local\php\php MySQL-Send-Non-ASCII-GBK.php

character_set_client = gbk
character_set_connection = gbk
character_set_database = latin1
character_set_filesystem = binary
character_set_results = utf8
character_set_server = latin1
character_set_system = utf8
character_sets_dir = \local\mysql\share\charsets\

Number of rows deleted: 1

Number of rows inserted: 1

Test Name = Send Non-ASCII
   String_ASCII: 0x54656c65766973696f6e
   String_Latin1: 0x543f3f6973696f6e
   String_UTF8: 0xe990a2e4bd833fe98f883fe99786e6988f3fe5a797
   String_GBK: 0xe794b5e8a786e69cba
   String_Big5: 0x3fe8b78c3f

The output is interesting:

  • Conversion from character_set_client, gbk, to character_set_connection, gbk - Identical encodings. No conversion will happen.
  • Conversion from character_set_connection, gbk, to column String_ASCII's encoding, ascii - Will happen with potential data loss. But the input text for String_ASCII is ASCII characters only and compatible with GBK. The output shows no problem.
  • Conversion from character_set_connection, gbk, to column String_Latin1's encoding, latin1 - Will happen with potential data corruption. MySQL took my Latin1 input string as a GBK string and converted to Latin1. Output of String_Latin1 shows some corrupted characters.
  • Conversion from character_set_connection, gbk, to column String_UTF8's encoding, utf8 - Will happen with potential data corruption. MySQL took my UTF-8 input string as a GBK string and converted to UTF-8. Output of String_UTF8 is totally corrupted.
  • Conversion from character_set_connection, gbk, to column String_GBK's encoding, gbk - Perfect match. No conversion will happen. Output of String_GBK is perfectly correct.
  • Conversion from character_set_connection, gbk, to column String_Big5's encoding, big5 - Will happen with potential data corruption. MySQL took my Big5 input string as a GBK string and converted to Big5. Output of String_Big5 is totally corrupted.

Sections in This Chapter

Sending Non-ASCII Text to MySQL in Latin1 Encoding

Sending Non-ASCII Text to MySQL in UTF8 Encoding

Sending Non-ASCII Text to MySQL in GBK Encoding

Sending Non-ASCII Text to MySQL in Big5 Encoding

Sending Non-ASCII Text to MySQL in Multiple Encoding

Summary - Sending and Fetching Non-ASCII Text

Dr. Herong Yang, updated in 2007
Sending Non-ASCII Text to MySQL in GBK Encoding