PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Receiving Non ASCII Characters from Input Forms

Part:   1  2  3  4  5  6  7 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

(Continued from previous part...)

Receiving Non ASCII Characters in UTF-8

In the previous scripts, "charset=iso-8859-1" is used for the input page. Now let's play with "charset=utf-8". Here is my sample script:

<?php # InputUtf8Get.php
# Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
# 
#- Promoting CGI values to local variables
   global $r_English, $r_Spanish, $r_Korean, $r_ChineseUtf8;
   global $r_ChineseGb2312;
   import_request_variables("GPC","r_");

#- Generating HTML document
   print("<html>");
   print('<meta http-equiv="Content-Type"'
      .' content="text/html; charset=utf-8"/>');
   print("<body>\n");
   print("<form action=InputUtf8Get.php method=get>");
   print("English ASCII: <input name=English"
      ." value='$r_English' size=16><br>\n");
   print("Spanish UTF-8: <input name=Spanish"
      ." value='$r_Spanish' size=16><br>\n");
   print("Korean UTF-8: <input name=Korean"
      ." value='$r_Korean' size=16><br>\n");
   print("Chinese UTF-8: <input name=ChineseUtf8"
      ." value='$r_ChineseUtf8' size=16><br>\n");
   print("Chinese GB2312: <input name=ChineseGb2312"
      ." value='$r_ChineseGb2312' size=16><br>\n");
   print("<input type=submit name=submit value=Submit>\n");
   print("</form>\n");

#- Outputing input strings back to HTML document
   print("<hr>");
   print("<pre>");
   foreach ($_GET as $k => $v) {
      print "$k = ($v)\n";
   }
   print("</pre>");
   print("</body>");
   print("</html>");

#- Dumping input strings to a file
   $file = fopen("\\temp\\InputUtf8Get.txt", 'ab');
   $str = "------\n";
   fwrite($file, $str, strlen($str));
   if (array_key_exists('QUERY_STRING',$_SERVER)) {
      $str = $_SERVER['QUERY_STRING'];
   } else {
      $str = NULL;
   }
   fwrite($file, $str, strlen($str));

   $str = "------\n";
   fwrite($file, $str, strlen($str));
   foreach ($_REQUEST as $k => $v) {
      $str = "$k = ($v)\n";
      fwrite($file, $str, strlen($str));
   }
   fclose($file);
?>

If you enter the same input strings as in the previous tests:

English ASCII: Hello world!
Spanish UTF-8: ola mundo!
Korean UTF-8: ???? ?? !
Chinese UTF-8: ????!
Chinese GB2312: 世界你好!

The page returned with the input strings displayed below the form. They look correct to me.

If you open the dump file, \temp\InputUtf8Get.txt, you will see how input strings are URL encoded in query string, and decoded in $_REQUEST.

------
English=Hello+world%21&
Spanish=%C2%A1Hola+mundo%21&
Korean=%EC%97%AC%EB%B3%B4%EC%84%B8%EC%9A%94+%EC%84%B8%EA%B3%84+%21&
ChineseUtf8=%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C%21&
ChineseGb2312=%C3%8A%C3%80%C2%BD%C3%A7%C3%84%C3%A3%C2%BA%C3%83%C2%A3
   %C2%A1
&submit=Submit------
------
English = (Hello world!)
Spanish = (ola mundo!)
Korean = (???? ?? !)
ChineseUtf8 = (????!)
ChineseGb2312 = (世界你好!)
submit = (Submit)

Again, the result matches the rules listed earlier in this chapter. Input strings are recorded as UTF-8 byte sequences when entered on the page. Then each byte is URL encoded as %xx when sending to the server. When input strings are parsed into $_REQUEST, they are decoded back to UTF-8 byte sequences.

One surprise to me is that the GB2312 characters are also recorded as UTF-8 byte sequences.

(Continued on next part...)

Part:   1  2  3  4  5  6  7 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Receiving Non ASCII Characters from Input Forms