Receiving Non ASCII Characters in UTF-8 Encoding

This section provides a tutorial example on how enter non-ASCII characters in HTML forms and receive them correctly with the GET method. The HTML form is using the Unicode UTF-8 encoding.

In the previous scripts, "charset=iso-8859-1" is used for the input page. Now let's play with "charset=utf-8". Here is my sample script:

<?php
#  InputUtf8Get.php
#- Copyright 2009 (c) HerongYang.com. All Rights Reserved.
#
#- Promoting CGI values to local variables
   global $r_English, $r_Spanish, $r_Korean, $r_ChineseUtf8;
   global $r_ChineseGb2312;
   import_request_variables("GPC","r_");

#- Generating HTML document
   print("<html>");
   print('<meta http-equiv="Content-Type"'
      .' content="text/html; charset=utf-8"/>');
   print("<body>\n");
   print("<form action=InputUtf8Get.php method=get>");
   print("English ASCII: <input name=English"
      ." value='$r_English' size=16><br>\n");
   print("Spanish UTF-8: <input name=Spanish"
      ." value='$r_Spanish' size=16><br>\n");
   print("Korean UTF-8: <input name=Korean"
      ." value='$r_Korean' size=16><br>\n");
   print("Chinese UTF-8: <input name=ChineseUtf8"
      ." value='$r_ChineseUtf8' size=16><br>\n");
   print("Chinese GB2312: <input name=ChineseGb2312"
      ." value='$r_ChineseGb2312' size=16><br>\n");
   print("<input type=submit name=submit value=Submit>\n");
   print("</form>\n");

#- Outputing input strings back to HTML document
   print("<hr>");
   print("<pre>");
   foreach ($_GET as $k => $v) {
      print "$k = ($v)\n";
   }
   print("</pre>");
   print("</body>");
   print("</html>");

#- Dumping input strings to a file
   $file = fopen("\\temp\\InputUtf8Get.txt", 'ab');
   $str = "------\n";
   fwrite($file, $str, strlen($str));
   if (array_key_exists('QUERY_STRING',$_SERVER)) {
      $str = $_SERVER['QUERY_STRING'];
   } else {
      $str = NULL;
   }
   fwrite($file, $str, strlen($str));

   $str = "------\n";
   fwrite($file, $str, strlen($str));
   foreach ($_REQUEST as $k => $v) {
      $str = "$k = ($v)\n";
      fwrite($file, $str, strlen($str));
   }
   fclose($file);
?>

If you enter the same input strings as in the previous tests:

English = (Hello world!)
Spanish = (¡Hola mundo!)
Korean = (여보세요 세계 !)
ChineseUtf8 = (你好世界!)
ChineseGb2312 = (????!)
submit = (Submit)

The page returned with the input strings displayed below the form. They look correct to me.

If you open the dump file, \temp\InputUtf8Get.txt, you will see how input strings are URL encoded in query string, and decoded in $_REQUEST.

------
English=Hello+world%21&
Spanish=%C2%A1Hola+mundo%21&
Korean=%EC%97%AC%EB%B3%B4%EC%84%B8%EC%9A%94+%EC%84%B8%EA%B3%84+%21&
ChineseUtf8=%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C%21&
ChineseGb2312=%C3%8A%C3%80%C2%BD%C3%A7%C3%84%C3%A3%C2%BA%C3%83%C2%A3
   %C2%A1
&submit=Submit------
------
English = (Hello world!)
Spanish = (¡Hola mundo!)
Korean = (여보세요 세계 !)
ChineseUtf8 = (你好世界!)
ChineseGb2312 = (????!)
submit = (Submit)

Again, the result matches the rules listed earlier in this chapter. Input strings are recorded as UTF-8 byte sequences when entered on the page. Then each byte is URL encoded as %xx when sending to the server. When input strings are parsed into $_REQUEST, they are decoded back to UTF-8 byte sequences.

One surprise to me is that the GB2312 characters are also recorded as UTF-8 byte sequences.

Table of Contents

 About This Book

 Introduction and Installation of PHP

 PHP Script File Syntax

 PHP Data Types and Data Literals

 Variables, References, and Constants

 Expressions, Operations and Type Conversions

 Conditional Statements - "if" and "switch"

 Loop Statements - "while", "for", and "do ... while"

 Function Declaration, Arguments, and Return Values

 Arrays - Ordered Maps

 Interface with Operating System

 Introduction of Class and Object

 Integrating PHP with Apache Web Server

 Retrieving Information from HTTP Requests

 Creating and Managing Sessions in PHP Scripts

 Sending and Receiving Cookies in PHP Scripts

 Controlling HTTP Response Header Lines in PHP Scripts

 Managing File Upload

 MySQL Server Connection and Access Functions

 Functions to Manage Directories, Files and Images

 SOAP Extension Function and Calling Web Services

 SOAP Server Functions and Examples

 Localization Overview of Web Applications

 Using Non-ASCII Characters in HTML Documents

 Using Non-ASCII Characters as PHP Script String Literals

Receiving Non-ASCII Characters from Input Forms

 Basic Rules of Receiving Non-ASCII Characters from Input Forms

 Receiving Non-ASCII Characters with GET Method

 Receiving Non-ASCII Characters with POST Method

Receiving Non ASCII Characters in UTF-8 Encoding

 Decoding HTML Entities

 "mbstring" Extension and Non-ASCII Encoding Management

 Managing Non-ASCII Character Strings with MySQL Servers

 Parsing and Managing HTML Documents

 Configuring and Sending Out Emails

 Image and Picture Processing

 Managing ZIP Archive Files

 Managing PHP Engine and Modules on macOS

 Managing PHP Engine and Modules on CentOS

 Archived Tutorials

 References

 Full Version in PDF/EPUB