Receiving Non-ASCII Characters with GET Method

This section provides a tutorial example on how enter non-ASCII characters in HTML forms and receive them correctly with the GET method. The HTML form is using the iso-8859-1 encoding.

We know that there are two methods you can use to submit input data in a HTML form: GET and POST. Let's work with GET method first. I wrote the following PHP script to demonstrate how non ASCII characters are managed in the steps described in the previous section.

<?php
#  InputIsoGet.php
#- Copyright 2009 (c) HerongYang.com. All Rights Reserved.
#
#- Promoting CGI values to local variables
   global $r_English, $r_Spanish, $r_Korean, $r_ChineseUtf8;
   global $r_ChineseGb2312;
   import_request_variables("GPC","r_");

#- Generating HTML document
   print("<html>");
   print('<meta http-equiv="Content-Type"'
      .' content="text/html; charset=iso-8859-1"/>');
   print("<body>\n");
   print("<form action=InputIsoGet.php method=get>");
   print("English ASCII: <input name=English"
      ." value='$r_English' size=16><br>\n");
   print("Spanish UTF-8: <input name=Spanish"
      ." value='$r_Spanish' size=16><br>\n");
   print("Korean UTF-8: <input name=Korean"
      ." value='$r_Korean' size=16><br>\n");
   print("Chinese UTF-8: <input name=ChineseUtf8"
      ." value='$r_ChineseUtf8' size=16><br>\n");
   print("Chinese GB2312: <input name=ChineseGb2312"
      ." value='$r_ChineseGb2312' size=16><br>\n");
   print("<input type=submit name=submit value=Submit>\n");
   print("</form>\n");

#- Outputing input strings back to HTML document
   print("<hr>");
   print("<pre>");
   foreach ($_GET as $k => $v) {
      print "$k = ($v)\n";
   }
   print("</pre>");
   print("</body>");
   print("</html>");

#- Dumping input strings to a file
   $file = fopen("\\temp\\InputIsoGet.txt", 'ab');
   $str = "------\n";
   fwrite($file, $str, strlen($str));
   if (array_key_exists('QUERY_STRING',$_SERVER)) {
      $str = $_SERVER['QUERY_STRING'];
   } else {
      $str = NULL;
   }
   fwrite($file, $str, strlen($str));

   $str = "------\n";
   fwrite($file, $str, strlen($str));
   foreach ($_REQUEST as $k => $v) {
      $str = "$k = ($v)\n";
      fwrite($file, $str, strlen($str));
   }
   fclose($file);
?>

Couple of notes for this script:

Now, move this script to your Web server. Open your IE to http://localhost/InputIsoGet.php. You should see a simple Web form.

Next, enter the following strings on the form:

English ASCII: Hello world!
Spanish UTF-8: ¡Hola mundo!
Korean UTF-8: 여보세요 세계 !
Chinese UTF-8: 你好世界!
Chinese GB2312: 你好世界!

Don't try to enter those hello messages in UTF-8 yourself. Go to the Google language tool site google.com/language_tools. You can enter "Hello world!" and translate it to other languages. On the translation output page, just copy those translations and paste them back to this page. This should cause no corruption, because Google site, Windows IE, and Notepad all support UTF-8.

But for the "Chinese GB2312" field, you can enter the message in GB2312 compatible editor. Then copy and paste it to the Web form. Do not enter it directly on the Web form, IE will convert it to Unicode encoding without telling you.

When you are ready, click the submit button. You will see the form comes back with input strings maintained in the field. The input strings are also display below the form as:

English = (Hello world!)
Spanish = (¡Hola mundo!)
Korean = (여보세요 세계 !)
ChineseUtf8 = (你好世界!)
ChineseGb2312 = (????!)
submit = (Submit)

As you can see that the input strings return to the Web page appear to be matching values I entered on the form. But they do not match at all. If you view the source code of the page, you will see that HTML entity encoded strings are generated in the HTML document:

<html><meta http-equiv="Content-Type" content="text/html;
 charset=iso-8859-1"/><body>
<form action=InputIsoGet.php method=get>English ASCII:
 <input name=English value='Hello world!' size=16><br>
Spanish UTF-8: <input name=Spanish value='¡Hola mundo!' size=16><br>
Korean UTF-8: <input name=Korean value='&#50668;&#48372;&#49464;
 &#50836; &#49464;&#44228; !&#50668;&#48372;&#49464;&#50836;
 &#49464;&#44228; !' size=16><br>
Chinese UTF-8: <input name=ChineseUtf8 value='&#20320;&#22909;
 &#19990;&#30028;!' size=16><br>
Chinese GB2312: <input name=ChineseGb2312 value='????¡'
 size=16><br>
<input type=submit name=submit value=Submit>
</form>
<hr><pre>English = (Hello world!)
Spanish = (¡Hola mundo!)
Korean = (&#50668;&#48372;&#49464;&#50836; &#49464;&#44228; !&#50668;
 &#48372;&#49464;&#50836; &#49464;&#44228; !)
ChineseUtf8 = (&#20320;&#22909;&#19990;&#30028;!)
ChineseGb2312 = (????¡)
submit = (Submit)
</pre></body></html>

This matches well with the rule that form input strings are stored in $_REQUEST in the decode format, for example, "&#20320;", where "20320" is the decimal value of Unicode character "\u4F60".

If you open the dump file, \temp\InputIsoGet.txt, you will see how input strings are URL encoded in query string, and decoded in $_REQUEST.

------
English=Hello+world%21&Spanish=%A1Hola+mundo%21&
Korean=%26%2350668%3B%26%2348372%3B%26%2349464%3B%26%2350836%3B
+%26%2349464%3B%26%2344228%3B+%21%26%2350668%3B%26%2348372%3B
%26%2349464%3B%26%2350836%3B+%26%2349464%3B%26%2344228%3B+%21&
ChineseUtf8=%26%2320320%3B%26%2322909%3B%26%2319990%3B
%26%2330028%3B%21&ChineseGb2312=%CA%C0%BD%E7%C4%E3%BA%C3%A3%A1
&submit=Submit
------
English = (Hello world!)
Spanish = (¡Hola mundo!)
Korean = (&#50668;&#48372;&#49464;&#50836;
&#49464;&#44228; !&#50668;&#48372;&#49464;
&#50836; &#49464;&#44228; !)
ChineseUtf8 = (&#20320;&#22909;&#19990;&#30028;!)
ChineseGb2312 = (????!)
submit = (Submit)

For example, the Chinese character entered as Unicode was recorded by the browser as "&#20320;". It was then URL encoded into "%26%2320320%3B" when submitted to the server. PHP CGI module recorded it in $_SERVER['QUERY_STRING'] without any changes. But it was decoded back to "&#20320;" when PHP CGI module copy it to $_REQUEST.

Table of Contents

 About This Book

 Introduction and Installation of PHP

 PHP Script File Syntax

 PHP Data Types and Data Literals

 Variables, References, and Constants

 Expressions, Operations and Type Conversions

 Conditional Statements - "if" and "switch"

 Loop Statements - "while", "for", and "do ... while"

 Function Declaration, Arguments, and Return Values

 Arrays - Ordered Maps

 Interface with Operating System

 Introduction of Class and Object

 Integrating PHP with Apache Web Server

 Retrieving Information from HTTP Requests

 Creating and Managing Sessions in PHP Scripts

 Sending and Receiving Cookies in PHP Scripts

 Controlling HTTP Response Header Lines in PHP Scripts

 Managing File Upload

 MySQL Server Connection and Access Functions

 Functions to Manage Directories, Files and Images

 SOAP Extension Function and Calling Web Services

 SOAP Server Functions and Examples

 Localization Overview of Web Applications

 Using Non-ASCII Characters in HTML Documents

 Using Non-ASCII Characters as PHP Script String Literals

Receiving Non-ASCII Characters from Input Forms

 Basic Rules of Receiving Non-ASCII Characters from Input Forms

Receiving Non-ASCII Characters with GET Method

 Receiving Non-ASCII Characters with POST Method

 Receiving Non ASCII Characters in UTF-8 Encoding

 Decoding HTML Entities

 "mbstring" Extension and Non-ASCII Encoding Management

 Managing Non-ASCII Character Strings with MySQL Servers

 Parsing and Managing HTML Documents

 Configuring and Sending Out Emails

 Image and Picture Processing

 Managing ZIP Archive Files

 Managing PHP Engine and Modules on macOS

 Managing PHP Engine and Modules on CentOS

 Archived Tutorials

 References

 Full Version in PDF/EPUB