PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Receiving Non ASCII Characters from Input Forms

Part:   1  2  3  4  5  6  7 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

(Continued from previous part...)

Couple of notes for this script:

  • The "global" statement declares all variables in the global scope. I don't really need them to be in the global scope. But this statement also make those variables defined to avoid the "undefined variable" error message at run time.
  • import_request_variables() is a nice method to bring in all input strings in $_GET, $_POST, and $_COOKIE as variables. This saves a lots of coding like this: "if (array_key_exists('English',$_GET)) { $r_English = $_GET['English']; } else { $r_English = NULL;}"
  • I used the single quote "'" in "<input name=English value='$r_English' ...>" to stop white spaces damaging the tag syntax. But I didn't do enough to escape "'". I will leave that to you.
  • I dumped all input strings to a file in a binary mode to see what we are really getting from the PHP CGI interface. "b" is used in fopen() to make sure file is open binary mode. fwrite() is used for output binary data. String length is given in fwrite() to stop the magic_quotes_runtime stripping slashes in the string.
  • iso-8859-1 is used as the HTML document encoding.

Now, move this script to your IIS server. Open your IE to http://localhost/InputIsoGet.php. You should see a simple Web form.

Next, enter the following strings on the form:

English ASCII: Hello world!
Spanish UTF-8: ola mundo!
Korean UTF-8: ???? ?? !
Chinese UTF-8: ????!
Chinese GB2312: 世界你好!

Don't try to enter those hello messages in UTF-8 yourself. Go to the Google language tool site, http://www.google.com/language_tools. You can enter "Hello world!" and translate it to other languages. On the translation output page, just copy those translations and paste them back to this page. This should cause no corruption, because Google site, Windows IE, and Notepad all support UTF-8.

But for the "Chinese GB2312" field, you can enter the message in GB2312 compatible editor. Then copy and paste it to the Web form. Do not enter it directly on the Web form, IE will convert it to Unicode encoding without telling you.

When you are ready, click the submit button. You will see the form comes back with input strings maintained in the field. The input strings are also display below the form as:

English = (Hello world!)
Spanish = (ola mundo!)
Korean = (???? ?? !)
ChineseUtf8 = (????!)
ChineseGb2312 = (世界你好!)
submit = (Submit)

As you can see that the input strings return to the Web page appear to be matching values I entered on the form. But they do not match at all. If you view the source code of the page, you will see that HTML entity encoded strings are generated in the HTML document:

<html><meta http-equiv="Content-Type" content="text/html;
 charset=iso-8859-1"/><body>
<form action=InputIsoGet.php method=get>English ASCII:
 <input name=English value='Hello world!' size=16><br>
Spanish UTF-8: <input name=Spanish value='ola mundo!' size=16><br>
Korean UTF-8: <input name=Korean value='&#50668;&#48372;&#49464;
 &#50836; &#49464;&#44228; !&#50668;&#48372;&#49464;&#50836;
 &#49464;&#44228; !' size=16><br>
Chinese UTF-8: <input name=ChineseUtf8 value='&#20320;&#22909;
 &#19990;&#30028;!' size=16><br>
Chinese GB2312: <input name=ChineseGb2312 value='世界你好!'
 size=16><br>
<input type=submit name=submit value=Submit>
</form>
<hr><pre>English = (Hello world!)
Spanish = (ola mundo!)
Korean = (&#50668;&#48372;&#49464;&#50836; &#49464;&#44228; !&#50668;
 &#48372;&#49464;&#50836; &#49464;&#44228; !)
ChineseUtf8 = (&#20320;&#22909;&#19990;&#30028;!)
ChineseGb2312 = (世界你好!)
submit = (Submit)
</pre></body></html>

This matches well with the rule that form input strings are stored in $_REQUEST in the decode format, for example, "&#20320;", where "20320" is the decimal value of Unicode character "\u4F60".

(Continued on next part...)

Part:   1  2  3  4  5  6  7 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Receiving Non ASCII Characters from Input Forms