PHP Tutorials - Herong's Tutorial Examples - v5.18, by Herong Yang
Receiving Non-ASCII Characters with GET Method
This section provides a tutorial example on how enter non-ASCII characters in HTML forms and receive them correctly with the GET method. The HTML form is using the iso-8859-1 encoding.
We know that there are two methods you can use to submit input data in a HTML form: GET and POST. Let's work with GET method first. I wrote the following PHP script to demonstrate how non ASCII characters are managed in the steps described in the previous section.
<?php # InputIsoGet.php #- Copyright 2009 (c) HerongYang.com. All Rights Reserved. # #- Promoting CGI values to local variables global $r_English, $r_Spanish, $r_Korean, $r_ChineseUtf8; global $r_ChineseGb2312; import_request_variables("GPC","r_"); #- Generating HTML document print("<html>"); print('<meta http-equiv="Content-Type"' .' content="text/html; charset=iso-8859-1"/>'); print("<body>\n"); print("<form action=InputIsoGet.php method=get>"); print("English ASCII: <input name=English" ." value='$r_English' size=16><br>\n"); print("Spanish UTF-8: <input name=Spanish" ." value='$r_Spanish' size=16><br>\n"); print("Korean UTF-8: <input name=Korean" ." value='$r_Korean' size=16><br>\n"); print("Chinese UTF-8: <input name=ChineseUtf8" ." value='$r_ChineseUtf8' size=16><br>\n"); print("Chinese GB2312: <input name=ChineseGb2312" ." value='$r_ChineseGb2312' size=16><br>\n"); print("<input type=submit name=submit value=Submit>\n"); print("</form>\n"); #- Outputing input strings back to HTML document print("<hr>"); print("<pre>"); foreach ($_GET as $k => $v) { print "$k = ($v)\n"; } print("</pre>"); print("</body>"); print("</html>"); #- Dumping input strings to a file $file = fopen("\\temp\\InputIsoGet.txt", 'ab'); $str = "------\n"; fwrite($file, $str, strlen($str)); if (array_key_exists('QUERY_STRING',$_SERVER)) { $str = $_SERVER['QUERY_STRING']; } else { $str = NULL; } fwrite($file, $str, strlen($str)); $str = "------\n"; fwrite($file, $str, strlen($str)); foreach ($_REQUEST as $k => $v) { $str = "$k = ($v)\n"; fwrite($file, $str, strlen($str)); } fclose($file); ?>
Couple of notes for this script:
Now, move this script to your Web server. Open your IE to http://localhost/InputIsoGet.php. You should see a simple Web form.
Next, enter the following strings on the form:
English ASCII: Hello world! Spanish UTF-8: ¡Hola mundo! Korean UTF-8: 여보세요 세계 ! Chinese UTF-8: 你好世界! Chinese GB2312: 你好世界!
Don't try to enter those hello messages in UTF-8 yourself. Go to the Google language tool site google.com/language_tools. You can enter "Hello world!" and translate it to other languages. On the translation output page, just copy those translations and paste them back to this page. This should cause no corruption, because Google site, Windows IE, and Notepad all support UTF-8.
But for the "Chinese GB2312" field, you can enter the message in GB2312 compatible editor. Then copy and paste it to the Web form. Do not enter it directly on the Web form, IE will convert it to Unicode encoding without telling you.
When you are ready, click the submit button. You will see the form comes back with input strings maintained in the field. The input strings are also display below the form as:
English = (Hello world!) Spanish = (¡Hola mundo!) Korean = (여보세요 세계 !) ChineseUtf8 = (你好世界!) ChineseGb2312 = (????!) submit = (Submit)
As you can see that the input strings return to the Web page appear to be matching values I entered on the form. But they do not match at all. If you view the source code of the page, you will see that HTML entity encoded strings are generated in the HTML document:
<html><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/><body> <form action=InputIsoGet.php method=get>English ASCII: <input name=English value='Hello world!' size=16><br> Spanish UTF-8: <input name=Spanish value='¡Hola mundo!' size=16><br> Korean UTF-8: <input name=Korean value='여보세 요 세계 !여보세요 세계 !' size=16><br> Chinese UTF-8: <input name=ChineseUtf8 value='你好 世界!' size=16><br> Chinese GB2312: <input name=ChineseGb2312 value='????¡' size=16><br> <input type=submit name=submit value=Submit> </form> <hr><pre>English = (Hello world!) Spanish = (¡Hola mundo!) Korean = (여보세요 세계 !여 보세요 세계 !) ChineseUtf8 = (你好世界!) ChineseGb2312 = (????¡) submit = (Submit) </pre></body></html>
This matches well with the rule that form input strings are stored in $_REQUEST in the decode format, for example, "你", where "20320" is the decimal value of Unicode character "\u4F60".
If you open the dump file, \temp\InputIsoGet.txt, you will see how input strings are URL encoded in query string, and decoded in $_REQUEST.
------ English=Hello+world%21&Spanish=%A1Hola+mundo%21& Korean=%26%2350668%3B%26%2348372%3B%26%2349464%3B%26%2350836%3B +%26%2349464%3B%26%2344228%3B+%21%26%2350668%3B%26%2348372%3B %26%2349464%3B%26%2350836%3B+%26%2349464%3B%26%2344228%3B+%21& ChineseUtf8=%26%2320320%3B%26%2322909%3B%26%2319990%3B %26%2330028%3B%21&ChineseGb2312=%CA%C0%BD%E7%C4%E3%BA%C3%A3%A1 &submit=Submit ------ English = (Hello world!) Spanish = (¡Hola mundo!) Korean = (여보세요 세계 !여보세 요 세계 !) ChineseUtf8 = (你好世界!) ChineseGb2312 = (????!) submit = (Submit)
For example, the Chinese character entered as Unicode was recorded by the browser as "你". It was then URL encoded into "%26%2320320%3B" when submitted to the server. PHP CGI module recorded it in $_SERVER['QUERY_STRING'] without any changes. But it was decoded back to "你" when PHP CGI module copy it to $_REQUEST.
Table of Contents
Introduction and Installation of PHP
PHP Data Types and Data Literals
Variables, References, and Constants
Expressions, Operations and Type Conversions
Conditional Statements - "if" and "switch"
Loop Statements - "while", "for", and "do ... while"
Function Declaration, Arguments, and Return Values
Interface with Operating System
Introduction of Class and Object
Integrating PHP with Apache Web Server
Retrieving Information from HTTP Requests
Creating and Managing Sessions in PHP Scripts
Sending and Receiving Cookies in PHP Scripts
Controlling HTTP Response Header Lines in PHP Scripts
Functions to Manage Directories, Files and Images
Localization Overview of Web Applications
Using Non-ASCII Characters in HTML Documents
Using Non-ASCII Characters as PHP Script String Literals
►Receiving Non-ASCII Characters from Input Forms
Basic Rules of Receiving Non-ASCII Characters from Input Forms
►Receiving Non-ASCII Characters with GET Method
Receiving Non-ASCII Characters with POST Method
Receiving Non ASCII Characters in UTF-8 Encoding
"mbstring" Extension and Non-ASCII Encoding Management
Managing Non-ASCII Character Strings with MySQL Servers
Configuring and Sending Out Emails
Managing PHP Engine and Modules on macOS