PHP Tutorials - Herong's Tutorial Examples - v5.17, by Herong Yang
Receiving Non ASCII Characters in UTF-8 Encoding
This section provides a tutorial example on how enter non-ASCII characters in HTML forms and receive them correctly with the GET method. The HTML form is using the Unicode UTF-8 encoding.
In the previous scripts, "charset=iso-8859-1" is used for the input page. Now let's play with "charset=utf-8". Here is my sample script:
<?php # InputUtf8Get.php #- Copyright 2009 (c) HerongYang.com. All Rights Reserved. # #- Promoting CGI values to local variables global $r_English, $r_Spanish, $r_Korean, $r_ChineseUtf8; global $r_ChineseGb2312; import_request_variables("GPC","r_"); #- Generating HTML document print("<html>"); print('<meta http-equiv="Content-Type"' .' content="text/html; charset=utf-8"/>'); print("<body>\n"); print("<form action=InputUtf8Get.php method=get>"); print("English ASCII: <input name=English" ." value='$r_English' size=16><br>\n"); print("Spanish UTF-8: <input name=Spanish" ." value='$r_Spanish' size=16><br>\n"); print("Korean UTF-8: <input name=Korean" ." value='$r_Korean' size=16><br>\n"); print("Chinese UTF-8: <input name=ChineseUtf8" ." value='$r_ChineseUtf8' size=16><br>\n"); print("Chinese GB2312: <input name=ChineseGb2312" ." value='$r_ChineseGb2312' size=16><br>\n"); print("<input type=submit name=submit value=Submit>\n"); print("</form>\n"); #- Outputing input strings back to HTML document print("<hr>"); print("<pre>"); foreach ($_GET as $k => $v) { print "$k = ($v)\n"; } print("</pre>"); print("</body>"); print("</html>"); #- Dumping input strings to a file $file = fopen("\\temp\\InputUtf8Get.txt", 'ab'); $str = "------\n"; fwrite($file, $str, strlen($str)); if (array_key_exists('QUERY_STRING',$_SERVER)) { $str = $_SERVER['QUERY_STRING']; } else { $str = NULL; } fwrite($file, $str, strlen($str)); $str = "------\n"; fwrite($file, $str, strlen($str)); foreach ($_REQUEST as $k => $v) { $str = "$k = ($v)\n"; fwrite($file, $str, strlen($str)); } fclose($file); ?>
If you enter the same input strings as in the previous tests:
English = (Hello world!) Spanish = (¡Hola mundo!) Korean = (여보세요 세계 !) ChineseUtf8 = (你好世界!) ChineseGb2312 = (????!) submit = (Submit)
The page returned with the input strings displayed below the form. They look correct to me.
If you open the dump file, \temp\InputUtf8Get.txt, you will see how input strings are URL encoded in query string, and decoded in $_REQUEST.
------ English=Hello+world%21& Spanish=%C2%A1Hola+mundo%21& Korean=%EC%97%AC%EB%B3%B4%EC%84%B8%EC%9A%94+%EC%84%B8%EA%B3%84+%21& ChineseUtf8=%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C%21& ChineseGb2312=%C3%8A%C3%80%C2%BD%C3%A7%C3%84%C3%A3%C2%BA%C3%83%C2%A3 %C2%A1 &submit=Submit------ ------ English = (Hello world!) Spanish = (¡Hola mundo!) Korean = (여보세요 세계 !) ChineseUtf8 = (你好世界!) ChineseGb2312 = (????!) submit = (Submit)
Again, the result matches the rules listed earlier in this chapter. Input strings are recorded as UTF-8 byte sequences when entered on the page. Then each byte is URL encoded as %xx when sending to the server. When input strings are parsed into $_REQUEST, they are decoded back to UTF-8 byte sequences.
One surprise to me is that the GB2312 characters are also recorded as UTF-8 byte sequences.
Table of Contents
Introduction and Installation of PHP
PHP Data Types and Data Literals
Variables, References, and Constants
Expressions, Operations and Type Conversions
Conditional Statements - "if" and "switch"
Loop Statements - "while", "for", and "do ... while"
Function Declaration, Arguments, and Return Values
Interface with Operating System
Introduction of Class and Object
Integrating PHP with Apache Web Server
Retrieving Information from HTTP Requests
Creating and Managing Sessions in PHP Scripts
Sending and Receiving Cookies in PHP Scripts
Controlling HTTP Response Header Lines in PHP Scripts
MySQL Server Connection and Access Functions
Functions to Manage Directories, Files and Images
SOAP Extension Function and Calling Web Services
SOAP Server Functions and Examples
Localization Overview of Web Applications
Using Non-ASCII Characters in HTML Documents
Using Non-ASCII Characters as PHP Script String Literals
►Receiving Non-ASCII Characters from Input Forms
Basic Rules of Receiving Non-ASCII Characters from Input Forms
Receiving Non-ASCII Characters with GET Method
Receiving Non-ASCII Characters with POST Method
►Receiving Non ASCII Characters in UTF-8 Encoding
"mbstring" Extension and Non-ASCII Encoding Management
Managing Non-ASCII Character Strings with MySQL Servers
Parsing and Managing HTML Documents
Configuring and Sending Out Emails
Managing PHP Engine and Modules on macOS