|
Receiving Non ASCII Characters from Input Forms
Part:
1
2
3
4
5
6
7
(Continued from previous part...)
Couple of notes for this script:
- The "global" statement declares all variables in the global scope. I don't really need them to be
in the global scope. But this statement also make those variables defined to avoid the "undefined variable"
error message at run time.
- import_request_variables() is a nice method to bring in all input strings in $_GET, $_POST, and $_COOKIE
as variables. This saves a lots of coding like this: "if (array_key_exists('English',$_GET)) {
$r_English = $_GET['English']; } else { $r_English = NULL;}"
- I used the single quote "'" in "<input name=English value='$r_English' ...>" to stop white spaces
damaging the tag syntax. But I didn't do enough to escape "'". I will leave that to you.
- I dumped all input strings to a file in a binary mode to see what we are really getting from the PHP CGI
interface. "b" is used in fopen() to make sure file is open binary mode. fwrite() is used for output binary
data. String length is given in fwrite() to stop the magic_quotes_runtime stripping slashes in the string.
- iso-8859-1 is used as the HTML document encoding.
Now, move this script to your IIS server. Open your IE to http://localhost/InputIsoGet.php. You should
see a simple Web form.
Next, enter the following strings on the form:
English ASCII: Hello world!
Spanish UTF-8: ola mundo!
Korean UTF-8: ???? ?? !
Chinese UTF-8: ????!
Chinese GB2312: 世界你好!
Don't try to enter those hello messages in UTF-8 yourself. Go to the Google language tool site,
http://www.google.com/language_tools. You can enter "Hello world!" and translate it to other languages.
On the translation output page, just copy those translations and paste them back to this page.
This should cause no corruption, because Google site, Windows IE, and Notepad all support UTF-8.
But for the "Chinese GB2312" field, you can enter the message in GB2312 compatible editor. Then copy and paste
it to the Web form. Do not enter it directly on the Web form, IE will convert it to Unicode encoding without
telling you.
When you are ready, click the submit button. You will see the form comes back with input strings maintained in
the field. The input strings are also display below the form as:
English = (Hello world!)
Spanish = (ola mundo!)
Korean = (???? ?? !)
ChineseUtf8 = (????!)
ChineseGb2312 = (世界你好!)
submit = (Submit)
As you can see that the input strings return to the Web page appear to be matching values I entered on the form.
But they do not match at all. If you view the source code of the page, you will see that HTML entity encoded strings
are generated in the HTML document:
<html><meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1"/><body>
<form action=InputIsoGet.php method=get>English ASCII:
<input name=English value='Hello world!' size=16><br>
Spanish UTF-8: <input name=Spanish value='ola mundo!' size=16><br>
Korean UTF-8: <input name=Korean value='여보세
요 세계 !여보세요
세계 !' size=16><br>
Chinese UTF-8: <input name=ChineseUtf8 value='你好
世界!' size=16><br>
Chinese GB2312: <input name=ChineseGb2312 value='世界你好!'
size=16><br>
<input type=submit name=submit value=Submit>
</form>
<hr><pre>English = (Hello world!)
Spanish = (ola mundo!)
Korean = (여보세요 세계 !여
보세요 세계 !)
ChineseUtf8 = (你好世界!)
ChineseGb2312 = (世界你好!)
submit = (Submit)
</pre></body></html>
This matches well with the rule that form input strings are stored in $_REQUEST in the decode format,
for example, "你", where "20320" is the decimal value of Unicode character "\u4F60".
(Continued on next part...)
Part:
1
2
3
4
5
6
7
|