|
Receiving Non ASCII Characters from Input Forms
Part:
1
2
3
4
5
6
7
(Continued from previous part...)
Now enter the following input strings on InputIsoGetDecoded.php to see what happens:
English ASCII: Hello world!
Spanish UTF-8: ¡Hola mundo!
Korean UTF-8: ???? ?? !
Chinese UTF-8: ????!
Chinese GB2312: ÊÀ½çÄãºÃ£¡
If you click the submit button, you will get:
Input strings before decoding:
English = (Hello world!)
Spanish = (¡Hola mundo!)
Korean = (???? ?? !)
ChineseUtf8 = (????!)
ChineseGb2312 = (ÊÀ½çÄãºÃ£¡)
submit = (Submit)
------
Input strings after decoding:
English = (Hello world!)
Spanish = (¡Hola mundo!)
Korean = (여보세요 세계 !)
ChineseUtf8 = (ä½ å¥½ä¸–ç•Œ!)
ChineseGb2312 = (ÊÀ½çÄãºÃ£¡)
submit = (Submit)
The first section shows you input strings as they are received in HTML entity encoding.
The second section shows you input strings as they are decoded from HTML entity encoding to UTF-8 encoding.
Conclusion
- How non ASCII characters are recorded on a Web page depends on the "charset" setting of the page.
- URL encoding is applied when input strings are transferred to the server.
- PHP CGI module applies URL decoding when parsing input strings into $_REQUEST.
- My suggestion is to use "charset=utf-8" for your input pages. No need to worry about HTML entity
conversion.
Part:
1
2
3
4
5
6
7
|