PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Receiving Non ASCII Characters from Input Forms

Part:   1  2  3  4  5  6  7 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

(Continued from previous part...)

Decoding HTML Entities

As you see earlier in this chapter, if page has "charset=iso-8859-1", Unicode characters will be received as HTML entities in $_REQUEST. How can we convert them back to Unicode characters?

I have tried with "urldecode()" and "rawurldecode()". They work fine on single-byte characters. But they do not work with multi-byte characters.

PHP has a special function "html_entity_decode()" to decode HTML entities with multi-byte characters. Here is the syntax of html_entity_decode():

   html_entity_decode(string[, quote_style[, charset]])

where "string" is the HTML entity encoded string; "quote_style" specifies how quotes should be handled; and "charset" specifies which character set to use. Supported character sets include: ISO-8859-1, UTF-8, cp1251, GB2312, and Shift_JIS.

To show you how to use html_entity_decode(), I modified InputIsoGet.php to InputIsoGetDecoded.php:

<?php # InputIsoGetDecoded.php
# Copyright (c) 2005 by Dr. Herong Yang, http://www.herongyang.com/
# 
#- Promoting CGI values to local variables
   global $r_English, $r_Spanish, $r_Korean, $r_ChineseUtf8;
   global $r_ChineseGb2312;
   import_request_variables("GPC","r_");

#- Generating HTML document
   print("<html>");
   print('<meta http-equiv="Content-Type"'
      .' content="text/html; charset=utf-8"/>');
   print("<body>\n");
   print("<form action=InputIsoGetDecoded.php method=get>");
   print("English ASCII: <input name=English"
      ." value='$r_English' size=16><br>\n");
   print("Spanish UTF-8: <input name=Spanish"
      ." value='$r_Spanish' size=16><br>\n");
   print("Korean UTF-8: <input name=Korean"
      ." value='$r_Korean' size=16><br>\n");
   print("Chinese UTF-8: <input name=ChineseUtf8"
      ." value='$r_ChineseUtf8' size=16><br>\n");
   print("Chinese GB2312: <input name=ChineseGb2312"
      ." value='$r_ChineseGb2312' size=16><br>\n");
   print("<input type=submit name=submit value=Submit>\n");
   print("</form>\n");

#- Outputing input strings back to HTML document
   print("<hr>");
   print("<pre>");
   print("Input strings before decoding:\n");
   foreach ($_GET as $k => $v) {
      print "$k = ($v)\n";
   }
   print("</pre>");

#- Outputing input strings back to HTML document - decoded
   print("<hr>");
   print("<pre>");
   print("Input strings after decoding:\n");
   foreach ($_GET as $k => $v) {
      print("$k = (".html_entity_decode($v,ENT_COMPAT,"UTF-8").")\n");
   }
   print("</pre>");
   print("</body>");
   print("</html>");

#- Dumping input strings to a file
   $file = fopen("\\temp\\InputIsoGet.txt", 'ab');
   $str = "------\n";
   fwrite($file, $str, strlen($str));
   if (array_key_exists('QUERY_STRING',$_SERVER)) {
      $str = $_SERVER['QUERY_STRING'];
   } else {
      $str = NULL;
   }
   fwrite($file, $str, strlen($str));

   $str = "------\n";
   fwrite($file, $str, strlen($str));
   foreach ($_REQUEST as $k => $v) {
      $str = "$k = ($v)\n";
      fwrite($file, $str, strlen($str));
   }
   fclose($file);
?>

(Continued on next part...)

Part:   1  2  3  4  5  6  7 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Receiving Non ASCII Characters from Input Forms