PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Managing Non ASCII Character Strings

Part:   1  2  3  4  5 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

(Continued from previous part...)

I you run it directly, you will get:

Current settings:
   internal_encoding = (UTF-8)
   http_input = ()
   http_output = (pass)
   func_overload = (pass)

Encoding detection:
1. ASCII for (\x48656c6c6f21)
2. ASCII for (\x00480065006c006c006f0021)
3. UTF-8 for (\xc2a1486f6c6121)
4. UTF-8 for (\xe4bda0e5a5bd21)
5. UTF-8 for (\xc4e3bac3a3a1)

String length:
1. 6 for (\x48656c6c6f21)
2. 6 for (\x00480065006c006c006f0021)
3. 6 for (\xc2a1486f6c6121)
4. 3 for (\xe4bda0e5a5bd21)
5. 3 for (\xc4e3bac3a3a1)

String conversion - ASCII <--> UTF-16:
   String in ASCII = (\x48656c6c6f21)
   Converted to UTF-16 = (\x00480065006c006c006f0021)
   Converted to ASCII = (\x48656c6c6f21)

String conversion - UTF-8 <--> UTF-16:
   String in UTF-8 = (\xc2a1486f6c6121)
   Converted to UTF-16 = (\x00a10048006f006c00610021)
   Converted to UTF-8 = (\xc2a1486f6c6121)

String conversion - UTF-8 <--> GB2312:
   String in UTF-8 = (\xe4bda0e5a5bd21)
   Converted to GB2312 = (\xc4e3bac321)
   Converted to UTF-8 = (\xe4bda0e5a5bd21)

String conversion - GB2312 <--> UTF-16:
   String in GB2312 = (\xc4e3bac3a3a1)
   Converted to UTF-16 = (\x4f60597dff01)
   Converted to GB2312 = (\xc4e3bac3a3a1)

Some interesting notes about this test:

  • I did set "mbstring.http_input = pass", but "mb_get_info()" reported no setting. I don't know why.
  • Encoding detection #2 did not recognize the string "\x00480065006c006c006f0021" as UTF-16 encoding.
  • Encoding detection #3 did not recognize the string "\xc4e3bac3a3a1" as GB2312.
  • By telling "mbstring" the correct encoding name, mb_strlen() worked perfectly.
  • Encoding conversion worked nicely too. I am actually surprised to see UTF-8 and GB2312 conversion working correctly.

HTTP Input and Output Encoding

There are 3 approaches on how to manage HTTP input and output encodings:

1. Set HTTP input encoding, HTTP output encoding and PHP script internal encoding to be exactly the same, like UTF-8 or GB2312. I am strongly recommending this approach, since it avoids the need for conversion when receiving HTTP input and generating HTTP output.

2. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one. But do not let the PHP engine to do automated conversion on HTTP input and output. Let the script manages it explicitly.

3. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one. But let the PHP engine to do automated conversion on HTTP input and output.

(Continued on next part...)

Part:   1  2  3  4  5 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Managing Non ASCII Character Strings