|
Managing Non ASCII Character Strings
Part:
1
2
3
4
5
(Continued from previous part...)
I you run it directly, you will get:
Current settings:
internal_encoding = (UTF-8)
http_input = ()
http_output = (pass)
func_overload = (pass)
Encoding detection:
1. ASCII for (\x48656c6c6f21)
2. ASCII for (\x00480065006c006c006f0021)
3. UTF-8 for (\xc2a1486f6c6121)
4. UTF-8 for (\xe4bda0e5a5bd21)
5. UTF-8 for (\xc4e3bac3a3a1)
String length:
1. 6 for (\x48656c6c6f21)
2. 6 for (\x00480065006c006c006f0021)
3. 6 for (\xc2a1486f6c6121)
4. 3 for (\xe4bda0e5a5bd21)
5. 3 for (\xc4e3bac3a3a1)
String conversion - ASCII <--> UTF-16:
String in ASCII = (\x48656c6c6f21)
Converted to UTF-16 = (\x00480065006c006c006f0021)
Converted to ASCII = (\x48656c6c6f21)
String conversion - UTF-8 <--> UTF-16:
String in UTF-8 = (\xc2a1486f6c6121)
Converted to UTF-16 = (\x00a10048006f006c00610021)
Converted to UTF-8 = (\xc2a1486f6c6121)
String conversion - UTF-8 <--> GB2312:
String in UTF-8 = (\xe4bda0e5a5bd21)
Converted to GB2312 = (\xc4e3bac321)
Converted to UTF-8 = (\xe4bda0e5a5bd21)
String conversion - GB2312 <--> UTF-16:
String in GB2312 = (\xc4e3bac3a3a1)
Converted to UTF-16 = (\x4f60597dff01)
Converted to GB2312 = (\xc4e3bac3a3a1)
Some interesting notes about this test:
- I did set "mbstring.http_input = pass", but "mb_get_info()" reported no setting. I don't know why.
- Encoding detection #2 did not recognize the string "\x00480065006c006c006f0021" as UTF-16 encoding.
- Encoding detection #3 did not recognize the string "\xc4e3bac3a3a1" as GB2312.
- By telling "mbstring" the correct encoding name, mb_strlen() worked perfectly.
- Encoding conversion worked nicely too. I am actually surprised to see UTF-8 and GB2312 conversion
working correctly.
HTTP Input and Output Encoding
There are 3 approaches on how to manage HTTP input and output encodings:
1. Set HTTP input encoding, HTTP output encoding and PHP script internal encoding to be exactly the same, like
UTF-8 or GB2312. I am strongly recommending this approach, since it avoids the need for conversion when receiving
HTTP input and generating HTTP output.
2. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one.
But do not let the PHP engine to do automated conversion on HTTP input and output. Let the script manages it explicitly.
3. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one.
But let the PHP engine to do automated conversion on HTTP input and output.
(Continued on next part...)
Part:
1
2
3
4
5
|