|
Managing Non ASCII Character Strings
Part:
1
2
3
4
5
This chapter explains:
- "mbstring" Extension
- "mbstring" Functions
- "mbstring" Basic Tests
- HTTP Input and Output Encoding
"mbstring" Extension
PHP offers a nice extension called "mbstring" to help you manage non ASCII strings. Here are the features
of "mbstring":
- Provides multibyte specific string functions that properly detect the beginning or ending
of a multibyte character. For example, mb_strlen() and mb_split().
- Handles character encoding conversion between the possible encoding pairs.
- Offers automatic encoding conversion for HTTP input and output.
- Supports 'function overloading' feature which enables you to add multibyte awareness
to regular string functions. For example, you can overload substr() with mb_substr(),
so that calling substr() is really calling mb_substr().
"mbstring" extension needs to installed and configured by updating the php.ini file.
To get started with "mbstring" features, I modified my php.ini as:
extension=php_mbstring.dll
...
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = pass
mbstring.http_output = pass
mbstring.encoding_translation = Off
mbstring.detect_order = auto
Note that:
- "mbstring.language = Neutral" is really setting the language to UTF-8
- "mbstring.http_input = pass" means no decoding when receiving HTTP input.
- mbstring.http_output = pass means no encoding when generating HTTP output.
"mbstring" Functions
Let's first look at some of the basic "mbstring" functions offered in "mbstring":
mb_get_info ( ) - Returns the current settings of "mbstring" extension.
mb_internal_encoding ( [string encoding] ) - Sets the current internal encoding with the specified
encoding name. If no encoding is specified, it returns the current internal encoding.
string mb_strlen ( string str [, string encoding] ) - Returns the number of characters in the
specified string based on the specified encoding name. If no encoding is specified,
it uses the current internal encoding.
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
- Converts the specified string to a new encoding from an old encoding and returns the
converted string. If no old encoding is specified, it uses the current internal encoding.
mb_detect_encoding ( string str ) - Detects and returns the encoding name of the specified string.
mb_http_input ( string type ) - Detects and returns the encoding name of the specified HTTP input
type: "G" for GET, "P" for POST, "C" for COOKIE.
mb_http_output ( [string encoding] ) - Sets or returns the current HTTP output encoding.
mb_output_handler ( string contents, int status ) - Call back function for output buffer. When used as
ob_start("mb_output_handler"), all strings going to the output buffer will be converted from the internal
encoding to the HTTP output encoding.
mb_parse_str ( string encoded_string [, array &result] ) - Parses URL encoded strings, like the query
string received with GET method. Resulting variables will be stored in the specified array. If no array
is specified, resulting variables will be promoted into global variables.
(Continued on next part...)
Part:
1
2
3
4
5
|