PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Managing Non ASCII Character Strings

Part:   1  2  3  4  5 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

This chapter explains:

  • "mbstring" Extension
  • "mbstring" Functions
  • "mbstring" Basic Tests
  • HTTP Input and Output Encoding

"mbstring" Extension

PHP offers a nice extension called "mbstring" to help you manage non ASCII strings. Here are the features of "mbstring":

  • Provides multibyte specific string functions that properly detect the beginning or ending of a multibyte character. For example, mb_strlen() and mb_split().
  • Handles character encoding conversion between the possible encoding pairs.
  • Offers automatic encoding conversion for HTTP input and output.
  • Supports 'function overloading' feature which enables you to add multibyte awareness to regular string functions. For example, you can overload substr() with mb_substr(), so that calling substr() is really calling mb_substr().

"mbstring" extension needs to installed and configured by updating the php.ini file. To get started with "mbstring" features, I modified my php.ini as:

extension=php_mbstring.dll
...
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = pass
mbstring.http_output = pass
mbstring.encoding_translation = Off
mbstring.detect_order = auto

Note that:

  • "mbstring.language = Neutral" is really setting the language to UTF-8
  • "mbstring.http_input = pass" means no decoding when receiving HTTP input.
  • mbstring.http_output = pass means no encoding when generating HTTP output.

"mbstring" Functions

Let's first look at some of the basic "mbstring" functions offered in "mbstring":

mb_get_info ( ) - Returns the current settings of "mbstring" extension.

mb_internal_encoding ( [string encoding] ) - Sets the current internal encoding with the specified encoding name. If no encoding is specified, it returns the current internal encoding.

string mb_strlen ( string str [, string encoding] ) - Returns the number of characters in the specified string based on the specified encoding name. If no encoding is specified, it uses the current internal encoding.

string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] ) - Converts the specified string to a new encoding from an old encoding and returns the converted string. If no old encoding is specified, it uses the current internal encoding.

mb_detect_encoding ( string str ) - Detects and returns the encoding name of the specified string.

mb_http_input ( string type ) - Detects and returns the encoding name of the specified HTTP input type: "G" for GET, "P" for POST, "C" for COOKIE.

mb_http_output ( [string encoding] ) - Sets or returns the current HTTP output encoding.

mb_output_handler ( string contents, int status ) - Call back function for output buffer. When used as ob_start("mb_output_handler"), all strings going to the output buffer will be converted from the internal encoding to the HTTP output encoding.

mb_parse_str ( string encoded_string [, array &result] ) - Parses URL encoded strings, like the query string received with GET method. Resulting variables will be stored in the specified array. If no array is specified, resulting variables will be promoted into global variables.

(Continued on next part...)

Part:   1  2  3  4  5 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Managing Non ASCII Character Strings