PHP Tutorials - Herong's Tutorial Examples - v5.17, by Herong Yang
Basic Rules of Using Non-ASCII Characters in HTML Documents
This section describes basic rules on how non-ASCII character strings should be managed at different steps to ensure localized text strings can be used in PHP script string literals and displayed correctly on the browser window.
As you can see from the previous chapters, when PHP scripts are involved in a Web based application, they are always used behind a Web server. PHP scripts are expected to generate HTML documents and pass them back to the Web server. There are about four ways non ASCII characters can get into the HTML document through PHP scripts: a) Enter them as string literals; b) Receive them from HTTP request; c) Retrieve them from files; d) Retrieve them from a database.
In this chapter, we will concentrate on how to include non ASCII characters in PHP scripts as string literals. Here are the steps involved in this scenario:
A1. Key Sequences from keyboard | |- Text editor v A2. PHP File | |- PHP CGI engine v A3. HTML Document
Based on my experience, here are some basic rules related to those steps:
1. You must decide on the character encoding schema to be used in your PHP script file. For most of the languages, you have two options, a: use a encoding schema specific to that language; b: use a Unicode schema. For example, you can use either GB2312 (a simplified Chinese character schema) or UTF-8 (a Unicode character schema) for Chinese characters. My suggestion used to be "a". But today, I am suggesting "b", because Unicode schema can support all characters of all languages.
2. From step "A1" to "A2", you need select good text editor that supports the encoding schema you have decided. The end goal of this step is simple - characters in string literals must be stored in the PHP file using the decided encoding schema. Don't under estimate the difficulty level of this step. It could be very frustrating, because most computer keyboards support alphabetic letters only. You may have to use some language specific input software to translate alphabetic letters into language specific characters. The editor sometimes may also store characters in memory in one encoding schema, and offer you different encoding schema when saving files to hard disk.
3. String data type is defined as a sequence of bytes in PHP, like C language. This is different than Java language, where string data type is defined as a sequence of Unicode characters. String literals in PHP are also taken as sequences of bytes. This is a nice feature. It allows us to enter non ASCII characters in almost any encoding schema.
4. All PHP built-in string functions assume that strings are sequences of bytes. For example, strlen() returns the number of bytes of the given string, not the number of characters of a specific language. To manage strings as sequences of characters, we need to use Multibyte String functions, mb_*().
5. From step "A2" to "A3", HTML documents are generated from PHP script mainly through the print() function. The print() function will nicely copy every bytes from the specified string to HTML documents. This guarantees that any non ASCII characters encoded in any encoding schema will be copied correctly to the HTML document. Again, this is different than JSP pages, where strings will be converted into bytes stream based a specified encoding schema, if you are using character based output stream functions.
6. If you do want to convert from one encoding schema to another encoding schema during the print() function call, you can use mb_output_handler as the call back function on the output buffer: ob_start("mb_output_handler").
Table of Contents
Introduction and Installation of PHP
PHP Data Types and Data Literals
Variables, References, and Constants
Expressions, Operations and Type Conversions
Conditional Statements - "if" and "switch"
Loop Statements - "while", "for", and "do ... while"
Function Declaration, Arguments, and Return Values
Interface with Operating System
Introduction of Class and Object
Integrating PHP with Apache Web Server
Retrieving Information from HTTP Requests
Creating and Managing Sessions in PHP Scripts
Sending and Receiving Cookies in PHP Scripts
Controlling HTTP Response Header Lines in PHP Scripts
MySQL Server Connection and Access Functions
Functions to Manage Directories, Files and Images
SOAP Extension Function and Calling Web Services
SOAP Server Functions and Examples
Localization Overview of Web Applications
Using Non-ASCII Characters in HTML Documents
►Using Non-ASCII Characters as PHP Script String Literals
►Basic Rules of Using Non-ASCII Characters in HTML Documents
French Characters in String Literals - UTF-8 Encoding
French Characters in HTML Documents - ISO-8859-1 Encoding
Chinese Characters in String Literals - UTF-8 Encoding
Chinese Characters in String Literals - GB2312 Encoding
Characters of Multiple Languages in String Literals
Receiving Non-ASCII Characters from Input Forms
"mbstring" Extension and Non-ASCII Encoding Management
Managing Non-ASCII Character Strings with MySQL Servers
Parsing and Managing HTML Documents
Configuring and Sending Out Emails
Managing PHP Engine and Modules on macOS