PHP Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 2.21

Handling Non ASCII Characters - Overview

Part:   1  2  3 

PHP Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Non ASCII Characters with MySQL

Inputting Non ASCII Characters

Controlling Response Header Lines

HTTP Request Variables

Sessions

Using Cookies

PHP SOAP Extension

PHP SOAP Extension - Server

Directories, Files and Images

Using MySQL with PHP

... Table of Contents

This chapter explains:

  • What Is Localization / Internationalization?
  • Managing Characters in Web Based Applications
  • Character Traveling Paths
  • ASCII Characters in PHP Pages

What Is Localization / Internationalization?

Localization (software localization), sometimes shortened to "l10n", is a process that tailors software user interface for the user based on his/her specific geographical, and cultural preferences.

Internationalization (software Internationalization), sometimes shortened to "i18n", is a process of enabling software for localization.

Users specific geographical and cultural preferences can be abstracted into a concept of locale. A locale usually determines the following aspects of the user interface:

  • Using the specific language for system output and user input.
  • Using the specific time zone and date/time format.
  • Using the specific currency and numbering system.

We know that the main task of localization is to communicate with user in the local language. This means that software must present information as characters of the local language, and take information as characters of the local language.

In this part of the book, we will concentrate on presenting and taking information as characters of the local language.

Managing Characters in Web Based Applications

Managing characters of a specific language in a Web based application is not a simple task, because there are a number of software components involved in a typical Web based application. We have to understand how each of them manages characters and how we transfer characters from one component to another component.

First, let me try to abstract the involved software programs into the following list:

  • Web browser: End user interface program responsible for presenting and taking characters to and from users.
  • Web server: Communication program responsible for forwarding characters between Web browser and application program.
  • Application program: Program developed by you for processing characters received from the Web server and/or retrieved from the database server.
  • Database server: Storage program responsible for storing characters.
  • Development environment: Software program responsible for embedding characters into the source code of the application program.

Before we get into details of how characters are handled in those programs, we need to understand how characters of a specific language are represented in computers. Here is my understanding:

  • Characters of English language can be represented by ASCII encoding schema. ASCII encoding schema represents one English character into one byte.
  • Characters of a non English language can be represented either in Unicode encoding schema or in a specific encoding scheme designed for that language. For example, characters of simplified Chinese can be represented either in Unicode or in GB2312. Characters of Japanese can be represented either in Unicode or in Shift-JIS.
  • Unicode encoding schema has several encoding variations: Unicode internal code, UTF-8, UTF16, etc.

To read more about character encoding and Unicode, see my other book: "Herong's tutorial notes on Unicode".

(Continued on next part...)

Part:   1  2  3 

Dr. Herong Yang, updated in 2006
PHP Tutorials - Herong's Tutorial Notes - Handling Non ASCII Characters - Overview