|
Handling Non ASCII Characters - Overview
Part:
1
2
3
This chapter explains:
- What Is Localization / Internationalization?
- Managing Characters in Web Based Applications
- Character Traveling Paths
- ASCII Characters in PHP Pages
What Is Localization / Internationalization?
Localization (software localization), sometimes shortened to "l10n", is a process
that tailors software user interface for the user based on his/her specific geographical,
and cultural preferences.
Internationalization (software Internationalization), sometimes shortened to "i18n", is a process of
enabling software for localization.
Users specific geographical and cultural preferences can be abstracted into a concept of locale.
A locale usually determines the following aspects of the user interface:
- Using the specific language for system output and user input.
- Using the specific time zone and date/time format.
- Using the specific currency and numbering system.
We know that the main task of localization is to communicate with user in the local language.
This means that software must present information as characters of the local language, and take
information as characters of the local language.
In this part of the book, we will concentrate on presenting and taking information as characters
of the local language.
Managing Characters in Web Based Applications
Managing characters of a specific language in a Web based application is not a simple task,
because there are a number of software components involved in a typical Web based application.
We have to understand how each of them manages characters and how we transfer characters
from one component to another component.
First, let me try to abstract the involved software programs into the following list:
- Web browser: End user interface program responsible for presenting and taking characters to and from users.
- Web server: Communication program responsible for forwarding characters between Web browser and
application program.
- Application program: Program developed by you for processing characters received from the Web server
and/or retrieved from the database server.
- Database server: Storage program responsible for storing characters.
- Development environment: Software program responsible for embedding characters into the source code of
the application program.
Before we get into details of how characters are handled in those programs, we need to understand how
characters of a specific language are represented in computers. Here is my understanding:
- Characters of English language can be represented by ASCII encoding schema. ASCII encoding schema
represents one English character into one byte.
- Characters of a non English language can be represented either in Unicode encoding schema or in a specific
encoding scheme designed for that language. For example, characters of simplified Chinese can be represented
either in Unicode or in GB2312. Characters of Japanese can be represented either in Unicode or in Shift-JIS.
- Unicode encoding schema has several encoding variations: Unicode internal code, UTF-8, UTF16, etc.
To read more about character encoding and Unicode, see my other book: "Herong's tutorial notes on Unicode".
(Continued on next part...)
Part:
1
2
3
|