Building Chinese Web Sites using PHP
Dr. Herong Yang, Version 2.11

Specifying Character Set for Text Columns

This section describes how to specify character set for text columns in MySQL database.

By default, MySQL uses character set (encoding) "Latin1" for all text columns like CHAR, VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. But you can change it to any of the following character sets:

mysql> show character set;
+----------+-----------------------------+
| Charset  | Description                 |
+----------+-----------------------------+
| big5     | Big5 Traditional Chinese    |
| dec8     | DEC West European           |
| cp850    | DOS West European           |
| hp8      | HP West European            |
| koi8r    | KOI8-R Relcom Russian       |
| latin1   | cp1252 West European        |
| latin2   | ISO 8859-2 Central European |
| swe7     | 7bit Swedish                |
| ascii    | US ASCII                    |
| ujis     | EUC-JP Japanese             |
| sjis     | Shift-JIS Japanese          |
| hebrew   | ISO 8859-8 Hebrew           |
| tis620   | TIS620 Thai                 |
| euckr    | EUC-KR Korean               |
| koi8u    | KOI8-U Ukrainian            |
| gb2312   | GB2312 Simplified Chinese   |
| greek    | ISO 8859-7 Greek            |
| cp1250   | Windows Central European    |
| gbk      | GBK Simplified Chinese      |
| latin5   | ISO 8859-9 Turkish          |
| armscii8 | ARMSCII-8 Armenian          |
| utf8     | UTF-8 Unicode               |
| ucs2     | UCS-2 Unicode               |
| cp866    | DOS Russian                 |
| keybcs2  | DOS Kamenicky Czech-Slovak  |
| macce    | Mac Central European        |
| macroman | Mac West European           |
| cp852    | DOS Central European        |
| latin7   | ISO 8859-13 Baltic          |
| cp1251   | Windows Cyrillic            |
| cp1256   | Windows Arabic              |
| cp1257   | Windows Baltic              |
| binary   | Binary pseudo charset       |
| geostd8  | GEOSTD8 Georgian            |
| cp932    | SJIS for Windows Japanese   |
| eucjpms  | UJIS for Windows Japanese   |
+----------+-----------------------------+
36 rows in set (0.05 sec)

MySQL allows you to set "character set" at 3 levels:

1. Database Level: The "character set" defined at the database level applies to all text columns of all tables in this database. Here are the sample SQL commands to set and view database level character set:

CREATE DATABASE <database_name> CHARACTER SET utf-8;
SHOW CREATE DATABASE <database_name>;

2. Table Level: The "character set" defined at the table level applies to all text columns in this table and overrides the database level setting. Here are the sample SQL commands to set and view table level character set:

CREATE TABLE <table_name> (...) CHARACTER SET gbk;
SHOW CREATE TABLE <table_name>;

3. Column Level: The "character set" defined at the column level applies to this column only and overrides the table level and database level setting. Here are the sample SQL commands to set and view column level character set:

CREATE TABLE <table_name> (<column_name> VARCHAR(80) CHARACTER SET gbk, ...);
SHOW CREATE TABLE <table_name>;

Sections in This Chapter

Specifying Character Set for Text Columns

Creating a Table with Multiple Character Sets

Checking Character Set Setting

Storing ASCII Characters in Non-ASCII Columns

Storing Non-ASCII Characters with Encoded Bytes

Transmitting Non-ASCII Characters between PHP and MySQL

Viewing Character Set Variables

Non-ASCII Test Result Analysis

Fetching Non-ASCII Text from MySQL

Dr. Herong Yang, updated in 2007
Specifying Character Set for Text Columns