Unicode Tutorials - Herong's Tutorial Examples

https://www.herongyang.com/Unicode

Copyright © 1995-2024 Herong Yang. All rights reserved.

Unicode Tutorials This Unicode tutorial book is a collection of notes and sample codes written by the author while he was learning Unicode himself. Topics include Character Sets and Encodings; GB2312/GB18030 Character Set and Encodings; JIS X0208 Character Set and Encodings; Unicode Character Set; Basic Multilingual Plane (BMP); Unicode Transformation Formats (UTF); Surrogates and Supplementary Characters; Unicode Character Blocks; Python Support of Unicode Characters; Java Character Set and Encoding; Java Encoding Maps, Counts and Conversion. Updated in 2024 (Version v5.32) with minor changes.

Table of Contents

About This Book

Character Sets and Encodings

What Is Character Set

Commonly Used Character Sets and Encodings

ASCII Character Set and Encoding

What Is ASCII

Listing of ASCII Characters and Encoded Bytes

GB2312 Character Set and Encoding

GB2312 Character Set for Chinese Characters

GB2312 Encoding for GB2312 Character Set

Relation of GB2312 and Unicode

GB18030 Character Set and Encoding

History of GB Character Sets

GB18030 Encoding for GB18030 Character Set

JIS X0208 Character Set and Encodings

JIS X0208 Character Set for Japanese Characters

JIS X0208 Character Code Values

EUC-JP Encoding

ISO-2022-JP Encoding

Shift-JIS Encoding

Unicode Character Set

What Is Unicode

Examples of Unicode Characters

Unique Features of Unicode

Unicode Standard Releases

Code Point Blocks

Unicode 13.0 Character Samples

Unicode 8.0 Character Samples

Unicode 7.0 Character Samples

Unicode 6.0 Character Samples

Unicode 5.0 Character Samples

Unicode 4.0 Character Samples

UTF-8 (Unicode Transformation Format - 8-Bit)

UTF-8 Encoding

UTF-8 Encoding Algorithm

Features of UTF-8 Encoding

UTF-16, UTF-16BE and UTF-16LE Encodings

What Are Paired Surrogates

UTF-16 Encoding

UTF-16BE Encoding

UTF-16LE Encoding

UTF-32, UTF-32BE and UTF-32LE Encodings

UTF-32 Encoding

UTF-32BE Encoding

UTF-32LE Encoding

Python Language and Unicode Characters

Summary of Unicode Support in Python

Python Source Code Encoding

Unicode Support on "str" Data Type

Unicode Character Encoding and Decoding

"unicodedata" Module for Unicode Properties

Java Language and Unicode Characters

Unicode Versions Supported in Java History

'int' and 'String' - Basic Data Types for Unicode

"Character" Class with Unicode Utility Methods

Character.toChars() - "char" Sequence of Code Point

Character.getNumericValue() - Numeric Value of Code Point

"String" Class with Unicode Utility Methods

String.length() Is Not Number of Characters

String.toCharArray() Returns the UTF-16BE Sequence

String Literals and Source Code Encoding

Character Encoding in Java

What Is Character Encoding

List of Supported Character Encodings in Java

EncodingSampler.java - Testing encode() Methods

Examples of CP1252 and ISO-8859-1 Encodings

Examples of US-ASCII, UTF-8, UTF-16 and UTF-32 Encodings

Examples of GB18030 Encoding

Testing decode() Methods

Character Set Encoding Maps

Character Set Encoding Map Analyzer

Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1

Character Set Encoding Maps - CP1252/Windows-1252

Character Set Encoding Maps - Unicode UTF-8

Character Set Encoding Maps - Unicode UTF-16, UTF-16BE, UTF-16LE

Character Set Encoding Maps - Unicode UTF-32, UTF-32BE, UTF-32LE

Character Counter Program for Any Given Encoding

Character Set Encoding Comparison

Encoding Conversion Programs for Encoded Text Files

\uxxxx - Entering Unicode Data in Java Programs

HexWriter.java - Converting Encoded Byte Sequences to Hex Values

EncodingConverter.java - Encoding Conversion Sample Program

Viewing Encoded Text Files in Web Browsers

Unicode Signs in Different Encodings

Using Notepad as a Unicode Text Editor

What Is Notepad

Opening UTF-8 Text Files

Opening UTF-16BE Text Files

Opening UTF-16LE Text Files

Saving Files in UTF-8 Option

Byte Order Mark (BOM) - FEFF - EFBBBF

Saving Files in "Unicode Big Endian" Option

Saving Files in "Unicode" Option

Supported Save and Open File Formats

Using Microsoft Word as a Unicode Text Editor

What Is Microsoft Word

Opening UTF-8 Text Files

Opening UTF-16BE Text Files

Opening UTF-16LE Text Files

Saving Files in "Unicode (UTF-8)" Option

Saving Files in "Unicode (Big-Endian)" Option

Saving Files in Unicode Option

Supported Save and Open File Formats

Using Microsoft Excel as a Unicode Text Editor

What Is Microsoft Excel

Opening UTF-8 Text Files

Opening UTF-16BE Text Files

Opening UTF-16LE Text Files

Saving UTF-8 Text Files

Saving Files in "Unicode Text (*.txt)" Option

Opening UTF-16 Text Files

Supported Save and Open File Formats

Unicode Fonts

What Is a Font

What Is a Unicode Font

Downloading and Installing GNU Unifont

Windows Tool "Character Map"

Archived Tutorials

Archived: EncodingSampler.java - BMP Character Encoding

References

Full Version in PDF/EPUB

Keywords: Unicode, Universal, Character, Encoding, Tutorial, Book