Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
UTF-8 Encoding Algorithm
This section provides a tutorial example on how to write a programming algorithm to encode characters with UTF-8 encoding.
Here is an algorithm for UTF-8 encoding on a single character:
Input: unsigned integer c - the code point of the character to be encoded Output: byte b1, b2, b3, b4 - the encoded sequence of bytes Algorithm: if (c<0x80) b1 = c>>0 & 0x7F | 0x00 b2 = null b3 = null b4 = null else if (c<0x0800) b1 = c>>6 & 0x1F | 0xC0 b2 = c>>0 & 0x3F | 0x80 b3 = null b4 = null else if (c<0x010000) b1 = c>>12 & 0x0F | 0xE0 b2 = c>>6 & 0x3F | 0x80 b3 = c>>0 & 0x3F | 0x80 b4 = null else if (c<0x110000) b1 = c>>18 & 0x07 | 0xF0 b2 = c>>12 & 0x3F | 0x80 b3 = c>>6 & 0x3F | 0x80 b4 = c>>0 & 0x3F | 0x80 end if
Exercise: Write an algorithm to decode a UTF-8 encoded byte sequence.
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
►UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor