Unicode Tutorials - Herong's Tutorial Notes
∟UTF-8 (Unicode Transformation Format - 8-Bit)
∟UTF-8 Encoding Algorithm
This section provides a tutorial example on how to write a programming algorithm to encode characters with UTF-8 encoding.
Here is an algorithm for UTF-8 encoding on a single character:
Input: unsigned integer c - the code point of the character to be encoded Output: byte b1, b2, b3, b4 - the encoded sequence of bytes Algorithm: if (c<0x80) b1 = c>>0 & 0x7F | 0x00 b2 = null b3 = null b4 = null else if (c<0x0800) b1 = c>>6 & 0x1F | 0xC0 b2 = c>>0 & 0x3F | 0x80 b3 = null b4 = null else if (c<0x010000) b1 = c>>12 & 0x0F | 0xE0 b2 = c>>6 & 0x3F | 0x80 b3 = c>>0 & 0x3F | 0x80 b4 = null else if (c<0x110000) b1 = c>>18 & 0x07 | 0xF0 b2 = c>>12 & 0x3F | 0x80 b3 = c>>6 & 0x3F | 0x80 b4 = c>>0 & 0x3F | 0x80 end if
Exercise: Write an algorithm to decode a UTF-8 encoded byte sequence.
Table of Contents
About This Book
Character Sets and Encodings
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
Unicode Character Set
►UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-8 Encoding
►UTF-8 Encoding Algorithm
Features of UTF-8 Encoding
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Character Encoding in Java
Character Set Encoding Maps
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor
Using Microsoft Excel as a Unicode Text Editor
References
Printable Copy - PDF Version