Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
UTF-32LE Encoding
This section provides a quick introduction of the UTF-32LE (Unicode Transformation Format - 32-bit Big Endian) encoding for Unicode character set.
UTF-32LE: A character encoding schema that maps code points of Unicode character set to a sequence of 4 bytes (32 bites). UTF-32LE stands for Unicode Transformation Format - 32-bit Little Endian.
Here is my understanding of the UTF-32LE specification. When UTF-32LE encoding is used to encode (serialize) Unicode characters into a byte stream for communication or storage, the code point of each character will be converted as a 32-bit integer into 4 bytes with the least significant byte first.
For example, these 3 Unicode characters, U+004D, U+0061 and U+10000 will be converted into 0x4D0000006100000000000100 when UTF-32LE is used.
When UTF-32LE encoding is used to decode (deserialize) a byte stream into Unicode characters, the entire stream will be divided into blocks of 4 bytes. Each block is converted to a 32-bit integer to represent a Unicode code point assuming the least significant byte first.
Note that the use of BOM (Byte Order Mark) is not part of the UTF-32LE specification. So you should:
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
►UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor