Unicode Tutorials - Herong's Tutorial Examples - Version 5.20, by Dr. Herong Yang
This section provides a quick introduction of the UTF-16LE (Unicode Transformation Format - 16-bit Little Endian) encoding for Unicode character set. UTF-16LE is a variation of UTF-16.
UTF-16LE: A character encoding that maps code points of Unicode character set to a sequence of 2 bytes (16 bits). UTF-16LE stands for Unicode Transformation Format - 16-bit Little Endian.
Here is my understanding of the UTF-16LE specification. When UTF-16LE encoding is used to encode (serialize) Unicode characters into a byte stream for communication or storage, the resulting byte stream is identical to the Little-Endian with BOM Format of the UTF-16 encoding except that BOM is not prepended to the byte stream.
For example, these 3 Unicode characters, U+004D, U+0061 and U+10000 will be converted into 0x4D00610000D800DC when UTF-16LE is used.
When UTF-16LE encoding is used to decode (deserialize) a byte stream into Unicode characters, the entire stream will be divided into blocks of 2 bytes. Each block is converted to a 16-bit integer assuming the least significant byte first. Then process the converted integer stream as described below:
Note that the use of BOM (Byte Order Mark) is not part of the UTF-16LE specification. So you should:
Last update: 2009.
Table of Contents