Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

UTF-16, UTF-16BE and UTF-16LE Encodings

This chapter provides notes and tutorial examples on UTF-16, UTF-16BE and UTF-16LE encodings. Topics including encoding and decoding logics of UTF-16, UTF-16BE and UTF-16LE encodings; introduction of surrogate pairs; explanation of the use of BOM (Byte Order Mark).

What Are Paired Surrogates?

UTF-16 Encoding

UTF-16BE Encoding

UTF-16LE Encoding

Conclusions:

  • UTF-16, UTF-16BE and UTF-16LE encodings are all variable-length 16-bit (2-byte) Unicode character encodings.
  • Output byte streams of UTF-16 encoding may have 3 valid formats: Big-Endian without BOM, Big-Endian with BOM, and Little-Endian with BOM.
  • UTF-16BE encoding is identical to the Big-Endian without BOM format of UTF-16 encoding.
  • UTF-16LE encoding is identical to the Little-Endian with BOM format of UTF-16 encoding without using BOM.
  • UTF-16, an encoding of ISO 10646 gives official specifications of UTF-16, UTF-16BE and UTF-16LE encodings.

Dr. Herong Yang, updated in 2009
UTF-16, UTF-16BE and UTF-16LE Encodings