Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

UTF-32, UTF-32BE and UTF-32LE Encodings

This chapter provides notes and tutorial examples on UTF-32, UTF-32BE and UTF-32LE encodings. Topics including encoding and decoding logics of UTF-32, UTF-32BE and UTF-32LE encodings; explanation of the use of BOM (Byte Order Mark).

UTF-32 Encoding

UTF-32BE Encoding

UTF-32LE Encoding

Conclusions:

  • UTF-32, UTF-32BE and UTF-32LE encodings are all fixed-length 32-bit (4-byte) Unicode character encodings.
  • Output byte streams of UTF-32 encoding may have 3 valid formats: Big-Endian without BOM, Big-Endian with BOM, and Little-Endian with BOM.
  • UTF-32BE encoding is identical to the Big-Endian without BOM format of UTF-32 encoding.
  • UTF-32LE encoding is identical to the Little-Endian with BOM format of UTF-32 encoding without using BOM.
  • Unicode Standard Annex #19 - UTF-32 gives quick and precise definitions of UTF-32, UTF-32BE and UTF-32LE encodings.

Dr. Herong Yang, updated in 2009
UTF-32, UTF-32BE and UTF-32LE Encodings