Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

Byte Order Mark (BOM) - FEFF - EFBBBF

This section provides a brief introduction on the Byte Order Mark (BOM) character, U+FEFF, used as the Unicode character stream signature when prepended to a character stream. The U+FEFF character becomes a 3-byte sequence of EFBBBF when encoded in UTF-8.

What Is BOM (Byte Order Mark)? BOM is the informal name of the special Unicode character U+FEFF "ZERO WIDTH NO-BREAK SPACE", when it is used to prepend to a stream of Unicode characters as a "signature". This signature tells the receiver of this stream to be ready to process Unicode characters and pay attention to the serialization order of the encoding octets.

When this BOM character, U+FEFF, is serialized in UTF-8 encoding, it becomes an octet sequence of EF BB BF (\xEFBBBF).

As you can see from the previous tutorial, Notepad prepends U+FEFF to the text and converted it to EFBBBF when saving the text in UTF-8 encoding. This is why I was getting these 3 extra bytes, EFBBBF, at the beginning of the saved UTF-8 text file.

With the introduction of the BOM character, now we need to ready to support two variations of UTF-8 text file formats:

  • UTF-8 text file with no leading BOM character.
  • UTF-8 text file with the leading BOM character.

Read RFC 3629, "UTF-8, a transformation format of ISO 10646", November 2003 at http://tools.ietf.org/html/rfc3629 for more information.

Prepending the BOM character to Unicode text files is recommended by RFC 3629.

Sections in This Chapter

What Is Notepad?

Opening UTF-8 Text Files

Opening UTF-16BE Text Files

Opening UTF-16LE Text Files

Saving Files in UTF-8 Option

Byte Order Mark (BOM) - FEFF - EFBBBF

Saving Files in "Unicode Big Endian" Option

Saving Files in "Unicode" Option

Supported Save and Open File Formats

Dr. Herong Yang, updated in 2009
Byte Order Mark (BOM) - FEFF - EFBBBF