Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

Supported Save and Open File Formats

This section provides a quick summary on Notepad saving and opening Unicode files correctly with the BOM character prepended. But Notepad failed to open Unicode big endian files without the BOM character prepended.

Now we learned that Notepad saves Unicode text files in 3 encoding formats:

  • UTF-8 format - Text files saved in UTF-8 byte sequences with BOM, 0xEFBBBF, prepended.
  • Unicode big endian format - Text files saved in UTF-16 byte sequences in Big-Endian with BOM format.
  • Unicode format - Text files saved in UTF-16 byte sequences in Little-Endian with BOM format.

Notepad can open Unicode text files in 5 encoding formats,

  • UTF-8 format - Text files opened with encoding format automatically detected.
  • UTF-8 with BOM format - Text files opened with encoding format automatically detected.
  • UTF-16 (Big-Endian with BOM) - Text files opened with encoding format automatically detected.
  • UTF-16 (Little-Endian with BOM) - Text files opened with encoding format automatically detected.
  • UTF-16LE format - Text files opened with encoding format automatically detected.

Notepad can not open Unicode text files in UTF-16BE encoding format correctly.

Sections in This Chapter

What Is Notepad?

Opening UTF-8 Text Files

Opening UTF-16BE Text Files

Opening UTF-16LE Text Files

Saving Files in UTF-8 Option

Byte Order Mark (BOM) - FEFF - EFBBBF

Saving Files in "Unicode Big Endian" Option

Saving Files in "Unicode" Option

Supported Save and Open File Formats

Dr. Herong Yang, updated in 2009
Supported Save and Open File Formats