Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

Saving Files in "Unicode (UTF-8)" Option

This section provides a tutorial example on how to save text files with Nodepad by selecting the 'Unicode (UTF-8)' encoding option on the file conversion dialog box.

After testing the Word open function, now I want to test the save function with the "Unicode (UTF-8)" option.

1. Run Word and open hello.utf-8 correctly with the Unicode (UTF-8) encoding option selected.

2. Click the File > "Save As" menu. The "Save As" dialog box comes up.

3. Enter word_utf-8.txt as the new file name and select the "Plain Text (*.txt)" option in the "Save as Type" field. See the picture below:
Word Save Text File

4. Click the Save button. The File Conversion dialog box comes up.

5. Click the "Other encoding" radio button and select the "Unicode (UTF-8)" option.

6. Click the OK button. Word saves the text to a new file named as: word_utf-8.txt.

7. To see how my text is saved by Word, I need to run my HEX dump program on word_utf-8.txt:

C:\herong\unicode>java HexWriter word_utf-8.txt word_utf-8.hex
Number of input bytes: 107

C:\herong\unicode>type word_utf-8.hex
EFBBBF48656C6C6F20636F6D70757465
7221202D20456E676C6973680D0AE794
B5E88491E4BDA0E5A5BDEFBC81202D20
53696D706C6966696564204368696E65
73650D0AE99BBBE885A6E4BDA0E5A5BD
EFB997202D20547261646974696F6E61
6C204368696E6573650D0A

The UTF-8 text file saved by Word is identical to my original UTF-8 text file except for those 3 bytes in the beginning, "EFBBBF". If we ignore "EFBBBF", we can say that Word saves UTF-8 text file correctly.

Of course, we know why Word prepends "EFBBBF" to the text file. "EFBBBF" is the UTF-8 sequence of the BOM character U+FEFF". See the previous chapter for more information.

Sections in This Chapter

What Is Microsoft Word?

Opening UTF-8 Text Files

Opening UTF-16BE Text Files

Opening UTF-16LE Text Files

Saving Files in "Unicode (UTF-8)" Option

Saving Files in "Unicode (Big-Endian)" Option

Saving Files in Unicode Option

Supported Save and Open File Formats

Dr. Herong Yang, updated in 2009
Saving Files in "Unicode (UTF-8)" Option