Unicode Tutorials - Herong's Tutorial Examples - Version 5.21, by Dr. Herong Yang
Saving Files in "Unicode (UTF-8)" Option
This section provides a tutorial example on how to save text files with Nodepad by selecting the 'Unicode (UTF-8)' encoding option on the file conversion dialog box.
After testing the Word open function, now I want to test the save function with the "Unicode (UTF-8)" option.
1. Run Word and open hello.utf-8 correctly with the Unicode (UTF-8) encoding option selected.
2. Click the File > "Save As" menu. The "Save As" dialog box comes up.
3. Enter word_utf-8.txt as the new file name and select the "Plain Text (*.txt)" option in the "Save as Type" field. See the picture below:
4. Click the Save button. The File Conversion dialog box comes up.
5. Click the "Other encoding" radio button and select the "Unicode (UTF-8)" option.
6. Click the OK button. Word saves the text to a new file named as: word_utf-8.txt.
7. To see how my text is saved by Word, I need to run my HEX dump program on word_utf-8.txt:
C:\herong\unicode>java HexWriter word_utf-8.txt word_utf-8.hex Number of input bytes: 107 C:\herong\unicode>type word_utf-8.hex EFBBBF48656C6C6F20636F6D70757465 7221202D20456E676C6973680D0AE794 B5E88491E4BDA0E5A5BDEFBC81202D20 53696D706C6966696564204368696E65 73650D0AE99BBBE885A6E4BDA0E5A5BD EFB997202D20547261646974696F6E61 6C204368696E6573650D0A
The UTF-8 text file saved by Word is identical to my original UTF-8 text file except for those 3 bytes in the beginning, "EFBBBF". If we ignore "EFBBBF", we can say that Word saves UTF-8 text file correctly.
Of course, we know why Word prepends "EFBBBF" to the text file. "EFBBBF" is the UTF-8 sequence of the BOM character U+FEFF". See the previous chapter for more information.
Table of Contents