Unicode Tutorials - Herong's Tutorial Examples - v5.31, by Herong Yang
Saving Files in UTF-8 Option
This section provides a tutorial example on how to save text files with Nodepad by selecting the UTF-8 encoding option on the save file dialog box.
After testing the Notepad open function, now I want to test the save function with the UTF-8 encoding.
1. Run Notepad and open hello.utf-8 correctly with the UTF-8 encoding option selected.
2. Click the File > "Save As" menu. The "Save As" dialog box comes up.
3. Enter notepad_utf-8 as the new file name and select UTF-8 option in the Encoding field.
4. Click the Save button. Notepad saves the text to a new file named as: notepad_utf-8.txt.
5. To see how my text is saved by Notepad, I need to run my HEX dump program on notepad_utf-8.txt:
C:\herong\unicode>java HexWriter notepad_utf-8.txt notepad_utf-8.hex Number of input bytes: 107 C:\herong\unicode>type notepad_utf-8.hex EFBBBF48656C6C6F20636F6D70757465 7221202D20456E676C6973680D0AE794 B5E88491E4BDA0E5A5BDEFBC81202D20 53696D706C6966696564204368696E65 73650D0AE99BBBE885A6E4BDA0E5A5BD EFB997202D20547261646974696F6E61 6C204368696E6573650D0A
5. To compare the UTF-8 text file created by Notepad with my original UTF-8 file, I need to run my HEX dump program on hello.utf-8:
C:\herong\unicode>java HexWriter hello.utf-8 hello_utf-8.hex Number of input bytes: 104 C:\herong\unicode>type hello_utf-8.hex 48656C6C6F20636F6D70757465722120 2D20456E676C6973680D0AE794B5E884 91E4BDA0E5A5BDEFBC81202D2053696D 706C6966696564204368696E6573650D 0AE99BBBE885A6E4BDA0E5A5BDEFB997 202D20547261646974696F6E616C2043 68696E6573650D0A
The UTF-8 text file saved by Notepad is identical to my original UTF-8 text file except for those 3 bytes in the beginning, "EFBBBF". If we ignore "EFBBBF", we can say that Notepad saves UTF-8 text file correctly.
So what is this "EFBBBF" and why it is added? See the next section for a brief explanation.
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
►Using Notepad as a Unicode Text Editor
Byte Order Mark (BOM) - FEFF - EFBBBF
Saving Files in "Unicode Big Endian" Option
Saving Files in "Unicode" Option
Supported Save and Open File Formats
Using Microsoft Word as a Unicode Text Editor
Using Microsoft Excel as a Unicode Text Editor
Unicode Code Point Blocks: 0000 - 0FFF
Unicode Code Point Blocks: 1000 - FFFF
Unicode Code Point Blocks: 10000 - 11FFF