Unicode Tutorials - Herong's Tutorial Examples - v5.31, by Herong Yang
This section provides a quick introduction of the UTF-16BE (Unicode Transformation Format - 16-bit Big Endian) encoding for Unicode character set. UTF-16BE is a variation of UTF-16.
UTF-16BE: A character encoding that maps code points of Unicode character set to a sequence of 2 bytes (16 bits). UTF-16BE stands for Unicode Transformation Format - 16-bit Big Endian.
Here is my understanding of the UTF-16BE specification. When UTF-16BE encoding is used to encode (serialize) Unicode characters into a byte stream for communication or storage, the resulting byte stream is identical to the Big-Endian without BOM Format of the UTF-16 encoding.
For example, these 3 Unicode characters, U+004D, U+0061 and U+10000 will be converted into 0x004D0061D800DC00 when UTF-16BE is used.
When UTF-16BE encoding is used to decode (deserialize) a byte stream into Unicode characters, the entire stream will be divided into blocks of 2 bytes. Each block is converted to a 16-bit integer assuming the most significant byte first. Then process the converted integer stream as described below:
Note that the use of BOM (Byte Order Mark) is not part of the UTF-16BE specification. So you should:
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
►UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor
Using Microsoft Excel as a Unicode Text Editor
Unicode Code Point Blocks: 0000 - 0FFF
Unicode Code Point Blocks: 1000 - FFFF
Unicode Code Point Blocks: 10000 - 11FFF
Unicode Code Point Blocks: 12000 - 10FFFF