Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
Shift-JIS Encoding
This section provides a quick introduction of Shift-JIS, also called MS Kanji, encoding, which maps a JIS X0208 character to a 2-byte sequence using a complicated schema designed by Microsoft.
Shift-JIS: An encoding for JIS X0208 character set. It is a 8-bit encoding with 1 to 2 bytes per character:
Number Of Valid Range Bytes Byte 1 Byte 2 1 0x21 - 0x7F (for ASCII) 1 0xA1 - 0xDF (for Katakana) 2 0x81 - 0x9F 0x40 - 0x7E 2 0xE0 - 0xEF 0x80 - 0xFC
Shift-JIS, also called MS Kanji, is a Microsoft standard (codepage 932). The encoding schema is not straightforward. Please read http://en.wikipedia.org/wiki/Shift_JIS for more details.
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
►JIS X0208 Character Set and Encodings
JIS X0208 Character Set for Japanese Characters
JIS X0208 Character Code Values
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor