Unicode Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 5.00

UTF-8 (Unicode Transformation Format - 8-Bit)

This chapter provides notes and tutorial examples on UTF-8 encoding. Topics including introduction of UTF-8 encoding; examples of encoded byte stream; UTF-8 encoding algorithm.

UTF-8 Encoding

UTF-8 Encoding Algorithm

Features of UTF-8 Encoding

Conclusions:

  • UTF-8 encoding is a variable-length 8-bit (1-byte) Unicode character encodings.
  • UTF-8 is compatible with ASCII encoding. It is very efficient for Western language characters.
  • UTF-8 is not so efficient for CJK (Chinese, Japanese and Korean) language characters, which are encoded into 3 bytes per character most of the time.
  • The maximum number encoded bytes is 4 for characters in the latest version of Unicode character set - Unicode 5.0.
  • UTF-8, a transformation format of ISO 10646 gives official specifications of UTF-8 encoding.

Dr. Herong Yang, updated in 2009
UTF-8 (Unicode Transformation Format - 8-Bit)