JDK (Java Development Kit) Tutorials
Dr. Herong Yang, Version 5.00

Supported Character Encodings in JDK

This section provides a tutorial example on how to display a list character encodings supported by JDK with the java.nio.charset.Charset.availableCharsets() method.

JDK uses the java.nio.charset.Charset class to represent a character encoding, with both encode() method and decode() method. It also provides a method, availableCharsets(), to return all supported encodings. Here is a program to display all the supported character encodings in JDK:

/**
 * Encodings.java
 * Copyright (c) 2002 by Dr. Herong Yang
 */
import java.nio.charset.*;
import java.util.*;
class Encodings {
   public static void main(String[] arg) {
      SortedMap m = Charset.availableCharsets();
      Set k = m.keySet();
      System.out.println("Canonical name, Display name,"
         +" Can encode, Aliases");
      Iterator i = k.iterator();
      while (i.hasNext()) {
         String n = (String) i.next();
         Charset e = (Charset) m.get(n);
         String d = e.displayName();
         boolean c = e.canEncode();
         System.out.print(n+", "+d+", "+c);
         Set s = e.aliases();
         Iterator j = s.iterator();
         while (j.hasNext()) {
            String a = (String) j.next();         
            System.out.print(", "+a);
         }
         System.out.println("");
      }
   }
}

Output:

Canonical name, Display name, Can encode, Aliases
Big5, Big5, true, csBig5
Big5-HKSCS, Big5-HKSCS, true, big5-hkscs, Big5_HKSCS, big5hkscs
EUC-CN, EUC-CN, true
EUC-JP, EUC-JP, true, eucjis, x-eucjp, csEUCPkdFmtjapanese, eucjp, 
   Extended_UNIX_Code_Packed_Format_for_Japanese, x-euc-jp, euc_jp
euc-jp-linux, euc-jp-linux, true, euc_jp_linux
EUC-KR, EUC-KR, true, ksc5601, 5601, ksc5601_1987, ksc_5601, 
   ksc5601-1987, euc_kr, ks_c_5601-1987, euckr, csEUCKR
EUC-TW, EUC-TW, true, cns11643, euc_tw, euctw
GB18030, GB18030, true, gb18030-2000
GBK, GBK, true, GBK
ISCII91, ISCII91, true, iscii, ST_SEV_358-88, iso-ir-153, 
   csISO153GOST1976874
ISO-2022-CN-CNS, ISO-2022-CN-CNS, true, ISO2022CN_CNS
ISO-2022-CN-GB, ISO-2022-CN-GB, true, ISO2022CN_GB
ISO-2022-KR, ISO-2022-KR, true, ISO2022KR, csISO2022KR
ISO-8859-1, ISO-8859-1, true, iso-ir-100, 8859_1, ISO_8859-1, ISO8859_1,
   819, csISOLatin1, IBM-819, ISO_8859-1:1987, latin1, cp819, ISO8859-1,
   IBM819, ISO_8859_1, l1
ISO-8859-13, ISO-8859-13, true
ISO-8859-15, ISO-8859-15, true, 8859_15, csISOlatin9, IBM923, cp923, 
   923, L9, IBM-923, ISO8859-15, LATIN9, ISO_8859-15, LATIN0, 
   csISOlatin0, ISO8859_15_FDIS, ISO-8859-15
ISO-8859-2, ISO-8859-2, true
ISO-8859-3, ISO-8859-3, true
ISO-8859-4, ISO-8859-4, true
ISO-8859-5, ISO-8859-5, true
ISO-8859-6, ISO-8859-6, true
ISO-8859-7, ISO-8859-7, true
ISO-8859-8, ISO-8859-8, true
ISO-8859-9, ISO-8859-9, true
JIS0201, JIS0201, true, X0201, JIS_X0201, csHalfWidthKatakana
JIS0208, JIS0208, true, JIS_C6626-1983, csISO87JISX0208, x0208, 
   JIS_X0208-1983, iso-ir-87
JIS0212, JIS0212, true, jis_x0212-1990, x0212, iso-ir-159, 
   csISO159JISC02121990
Johab, Johab, true, ms1361, ksc5601_1992, ksc5601-1992
KOI8-R, KOI8-R, true
Shift_JIS, Shift_JIS, true, shift-jis, x-sjis, ms_kanji, shift_jis, 
   csShiftJIS, sjis, pck
TIS-620, TIS-620, true
US-ASCII, US-ASCII, true, IBM367, ISO646-US, ANSI_X3.4-1986, cp367, 
   ASCII, iso_646.irv:1983, 646, us, iso-ir-6, csASCII, ANSI_X3.4-1968,
   ISO_646.irv:1991
UTF-16, UTF-16, true, UTF_16
UTF-16BE, UTF-16BE, true, X-UTF-16BE, UTF_16BE, ISO-10646-UCS-2
UTF-16LE, UTF-16LE, true, UTF_16LE, X-UTF-16LE
UTF-8, UTF-8, true, UTF8
windows-1250, windows-1250, true
windows-1251, windows-1251, true
windows-1252, windows-1252, true, cp1252
windows-1253, windows-1253, true
windows-1254, windows-1254, true
windows-1255, windows-1255, true
windows-1256, windows-1256, true
windows-1257, windows-1257, true
windows-1258, windows-1258, true
windows-936, windows-936, true, ms936, ms_936
windows-949, windows-949, true, ms_949, ms949
windows-950, windows-950, true, ms950

Last update: 2006.

Sections in This Chapter

What Is Character Encoding?

Supported Character Encodings in JDK

Charset.encode() - Method to Encode Characters

Running EncodingSampler.java with CP1252 Encoding

Running EncodingSampler.java with ISO-8859-1 and US-ASCII

Running EncodingSampler.java with UTF-8, UTF-16, UTF16-BE

Running EncodingSampler.java with GB18030

Charset.decode() - Method to Decode Byte Sequences

Dr. Herong Yang, updated in 2008
Supported Character Encodings in JDK