This section describes the first derived datatype of 'token': 'language'. Input strings are converted to 'token' values before they are matched against the '[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*' pattern.
In XSD 1.1 specification, "token" is used to derive several other built-in datatypes
for various specific applications, because it has a clean value set.
The first built-in datatype derived from "token" is "language". Let's look at it now.
"language" is a datatype derived from "token" datatype
by limiting values to those satisfy this regular expression pattern:
"[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*".
With this definition, not all sequences of characters are valid "language" lexical representations.
To validate and evaluate "language" lexical representations, you can use these 2 steps:
First, covert the input lexical representation into an intermediate "token" value.
Then, apply "[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*" on the "token" value.
If a match found, the input lexical representation is a valid "language" lexical representation and the "token" value is the "language" value.
If no match found, the input lexical representation is not a valid "language" lexical representation.
"language" datatype is designed primarily to support the "lang" attribute in XML 1.1 specification
to all users to specify the language in which the XML element is written.
Here is an example of a <p> element written in Great Britain English and US English:
<p xml:lang="en-GB">What colour is it?</p>
<p xml:lang="en-US">What color is it?</p>
Other supported language codes are defined in the "IETF BCP 47" standard.
Some examples are listed below:
en-US For US English
fr-CA For Canadian French
pt-BR For Brazilian Portuguese
zh-Hans For Chinese written in Simplified Chinese script
zh-Hant For Chinese written in Traditional Chinese script
nan-Hant-TW For Min Nan Chinese as spoken in Taiwan
The global attribute "lang" in HTML document is a good example of using "language" values.
Here is a sample XSD document that defines a sub element <Language> to use "language" values: