Re: WWW and non-English (was ISO charsets; Unicode )

Peter Svanberg (psv@nada.kth.se)
Tue, 27 Sep 1994 11:46:08 +0100


Quoting: lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)
>
> The document at CERN specifies language codes as ISO 3316
> with optional ISO 639
> country codes to specify a national variant.
:
> Do they cover enough? For example can you specify things like
> Patagonian Welsh
> (ha!). What about historical languages? The country may be
> called something different, or (in the general case) the area
> of use of a language or language
> variant may not mesh well with a specific modern country.

This was discussed in the work with the Language Tag for MIME
etc. I mentioned (draft-ietf-mailext-lang-tag-00.txt). This
was the result (and I quote):

1. The Language tag

The language tag is composed of 2 parts: A language tag and a
subtag.

The syntax of this header is:

Language-Header ::= 'Content-language:' Language [',' Language]...
Language ::= ALPHA*8 [ '-' ALPHA*8 ]

The namespace of language tags and subtags is administered by the
IANA. The following registrations are predefined:

In the language tag:

- All 2-letter codes are interpreted according to ISO 639.

- All 3-letter codes are reserved for the (hopefully)
forthcoming revision to ISO 639

- The value "IANA" is reserved for IANA-defined
subregistrations

- The value "X" is reserved for private use. Subtags of "X"
will not be registered by the IANA.

- No other registration is allowed.

In the sublanguage tag:

- All 2-letter codes are interpreted as ISO 3166 country codes,
according to the rules laid down in ISO 639.

- Codes of 3 to 8 letters may be registered with the IANA by
anyone who feels a need for it. IANA has the right to reject
registrations that are felt to be misleading.

The information in the sublanguage tag may for instance be:

- Country identification, such as en-US (this usage is
described in ISO 639)

- Dialect information, such as no-NYNORSK or en-COCKNEY

- Languages not listed in ISO 639, which can be registered with
the IANA prefix, such as IANA-CHEROKEE

If multiple languages are used in the MIME body part, they are
listed with commas between them.

And, later

3. Usage examples

Examples of protocol usage of this header are:

- WWW selection of an appropriate version of information for
display, based on a profile for the user listing languages
that are understood

- MIME usage of alternate body parts in E-mail

So, *this* part of the work is already done, just to use.

---
Peter Svanberg				    Email: psv@nada.kth.se
Dept of Num An & CS,
Royal Inst of Tech			    Phone: +46 8 790 71 40
S-100 44  Stockholm, SWEDEN		    Fax:   +46 8 790 09 30