Re: iso 8859 or escape sequencies?

Daniel W. Connolly (connolly@hal.com)
Tue, 12 Apr 1994 09:59:52 -0500


In message <9404121346.AA24247@freya.let.rug.nl>, Bert Bos writes:
>I think the question of ISO Latin-1 character entities in HTML can be
>summarized as follows:
>
> The following are all equivalent:
>
> 1) &ouml;
> 2) &#246;
> 3) the-8-bit-code-for-o-with-umlaut-that-my-mailer-refuses

True. The equivalence between (1) and (2) is via the definition
of ouml in the version of the ISOlat1 entity set used in HTML:
<!ENTITY ouml "&#246;" -- small o, dieresis or umlaut mark -->

The equivalence between (2) and (3) is via the document character set
in the SGML declaration:

BASESET "ISO Registration Number 100//CHARSET
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET 128 32 UNUSED
160 95 32
255 1 UNUSED

>I understand that HTTP is defined as 8-bit clean, but is the same true
>of HTML or HTML+? It should be, of course, but I don't think it is in
>the DTD. (I may be misreading the <!SGML declaration, though.)

The intent of the <!SGML declaration for HTML was so say "HTML is
defined in terms of the 8 bit characters set ISOLatin1." I think I
made a couple mistakes in expressing that. For example, sgmls complains
when I use &#255; in an HTML document. I think it's responding correctly
to the
255 1 UNUSED
line. I think it should be taken out. But I don't fully grok SGML
character set declarations yet, so I haven't nailed it down fully.

Dan