Re: Baffling math problems [Was: HTML 3.0 DTD ]

Daniel W. Connolly (connolly@hal.com)
Mon, 12 Dec 1994 12:47:40 -0600


In message <9412121829.AA11308@dragget.hpl.hp.com>, Dave Raggett writes:
>Dan,
>
>Thanks for checking the details.
>
>I am still uncertain about how best to handle the latin-1 entities.
>I changed the name from %ISOlat1 to %HTMLlat1 following a suggestion
>by Terry (or was it Paul?). I would expect this file to include entity
>names for the Latin-1 character codes below 128 and hence would include
>&amp; and &quot; etc. Why were these omitted from the 2.0 spec?

Take care not to confuse the "Added Latin 1" entity set (from an
appendix to the SGML spec, ISO8879) with the Latin 1 character set
(defined by ISO-8859-1).

&amp; and &quot are not in the "Added Latin 1" entity set -- they're
in the iso-num set ("ISO 8879-1986//ENTITIES Numeric and Special
Graphic//EN"). But the rest of iso-num isn't used in HTML, so the few
definitions for amp, quot, lt, etc. are inlined in html.dtd.

The Added Latin 1 entity set defines a bunch of names for Latin 1
characters. The SGML spec appendix that defines it makes no reference
to the Latin 1 character set (ISO-8859-1). It maps those names to
these thingies called SDATA entities -- system dependent data
entities. I believe the intention is that the SDATA entities are
supposed to be replaced on a per-SGML-system basis. So you might
see TeX version of "ISO 8879-1986//ENTITIES Added Latin 1//EN", with:

<!ENTITY eacute SDATA "\eacute" -- for TeX -->

Since the document character set for HTML includes all the characters
referred to by those names, there's no need to use system-specific
mappings. The entities can be mapped to characters within the document
character set.

In response to the same feedback you saw, this set of definitions is
now called:

"ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"

See:
http://www.hal.com/%7Econnolly/html-spec/html-pubtext.html

for details.

Dan