<draft-ietf-iiir-html-01.txt, .ps> to be deleted.

W. Eliot Kimber (drmacro@vnet.IBM.COM)
Tue, 15 Feb 1994 13:24:16 --100


Ref: Your note of Mon, 14 Feb 1994 18:57:33 -0600 (attached)

At the risk of sidetracking what appears to be a productive dialog
with pedantry, I think it's important to clarify a few points that
I think are central to the issue of making HTML+ a real SGML
application.

| >> I think the HTML-Plus does a good job of getting a lot of interesting
| >> issues on the table, but it's approach of throwing all the stuff into
| >> one DTD, and making the DTD extensible (thereby forcing clients to
| >> know how to _parse_ SGML DTD's) is a little off track.
| >
| >Actually, once you state that HTML is an SGML format, then formally each
| >document can extend the DTD.
|
| Nope. I took great pains in the specification to prevent WWW clients
| from having to deal with anything but _instances_ of the DTD I wrote:
|
| <!-- Regarding clause 6.1, SGML Document:
|
| [1] SGML document = SGML document entity,
| (SGML subdocument entity |
| SGML text entity | non-SGML data entity)*
|
| The role of SGML document entity is filled by this DTD,
| followed by the conventional HTML data stream.
| -->

I'm afraid that on this point there can be no compromise. If a document
is an SGML document then it *must* start with a DOCTYPE declaration
and include the document element *in the same entity*. The definition
of SGML document entity is quite clear on this. In fact, it is impossible
to know whether or not a given stream of data is valid SGML *unless*
there is a doctype declaration (and an SGML declaration, which
may be implied by the processing system).

Therefore, if you have an SGML application it *must* parse DTDs. There
are certain constraints you can apply, such as only recognizing certain
GIs (and issuing application error messages when other GIs are declared),
but you must be able to resolve entity references. Anything less is
"SGML-like".

But this shouldn't be a problem because parsing DTDs is about the
only really easy part of SGML, and building an entity resolution table
is easy. Having written crude parsers myself in Rexx and C, I have
a hard time buying the argument that such parsing is a burdensome
requirement to place on browsers. Surely real-time formatting is a
much harder problem to solve. Note that there is no requirement that
a conforming parser be a *validating* parser, nor that it support
the optional features of OMITTAG, SHORTTAG, and the like.

Note that given this requirement, you can cheat a little and refuse
to parse anything except entity declarations in DTD subsets. In the
interest of compromise I, for one, would look the other way, for what
it's worth. I don't think it's unreasonable for an application like
Mosaic, which is trying to stay as simple and easy to implement as
possible, to say "look, we just can't handle new element declarations
at the document level--the DTD's built in", as long as it does
require and recognize the doctype declaration. This would make
a minimal HTML+ document something like:

<!DOCTYPE HTMLPlus PUBLIC "-//WWW//DTD HTMLPlus/version 0.0.0//EN">
<HTMLPlus>
...
</HTMLPlus>

Note that by using the public identifier for the DTD, you can define
the mapping of the public ID to the "public part of the DTD" to be
hard-coded and unchanging to the parsing algorithms built into the
browser rather than to a literal file containing element declarations
(this is analagous to applications using a "compiled" DTD). I see
no difference between a DTD that is compiled into an application-specific
object and a DTD that is compiled into procedural code.

Certainly I would prefer to see the browsers be more general, but I
am willing to admit (if grudgingly) that practical considerations
may outweigh concerns for complete correctness and generality.

| >I have investigated HyTime compliance with Yuri Rubinsky and Elliot Kimber
| >(Dr Macro), and know how to add this in. At the moment though, most people
| >in the WWW community see little value in switching to a model which forces
| >you to declare hypertext links at the start of the document.
|
| There are ways to exploid HyTime without using <!ENTITYs for all
| links. More on that later too...

I'd like to know what these ways are. The use of entities for hyperlinking
is really a base function of SGML that HyTime merely exploits. In other
words, a reference to a separate document in an SGML context can only
(in the view of the pedant) be expressed as a data entity reference (by
the rule that if a mechanism exists in the standard, you must use that
mechanism). This is a basic fundamental of SGML. The one-level indirection
provided by entity-name/system-ID mapping is essential to any hope of
system and application independence for data, with the two-level indirection
of entity-name/public-ID/system-ID crucial for complete system and
application independence and wide interoperation.

You could, I suppose, use notation locations (notloc) in place of
entity references, but by my "in the standard" rule, the pedant in
me would have to object. Thus you could cast a URL-based link
as something like:

<p>See
<notloc id=book-1-loc notation=URL>ULR//FTP::/a/b/c/</notloc>
<a target=book-1-loc>this book</a> for more

Where the Notloc element is an inclusion at the document element,
and thus valid anywhere and otherwise transparent to the main content
processing routine (not to mention that record ends caused purely
by the notloc are not treated as data). Note that HyTime does let
you specify the constraint that forward references are not allowed,
removing the need to do lookahead to resolve references. Or you
could instead make Notloc only valid within the link elements themselves,
ensuring that its in a predictable place (there's no real functional
difference between an attribute and a required subelement).

Once you can resolve an entity reference, resolving a reference to a
particular location within a referenced entity isn't that much more
difficult (you merely pass two values to your resolution function rather
than one). If you can link to IDs within the document you're browsing,
you should be able to link to IDs within any referenced document
entity with equal ease. Having done this, you've implemented as much
HyTime as you'll need for most online browsers. An indirect HyTime
link to an ID in another document is nothing more than a two-element
address where one element is the entity name (resolved to a system location)
and the other element is a target ID. Any system that does cross-book
linking to specific locations must already support some form of two-element
address, so doing it in a HyTime-compliant fashion must only involve
mapping the HyTime syntax onto the existing mechanism. You may also need
to define constraints on the complexity of HyTime expressions supported,
which is reasonable and expected (which is one of the reasons there are
so many options in HyTime).

<aside>
Please note that my e-mail address has changed. As of 2/14 I am
no longer an IBM employee. I can be reached at kimber@passage.com
or drmacro@aol.com.
</aside>

--
<address id=drmacro HyTime=bibloc>
Eliot Kimber                      Internet: kimber@passage.com
Passage Systems                      Phone: 1-512-339-3618
9971 Quail Bldv, Suite 903     AltInternet: drmacro@aol.com
Austin, TX 78758
</address>