Evolution of HTML and other specs [Was: Browser support of HTML 2.0 ]

Daniel W. Connolly (connolly@hal.com)
Thu, 26 Jan 1995 19:19:56 +0100


In message <950125150731.5a@plewe.cit.buffalo.edu>, Brandon Plewe writes:
>While I salivate for HTML 3, and watch everyone clamor for this or that to
>be supported, I am reminded that currently, most browsers don't even support
>all of HTML 2 as it was designed (or as the spec says it should be handled).

The key there is "should," and not "must."

Your suggestions are perfectly reasonable "enhancement requests" of
current browsers, but they can't reasonably be called "defect
reports."

>I don't mean to sound argumentative, or negative about the current browsers.
>I
>think they're wonderful--I'm just wondering if there are plans for these
>"already existing" features; and if not, why are they in the RFC?

Already existing in what way?

I agree that "suggested rendering" is a tricky part of the RFC.

I'm a minimalist. My original proposal for an HTML specification was
to specify _only_ the syntax; i.e., enough information to decide
whether a sequence of characters constitutes a valid HTML document or
not, and if it does, enough info to make some sort of parse tree out
of it. (It was about 10 pages long, way back in Jan 1993.) So the
first step would be to get all the various implementations to agree
how to parse comments and attribute values, for example.

The question of what you _do_ with the resulting parse tree would
be outside the scope of that specification.

But folks weren't happy with that. They wanted something that says
"The H1 tag should be in a bigger font and have space above and
below." Blech. Now you're talking about a browser specification, not a
specification of the HTML language. And until you have some formalism
like DSSSL, it's very difficult to specify this sort of behaviour with
any level of rigor.

And they wanted a specification of ISMAP behaviour, and how you encode
forms data in application/x-www-urlencoded format, and ...

In the future, I'm going to push very hard to split the existing HTML
specification into a set of smaller, more concise RFCs:

"HTML Syntax"
-- how to decide whether a sequence of characters is
a valid HTML document, and if so, how to create
a parse tree.

"Interpretation of HTML Idioms"
-- an informal description of the meaning and suggested
rendering of an HTML parse tree.

"The text/html Internet Media Type"
-- registration of HTML as a MIME type. Charset issues.
Newline Issues. Appendices specifically addressing
SMTP transport and HTTP transport issues. Security
issues.

"World-Wide Web User Agents"
Specific techniques: basic HREF links, ISINDEX, FORMS, ISMAP,
.mailcap, $WWW_HOME, mailto:, proxies, security issues.
Suggestions for documentation, default configuration, etc.

"World-Wide Web Hypermedia Architecture"
-- formal discussion of the WWW hypertext model: documents,
anchors, links, searching.
Formal discussion of common abstractions from ftp, http,
gopher, WAIS, etc. Definition of correct caching/proxy
behavior.

(All these in addition to the URL and HTTP RFCs. The Common Gateway
Interface (CGI) needs an official maintainer somewhere too.)

The job of revising the HTML 2.0 document to accomodate the proposed
HTML 3.0 features looks completely overwhelming. But as we revise the
2.0 spec, I suggest we split it up as above. The job of revising
any of the above document w.r.t. HTML 3.0 is a manageable task:

Here are the outstanding/upcoming issues I see in each area:

"HTML Syntax"
-- DTD changes for new elements.
ISO character entities, and how they show up in
the parse tree.
Perhaps we allow <!entity > declarations,
marked sections, and a few other SGML syntactic idioms.
RAST representation of parse tree for conformance
testing.

"Interpretation of HTML Idioms"
Table rendering. Figures. Super/subscript. DSSSL-Lite.
Toolbars (next/previous/up). Vendor- and
application-specific extensions.

"The text/html Internet Media Type"
Character sets, versions, levels, format negociation
issues. Vendor- and applicatoin-specific extensions.

"World-Wide Web User Agents"
File upload. Embeded presentation. Mandatory display
of copyrights. Display of security information.
Desktop message bus (CCI/OLE/Tooltalk/AppleEvents).
Distributed editing, annotation,
and other forms of collaboration. Perhaps advances
in resource discovery technology (e.g. harvest,
verity) will have user interface implications.

"World-Wide Web Hypermedia Architecture"
link relationships, embedding, compound
doucment architecture. the web as a knowledge base.
isomorphisms with HyTime. Publishing model (URNs/URCs,
copyright, payment, replication, authentication,
access control).

HTTP:
Security. Variations on Proxy: no-cache.
Session management, and application-level
packets. Transactions.
Desktop message-bus, UDP version of the protocol.

Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
<connolly@hal.com> http://www.hal.com/%7Econnolly