Deploying new versions [Was: Versioning HTML at the server]

Daniel W. Connolly (connolly@hal.com)
Thu, 27 Oct 1994 16:40:27 -0500


In message <9410271613.AA20768@homer.spry.com>, cwilson@spry.com writes:
> Hey, if somebody comes up with a cool new
>feature, great, let's work it into HTML, if it can be agreed upon. If
>not, then it probably doesn't belong in HTML. Realistically speaking,
>we're not arguing about "experimental" tags here - I couldn't care
>less if someone puts a document on their server and serves it up as
>"text/html" with a few tags that they're trying out in their new
>browser. We're talking about the process by which HTML itself
>changes. I strongly believe that we MUST keep a strong core standard
>in order for the WWW to not degenerate into chaos in terms of document
>format standards ...

Yes, let's talk about this "process by which HTML itself changes"
... and this applies to HTTP as well, I think.

Chris: none of this is aimed at you. I'm mostly just thinking out
loud. It seems that it's not enough to have this format negociation
stuff sketched out on info.cern.ch -- we have to hash out the details
periodically on this list.

A careful reading of the WWW design documents shows that it was
designed with evolving distributed systems experience in mind.
Multiple formats are allowed. HTTP has versions. ...

But the existing mechanisms for evolving the WWW architecture just
don't seem to get the job done. Or maybe it's not the mechanisms as
designed, but how they are deployed that breaks down.

The criteria that I would use to test a mechanism for evolving the
protocols and data format is:

* Once a piece of software has been implemented correctly
with respect to version N, it should interoperate correctly
with other pieces of software which are implemented correctly
with respect to versions >= N.

I think the prime example of breaking this rule was the introduction
of FORMs. WWW browsers that used to be up-to-spec were suddenly
rendered broken because they didn't do forms. Had html-with-forms been
introduced as a new content type, they would have continued working
merrily along, and servers could automatically send the plain text,
email-reply version of a form to clients that didn't Accept:
text/html-with-forms.

OK, water under the bridge. But let's try not to do it again!

Most of the Mozilla extensions to HTML don't break anything. But if
CENTER can really be safely ignored, then you can't rely on it causing
a paragraph break. In other words, you shouldn't write:

<h1>heading</h1>

Some stuff.

<center>Some centered stuff</center>.

because on browsers that only support HTML 2.0, the two paragraphs
will be blurred together as one. You should to write:

<h1>heading</h1>

Some stuff.

<center><p>Some centered stuff</center>.

Tables are more like forms. The NCSA 2.5 browser should explicitly
Accept: text/html-ncsa-2.5 or some such, and there should be an easy
way for information providers to communicate to their server software
the fact that a given document has tables in it, like using a .thtml
extension. Granted, .thtml is a short-term hack that doesn't scale,
but it's better than breaking existing clients.

Eventually, server software should be enhanced to efficiently open the
file and find some magic cookie (like a <!DOCTYPE declaration...) in
the first few hundred bytes before deciding on the content type. Or
some other efficient, but easy-to-maintain format negociation
mechanism should be deployed. I don't have enough experience to design
an optimal solution right now, but that's no excuse for folks to go
breaking existing clients. (I'll say it again: don't break existing
clients!)

Now let's look at the HTTP version mechanism. It seems explicit enough
that it should provide interoperability across versions. But let's
look at real scenarios:

Suppose HTTP 2.0 includes the ability to conduct multiple transactions
over the same connection.

So an HTTP 2.0 client connects to server.host:80 and writes:

GET / HTTP/2.0
Accept: text/html, text/plain

GET /foo.html HTTP/2.0

Now it happens that server.host is running an HTTP 1.0 server.
On seeing HTTP/2.0 in the request, Should it:

* error out immediately
* attempt to service the first request, responding with
HTTP/1.0 0200 ca va bien
Content-Type: text/html

<title>...

and close the connection

And how do currently deployed servers respond in this case?
My guess is that a lot of them don't even look at the HTTP
version. A quick test agrees with this:

connolly@ulua ../logs[566] telnet www.hal.com 80
Trying 148.57.2.14 ...
Connected to hal-alt-backbone.
Escape character is '^]'.
GET / HTTP/2.0
Accept: text/html

HTTP/1.0 200 OK
Date: Thursday, 27-Oct-94 20:35:46 GMT
Server: NCSA/1.3
MIME-version: 1.0
Content-type: text/html
Last-modified: Tuesday, 30-Aug-94 07:22:28 GMT
Content-length: 1161

<title>HaL WWW External Tree</title>

OK... this is good enough. Our hypothetical 2.0 client can see the 1.0
response and assume that all requests after the first were ignored.

So it seems there's a couple heuristics that everybody should know:

In an HTTP 1.0 server, and request with a version > 1.0 should
be treated as a 1.0 request, rather than causing an error.
(The request may cause an error for other reasons than the
version number, of course.)

Any HTTP requests for versions >= 1.0 must function syntactically
as HTTP 1.0 requests, since when a client sends a request,
it can't tell what version of the protocol the server implements.

By the way... does the idea of the server gratuitously packing up
inline images with html files into a mime multipart/mixed body really
look like a good idea to anybody? What about client-side image
caching? What about per-image errors?

Doesn't it make more sense to just allow multiple transactions over
the same connection? For example:

C: GET /foo.html HTTP/2.0
Accept: text/html, text/plain
S: HTTP/2.0 0200 it is my pleasure to server you this document...
Content-Type: text/html
Content-Transfer-Encoding: packet

43
<title>foo</title><img src="images/bar.gif">
-1
HTTP/2.0 0206 what next?
C: GET /images/bar.gif HTTP/2.0
Accept: image/gif, image/x-xbm
S: HTTP/2.0 0200 a picture is worth a thousand words...
Content-Type: image/gif
Content-Transfer-Encoding: packet

4006
... 4006 bytes of image data ...
-1
HTTP/2.0 0306 15 Other folks are waiting Come back in 15 seconds.
S: <closes connection>

* This example looks half-duplex, but the protocol should be full
dupliex, e.g. the client can begin sending requests for inline images
etc. before the server has finished sending the original html
file. The client might send 15 requests, and the server might serve 10
of them and then close the connection.

* Henrik once suggested that we shouldn't rely on the requests and
responses staying in order -- a multi-threaded server might finish
requests in a strange order. This motivates transaction ids of some
sort. Perhaps:

C: GET /foo.html HTTP/2.0 1
C: GET /bar.html HTTP/2.0 2
...
S: HTTP/2.0 0200 2 here's the response to request 2
S: HTTP/2.0 0200 1 here's the response to request 1

Anyway... just rambling, mostly, I guess.

Dan