Re: meta information

Roy T. Fielding (fielding@simplon.ICS.UCI.EDU)
Wed, 08 Jun 1994 15:06:31 -0700


Peter Deutsch wrote:

> How about if we say that:
>
> MARK-UP <= META-INFORMATION
>
> This implies that we can have metainfo about a document
> that is not necessarily included in the view of the
> document provided to Internet-based clients for rending or
> display.

Yep, that is (almost) always true, just as their is always meaningful
information about a paper document which never makes it onto the paper.

> By extension this implies that we will need alternate
> mechanisms (presumably at the protocol level) to allow us
> to extract this info. In many cases this seems the better
> route to take and one that seems not to be getting enough
> attention in this discussion.

This implies that we *would like additional* mechanisms -- mechanisms
which may be useful but which are not considered necessary to the
identification of the object. This is the role of collections/libraries/
search engines for non-minimal URC information. However, that also is not
what the original discussion was about.

>...
>
> Frankly I'd rather query a server for certain other kinds
> of info when I need it, not pile it all into the document
> where it will be excess baggage most of the time. This
> looks like a nice way to do it.

This is where the problem lies. The query server must get that information
from somewhere. Presumably, it should originate from an authoritative
source (in fact, the original author), although others may add to that
information at a later time. With HTML, the author has a choice of
embedding some (not necessarily all) metainformation within the document.
In doing so, they keep the information close and make it easier to keep
that information consistent with the document. In fact, practice has shown
that metainfo stored separately from the document rarely stays consistent.

> I've a real aversion to piling lots of metainfo into a
> document for all sorts of reasons. It certainly makes the
> document larger. I suspect it also makes it harder to
> maintain and harder to keep multiple copies consistent.

The last sentence is simply not the case, as has been proven by every software
development project since the 50s.

> The way I see it, what is delivered to Internet-literate
> clients should be merely one view of a document (based
> upon the degree of output formatting control we want,
> etc). I think as a model, the server should be responsible
> for managing the various info we might have on a document
> and let the client indicate what it needs or can handle at
> any particular time.

That seems reasonable and there is no reason that it can't be done
right now even with all the metainformation embedded in the document.
The server could easily remove it before transmission to the client.
However, the reality is that it's much faster for the server not to
parse the document and just spit it down the pipe. Future servers
(perhaps based on an object model where frequently-referenced documents
are held in memory pre-parsed) may live in a different reality.

In any case, that is also not what we were talking about. What is needed
is a mechanism for embedding open-ended metainformation within existing
HTML 2.0 documents such that they can be fed to existing HTTP servers and
existing HTML consuming clients won't barf, while at the same time the metainfo
is available to other HTML consuming clients that can make use of it.

Over time, what will be needed is a way to embed some metainformation
which must be authoritative and thus must reside with the original document.
This includes information like document owner (who is responsible for
maintainaining it), a reply-to address (how do I get in touch with them),
and other semantic information that allows the author to describe how the
document is intended to relate to others (necessary for semantic navigation
and printing). All of these should be supportable by HTML 3.0 (assuming the
spec doesn't change radically from that of HTML+).

> Assuming people buy this model, I think it implies we
> probably need to provide the equivalent of a "COPY"
> command, which would pack up and include the meta info
> when delivering a document so that its total state can be
> preserved when needed. Meanwhile we should focus on
> keeping both the transfer protocol and basic display
> markup simple, with equally simple extensions for getting
> to more complex formats when needed. At that point, we can
> profit from the efforts of others where appropriate.

Simplicity was exactly why the META element was defined (otherwise,
I would have just used a malformed LINK element). However, I think
there is sufficient need for some information that a general mechanism
for embedding metainfo (associated with a general mechanism for hinting
that it should be included in a response header) is necessary for long-
term maintenance of the Web. I think Tim's proposal for using "http="
attributes is sufficient to cover both.

....Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)
<A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>