Re: meta information

Roy T. Fielding (fielding@simplon.ICS.UCI.EDU)
Mon, 06 Jun 1994 02:22:56 -0700


Tim writes:

> A few points on this topic.
>
> 1. Everything inside the HEAD is metainformation. That's what
> the head is for.

Yep.

> 2. The relationship beteen elements in the HTML HEAD and the HTTP
> headers is referred to in the HTTP spec which suggests allowing
> and header WWW-xxx where xxx is an HTML element. For example,
> HTTP gains WWW-Link: http://foo/bar rel=whatever by this method.
> If we have HTML headers which are defined on HTTP headers, we
> will have the dog chasing its own tail fairly quickly.
> Suppose we hypothesize that *anything* in an HTML HEAD element
> should be expressable in an HTTP header. This is true if anything
> about an HTML body can also apply to GIF. In that case, the HTTP
> becomes the master spec to which HTML should refer. It also
> suggests that HTTP might have to evolve faster than HTML.

While it is true that anything inside HEAD *could* be expressed as an
HTTP header, I can imagine many metainfo items which can and should be
embedded in the document but which should not be copied into http headers
(e.g. local index items). However, I think that fits nicely with your
suggestion below.

Is it necessary for the HTTP spec to define all headers? Certainly
the spec is needed for those involving client-server negotiation, but I
think the purely informational headers should be freely extensible,
with eventual standards laid out for those which are commonly useful
(e.g. URC stuff).

BTW, has there been a decision on whether or not the "WWW-" prefix
should be used? I vote no (if I have a vote ;) because such things are
always more trouble than they are worth and prevent consistency
between protocols. For example, I want

<REPLY-TO>fielding@ics.uci.edu (Roy Fielding)</REPLY-TO>

to generate:

Reply-to: fielding@ics.uci.edu (Roy Fielding)

and not WWW-Reply-To. [Since we are on the topic, I think "Reply-to" is
one response header that should be added to the spec, with syntax and semantics
equivalent to its definition in rfc1036. But then I suppose this discussion
should be on www-talk ... *sigh*]

> 3. I agree with both camps about how this shouldn't be done!
> I don't like the idea of changing a DTD every time someone
> wantsto add a new HTTP element, as I can see those elements
> becoming very many, and having many headers used only by
> local communties (Content-Shelf-Number:, X-Compuserve-Xref:, etc)
> Can we have some way of tying them together? How about an
> architecural form? Or an attribute?
>
> <EXPIRES http-equiv="expires"> Jan 10,,,</EXPIRES>
>
> This "http-equiv"attribute binds the element to the header.
> It means that if you know the semantics of the http header
> "Expires" then you can process the contents based on a
> well defined syntactic mapping, whether or not your DTD
> tells you anything about it. So you can use it with
> "META"if really HTML has no use for the element, or
> (experimentally, walking in fear of the SGML purists)
> with your experimental element name, or when a
> element name has been defined, then using the element
> name in a new DTD.

That looks like an excellent compromise, although we will have to wait
for certain browsers to be fixed before we can make full use of it.
Can we just use "http=" instead of "http-equiv="? (I'm a lazy typist)
Should this attribute be added to existing HEAD elements (e.g. TITLE)?

I would then define the META element as follows:

<!--
The META element can be used to embed document metainformation not
defined by other HTML elements for use by servers/clients capable
of extracting that information. Although it is generally preferable
to use named elements which have well-defined semantics (e.g. TITLE),
this element is provided for situations where strict SGML parsing is
necessary but the local DTD is not extensible.

HTTP servers should read the document head (between <HEAD> and </HEAD>)
to generate response headers corresponding to any elements defining a
value for the attribute "http".

e.g. if the document contains:

<meta http="Expires" content="Tue, 04 Dec 1993 21:29:02 GMT">
<keywords http="Keywords">Fred, Barney, Wilma</keywords>

The server would include the headers:

Expires: Tue, 04 Dec 1993 21:29:02 GMT
Keywords: Fred, Barney, Wilma

as part of the HTTP response to a GET or HEAD request for that document.
When the http attribute is not present, the server should not generate
an HTTP header for this metainformation; e.g.

<meta name="IndexType" content="Service">

would not generate an HTTP header but would still allow clients or
other tools to make use of that metainformation.
-->

<!ELEMENT META - O EMPTY>
<!ATTLIST META
id ID #IMPLIED -- to allow meta info --
http CDATA #IMPLIED -- HTTP header name --
name CDATA #IMPLIED -- metainformation name (if no http) --
content RCDATA #IMPLIED -- associated value -->

> I can imagie defining an SGML architecural form which
> would allow one to specify that the http-equiv attribute
> had this significance in this DTD, though the treatment
> of unrecognised headers is something which a DTD can't
> even think about.

Hmmm...one step at a time I think.

> 4. If we have such a mapping, then we have to make sure that
> the order of elements in the HEAD has the same significance
> as the order of elements in the HTTP headers, which is
> (I think) currently not the case: HTML HEAD specifically
> specifies an *unordered* list of meta information,
> wheras I bet there are millions of constraints people
> have put on the order of RFC822 headers, including for
> example that MIME-Version must precede all MIME headers.

There are? Actually, the MIME-Version contraint is the only one I know
of (and even that one isn't necessary). In most cases, rfc822 headers
are parsed through first (using the whitespace to determine continuance
and the first colon to separate the name from the value). The content
is only looked at after parsing is completed and duplicate headers are
combined. In other words, I think the spec should specify it as "unordered".

> 5. If we move toward http headers as the defining point for
> metainformation, we are using a syntax which is poor
> in structure. This should not worry us. I see no reason
> for constaining HTTP headers to be unstructured, so if
> we find a need for the odd {bracket} which will of course
> map fine onto SGML, we should not be put off.
>
> MIC-header: {
> blah:
> blah:
> MIC-header: }
>
> or whatever.

Ummm, errrr, well .... it would break my simple parser, but then I've
never been one to let old technologies get in the way of the new.
In that case, nested headers would have to be strictly ordered and combining
duplicates would be bad (i.e. we wouldn't be talking about rfc822 headers
at all).

....Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)
<A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>