Re: meta information

Daniel W. Connolly (connolly@hal.com)
Fri, 03 Jun 1994 11:34:25 -0500


In message <9406030136.aa18659@paris.ics.uci.edu>, "Roy T. Fielding" writes:
>Dan wrote:
>
>> Gee... this thing has really blown up. I see three issues:
>>
>> 1. How does the author express stuff that the server should
>> use and stick in the HTTP headers? My answer:
>> <EXPIRES http>...</expires>
>> or, until implementations are fixed,
>> <EXPIRES http content="...">
>
>How does the author know that the only purpose for that is the HTTP headers?
>It is possible that multiple tools may be applied to that information.
>Should the authors have to change all of their HTML files every time a new tool
>is introduced?

The expires tag above says two things:
"This document expires on ..."
and
"Please use this expiration date in the HTTP headers."

If new tools are looking for the expiration date of a document, this
markup will work for them too.

If new tools need more information than this, then yes, the document
will have to be edited. If there is something that these hypthetical
tools have in common (such as a relational structure), then we can
plan for them. But if they're just arbitrary hypothetical tools, then
I don't see the value in standardizing on a way to give them
information.

>>
>> 3. New element names. It has worked so far.
>
>Side note: They don't work at all when the elements contain content
> which should not be rendered as normal text. Wouldn't it
> be nice if we had a general mechanism for telling clients
> to ignore a particular element's content if the tag is unknown?
> I guess we'll have to wait for full SGML parsing within the client.

I'm going to try to make it clear to HTML implementors that:
Inside the HEAD element, skip everything except elements
you recognize. (This could get tricky... it means we
can never introduce complex structure in the HEAD, really)

Inside the BODY element, skip tags of elements you don't recognize,
but use their content as if the tags weren't there.

This creates some real limitations as to what we can add to HTML in
the future. For example, if a document contained a heavily marked-up
structure such as an equasion, it might turn to gibberish when the
tags are removed. Hence the need for "levels" in HTML... but that's
another discussion altogether...

>> How many WWW implementations don't include the "skip tags you
>> don't recognize" convention? I don't believe you have to write
>> code each time you want to _ignore_ another tag. And I don't believe
>> you can _act_ on a new tag _without_ writing more code.
>
>On the contrary, you certainly can if the tag follows an identifiable pattern.
>For instance, this is what MOMspider does while traversing a document:
>
> 1. GET it via the URL
> 2. Extract all the links and metainfo, e.g. (look out, it's Perl)

[nifty perl code deleted]

>
>Note that at no time does the program need to know exactly what tag
>names are used in the META elements -- it just acts as a filter.
>

Ah... now that perl code is a hack of the first order. Do you want ten
or twenty test cases that break it? If you're going to do this sort of
psuedo-parsing, why not use processing instructions or something?

I'd be happy to see:

<? meta expires ...>

Of course this doesn't play nicely with Mosaic 2.4, but it has the right
flavor... a sort of "this will work for the tools we're building right
now, but we haven't thought it through enough to cast it in stone."

And heck, really, you can just write

<meta name="expires" value="...">

as long as you realize that this will probably not become standard,
and you'll probably have to edit your documents to take advantage of
standard idioms for expressing these ideas in the future.

>> Come on! It's not that tough to maintain the DTD as a community,
>> is it? Do we _have_ to escape out of SGML all over the place?
>
>If all clients parsed SGML and all clients kept track of DTD versions
>and all HTML files contained a pointer to their particular DTD version,
>then this type of argument would make sense. However, that is certainly
>not the case. Currently, the "official" DTD has not been updated for
>more than a year. This is not surprising given the constraints of our
>community and I don't expect the updates after 2.0 to be any more rapid.

I disagree. I expect Web applications will continue to expand their
support of more and more sophisticated SGML features. Time will tell
who is right.

Dan