Re: meta information

Bert Bos (bert@let.rug.nl)
Mon, 6 Jun 1994 15:53:48 +0200 (METDST)


The discussion about META has turned into much more than a simple
disagreement over the desirability of a META element...

I guess it's a question of what do we want to use HTML for. The
constraints are clear: we want *document* to express arbitrarily many
different things along many different dimensions, but we want *HTML*
to remain simple (and SGML conformant). Assuming the equality

MARK-UP = META-INFORMATION,

a few of the roles that mark-up can have are:

1) Lay-out

HTML provides lightweight, display-oriented markup. When more
visual aspects are needed, style sheets are the way to go.

2) Linking documents in a semantically meaningful way

The LINK element defines a few types of relations, the PRINT
attribute of the A tag adds a few more, but arbitrary semantics are
difficult to express. The semantics are meant for automated
searches, indexing, and other applications .

a) Private schemes: using PIs is probably best: <? whatever>

b) No machine-readable semantics at all: Ari Luotonen's WIT package
(see <http://info.cern.ch/wit>) shows that semantic links
(currently only `agree' and `disagree') can be expressed without
new link types, at least to human readers.

c) Fixed set of link types.

d) Extensible, hierarchical classification: new roles can be added
via IS-A relations. E.g, assuming `maker' is a primitive role,
we can define new roles `painter', `writer', and `co-writer' as
(sub-)subclasses of `maker':

<!element writer - - (#pcdata|img|%emph;)*>
<!attlist writer
href %URL #implied
is-a cdata #fixed "maker" -- a kind of `maker' --
role cdata #fixed "writer">
<!element painter - - (#pcdata|img|%emph;)*>
<!attlist painter
href %URL #implied
is-a cdata #fixed "maker" -- a kind of `maker' --
role cdata #fixed "painter">
<!element co-writer - - (#pcdata|img|%emph;)*>
<!attlist co-writer
href %URL #implied
is-a cdata #fixed "writer" -- a kind of `writer' --
role cdata #fixed "co-writer">

Tags like these can be added to HTML (3.0) via the `cextra'
entity and the RENDER element; there is no need to change HTML.
The presence of the IS-A attribute flags to the indexer that
this is a semantic link.

e) Extensible, hierarchical classification with multiple
inheritance: this allows us to express that a `writer' is not
only a kind of `maker', but also a kind of `human'.

For (c), (d) and (e) we will need a common set of primitive roles
and a procedure for registering new primitives.

The mechanism above is relatively simple and can be used without
changes to HTML (3.0), but it might be too simple. In HyTime the
position of the anchor of a link is marked in the text, but the
link info itself (i.c. the target URL and the type of relation) is
defined elsewhere, allowing for much more elaborate link
descriptions.

f) Don't try to use SGML at all: use

<link rel="semantics" href="semantics.sem">

and define a language for expressing semantics (in Prolog? Scheme?)

3) Providing parameters for the HTTP protocol

The proposal by TBL (see message <9406051937.AA19130@www0.cern.ch>)
and Roy Fielding (see message
<9406060223.aa24242@paris.ics.uci.edu>) for what I would call an
`HTTP architectural form' seems a good way to ensure that HTTP and
HTML can continue to be developed independently.

The idea is that none of the HTML elements is reserved for HTTP
headers, not even META. And whether there is ever going to be an
<EXPIRES> or <REPLY-TO> tag is immaterial. To find the HTTP headers
that are hidden in the HTML document, the server only looks at
attributes, never at element names. In this way, the following two
lines would yield the same HTTP header:

<meta http="Expires" content="Mon, 6 Jun 1994 11:24:21">
<expires http="Expires" content="Mon, 6 Jun 1994 11:24:21">

Alternative: omit the CONTENT attribute and add a </META> tag. This
is possible provided Dan Connolly's rule-of-thumb for the HEAD
element is adopted: all content in the HEAD is ignored by
browsers.

SUMMARY: HTML is for simple, display-oriented markup, it has just
enough extensibility to allow extra info to be embedded invisibly. All
other dimensions of a document are added with mechanisms that can be
easily ignored by browers:

- extensions in the head -> use META
- extensions in the body -> use `cextra'
- sophisticated layout --> style sheets
- indexing and semantics --> `semantic link' architectural form
- HTTP info --> `HTTP' architectural form

-- 
                     __________________________________
                    / _   Bert Bos <bert@let.rug.nl>   |
           ()       |/ \  Alfa-informatica,            |
            \       |\_/  Rijksuniversiteit Groningen  |
             \_____/|     Postbus 716                  |
                    |     9700 AS GRONINGEN            |
                    |     Nederland                    |
                    |     http://tyr.let.rug.nl/~bert/ |
                    \__________________________________|