Re: HTML DTD and related problems (rather long)

Tim Berners-Lee (timbl)
Wed, 15 Jul 92 00:33:09 MET DST


I am replying very late to a message of Frank Kappe,
dated 25 Jun 1992.

First of all, to clear up some things about W3 HTML.
You can't nest anchors.
You can't nest anything EXCEPT you can put any elements inside
an anchor (excpt for other anchors) and you can put an anchor
inside any elements.
Then there is the slight structure of the <DL>[<DT><DD>]*</DL>
etc which is required.

I think I emntioned in earlier messages that the lack of structure
is to make it easier to process HTML on systems which have styled
text (like most systems, MSWord etc).

Now it is interesting that you in Gyper-G allow anchors to be
any section of the text. This of course is counter to the
SGML philosophy of strict nesting (SGML people can get quite
religeous about this, but I can't.) I think it is useful
to be able to refer to two separate overlapping anchors.
The problem is it is taken as given by any SGML DTD designer
that this sort of structure in a document is "BAD". This
means that SGML tools won't be able to process the <AS>
and <AE> (anchor start and anchor end) tags, you'll have to
write something special on top which keep track of anchors.
AGML parsers won't be able to verify the anchor structure.
So though the DTD is valid, but it doesn't in fact representthe restrictions.

You say that you feel it is better to store the links separately
rather than in the document. First of all let me say that W3 does NOT
require you to do that -- it just requires that the links, anchors
and text are transmitted at the same time on the net. That is very different.
Many systems generate the HTML on the fly from other sources of link
information.

Nowlooking at this question of where to store the links, the 'link database"
model you propose is the Intermedia model of Norman Meyrowitz et al.
This was developped in a non-distributed environment, where a "web"
(database of links) was available to the readers and centrally coordinated.
If you expand that system to more than one web server, and scale it to
global, then you find that the same problems of ensuring consistency
occur between the multiple link databases when before they occured between dcouments. You can't set up a system of bidirectional links for example,
which you could in the non-distributed case.

Link databases have their advantages, though. A model I am rather keen on
atthe moment is of servers which are both link databases and
source code control (sccs, rcs etc) systems. In this world,
you don't need to store anchors at all -- you just quote the
character number AND VERSION number of the document. To find an
anchor, if you have a more recent version than the one referred to in the anchor,
you have to ask the server to translate the anchor's position in the
old document into a position in the new one. It may reply with
the same or a didfferent position, or that as far as its "diff"
algorithms go, it can't find where the original anchor text would
be in the new document, or it might really have gone.
This would allow links to be made around source code (for which you want
source code control anyway).

There would be common problems when you do the sorts of changes which
diff can't handle intelligently, like you swap the two halves of a file,
or you change all the "a_"s to "A_". In these cases, a smart editor
would be able to explian each editing change to the server as it goes,
to the server would understand the relationship between the pre-edit
and post-edit versions. If you are going to do that, then you can
go a small extra step and make the server coordinate simultaneous edits
by many people, which gives you a group editor.

Tim