Re: Byte ranges -- formal spec proposal

Gavin Nicol (gtn@ebt.com)
Sun, 21 May 1995 08:47:08 +0500


>Several good examples have been brought up of files that can be comprised of
>segments, where each of those segments is a valid file of the same data-type,
>as an argument for this proposal. However, in almost all of the examples,
>there were only *specific* byte ranges which would work, in which the
>requested object would really be usable. Thus, for most of these examples,
>you could just ask for "parts 0-3" or "2-5" or "3-end", and the right thing
>would happen. In only one of the examples was *true* random access
>necessary, and that was to resume downloading of a file if it was interrupted
>part of the way through. Keep this example off to the side for the next few
>paragraphs.

I've been meaning to write up an RFC on how DynaWeb handles large
files. As I've said, DynaWeb breaks a document into parts based on the
structure of the data. In particular, DynaWeb does runtime conversion
from SGML to HTML, and the smallest addressable part of a document in
DynaWeb is a single SGML element.

As you all probably know, an SGML document basically forms a
heirarchy of nested elements, or in other words, a tree. Filesystems,
in general, are also trees. It seemed natural to me to use the same
*type* of URL for files, and for sub-document addressing.

As such, DynaWeb actually supports 3 sub=document addressing modes,
which are pretty much taken straight from the TEI guidelines:

http://www.ebt.com/collection/book/doc=1/chap=2/sect=3
http://www.ebt.com/collection/book/1/2/3
http://www.ebt.com/collection/book/1

The first form accesses elements in the heirarchy by *typed* child
number, the second form accesses elements based on child number,
irrespective of type, and the last is a direct element address. In
practice, because few people ever access the server except by
browsing, the last form can be used in most cases. I would like to
argue that such an addressing scheme is applicable to many other types
of data as well.

As I said before, my real problem with byte-ranges is that generally,
they don't make sense. Ranges of *parts* does make sense however. One
other problem I have is that the format of a URL should really be
application dependent, so why make recommendations for cases where it
is meaningless? Let's leave it to the application (ie. the server),
until we are ready to design a far more general linking mechanism.

Loot at http://www.ebt.com/ to see how DynaWeb works.

PS. I should note that the above naming scheme is very, very useful in
our case, but it drives spiders wild....