Re: Dienst, A Protocol for a Distributed Digital Document Library

Daniel W. Connolly (connolly@hal.com)
Mon, 08 Aug 1994 15:12:00 -0500


In message <9408081907.AA17028@martin.cs.cornell.edu>, Carl Lagoze writes:

>Jim Davis and I recently submitted an Internet Draft describing Dienst, a
>protocol for communication with distributed digital library servers.
>This protocol is embedded within HTTP. You are invited to look at the
>protocol document, available in HTML at
>http://cs-tr.cs.cornell.edu/Info/dienst_protocol.html or in ASCII at
>ftp://nis.nsf.net/internet/documents/internet-drafts/draft-lagoze-dienst
>protocol-00.txt. You might also want to look at a prototype
>implementation of dienst at http://cs-tr.cs.cornell.edu. We welcome
>your comments.

Excellent! Great stuff! This is a great way to address three issues on
my list of WWW Architecture Wishlist: resource discovery, replication,
and a compound document architecture.

The list is:

* resource discovery -- how do I find stuff? If I've got a good
description of a given document (author, publisher, pub date, that
sort of thing), and I have an internet
connection, I should be able to submit a query and search the
whole docuverse in one RPC (which will probably cascade into
many RPC's, but as far as I'm concerned...).

If I have only a vague description of the document I'm interested
in, I should be able to conduct the same search, but it may take
several RPC's, with some user interaction at each iteration.
(e.g. What libraries are available? ... Ok, from those three,
what databases relate to quantum physics? ...)

* replication -- documents should be highly available; that is,
given authorized access to sufficient connectivity, compute
resources, and disk space, I should be _able_ (not required)
to publish a document in such a way that there is no single
point of failure between me, the producer, and any of my consumers.

The USENET model addresses this feature, but due to its completely
asynchronous operation, it lacks sufficient fault detection mechanisms.
(e.g. I can't compute: did my message make it to foo.com?)

Another limitation of USENET is that documents are immutable.

* compound document architecture -- long story. But Dienst's
support for printing pages is an application.

The items on my wishlist that dienst doesn't cover are:

* data integrity/fault detection: If fred says "see XXX for info on
apples," and I get XXX and it has info on oranges, I can't tell
if there was a fault, let alone where it was. A reference/link/citation
should be _able_ (not required) to contain integrity information
of various levels of reliability:

"see rfc822.txt; you'll know you've got the right XXX if
it came from ds.internic.net any time since 1990"
(allows replication by caching)

"see foo.tar.Z; you'll know you've got the right foo.tar.Z if
it has 1210921 bytes."

"see foo.tar.Z; you'll know you've got the right XXX if
it has a gnu cksum checksum of 1203980123"

"see XXX; you'll know you've got the right XXX if
it has an MD5 checksum of 2342345234lksjw34"

"see XXX; you'll know you've got the right XXX if
it's been RSA signed by fred@foo.com"
(works for documents that change)


* democratic publishing model -- anybody should be able to
spontaneously create a lasting name for a document without doing
an RPC with a naming authority. Authorized users should be
able to create a name within a naming authority's namespace
by doing an authenticated RPC. Document names must be associated
with copyright owners only -- not service providers etc. (Witness
the recent issues with 1-800 number portability between AT&T & MCI).

Dan