documents, files, types, and access methods

connolly@pixel.convex.com
Thu, 05 Dec 91 12:16:11 CST


Someone mentioned that WAIS should obviate the need for FTP. I disagree.
I think that the WAIS protocol is good for finding documents, but not
necesarily for transferring or displaying them.

There are two scenarios that WAIS is good for:

A. The database is built for wais. For example, DowQuest. That database is
stored so that it can be efficiently acessed and delivered through WAIS.

In this case, it makes sense to transfer the contents of the documents through
WAIS and to use the nifty chunking ideas.

B. The database is built for system X, and somebody sicked waisindex on it.
This is currently, by far the most common case. Look at all the USENET
archives, biology databases, library catalogs, etc. that weren't designed for
use with WAIS, but they work pretty well.

In this case, it makes more sense to me to transfer and/or present the
documents using the clients that the database was designed for. The WAIS server
should send enough information to retrieve and/or display the document using the
other client.

Example: the archie database. As a user, I want to query the archie
database using WAIS's fulltext and relevance feedback queries, but I want to
retrieve the documents with FTP, and I may want to "present" them with
uncompress and tar, or lpr, or ghostscript, etc.

Example: USENET news. I want to query using WAIS, but read it with my
news reader.

Example: my mail box. Query with wais, display with Xmh, Elm, mh, emacs, etc.

Retrieving the whole document with WAIS and saving it to a file is no good in
this day and age of client-server computing. The WAIS client may be on a
machine with no disk space to spare. And I may want to use the file on a
different host.

So we see that the WAIS client needs to hand off documents to other clients.
This raises the question: what information should the WAIS search client pass
to the retrieval/display slave clients, and how?

The CNI-ARCH folks are discussing a standard for document identifiers. I
think this is definitely one of the things that WAIS should pass, but it's
not the only thing.

I'm beginning to look at documents sort of like records in a relational
database. The WAIS client should negociate with the slave client what fields
they have or are interested in. An obvious representation for these records
is the RFC-822 mail message format.

Example: the archie database.

I use my xwais client to query archie.src on "vgrind." My xwais client gets a
list of docids from the WAIS server. These docids contain at least the score
and the CNI-ARCH style docid, which in this case would be enough info to
construct a prospero file handle [I'm not sure there is such a thing as a
prospero file handle, but play along anyway...].

I play gui-games with xwais until I get the list of documents that I like.
Then, using some mechanism like the X selection mechanism or drag-and-drop
(combined with SMTP, perhaps), I select a document and give it to my xftp
application. The xwais client and the xftp client agreed earlier that they
would send messages like:

From: xwais@x.server.host
To: xftp@x.server.host
CNI-ARCH-ID: <12345@prospero:quiche.cs.mcgill.ca>
SIZE-IN-BYTES: 120034
FTP-HOST: export.lcs.mit.edu
FTP-USER: anonymous
FTP-CD: pub/util
FTP-GET: vgrind.tar.Z

blah blah blah about vgrind, perhaps explaining what query found this file,
or perhaps some stuff from the README in vgrind.tar.Z
.

I have already played gui-games with xftp to tell it where to put the files
it retrieves. When it gets this message, it does the HOST, USER, CD, and GET
commands, and presto! I've got my document.

I think if we had a suite of these gui tools talking SMTP to each other, they
could get a lot of work done. More examples:

To: xtar@x.server.host
fopen: /home/connolly/vgrind.tar
or perhaps
popen: zcat /home/connolly/vgrind.tar.Z

xtar has a gui for selecting a place to extract the archive

To: xlpr@x.server.host
fopen: /home/connolly/vgrind-2.1/manual.ps
or
popen: zcat /home/connolly/vgrind-2.1/manul.ps.Z |

xlpr selects destination printer, copies, etc.

Most tools fit in naturally. The $PAGER and $EDITOR, and perhaps $SHELL tools
could be MUCH more powerful if they could interoperate this way. [Has anybody
used mx and tx from John Osterhout(sp?) ? Those and the Tk toolkit allow X
applications to send commands back and forth.]

For example, the World-Wide-Web browser would fit the role of $PAGER in this
environment. It would receive messages to display WWW nodes, containing their
HTTP address (or NNTP, FTP, etc.). It would then display the node and allow
the user to scroll around and choose anchors etc. It could handle most
anchors by itself, but it might want to let the user select a region of text
and send it to the WAIS client.

I don't think there's an $EDITOR that fits very well, though emacs is always a
contender, and you have to have vi.
[I think the mouse support in emacs needs a LOT of work, but I probably
haven't seen the latest and greatest stuff.]

I'm not sure how $SHELL fits into all this but, for example, folks send shell
commands in mail messages to each other all the time. You could just select
the shell command in your mail $PAGER, and drag it to your $SHELL x-client
for invocation.

I hope I get time to try to implement a couple of these ideas. Then we can
all see whether they're worth persuing.

Dan