Re: WAIS

David C. Martin (dcmartin@library.ucsf.edu)
Tue, 20 Jul 1993 10:49:40 PDT


We are currently working with content that consists of TIFF page images,
OCR generated text, PostScript, SGML "header" information (i.e. author,
title, abstract, etc...) and JPEG images (for half-tones). We are
modifying the WAIS code to support the indexing of our information with
URL's for HTML generated documents that incorporate the structural
information relevant to the content identified by the index.

We are integrating this code into the Plexus server.

dcm
--------
Martijn Koster writes:

David C. Martin writes:

> Exactly. However, you would like to separate the two functions as it
> would allow you to index information contained in various objects
> (files, etc...) and then utilize that index to return some compilation
> of those objects or some other object (e.g. indexing a database of faxes
> utilizing the OCR text, but actually returning the fax, not the OCR text
> file).

Why do I keep thinking we want the same thing? :-)

Yes, definately separate the retrieval from the indexing.
But how are you going to display the results of the search to the user?
In your example, if I search the database of faxes, I'd want to see a list
of results like say

<LI><A HREF="...">Fax</A> from m.koster, 19 Jul 1993, 10K (Score 1000).

rather than just something like:

<LI><A HREF=".../fax_08932">fax_08932</A> (Score 1000)

Either you build a special server that generates this extra info on the fly,
or you need to generate it beforehand. The latter is far more general.
What I am after is how to best do it, preferably integrated in a HTP server.

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A=Mark400; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NeXor Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html