Re: Client-side searching proposal

Alastair Aitken CLMS (ZPALASTAIR@CLUSTER.NORTH-LONDON.AC.UK)
Wed, 1 Feb 1995 20:34:23 +0100


Nick Arnett (narnett@verity.com) wrote on Wed, 1 Feb 1995 15:18:51 +0100
: At 7:40 AM 1/31/95, Gary.Adams@east.sun.com (Gary Adams - Sun Microsystems
: Labs BOS wrote:

: > ... subdocument addressability ...

: That's a bit of a tough one, but everyone in the search business is looking
: at redefining the definition of a document in order to accomodate searches
: against logical units within logical units. Soon, one might search each
: server as if it were a single document, for example, in order to select
: which servers to search further.

This approach still seems to be server and client based rather than
information or document based. It seems to me that the information that
might be relevant to any user could come from a variety of different
servers as well as a variety of documents from those servers. Thus a query
might result in several USENET posts, several html files and a couple of
pieces of text from a gopher hole. This sort of searching requires some
form of information addressing along the lines of the Dewey Decimal System
in use in Libraries.

Dewey assigns *all* printed material - books, papers, magazines, etc - a
unique serial number based on the content. The code follows a three
numbers, full stop, any number of numbers. The first three numbers
represent 1) a category 2) a subcategory 3) a sub-subcategory. Thus number
513 might represent 1) 5 = science 2) 1 = maths 3) 3 = Calculus. The
numbers after the full stop can subdivide further the category of
information the document contains and also provide author and publisher
information.

I recognise that asking USENET posters to attach the correct Dewey number
to their posts is unreasonable but if this approach is also wedded to a
keyword system then servers could also have access to reference files of
information codes that index builders could then add to document indices.
A search of cyberspace for "calculus" information would therefore result in
all 513 information blocks to be indexed and a menu returned to the client,
"maths" would result in all 51- information being menued and "science" would
result in all 5-- information blocks being menued.

Does anyone think that encoding information in cyberspace in this way is
possible?

BTW - Please excuse the Subject:. I think client side searching is somewhat
off base as indexing is, by definition, a centralising function and therefore
the fief of the server - whatever that server is. I "replied" - sue me.

Alastair.