Re: Client-side searching proposal

Gary Adams - Sun Microsystems Labs BOS (Gary.Adams@east.sun.com)
Tue, 31 Jan 1995 15:39:07 +0100


> From gtn@ebt.com Tue Jan 31 09:03:59 1995
> Date: Tue, 31 Jan 1995 09:03:51 -0500
> From: Gavin Nicol <gtn@ebt.com>
> To: Gary.Adams@East
> Cc: www-talk@www0.cern.ch
> Subject: Re: Client-side searching proposal
> Content-Length: 704
> X-Lines: 14
>
> >There are standard SQL exetensions that provide for a limited amount of
> >full/text query specification. The formulation of the query is only half of
> >the problem. The more difficult part of the problem (in my opinion) is how do
> >you handle the "sub-document" addressability for the relevant fragments of
> >the document to be retrieved or to be highlighted. Traditional database
>
> Sub-document addressing is not a hard problem for SGML documents. Have
> a look at the TEI schemes, or perhaps the HyTime schemes.

After a document is frozen on a CDROM, can I go back and impose new
addressing schemes beyond the original named element of the document?
e.g. the third sentence of the Constitution.

>
> Can someone provide a URL for the online TEI specs?
>

The home page for TEI is <A HREF="http://etext.virginia.edu/TEI.html">
http://etext.virginia.edu/TEI.html</A>

> Also, have a look at <URL:http://www.ebt.com/> for a server that has
> sub-document adressing capabilities based on the SGML tree structure.
>
>

The full SGML system at EBT addresses the need for authored structural addressability.
The type of subdocument addressability that I am looking for would allow a search
engine to refer to the last paragraph in chapter 2 spanning to the first paragraph
of chapter 4 (potentially spanning 3 html files) as a region of information to be
presented to a user which satisfies a complex "how to" query.

I'd also like to construct complex standing queries about "the president of the
United States" in the news, which returns a conditional result. The selection
mechanism for a search engine can be distinct from both the scoring and highlighting
mechanisms. An older document might incorrectly highlight the word "Clinton" if he
was "govenor Clinton" at the time.