Re: Client-side searching proposal

Daniel W. Connolly (connolly@hal.com)
Thu, 26 Jan 1995 01:49:34 +0100


In message <ab4c74bd0d021004631d@[192.187.143.12]>, Nick Arnett writes:
>
>I suspect that if we could drag a proxy server developer into the thicket
>with us, we could see quicker results, since it'll be quicker to deploy
>first in a proxy than in clients.

Hmmm... I'm leery of anything so complex that a proxy server is required.

Proxy servers that do nifty searching things can be a good thing, but
let's not let that complicate this simple proposal.

Much of my trepidation is in that the proxy server would have to alter
the document between the original server and the client; hence, it's
not "transparent" to the protocol. This would be a dangerous precedent.
And it interacts very badly with security techniques such as digital
signatures (not to mention encryption!).

Three cheers for Verity engieers spending time to design a solution,
though!

Hmmm... thinking about it a little the first features I think we
should provide are the features that browsers like pg, more, less
provide: search for this string (perhaps regexp), and go to line
N. (but how do you define line N in html? It would have to be line N
of the source -- not the rendered document, I suppose.)

The first feature I'm interested in is the ability to link to some
point in a document that I can't/don't want to write to (or is not an
HTML document).

The ability to highlight/navigate search hits is something else... is
it really necessary, given that WWW documents are typically small
chunks, rather than long flows? I suppose in some cases it is.

Would the ability to specify a regexp cut it? e.g. suppose there
was a keyword search for frogs, dogs, and pigs, and all three
keywords were found in http://foo.com/filex.html. The search
results page might say:

<a href="http://foo.com/filex.html#!((frog)s?|(dog)s?|(pig)s?)">
On the nature of Animals</a>

(note my attempt to represent stemming).

At a mininum, the browser would just find the first match of that
regexp and scroll to there. A fancier browser might highlight all
matches and allow you to navigate between them with next/previous
match buttons. Well... at a bare minimum, the browser would do
the current thing: look for <a name="((frog..."> and fail to find it.
(Didya know netscape leaves you scrolled to the _bottom_ of the
page in this case?)

Folkd would have to be careful to backslash their .'s, and to %XX-ify
/'s and such. It could get messy. But it's doable!

Are POSIX regexp libraries widely available? Are they part of MS/Mac
development environments, or can the freely available implementations
be ported?

Dan