Re: Client-side searching proposal

Daniel W. Connolly (connolly@hal.com)
Wed, 25 Jan 1995 23:30:06 +0100


In message <199501251931.LAA00583@shell1.best.com>, David Glazer writes:
>
>However, just allowing a word (or list of words) will break down pretty
>quickly. Some examples of functionality I would want (with a bad example
>syntax):
>
> - n'th occurrence of a word (e.g. http://x.edu/xxx#/3:cheese)
> - occurrence of a word in a header (e.g. http://x.edu/xxx#/H2:cheese)
>
>Also, for the case of a search engine returning results, I would want what I
>think Dave H. is suggesting - the ability to highlight a whole list of terms
>in the returned document, and easily navigate between them. (For instance,
>with a "Next Relevant Region" button.)
>
>I seem to remember that HyTime has a syntax for doing things like this - does
>anyone know more about that?

It sure does. HyTime has about 27 different ways to refer to pieces of
text in a document. After studying them, it looks to me like somebody
just wrote down 27 different features, and made no attempt to find a
general, extensible mechanism based on a few well-chosen primitives.

For example, you can address "norms" -- space separated "names" --
kinda like C identifiers "numbers" -- sequences of digits. But they
didn't just go ahead and say "you can refer to character sequences
defined by a regular expression."

And the syntax for doing it is verbose, messy, and tied to SGML.

Oh... except for HyQ, the hytime query language. It's not SGML, and
it's not lisp s-expressions, or anything else simple and
powerful. It's a bunch of cryptic little words (why such fear of the
eight character boundary?) and ill-defined semantics.

Somebody spent a LOT of time cooking up the HyTime standard. I once
believed at one time that there were some powerful, novel features in
there somewhere that got bogged down by SGML syntax and the
international standards process. But as I look at it closely, it just
looks like a bunch of folks wrote down their 9 favorite mechanisms for
each problem. Little attempt to abstract, generalize, or simplify is
apparent. Proponents seems to say "HyperMedia is complex. No simple
system will work. It _has_ to be this complex, arbitrary, and
ill-specified."

Arguments to the contrary, as usual, are welcome.

>BTW - if addressing all of these issues gets too messy, I still like the idea
>of being able to navigate to the first occurrence of a particular word. It
>addresses some of the needs, just not all of them.

I heartily agree. In fact, this feature was requested a long time ago.
The proposed syntax back then was

scheme:path#!string

which I believe would have fewer backwards-compatibility problems than
path#/string.

This is a proposal whose value clearly outwieghs its cost of
deployment. I'm for it. As TimBL said one time, it's a "downhill
step."

I'm sure the Mosaic 2.4 source code can be patched to support this in
about an hour. Anybody have the free hour? Has this already been done?

I too would like to see fully general, recursive, powerful, indirect,
secure, replicated, ... reference mechanisms. Anybody with

* a complete proposal
* user documentation
* robust sample implementation that integrates
easily with existing clients

is encouraged to speak up!

In the mean time, let's do #! (or #* or whatever) (and SimpleMD5
authentication!) and take one step forward.

Dan