Re: Suggestion: URL string-search syntax

Rick Troth (troth@rice.edu)
Fri, 27 May 1994 10:25:16 -0500 (CDT)


> The suggestion is to extend this syntax to support reference to an
> arbitrary text string contained within the referenced document. ...

I like this idea.

> My suggestion is that #! be reserved as the header for a string to
> be searched for in this fashion. So <http://www.ai.mit.edu#!finance>
> would retrieve the URL <http://www.ai.mit.edu> and then search for
> the first occurrence of finance. ...

I'm not comfortable with your choice of "!", though.
Maybe it's because it "looks like" a shell escape. (not that I'd
ever advocate *using* such a thing) ;-) Maybe it's that I might
want to use "!" for negation.

Also, what about the Nth occurance instead of the first?
Given that you're considering texts of which you are not the author,
this might be a really handy addition, eh?

> ... display that portion of the document (and probably hilight the string,
> or otherwise indicate where it is). If not found, I suggest just going
> to the top of the retrieved document with nothing hilighted. ...

Yes. Nice.

> I also suggest
> allowing spaces in the search string; this works at present in
> NCSA X Mosaic. Should they be escaped?

How about quoted?

> ... Eventually, I would like to see byte-offset ranges
> available as a way to refer to parts of other documents as well.

Problem: byte offsets are *not* an interoperable metric.
Some filesystems don't have a notion of "this file contains n bytes".
The number of bytes PROBABLY CHANGES as the document goes from server
host storage (disk) to TCP (on-the-wire). In many cases, the "document"
is the output of a program and you'd have to hold the whole thing,
count the bytes (or at least count up to that point), and then place
the pointer. A better metric would be a line offset or record offset,
but I suspect that even that isn't suitable for some case, somewhere.

> ______________________________________________________________________________
>
> Mark Torrance Tel: (508) 442-0812
> Sun Microsystems Laboratories, Inc. Fax: (508) 250-5067
> 2 Elizabeth Drive (Mailstop: UCHL03-207) Net: torrance@east.sun.com
> Chelmsford, MA 01824-4195 USA
> ______________________________________________________________________________
>

[disclaimer: what I'm about to suggest comes from a (perhaps AR)
POV that quoting should be avoided unless required to resolve ambiguity]

About the quoting idea: what if we chose #" for your scheme?
So a URL like mydoc#myanchor would work as you'd expect today,
and a URL like mydoc#"mytext" would look for the text as you suggest.
Maybe someone will suggest another "quoting" method (in which case
I can throw away my disclaimer above as I embrace the better idea).
Parenthesis? Brackets? Braces?

-- 
Rick Troth <troth@rice.edu>, Rice University, Information Systems