LinkToLiving (was Re: Registrar)

Kevin Altis (kevin@scic.intel.com)
Tue, 13 Jul 1993 11:20:37 -0800


>Michael asks why you need to identify paragraphs within a document.
>The US Consitution may be fairly stable (in an agreed English
>version) so byte counts may be useful, but living documents
>often have ids for bits of their contents so that you can
>refer to them without having to go through the heuristics
>"the bit about protocols 2/3 of the way down your message".
>This is the classic "how do we make links to living documents"
>question -- I think there is a summary of the
>issues in
>http://info.cern.ch/hypertext/WWW/DesignIssues/LinkToLiving.html

The document is frozen
For "frozen text documents" it might be wise to ignore white space, line
delimiters, and other control characters since the same document (a mail
message or news article) may appear to be the same on different platforms
(Unix, Mac, DOS/Windows), but different line endings will be used, tabs
could be converted to spaces, etc. so it might be "safer" to refer to
character offsets that ignore white space. This would still allow a link to
a particular word, sentence, paragraph, etc.

On the other hand, link references to frozen documents could be made such
as "word 2 to 4 of sentence 5 of paragraph six" which would be the same on
all platforms if the terms "word", "line", "sentence", and "paragraph" are
defined. Apple uses this model in their text object model today. Many
modern scripting environments: AppleScript, HyperCard, ToolBook, MetaCard,
etc. also support this type of referencing.

Finally, references might be made to a word(s) or phrase(s) contained in a
document, so that the actual physical location of the link isn't determined
until the document is retrieved. Under this scenario, a browser also needs
to be able to find the next occurance within the document so that the user
sees all references, not just the first. A link reference by search
"phrase" works for formatted documents as well as straight text, so the
search method will work for Microsoft Word, RTF, WordPerfect, FrameMaker,
TeX, plain text, etc. versions of the same raw information, the search just
ignores formatting information. It doesn't matter if the document changes,
since no exact location offset references are made; the worst case is that
the search phrase is removed from the document so that the link is
effectively gone. An example of this kind of lookup is done by the On
Location software for the Macintosh; it maintains an index of all text on a
drive, but the index refers only to the document, not an exact location
within a document. On Location allows you to search for text "as is" as
well as the root of a word, so a link to "link" would match "links" or
"linking" if you wanted. I think this kind of link would fit in well with
the HTML+ specification.

I like the last approach, since it appears to work better with frozen
documents as well as documents with different versions. By referencing a
lookup word, phrase, etc. a browser or server can easily make a index of a
document as well as look for other documents that might be applicable. This
is one of those rare cases where the solution works well for the user and
the machine.

ka