Re: Reliable links [Was: Stab in the dark ]

Daniel W. Connolly (connolly@hal.com)
Fri, 18 Mar 1994 22:09:34 --100


In message <199403182020.OAA15469@austin.BSDI.COM>, Tony Sanders writes:
>"Daniel W. Connolly" writes:
>> But in either case, you can give the same url twice and there's no
>> mechanism to guarantee that you'll get the same thing back,
>This is true with a given URL but note the folowing from the HTTP spec
>where it talks about the URI: header:
>
> However, it is guaranteed that if an object is successfully retrieved
> using that URI it will be to a certain given degree the same object as
> this one. If the URI is used to refer to a set of variants, then the
> dimensiosn in which the variants may differ must be given with the "vary"
> parameter:
>
> Syntax URI: <uri> [ ; vary = dimension [ , dimension ]* ]
> dimension content-type[12] | language[13] | version[14]
>
> If no "vary" parameters are given, then the URI may not return anything
> other than the same bit stream as this object.
>
> Multiple occurencies of this field give alternative access names or
>
>I think this addresses a lot of the points you made but even more important
>it makes it clear that reliable references to bitstreams have been thought
>about. However, *MOST* references should not be reliable in this fashion.
>For example, you almost always want a vary=language, vary=content-type.

Ah! So the issue has been addressed somewhere... but (1) the scope of
this mechanism is only HTTP -- I can't make reliable links to FTP
files, and (2) shouldn't the URI: header tell where this document is
on the various dimenstions so that I can retrieve it again?

For example, suppose I ask:

GET: /foo/bar

and the server says:

HTTP/1.0 200 Message follows
URI: http://host/foo/bar ; vary=version

How do I make a reference to this document? (or what do I scribble in
my cache to uniquely identify this doc?) It needs to say something like:

URI: http://host/foo/bar ; vary=version=1.0

so that I can write

<A HREF="http://host/foo/bar" VERSION="1.0">

I hashed this over with a friend last night, and we talked a lot about
what it would take to migrate documents around the net something like
NNTP broadcasting or IP routing tables and such. We decided there
wasn't a clear scalable strategy, but for the case, we came up with
a workable solution. The GET request should say something like:

"Give me any copy of /foo/bar dated March 1 thru March30"

and proxy servers keep an "lifetime" for each document in the cache.
Some documents, FAQ postings for example, explicitly contain the
lifetime. The NCSA folks worked out a set of heuristics for other
types of documents.

Then, when the proxy server gets a "GET(doc, t0, t1)" request, it
looks up doc in its cache, and if the lifetime intersects [t0, t1],
the query is resolved. Else, it turns around and makes the request to
the original server (or some neighbors or some such...).

This generalizes fairly well... things like CGI script results should
have very short lifetimes. RFC documents should have very long
lifetimes.

For the content-type dimension, the format negociation algorithm in
HTTP works pretty well...

But in all these cases, I'd like to be able to put version, format,
language, etc. info in the reference itself, if I choose. For example,
I may know that
ftp://foo.com/lksjfli4jlij43
is a postscript file. But there's currently no way to express this.
And Mosaic, for example, will assume it's a plain text file.

Dan