Re: Can you cache the unknown? (was Re: Session-ID proposal)

Daniel W. Connolly (connolly@beach.w3.org)
Thu, 24 Aug 1995 16:55:18 -0400


In message <9508241613.AA14397@bingster>, Mark S. Friedman writes:
>Bob Wyman writes:
> > On a slightly different tack... It seems like this whole business of cachin
>g
> > is getting a bit complicated... [...] it would be useful to put some
> > effort into building at least an "informational" RFC or IETF-Draft on the
> > subject of caching. Is someone already doing this? Would it make sense? If
> > it isn't being done and it does make sense, I think I'll volunteer to try t
>o
> > fix this one...
>
>I certainly support this effort. I have yet to discover a source of
>comprehensive caching info and it would be much appreciated.

Quite. I have a message in my mh drafts folder called "HTTP
Caching: rules and Heuristics" that I can't ever seem to finish
and send out.

I'm trying to come up with some Larch[1] traits to formally
specify correct behaviour of information retrieval on the web.

Here's a snapshot of some working notes:

The idea is to capture the rules in Larch, ala:

WebArch: trait

includes
PartialOrder(Time for T)

introduces
rep: URI, Entity, Time, Time -> Bool
% for rep(r, e, t1, t2), read
% "from t1 to t2, e is a representation of r"
http: Request, Message -> Bool

addr: Request -> URI
date: Request -> Time

date: Message -> Time
ent: Message -> Entity
lmod: Message -> Time
exp: Message -> Time

asserts
forall r: URI, e: Entity, t1, t2, t3, t4: Time
rep(r, e, t1, t4) /\ t1 <= t2 /\ t2 <= t3 /\ t3 <= t4 =>
rep(r, e, t2, t3)

http(q, a) <=> rep(addr(q), ent(a), lmod(a), exp(a))
/\ lmod(a) <= date(q) /\ date(q) <= exp(a)

Then consider a protocol interaction like:

C1 -> P: GET http://S/path q1
P -> S: GET /path q2
S -> P: 200 ok a2
Date: tsresp
Last-Modified: tmod
Expires: texp rep('http://S/path', e, tmod, texp)
P -> C1: 200 ok a1
Forwarded: tPresp
Date: tsresp
Last-Modified: tmod
Expires: texp

C2 -> P: GET http://S/path q3

Try to prove that the proxy can return ent(a2) to C2.
1 thru 6 are the assumptions, 7 is the proof goal:

1. http(q2, a2) % origin server says so
2. addr(q3) = addr(q2) % proxy requires this
3. date(q3) < exp(a2) % proxy checks this
3' date(q3) > lmod(a2) % proxy checks this
4. ent(a3) = ent(a2) % proxy builds reply matching a2
5. lmod(a3) = lmod(a2)
6. exp(a3) = exp(a2)
7. Show http(q3, a3)
8. rep(addr(q2), ent(a2), lmod(a2), exp(a2)) 1, http I
9. rep(addr(q3), ent(a3), lmod(a3), exp(a3)) 8, 2, 4, 5, 6
10. lmod(a3) <= date(q3) /\ date(q3) < exp(a3) 3, 3', 5, 6
11. http(q3, a3) 9, 10, defn http

The trouble is that this keeps falling lower and lower on my
to-do list. :-(

Dan

p.s. note that format negotiation isn't addressed above. I'm
beginning to think that a server can't be bound to return
"the best" representation of a resource -- it just won't
scale. Rather, if a resource has multiple representations,
it must return one, and notify the client that others are
available.

And rather than asking "does the proxy return
exactly what the origin server would return?" we should ask
"does the proxy return something that the origin server
_could_ legally have returned?"

The net effect is that in the case of:
C1 prefers text to html, queries S for U via P, gets text D1
P caches U -> text D1
C2 prefers html to text, queries S for U via P, gets text D1
(because P had it cached)

P is acting correctly, provided that S has told P that the
html version is available (via URI: or Link: headers) and that P
passes this info along to C1 and C2. If the client doesn't like
what it gets, it can ask for the other version with "Pragma: no-cache"
in a subsequent transaction. The idea is that the proxy would
cache "the most popular" representation. The needs of the many...

Another idea that struck me is that the lack of an expires header
should be interpreted not as "it expires now" but "it expires
sometime soon, and clients can guess heuristically." In other
words, the onus is on the origin server to send out Expires:
headers; else proxies/clients are free to cache for some time.

The last idea I'll add here before I go is an answer to
"what does the Expires: header mean?" In the case of HTTP,
we're not so much concerned with when the data in the entity
becomes uninteresting, like a part announcement or a 3-month
sale. Rather, we're concerned with the binding between
a URL and an entity.

So in response to a query for U, an Expires header giving time t
says "the enclosed entity is a representation of the resource
identified by U until t."

Anybody who wants to help write this document is welcome to
step in!

Dan