Re: pragma no-cache -- Can we make it more useful?

karl@cavebear.com
Thu, 14 Jul 94 15:17:35 PDT


> > It doesn't always work.
>
> It *will* always work.

You are right, I mis-read your code to say the "client" fetched
from the original server.

However, I think you are misunderstanding my subsequent objections.

The idea that the server puts an expiration date on data only solves
part of the overall problem. It warns clients & caches that the image
they have has expired.

The if-modified-since header modifier for GET is useful to ensure
that the copy one has is the absolutely latest that is available.

The problem these mechanisms do not solve is that of the client who
wants recent data but who can get buy without paying the cost for the
absolute latest version.

I assert that this is an extremely common need.

And it is extremely costly of network resource (and wasted user time)
to give the binary choice of living with what is in the cache or
verifying (and retrieving) the absolute latest version.

I was the one who put forth the example of the stock charts. Your
modified version of my example had new charts generated every 5
minutes, so the server could make a good guess as to the expiration
time. However, data that is generated at random times during the day,
for example, the list of trades which have occured can not be given a
useful "expire date", it can only be updated at the master server. Or
another example, the database that comes back when you "finger
quake@geophys.washington.edu" is updated at quite random intervals.

So, suppose I want to know the list of earthquakes which have occurred
over the last 12 months. I probably don't care that the list I get
back is 100% up to date, so I want to say to my WWW client, get me the
list as it was anytime in the last 24 hours.

If I'm going through a cache, the current design doesn't give my
client enough information to decide whether the cached version is new
enough. I can learn that the document in the cache was generated 72
hours ago, but that doesn't give me any clue whether a newer version
may exist.

If, however, the cached image contains a timestamp showing when it was
last taken from the master image, then my client can know that the
document in the cache is accurate as of that time and can use that
time to determine whether the cached image is new enough for my needs.

It may very well be in practice that I end up refreshing the cache with
an identical copy -- identical that is except for the timestamp which
indicates when it was copied from the authoritative server.

> Someone recently used a stock market as an example of a type of
document
> that should have different expirations. This is an easy thing to solve:
>
> During the day, the quotes are updated every 5 (or some small n)
> minutes. So set the expiration for 5 minutes.
>
> The last quote of the day won't expire until 5 minutes after the
> opening of the market on the next business day. So set the
> expiration for then.
>
> The idea in the previous examples is to show that a server can often be
> "smarter" than a person in deciding when the document cache should be
> refreshed. Why should the system require a reader to understand when the
> information is old? Pure performance hints are something else... but there
> I would still like to see the reader put limits on the cost of the transaction
> instead of trying to explicitly tell the server how to do something.

The publisher of information can only put an upper bound on its value.

And, what I think people are missing, is that there is a middle ground between
the absolutely most recent version of a document and the version that is in
the cache.

If I have a client that needs recent, but not absolutely the latest
version, of a document, the current architecture forces the client to
force a cache miss. That is extremely inefficient and wasteful.

--karl--