Re: forwarding cache requests

Markus Stumpf (stumpf@informatik.tu-muenchen.de)
Tue, 22 Mar 1994 00:33:46 --100


reinpost@info.win.tue.nl (Reinier Post) writes:
[ sorry, I've restructured the text a little ]

>Cache-date: <date>
> the time the document was served from the cache in answer
> to the present request

So this is the same as "Date:" ???

>Cache-last-refreshed: <date>
> the time the document was last fetched into the cache,

Do you mean with "fetched" "checked to be valid" ???

>Cache-last-modified: <date>
> the time it was last fetched and found to be different from the
> previous version,

This is the only one I am currently sure I know what you mean with it :/

>Cache-via: <url> [, <url>]*

Would here be a approach like the SMTP mailers do with adding Received:
lines sufficient? i.e. allow for a unlimited number of those tags
and each cache/proxy that gates the document adds a "identifier"
that it could recognize again and thus detect loops.

Okay, let me explain my thoughts on this topic.
I have a working proxy/cache server running based on ncsa httpd-1.1.
I did the proxy module, Guenther Fischer from Chemnitz made the cache
module. The approach Guenther uses is as follows:

If you don't have the document in the cache, fetch it and put it in the cache.
If you have the document in the cache,
check with a stat() the last modification time of the file.
if this is longer than a certain timeout
send a HEAD request and check if file has changed.
if it has changed, update the cache
else update the last modification time of the file (utime()).

I currently don't know the strategy of the CERN server, but I think it's
rather similar.

What do we need for "good" caches.

1) "forwarders": if you want to reduce e.g. national and international
traffic, one could imagine a big national cache, which acts as proxy
to international sites that could be used by local proxy or cache
servers.
2) we should be able to have read/write and read/only caches.
we could have a "master" that writes the cache and "slaves" that
only have read/only access to the cache and ask the master to
update the cache if necessary. This would allow distributing
the load of fetching documents to some machines accessing the same
cache over e.g. NFS and having the burden of updating the cache,
which is IMHO rare, as most of the documents are rather static,
to one master.

What is needed for inter-cache communication:

o what I think is REALLY URGENTLY NEEDED is another way to handle
GET requests. I'd asked that before but got no answer. I don't
see any problems in requesting more than one document within
one server connection. BUT currently all servers close the connection
after the last byte sent, EVEN if there is a Content-length: field.
I'd like to propose that if there is such a field the client
has to close the connection if it doesn't want more documents from
that server or be able to send another GET or whatever!
(but maybe this should be discussed under another subject).

o one great idea is the conditionally GET via If-Modified-Since:
The only problem I see currently is: how do I determine as
forwarder or client if the other side supports it? If I send
a conditionally GET and the server on the other side does not
support it, it will send the document and this is currently worse
than the possible overhead of sending a HEAD followed by a GET.

o As with the approach Guenther Fischer uses a tag like
Cache-last-modified: would be sufficient, as the client or forwarder
could see from this date, when the cache server has last checked for
accuracy of the document in the cache. What would be informative
would probably be a Cache-Update-Interval: tag (in minutes) for this
specific document, to tell the forwarder when it is useful to
check for this document at this cache again, or, that it would
not get a newer version from that cache within the next n minutes
anyway. (of course the Cache-via: useful and needed!)

Is this sufficient?
Comments? Ideas?

\Maex