Caching and metainformation

Kevin Altis (altis@ibeam.jf.intel.com)
Tue, 15 Feb 1994 11:39:52 --100


At 8:54 AM 2/11/94 +0000, Rik Harris wrote:
>Has this project involved any discussion wrt caching? A proxy server
>would be the ideal place to cache documents that don't require
>authentication. I realise there are some types of documents that
>shouldn't be cached, but for a test system, a config file could
>specify that 'http://www.some.host/auto/*' should not be cached.
>
>I know the topic has been brought up on the list before, but is anyone
>actually working on it? Now that we have some clients that have been
>modified for proxy support, perhaps the caching discussion could be
>renewed?

Actually, the cern_httpd 2.15 supports caching in addition to proxying.
Unfourtunately, we don't have much client or server support right now for
all of the fields that you would want to check to do proper caching. Here
is an excerpt from an earlier message reply to me by TimBL.

At 12:20 PM 2/9/94 +0000, Tim Berners-Lee wrote:
>> Doing HEAD, expires, etc. is suddenly going to get important. Might put
>> some pressure on the URNs issue as well.
>
>Yes. Also, the Public: is important. We must get the default understanding
>completely clear. At the moment in HTTP is seems as though Public: is
>just informational, as in fact if anyone really wants to test access then
>they can just try it. With caching, the Public: allows the caching server
>to return it directly. If we specify the current assumption that
>if nothing is specified then the document is public, This is
>
>NOT fail-safe. Would it be better to make that assuption ONLY if no
>Authorization: header was sent?
>
>IE
> If Public: present, it is definitive.
> Else if authorization was needed, then assume NO public access
> else if Allowed: is present, assume public access is same
> else assume public access is GET only.
>
>I'll put that in the spec -- if anyone has any troubles with this
>say now.

A document that required authentication should not be cached, since it
implies the caching server have the same authentication rules and
information as the server that the document came off of. Also, a caching
server is going to store its documents on a file system that is probably
accessible by people not authorized/authenticated to read the information.
Servers implementing something like ChargeTo: probably don't want their
information cached either :)

Object MetaInformation
In the cases of documents that are candidates to cache, we need to
explicitly state which HTTP metainformation "fields" a server needs to send
in order for the document to be cached correctly. Tim mentioned Public:,
there is also the need to send Version:, Date:, Expires:, and Last-Modified
which may imply different caching strategies depending on their values. You
can also ask how the HTTP server is going to fill in that metainformation.
Is it part of the <HEAD> of an HTML document? If so, what happens with
binary files where there is no <HEAD>? I have two or three documents that I
never want a client or proxy server to cache such as a stock price or
weather document, so those need an "Expires: Always" kind of
Metainformation. Most other objects that I serve can probably be cached
based on the modified date and time of the document in the file system or
modified field on information from a SQL database.

Given an URL, Clients and proxy servers need to be able to ask for the HEAD
or metainformation parts of an object. It might be beneficial for the
client and server writers to describe which fields they currently supply or
request so they can see and agree what to add.

ka