Re: Customer pull on HTTP2

Kevin Hoadley (K.Hoadley@directory.rl.ac.uk)
Fri, 8 Jan 1993 14:32:21 +0000 (GMT)


Dave Raggett raised some interesting issues in his message. In
particular:

> Caching
>-------
>
> It will be desirable to avoid overloading servers with popular documents by
> supporting a caching scheme at local servers (or even at browsers?). This
As well as caching, replication would be nice. But this is
only practical if resource identifiers do not contain location
information (otherwise replication is only possible by making
all the peer servers to appear to be one machine, as in the
DNS CNAME suggestion I made some time ago).
But if resource identifiers do not contain host information
then you need an external means of determining how to reach
the resource. This is analagous to routing protocols (an address
is not a route ...)
Such a system is probably over ambitious for now. Anyway,
back to caching ...

> Servers need to be able to work out what documents to trash from
> their caches.
> A simple approach is to compare the date the document was received with the
> date it was originally created or last modified. Say it works out that when
> you got the document it was already one week old.
> Then one rough rule of thumb
> is to trash it after another week. You can be rather smarter if there is a
> valid expiry date included with the document:

I think this is silly. I haven't changed a document for
six months, therefore it is safe to say that it won't be
changed for the next six months ...
This also depends on hosts agreeing on the date. To quote
RFC1128, talking about a 1988 survey of the time/date on
Internet hosts, "... a few had errors as much as two years"

> I think that we need to provide an operation in which the server returns a
> document only if it is later that a date/time supplied with the request.

This would be useful as part of a replication system,
as long as both ends exchanged timestamps initially so
that the dates can be synchronised.

> Note that servers shouln't cache documents with restricted readership since
> each server don't know the restrictions to apply. This requires a further
> header to identify such documents as being unsuitable for general caching:

and also ...

> What happens if a copyright protected document is saved in the cache of a
> local server? We have got to ensure that the rightful owners get paid for
> access even when the document is obtained from a local server's cache.

It may be stating the obvious, but once you allow a
user to access you data such that they can save it, there is
no technical way you can prevent them from publically
redistributing your data. This is a social/legal problem,
not a technical one.
Accepting that nothing can be done to stop deliberate
abuse of licensed information, there is a need to prevent
accidental abuse. Probably the simplest way to do this is
to mark the document as one which should NOT be cached.

Perhaps this leading towards a very simple minded
caching scheme a la DNS, where information is returned
together with an indication of its "time to live" (TTL),
ie how long this can reasonably be cached. Setting a default
TTL for a server gives an idea of the "volatility" of the
information contained therein
Unless a document is exported with world read access,
it should always have TTL of 0.

Kevin Hoadley, Rutherford Appleton Laboratory, khoadley@directory.rl.ac.uk