HTTP2: caching and copyright

Tim Berners-Lee (timbl@www3.cern.ch)
Fri, 8 Jan 93 17:42:32 +0100


These are comment son the second part of Dave's note.
They refer to the new HTTP spec and a hypertextversion of RFC850 which I made
in order to be able to cross-reference to it (and make it prettier).

> Caching
> -------
>

> It will be desirable to avoid overloading servers with popular documents by
> supporting a caching scheme at local servers (or even at browsers?). This
> implies that document headers should provide sufficient information to make
> this practical.

Like "Expires:" for example. See
http://info.cern.chhypertext/WWW/Protocols/HTTP/Object_Headers.html

...>

> o the document header should *always* include a "Date:" field giving
> the date it was last written to

Agreed. Now why in the spec did I say that Date: was the creation date?
In any event, we need last-modified as well as created dates. I chose
to use the existing Date: field of RFC 850 as the creation date.
I guess the modified date is more available, but if you are tracing things
the creation date is a better thing to file under. For a mail/news article
of course they are the same.

> o the "Expires:" field is optional

agreed.

> o the date values should be in a prescribed format to simplify
> machine interpretation (Is this adequately defined by existing
RFCs?)

agreed. yes it is, in RFC850: in

http://info.cern.ch/hypertext/WWW/Protocols/rfc850/rfc850.html#z10

(pretty soon we're going to HAVE to have hypertext mail!).

> I think that we need to provide an operation in which the server returns a
> document only if it is later that a date/time supplied with the request. If
> it is the same (or earlier) the server should return a suitable status code
> and an optional "Cost:" header, see below.

Need to look at NNTP here. We end up getting very close indeed to it.
I would want the functionalty of this search to map onto the NEWNEWS
very nicely. A newsgroup is just a hypertext list anyway.

> Note that servers shouln't cache documents with restricted readership since
> each server don't know the restrictions to apply. This requires a further
> header to identify such documents as being unsuitable for general caching:
>

> Distribution: restricted | unrestricted

Good point. Not the the distribution of other messages is in the form of
To: and Cc: and Newsgroup: and in fact Distribution:. (See
http://info.cern.ch/hypertext/WWW/Protocols/rfc850/rfc850.html#z12)
So you'll need a new fieldname. If we could only merge the functionality of
these systems in some cool way, it would be grand.

When looking at protection for more than just GET, I came up with the
Public: *<method>
Allow: *<method>
lines documented in the list of object headers I mentioned above.

> This header is only needed for documents with restricted readership.
> An dirty alternative would be to set the expiry date to the same value as
> supplied with the "Date:" header.

Yes but that would not be legally binding. It could also mean "this document
is a live status: fetch it as often as you can".

> Copyright & Payments
> --------------------
>

> Although the Internet backbone restricts profit making services, many
subnets,
> such as University campuses, and company subnets such as HP's have no such
> problem. Indeed users strongly want access to copyrighted information for
> which a payment is due.
>

> My suggestion is that servers are responsible for tracking who accesses what
> information, and hence how much they owe. For use within Hewlett Packard for
> library services, we anticipate including some extra headers in the request:
>

> EmployeeNumber: 148689
> LocationCode: 8126 (an account number for cross charging)
>

> This would be stripped off when sending requests to servers outside the HP
> subnet. These headers are ignored by servers which conform to strict HTPP2.
>

> I would like the document header to include an optional cost header, e.g.
>

> Cost: 4.05 US DOLLARS
> Copyright: Reuters Inc.

I note here that both the copyright holder and the account for charging are
items in some address space, and we ought to be as flexible with these address
spaces as wit the udi. So I would propose something like

ChargeTo: HPInternal:/8126/148689 upto $2.00

would be better. But how does this fit in with authentication? Once you are
authenticated, your prefered method of paying will be known. You can't have
charging without authentication!

There is a little problem with sending an "upto $2.00" field as it requires
honesty of the server not to just charge you $2.00 flat! This is a real
problem, as otherwise one needs twice the round trip delays, which we really
object to, if one prefers the fairer

C GET junk
S No way: you pay $2.00 first
C GET junk I promise to pay $2.00
S *junk

> This would let the users know how much a given document has cost them, as
well
> as who owns the copyright. The latter heading is needed since you can't
always
> put it in the document, e.g. think of photographic images.
>

>

> The "Cost: 4.03 US DOLLARS" field
>

>

> Copyright and Caching
> ---------------------
>

Have you read "Litterary Machines?" That goes into this in a lot of detail,
or at least Xanadu should have done.

> What happens if a copyright protected document is saved in the cache of a
> local server? We have got to ensure that the rightful owners get paid for
> access even when the document is obtained from a local server's cache.
>

> My idea is that for each access, this server should inform the server on
which
> the original document resides.

yes -- hence we need three classes, free, forpay, restricted. I wonder whwther
these were Brewster's 3 types of flag in his proposed WAIS udi.

> The protocol ought to allow for multiple GOT statements (and associated
> headers in the same message. For this it seems simple enough to require a
> terminating blank line.

Hey, that;s not something you do for one method, it's a change to the whole
protocol to introduce pipelining.

A simple thing in the first instance is to say that it illegal to cache
a for-pay document unless you have a privat earrangement with the owner
about refunding him. This could be done using a completely separate billing
process.

> Naming Parts of a Multipart body
> --------------------------------
>

> It would be nice to use the MIME format's capability to send multiple
> documents as part of the same message, e.g. an HTML doc with several
> pictures. To make this work each separate part needs to include the
> Document Udi in its header, so that the browser can check if it has the
> document in its local cache (history stack) or whether it needs to make
> network request for the picture etc.
>

> DocumentName: Udi

Sure. This is a very useful thing to put in a mail message anyway.
Like

Archived-as:
or
Available:

> Effective support for discussion groups
> ---------------------------------------
>

> My model is that discussion groups each have unique Udi's. Each discussion
> group has a sequence of base notes, and each base note is associated with a
> sequence of responses. I am unsure of how to deal with cross postings!

I agree that the POST method is well defined as a method of the
newsgroup class which takes an article as a parameter. In fact, as you say,
cross-posting makes a mess of this, as it involved many groups in one atomic
operation. This is a peculiarity of news which makes it difficult to map onto
the object model. Any ideas?

...
> You also need a way of retrieving a given response. One way is to ask for
the
> list of Udi's for all the responses,

Yes.

> another is a command to get a particular
> response given the Udi for the base note and a sequence number, e.g.

No -- not sufficiently stateless.

> I assume the POST command can be accompanied by an html doc as a body.

yes

> Looking forward to your comments,
>

> David Raggett

as you see mine got shorter with time...

tim