Re: "Hits" pragma

Brian Behlendorf (brian@organic.com)
Sat, 12 Aug 1995 16:17:42 -0700 (PDT)


(taken off http-wg until we have a real proposal)

On Sat, 12 Aug 1995, Paul Burchard wrote:
> Brian Behlendorf <brian@organic.com> writes:
> > What would the proxy administrator get out of this? Well,
> > the more info that can be forwarded, the more likely
> > content providers will start putting useful Expires in
> > their documents. Web protocols of course should not be
> > designed around "who's more selfish", but hopefully
> > there's a common ground that can be reached.
>
> Finding this common ground is the crucial point. Could you perhaps
> whittle your "wish list" of reporting information down to a
> "requirements list" or even a "prevention of open rebellion list"?

Hmm. Okay. We can throw out User-Agent since there should be no statistical
significance between the user-agents hitting a cached copy and the ones
getting a fresh copy (just weight the numbers as per the hit counts). If
timestamps were thrown out then there'd be the temptation to make the
expiration time very short, so I'd vote we keep those in in some highly
compressed format. If hostnames were thrown out that would prevent analysis
of top-level domains (.coms vs. .edu's) as soon as we start seeing
cross-top-domain caches (the Hensa cache is probably the only one doing this
currently), but that too could be compressed well since caches probably will
see a lot of common hosts. RequestID and Referer are useful for pathway
determinations - if one had to be chosen over the other it would probably be
referrer (n.b. - if referrer was a *required* header, then there would be
little need for request-ID).

So, it looks like a structure of

host timestamp referer

would satisfy a most applications. I would think this information could
be *very* well compressed.

> My main point is that there _is_ a way to start a positive feedback
> loop and get out of this prisoners' dilemma ("who's more selfish"):
>
> (1) merge reporting into the ordinary, profit-making operations
> of the proxy (by forwarding "bundled" requests).

Right. Smart web servers could be configured to know, say, the Hensa
cache will provide them with reliable information, and make the Expire time
on documents served to Hensa much longer than for less reliable
caches. Cool!

> (2) make sure adoption of such reporting by major proxies will act
> as a positive incentive for servers to start using Expires
> correctly (if you use Expires right you get periodic reports
> automatically, while if you don't you get blocked!).

Yes.

> > *every* client of ours wants stats as to the busiest time
> > of day for their sites
>
> I don't get it....isn't the point of electronic commerce to break
> out of the constraints of space and time that limit ordinary
> commerce?

A peak usage time of Saturday nights at 9pm for one site and a peak of
Wednesday at noon (adjusted for origin of request when possible of course :)
for another can say a lot about the reasons people visit, and can influence
the decisions made about what other content to put on a site.

Far too many companies have been given Orwellian promises about what data
they can get, unfortunately.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/