Re: Common Log format

Roy T. Fielding (fielding@avron.ics.uci.edu)
Fri, 14 Apr 1995 04:43:04 +0500


> I have a serious problem with the Common Logfile format, as presented at
> <URL:http://w3.org/hypertext/WWW/Daemon/User/Config/Logging.html#
> common-logfile-format>. It indicates that the "request" portion of the
> log entry should be:
>
> The request line exactly as it came from the client.

Yes -- that is what a log is for.

> Unfortunately with directory indexing, this means that three different
> requests all have the same semantic meaning:
>
> GET /dirname
> GET /dirname/
> GET /dirname/index.html
>
> (Assuming that index.html is the dir index file, this too can vary.) Are
> the current logfile processing programs taking this vagarity into
> account?

Yes, it is a trivial thing to do -- wwwstat has done it since v0.1.

> I intend to log
>
> GET /dirname/index.html
>
> in all cases where index.html existed, and
>
> GET /dirname/
>
> in all cases where it doesn't, unless somebody can provide me with a
> really good reason not to.

Reason: it would by lying -- that is not the request it got, so it shouldn't
be logging it as if it was. For instance, I am usually interested in cases
where there are a large number of requests for

GET /dirname

since that usually means somebody has advertized (or included as a link)
the wrong URL for that dirname. Your scheme would prevent me from finding
those cases in the logfile.

> One of the features of the server I am
> writing will be reliable logging, so this is a little more important than
> it might sound.

In that case, don't do it -- you just introduced an unreliability.
If the server mucks with the request, I can't rely on it for maintenance
and security checks.

....Roy T. Fielding Department of ICS, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>