Announcing wwwstat-1.0 -- an NCSA httpd access log summary program

Roy T. Fielding (fielding@simplon.ICS.UCI.EDU)
Sat, 23 Apr 1994 02:29:28 -0700


This message is to announce the availability of wwwstat Version 1.0 --
a program for analyzing NCSA httpd_1.2 or earlier server access logs and
printing an HTML-formatted summary report. The program is written in Perl
and, once customized for your site, should work on any UNIX-based system
with Perl 4.019 or better.

As an example of what wwwstat can do for you, look
<A HREF="http://www.ics.uci.edu/Admin/wwwstats.html"> here </A>
to see UC Irvine's Department of Information and Computer Science
WWW server statistics.

For more information and access to the wwwstat-1.0 distribution,
point your World-Wide Web client at

<A HREF="http://www.ics.uci.edu/WebSoft/wwwstat/"> wwwstat-1.0 </A>.

For those of you without offsite http access but with ftp access, wwwstat is
also available via anonymous ftp at:

<ftp://liege.ics.uci.edu/pub/arcadia/wwwstat/wwwstat-1.0.tar.Z>

One of the nicest things about wwwstat is that it does not make any changes
to or write any files in the server directories. Thus, this program can be
safely run by any user with read access to the httpd server's access_log and
srm.conf files. This allows people to do specialized summaries of just the
things they are interested in.

Version 1.0 provides a plethora of options for creating customized
reports and for making it easier for webmasters to maintain their server.
It is also significantly different (more features) than version 1.0a, so
if you picked that up over the past few days you will want to get this one
as well. I do not anticipate any more versions until mid June at the
earliest, since I'll be in Europe for most of May (see ya at WWW'94!).

Usage: wwwstat [-helLoOuUrvx] [-s srmfile] [-i pathname]
[-a IP_address] [-c code] [-d date] [-t hour] [-n archive_name]
[-A IP_address] [-C code] [-D date] [-T hour] [-N archive_name]
[logfile ...] [logfile.gz ...] [logfile.Z ...]

Display Options:
-h Help -- just display this message and quit.
-e Display all invalid log entries on STDERR.
-l Do display full IP address of clients in my domain.
-L Don't (i.e. strip the machine name from local addresses).
-o Do display full IP address of clients from other domains.
-O Don't (i.e. strip the machine name from non-local addresses).
-u Do display IP address from unresolved domain names.
-U Don't (i.e. group all "unresolved" addresses under that name).
-r Display table of requests by each remote ident or authuser.
-v Verbose display (to STDERR) of each log entry processed.
-x Display all requests of nonexistant files to STDERR.
Input Options:
-s Get the server directives from the following srm.conf file.
-i Include the following file (assumed to be a prior wwwstat output).
... Process the sequence of logfiles (compressed if extension (gz|Z|z)).
Search Options (include in summary only those log entries):
-a Containing a hostname/IP address matching the given perl regexp.
-A Not containing " " " " " " " "
-c Containing a server response code matching the given perl regexp.
-C Not containing " " " " " " " "
-d Containing a date ("Feb 2 1994") matching the given perl regexp.
-D Not containing " " " " " " " "
-t Containing an hour ("00" -- "23") matching the given perl regexp.
-T Not containing " " " " " " " "
-n Containing an archive (URL) name matching perl regexp (except +.).
-N Not containing " " " " " " " "

What's new in this version:

Version 1.0 April 23, 1994
Now supports the NCSA httpd_1.2 "common" log format.
As a result, all attempts to figure out file size are gone
and there is no longer any need for all those fstat tests.
Code for srm parsing of aliases and scripts has been removed.
Basically, the entire log parsing section was rewritten and
then placed in a subroutine to allow for multiple logfiles.
Bunches of unnecessary backslashes removed from print statements.
Time of last update now includes GMT offset instead of full GMT.
Tries to estimate size of headers and error messages to account
for bytes that are not included in the log entry byte count.
Allows perl regular expressions (where possible) in all searches.
Allows multiple logfiles to be analyzed in sequence, with any
compressed logfiles automatically recognized by their file extension.
Removed -f and -z options because they are no longer needed.
Added -c option for searching based on server response code.
Added the uppercase options -A, -C, -D, -T, and -N which perform
the negation of the corresponding lowercase letters, i.e. they
force wwwstat to not include any log entries with the given pattern
in the address, response code, date, time, or archive name.

Version 0.4 (now called oldwwwstat) April 19, 1994
Removed escapes to allow regular expressions in -d and -t searches.
Fixed minor bug of outputing </HEAD> instead of </HTML>.
Made use of $startTag and $endTag explicit for report output.
Added option to append subdomain info on end of local hosts.
Added support for IdentityCheck (rfc931) logfile format.
Added output of Totals by Remote Identifier when Do_Ident is requested.
Added -r option to select Do_Ident when IdentityCheck is enabled.
NOTE: For security reasons, you should not publish to the web any
report that lists the Remote Identifiers. This option is
intended for server maintenance only.

If you have any suggestions, bug reports, fixes, or enhancements,
send them to me at <fielding@ics.uci.edu>. Also, I would like to ask anyone
who uses wwwstat on a regular basis to please send me an e-mail message which
indicates how and where it is being used (i.e. to publish stats, perform
research, assist in server maintenance, and/or just allow HTML authors to
see how much their work is appreciated) and also, if it is public information,
a URL to your site. This is, of course, only voluntary and I don't want
anyone to divulge private information, but please understand that such
information allows free-software authors like me to justify the time and
effort needed to build quality tools.

Have fun,

...Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)
-------------------------------------------------------------------------
This work has been sponsored in part by the Advanced Research Projects
Agency under Grant Number MDA972-91-J-1010. This software does not
necessarily reflect the position or policy of the U.S. Government and no
official endorsement should be inferred. Their support is appreciated.