Re: img statistics

Koen Holtman (koen@win.tue.nl)
Sat, 7 Oct 1995 14:26:11 +0100 (MET)


Hakon Lie:
>I'm looking for pointers to statistics on inlined images on the
>web. E.g., I'd like to know:
>
[...]
> - what percentage of HTTP traffic is image/gif, image/jpg, text/html
> etc.
[...]

>Thanks,
>
>-h&kon

I recently made some statistics of HTTP traffic through a local proxy
cache. Among other things, I made media mix statistics.

I guess this is as good a time as any to write up the results. Here
is a small report. Table 2a answers your question above.

HTTP traffic through the www.win.tue.nl CERN proxy cache.
---------------------------------------------------------

Oct 7, 1995
Koen Holtman, koen@win.tue.nl

The www.win.tue.nl proxy cache serves some of the web clients in the
.tue.nl (Eindhoven University of Technology) domain.

In the statistics below, only HTTP requests for URLs on off-campus
servers, servers outside the .tue.nl domain, are considered.

Most of the clients served by the proxy cache are configured to also
cache things locally themselves, so the figures below do not reflect
accurately the media mix _seen by the users_, they only tell how many
network traffic was generated to get this media mix.

I took a sample of about 18 days worth of traffic. To compute the
amount of traffic, only the content lengths of the responses were
counted, the request and response header overhead was ignored. I
estimate that counting the headers would add about 5% to the amount of
traffic.

1. Amount of HTTP traffic

HTTP traffic between proxy cache and off-campus servers: 145 Mb

HTTP traffic between local clients and proxy cache: 204 Mb
HTTP traffic served from cache memory to local clients: 59 Mb

Reduction in off-campus traffic caused by the proxy cache: 30%

The media mix figures below are based on analyzing the syntax of the
request-URLs, not on the mime types of the responses (because these
mime types were not logged). For example, a response to a request on
the URL

http://hoohoo.ncsa.uiuc.edu/images/fwd.gif

is counted as a gif picture because of the .gif extension.

There are 4 different media categories below:
text : plain text or HTML
gif : gif images
jpeg : jpeg images
other: data in some other for example .mpeg, .mov, .au, .gz, .zip, .ps.

2a. Media mix of HTTP traffic between proxy-cache and off-campus servers

text gif jpeg other (gif+jpeg)
30% 22% 20% 28% (44%)

The 28% off-campus `other' traffic can be divided into

19% : some movie format
9% : not a movie format.

This 19% of all traffic was generated in only 35 requests (0.2% of all
requests) coming from only 9 different clients. Thus, the `other'
percentage above is probably not very meaningful in a statistical
sense. I expect large variations in `other' size and contents across
the web.

2b. Media mix of HTTP traffic between local clients and proxy-cache

text gif jpeg other (gif+jpeg)
41% 21% 16% 22% (37%)

2c. Media mix of HTTP traffic served from proxy cache memory

text gif jpeg other (gif+jpeg)
68% 18% 8% 6% (26%)