Broken links, are we ever going to address them?

Martijn Koster (m.koster@nexor.co.uk)
Sun, 22 Jan 1995 20:09:41 +0100


Hi all,

I have just spent my weekly half hour looking at my server's error
logs, trying to find failed retrievals that are caused by broken
links. Thanks to the CERN HTTPD's handy logging of Referer, and an
error-log summarising Perl script I ended up writing, this is
reasoneably effective. I fixed one local broken link, and mailed about
5 remote sites. Especially the latter is tedious; you have to make
time to write a message (even if that is also facilitated by a Perl
script :-), and you're fixing other people's mistakes (some of the
broken URLs never even existed!)

So I'm wondering if we're ever going to do something about this. It
is obvious that many people (including myself) don't (want to) run
local link-cheking robots regularly enough. And the "offending"
servers don't know they're serving broken links unless they're local.
But that ought to change, IMHO.

Some random thoughts about this:

Idea 1: This could be changed by having clients that find a broken
URL send the offending server an HTTP/1.1 method BROKEN, with two
fields: URL (the broken URL) and Referer (the URL of the page with the
broken URL). A server can then log this, for later analysis by
humans/Perl scripts/whatever. Obviously a client doesn't do this if
the user cancelled or the connection timed out, or if there is no
Referer.

This is an overhead for the user-agent (although it shouldn't be too
bad if it's asynchronous), but something might then be done about it,
which benefits the user-agent's user.

A drawback is that the server may get many repeated notifications,
although this could be a feature (you know which broken links to
tackle first :-)

Idea 2: rather than the client implementing this, the server can do so
instead; when finding a failed URL it can initiated the BROKEN method
to the server found in the Referer (pity so many Referers lie). This
also reduces the repeats if a server remembers it has flagged a
particular error situation.

Introducing this means that servers unaware of this method will
complain , which isn't really a problem as any errors need not be
shown to the user.

The unfortunate thing is that it requires the servers administrator to
take action. This can be either finding the relevant user (easy if you
use a UserDir feature), or publishing the list for people to look at.
I'd be quite prepared to do this, and I'm sure a server or an log
analyszer can automate the process. But it would be better if the
actual maintainer of the document could automatically be notified.

Idea 3: send the repsonsible person's mail address in the http
request. The client can then mail automatically when it finds a broken
link, and the user can mail manually with other comments. I don't
like that at all, as it relies on email, fills up your mailbox, and can
easily be abused.

Idea 4: Instead of sending an email address, send a URL that is to be
retrieved should the link be found broken. This can be a script that
either logs the event, or is clever enough to take another action.
This is a variant of Idea 1, but smells rather of a hack.

In all of the above special consideration has to be given to caches;
obviously a cached document can still be wrong if the original has
been rectified.

My favourite so far is number 2.

Does anybody else think this should be addressed? Does anyone else
check their logs/would they check such a BROKEN log? Are there any
other/better solutions?

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html