Re: Broken links, are we ever going to address them?

Martijn Koster (m.koster@nexor.co.uk)
Mon, 23 Jan 1995 13:37:56 +0100


Paul Phillips wrote:

> This isn't going to fly with the current Referer implementations.
> Too many browsers lie, especially all the Mozillas which constitute
> over half the web clients currently. Even if every version written
> from now on were accurate, the sheer number of liars deployed will
> results in too many false positives. I get dozens of the MCOM home
> URL in my Referer logs on a daily basis.

I think this is not so much of a problem:

- As Paul Burchard pointed out, Netscape is likely to be fixed RSN.
- WWW users are like little kids; they always want the latest browser,
so upgrades propagate quite fast.
- In the specific case of the infamous NCOM home page it would result
in lots of BROKEN notifications to their server, not your's :-)
- If the server also sends a 'By-User-Agent' field you could even
filter out broken useragents.
- A script can atually confirm the reported misses (by retrieving the
page and scanning for the reportedly broken URL)
- If your documents have few broken links you won't have many
notifications, and therefore few false positives.

I used to use Referer to make inverse maps. With the amount of lying
implementations I had to give that up. But I still successfully use
Referer when investingating broken links. Because few people do
anything with Referer there is little pressure to fix the bugs. If it
was used in the way I proposed this pressure would mount.

I think we have probably been too easy on browser writers, which may
stem from the time that they were written/maintained on a spare-time
best-effort basis. :-)

I really don't think we should be design the mechansim around broken
implementations. Fix the implementations.

> There also needs to be a more reliable way of ascertaining the
> maintainer of a page. There are a few machine heuristics and a few
> more human ones that can work, but no reliable method. Even a ~user
> URL isn't guaranteed to be able to receive mail at the same machine.

My main concern with this is that you take the notification out of the
HTTP realm, and rely on external means of notification. These will be
more difficult to standardise and automate, even apart from the
difficulty of designing a solid method of identifying a user.

By keeping it in HTTP and between servers you can always add/change
this identification and notification process by changing how a server
handles it.

> I don't know if there's an easy answer to this, but I do think it's
> important that it be addressed. The web, like the Internet, has
> maintained the cooperation model for a long time, but the influx of
> corporate users is high. If a broken URL generates a noticeable
> number of errors to such a server, they may not be inclined to just
> patiently wait for it to get fixed.

Quite. I'm starting to consider if retrieving a broken URL from my
server constitues acceptable use :-) It is especially infuriating if
you find broken URL's that you know have never ever worked.

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html