Re: web roaming robot (was: strategy for HTML spec?)

Tony Johnson (TONYJ@scs.slac.stanford.edu)
Wed, 13 Jan 1993 21:07 PDT


>I have written a robot that does this, except it doesn't check for
>valid SGML -- it just tries to map out the entire web. I believe I
>found roughly 50 or 60 different sites (this was maybe 2 months ago --
>I'm sorry, I didn't save the output). It took the robot about half a
>day (a saturday morning) to complete.

If you do run your robot again I would be very interested if you could
generate a simple list of document titles and their corresponding
document id's (or URL's). We have a powerful spires database here,
interfaced to the web, which we could easily import such a file into to
great a VERONICA like index of the web. I think that would be pretty
useful (unless someone is already doing it??).

One other problem to add to you list.....many documents are probably
only accessible by giving a "keyword" . Unless you can write a robot
which can successfully guess all possible keywords, you cannot
gaurantee to be able to traverse the whole web.

Tony