New service: The Unified CS TR Index

Marc VanHeyningen (mvanheyn@cs.indiana.edu)
Thu, 20 May 1993 13:21:53 -0500


Announcing the availability of an experimental new service within the
World Wide Web (WWW), the Unified Computer Science Technical Report
Index.

WHAT IT IS

It's pretty simple, really. A daemon runs and pulls index files from
many various FTP sites which archive tech reports (and similar
material.) At present, 39 FTP sites are included in the index, with
over 1,400 reports included; both of these numbers are growing rather
rapidly. This information is then converted into entries for each
tech report with hypertext anchors to the TR itself, producing a
really big file. This file is then searchable for keywords by a
Simple Index Keyword Search (SIKS). I believe it represents a
potentially nicer general interface to this informational resources
than existing methods (e.g. WAIS pointers to ftp sites). It certainly
is not the ultimate information browsing tool, but I hope it may push
the migration towards such a little.

Note that this index only maintains pointers to papers that are
available online by a simple mouse click within XMosaic; following a
link will not entail walking to your local library or sending somebody
a check. I do not know of any other indexing system for CS papers
which is this large and which easily allows direct network access to
the documents themselves.

HOW TO USE IT

The URL is:

http://cs.indiana.edu/cstr/search

LIMITATIONS

This is still highly experimental, but I wanted to mention its
existence to the world so people can start to play around with it.

I'm sure there are some sites that archive TRs and the like that
aren't included in the TR listings I got my hands on. I'm not even
done looking through the listings I have yet, so please don't bombard
me with random names of archive sites not yet indexed just now unless
you are the maintainer of one.

There are a lot of different archive sites, and thus there are a lot
of different file formats for the indexes. Some sites don't have an
index at all. Some sites have file structures that are not easy to
grok. The daemon I have written understands several different types
of indexing, but does so in a rather crude way; thus, the results are
typically functional but may not always look pretty. If you don't
like this, then you'll have to go out and persuade every site that
archives TRs to agree on a standardized index file format. Good luck;
should take about 10 years. :-) If you would just like one specific
site (say, yours) to look a little nicer, write some code in perl to
do so and send it to me and I'll see what I can do.

Obviously, if an FTP site happens to be down when the index is made,
its stuff won't be in there. Other errors (e.g. typos in index files)
can cause problems, but I can't really fix them. Since the index file
is so large, obviously I don't check all (or even very many) of the
entries in it.

There are some sites with rather non-helpful filenames. For example,
the Xmosaic browser will automatically pop up a PostScript previewer
if the filename ends with .ps (or .ps.Z or the like); however, some
sites have PostScript files without that name. Not much I can do
about that.

Anyway, give it a try. Feel free to send constructive criticisms,
praise, or lavish gifts.

- Marc

--
Marc VanHeyningen   mvanheyn@cs.indiana.edu   MIME & RIPEM accepted

I'm married, I program computers, and I'm a grad student. If that doesn't give me permission to look like a slob, I don't know what does.