Re: The future of meta-indices/libraries?

Martijn Koster (m.koster@nexor.co.uk)
Tue, 15 Mar 1994 21:34:06 --100


> I think the WWW community should have addressed this long ago. This
> is the main area in which we are well behind the gopher community.

I think this is one of the examples of the lack of a Working Group.
It is really easy to discuss problems and come up with solutions,
but even if solutions are proven to work there is no mechanism
for standardising it. As a result all the same problems keep arising,
and people keep coming up with the same solutions.

In this case the problem has been addressed by ALIWEB. Have a look at
http://web.nexor.co.uk/aliweb/doc/aliweb.html

> In my opinion, one of the most important design criteria should be to
> eliminate the need for indexers (of whom there will likely be many) to
> walk the entire server tree. This can be annoying and it the worst
> cases disruptive.

I couldn't agree more. This is why I don't welcome the Robot trend,
and hope to help keep an eye of them by gathering information on the
Robot page (http://web.nexor.co.uk/mak/doc/robots/robots.html)

> A second important criterion would be giving the maintainer control
> over what is indexed.

> I would argue for a very simple document ....

ALIWEB does that.

> As a server writer I would implement this by having my server create
> this document on the fly when it is first requested and then cache
> it for later use until it expires. Subsequent requests would get
> the cached version until its expiration after which a new version
> would be created and cached. The maintainer would set the expiration
> period and could mark any part (or all) of his tree as not to be
> indexed. The cached file would be extremely useful for features local
> to the server also. For example, a search of all titles on the server
> or WAIS searches which return a menu of *titles* of hits (this is done
> now by WWWWais, for example, but it must search each document corresponding
> to a hit to extract its title)

I am not sure what you mean here. I'm not sure it is going to be sensible
to index all titles on a server and search those, even though it sounds
attractive. You do need to retain the context of the titles.

You mention marking part of a tree not to be indexed. Although it is
not quite what you mean, you may find it interesting to learn about a
proposal on the Robots page to introduce a voluntary mechanisms to
exclude part of trees by robots. I agree robots are the wrong solution
to the resource discovery problem, but they are going to be around, and
it makes sense to reduce problems they cause.

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html