Re: More on Indexing and Moving one higher than HTML etc

Marc VanHeyningen (mvanheyn@cs.indiana.edu)
Thu, 04 Aug 1994 12:09:12 -0500


> I talked about wanting to keep certain information in a file that may or may
> not be transparent (author, owner, keywords etc) and the more I think about
it
> the more that I can see that people **wont** add this information to the
files.
> After all why should they? It wont show up at the page view level so people
> tend look at the wider implications. How can we find a way, without
inventing a
> submissions system, to enforce people to use this information.

Coercive authoring tools? Guns? A smart indexer that divines keywords
by taking short items within <DFN> tags? (though nobody uses those.)

> Im fairly sure now that we will need to come up with our own indexing
system.
> Again this is due to the number of documents we are looking it. It would
need
> to be able to run on the files themselves rather than the HTTP output, it
would
> need to automatically update the files (so the users dont need to run it
when
> they add a file in), and as such it must be able to understand how to arrive
at
> the URL for the file. Is this do able? I cant see a way unless I can get
around
> the problem in the previous paragraph.

It's very much doable; I gave you pointers to two different programs that do
exactly that in my previous message. They probably aren't exactly what you
want, but do what you describe.

> (I was going to include my bit on ALIWEB here but I cant access its home
page
> right now - timed out - but I think that the above should answer questions
as
> to why we think we cant use it here...)

Haven't had any trouble accessing it from here. Anyway, no, the concerns you
describe above do not sound in any way incompatible with something that is
ALIWEB-based. The ALIWEB paradigm is, in essence:

1. A site prepares a collection of some IAFA templates describing what it has
2. A central indexer polls for that collection and builds an index out of it

What you are asking above is for an effective way to do step 1, unless you
consider IAFA templates unacceptable as a starting point. Again, you're
not being sufficiently specific about what exactly it is about ALIWEB that
you think makes it entirely inappropriate for your needs.

> I must appologise for trying to push this discussion along but we are
currently
> stuck for a lot of answers and the structure we are going to end up with if
we
> cant resolve some of these issues is going to be horrendous. (The biology
> skeleton pages went up today on our test server... they didnt really contain
> much information but came in at around 50 pages so Im told. If we have 20 or
so
> departments doing this I cant see that we can easily control the structure
of
> things AFTER they have happened so we need answers now.)

It is indeed hard to do. The main constraint is whether the people who
prepare and maintain the information have an interest in making it readily
findable; if so, they'll likely put in the effort to make it findable if given
an
easy way to do so. If they don't really care, well, you need either an
automated
indexer (possible to do, hard to do well, likely to push some of the work of
figuring out how to use it onto the user population) or to hire a LIS student
to
prepare the index. :-)