Re: More on Indexing and Moving one higher than HTML etc

Paul Everitt (paul@cminds.com)
Thu, 4 Aug 1994 11:33:24 -0400 (EDT)


[sorry for the mangled justification]

Paul Wain wrote:
> I talked about wanting to keep certain information in a file that may
or may > not be transparent (author, owner, keywords etc) and the more I
think about it > the more that I can see that people **wont** add this
information to the files. > After all why should they? It wont show up at
the page view level so people >tend look at the wider implications. How
can we find a way, without inventing a > submissions system, to enforce
people to use this information.

If you can't get people to add the ALT tag for owner, then that is quite
a pickle! What are you going to try, a UID<->Name mapping?

> Im fairly sure now that we will need to come up with our own indexing
system. > Again this is due to the number of documents we are looking it.
It would need > to be able to run on the files themselves rather than the
HTTP output, it would > need to automatically update the files (so the
users dont need to run it when > they add a file in), and as such it must
be able to understand how to arrive at > the URL for the file. Is this do
able? I cant see a way unless I can get around > the problem in the
previous paragraph.

I am doing something like that in python. I am towing the line on the
Aliweb architecture (more specifically, the IAFA templates), but am using
my own internal code. There will still be a site.idx file for Aliweb to
look at. More on what I'm doing below.

> (I was going to include my bit on ALIWEB here but I cant access its home
page > right now - timed out - but I think that the above should answer
questions as > to why we think we cant use it here...)

As of today, it should be fixed. Martijn is implementing mirrors and
local search capability.

> Also there were very few ideas on how to track author and ownership.
Does this > mean that no one has looked at this issue?

I have *really* been looking at it. Here's a snapshot of what I'm doing.

I have a library of classes that implement HTML tags. These tag
instances have attributes like owner, which (right now) I am only
comparing against the REMOTE_HOST. Then, I have a python CGI script that
reads in the site.idx file, creating a dictionary of IAFA template types
(SITEINFO, ORGANIZATION, DOCUMENT, USER, etc.) Each dictionary value is
a list of all the templates in the site.idx file that has that
Template-Type:.

So, I can now send methods to this dictionary object -- search, edit,
etc. All HTML is constructed from entity class instances -- the title,
owner, etc objects go into a head object, the paragraphs, anchors, lists
go into a body object. Head and body go into a document object, and I
then just say document.render(), and out comes a bunch of HTML.

The WWW client that sends it has to match the .owner attribute on
that object. This allows remote editing of the templates. Finally, I
have a function for printing a site.idx file (conformant to what Aliweb
is looking for) from the dictionary object in memory.

I'd like to add that the system (without implementing the choices for
enhancement) is *very* fast.

Sorry, I know this is a lot, but hey, you were lamenting the dearth of
feedback!

Paul Everitt V 703.785.7384 Email Paul.Everitt@cminds.com
Connecting Minds, Inc. F 703.785.7385 WWW http://www.cminds.com/