Re: Searchable Web info (was Finding CGI spec...)

CyberWeb (web@sowebo.CHARM.NET)
Mon, 9 Jan 1995 18:32:21 +0100

Nick Arnett wrote:
> At 7:09 PM 1/1/95, Carlos Miguel Paraz wrote:
> >Hello,
> >
> >hoohoo at NCSA doesn't seem to give out the CGI documents. ("not found")
> >Does anyone know another place to get them, preferably from a FTP site?
> >Thanks to all.
> You tried <URL:>? Just worked
> for me.

Me2. But anyway, I've just placed some CGI docs into
> There's also some info at <URL:>.

Thanks for the mention! May I remind everyone, or inform the
newcomers, that there's a whole lot of web developer info at
<URL:http://WWW.Charm.Net/~web/> (soon to become
<URL:>, but still on good ole Charm.Net :*)
> In the "teach a man to fish" category, I'll add the somewhat self-serving
> (excuse the pun) information that we're indexed a bunch of Web-related
> documents, including these, at <URL:>.
> It's still a bit rough -- we're building the list of sites that we'll keep
> indexed; we're also building a Web knowledgebase to go with it. My HTML
> archives of this list and others are included (but I'll warn you that the
> index and the documents are temporarily out of sync -- I just cleaned a lot
> of old data out of the mail archives).
> So, please whack on it when you've looking for information about the Web.
> Feedback is welcome, as always.

OK: it's nice and fast! I searched for "cyberweb" (I can be
self-serving too.. :*) and got back some docs that are only signposts
I put up after a re-org some months ago. I just re-read the exclusion
standards; robots.txt needs to go at the root of the Charm.Net URL?

I did a search on "cgi" and got back a doc with a name I didn't
recognise. Now although I have several hundreds of HTML files,
like my children, I know most of them by name :*) I think you got
the href from a file that has a Base tag pointing to another server.

Could indexers let indexed sites know they've been visited?
Then we could go in to check for such problems. People will
probably think it's a bad link at my site :*(

How often will you re-index? Another issue is updates and changes.
If there are major changes it would be nice to be able to ask the
indexers to re-index, else there will not be full info, or links may
appear broken.

I'd find it more helpful to have some text with an entry, e.g. an
excerpt, a la Lycos. The score helps some if one has supplied more
than one keyword. But this is part of the general indexing problem;
we need something (Meta?) to hold a document's abstract/keywords.

Why no Reset button on the search form?

Finally, I used Lynx for (some of) this. It would help if the Submit
button could be placed after the keyword entry field, not before. And
it's a pain to get down to a returned doc because there are 3 cols,
each is a link, the first looks to be of marginal use - especially when
it keeps leading to "Expired session handle" (how many seconds do we
get ?), the second 2 are the same links.
> We'll make a real announcement of it when we have a robust set of indexes
> and the knowledgebase...

CyberWeb has a What's New/Announcements page - if you send me a
short text I'll enter it there (email to
> Oh, I just realized that there's a crude knowledgebase there already. One
> thing that's in it is a bunch of synonyms and such for "World-Wide Web,"
> which is called "www-nicknames". For example, if you wanted to search for
> any reference to the Web (such as W3, WWW, etc.) and publishing, you might
> enter a query like this:
> www-nicknames <and> publish
> A few other Topics in the knowledgebase:
> www-servers (knows various names about httpds)
> O'Reilly-names (will get references to ORA and its products)
> www-toolmakers
Nice! I tried "www <and> systems <and> engineering" (self-serving
yet again) and was mildly surprised that documents with those words
in the title, and the body, didn't score higher than docs with those
words only in the body.

It looks like this is heading towards being a really useful tool!


