RE: CD-ROM - WWW pages

Jonathon Tidswell (t-jont@microsoft.com)
Sun, 26 Mar 1995 01:58:25 +0500


| From: Jose Pina Miranda <pinj@di.uminho.pt>

| In the WWW National Conference (URL http://www.di.uminho.pt/cnw3.html )
| we want to offer a CD-ROM to all the participants.
| The CD-ROM will have, among other useful informations, some WWW pages from
| all around the World to show people, without Internet connection, how
the Web
| looks like.
Sounds liek a wonderful idea.

| 1- I read 2 or 3 months ago about a tool that let you get an hierarchy of
| URL's. Someone knows where I can get it ?
You want a web spider or web crawler, follow the links from www.w3.org on their
software products page. [ this is from an obscure bit of grey matter :-]

| 2- Somebody tried before to make a CD-ROM with WWW pages ?
| We have a problem with the pages that have complete URL path. Let me
| explain it better :
| Suppose you're looking to a page, stored in the CD-ROM.
| You follow a link whose URL is http://xxx.org/zzz/aaa.html . You know
| that the page with the URL choosen is stored in the CD-ROM. But how
| does Mosaic (or Netscape, ...) knows that ? It doesn't !! It only
| tries to follow the link ...
|
| Any hints about possible solutions ? (Note that we want to access
| the pages in the CD-ROM from different environments - Windows, Linux,
| Unix, Windows NT and MacIntosh). We have thought in two possible
| solutions:
| a) change Mosaic code, in such a way that Mosaic knows where
| each URL is in the CD-ROM (it could be done with a MD5
| hash algorithm)
| b) pre-processing of all WWW pages that we put in the CD-ROM,
| in such a way that all the links have a "file" URL
| (URL file:... )
This is probably the best, because it will work with almost any viewer
on almost any platform.

| Other possible solutions ?
To include a caching web server on the cdrom, with the various files in
the correct
place to spoof the caching algorithm.
This has the distinct advantage that everybody now has a http server so
they can
continue to play with the web in their own private play pen :-)
Perhaps with this in mind you should make sure the cached pages include good
introductory HTML and http pages as well as specifications.
ie basically deliver people enough to set up their own web.
The serious problems with this approach are finding caching http's for
all platforms,
and fitting all of them and everything else you want on the cdrom.

ravings from the mind of
Jon Tidswell

Disclaimer:
I am a postgraduate student on a scholarship not an employee of Microsoft ...