CERN Common World-Wide Web Library Version 2.16pre2 Available

Henrik Frystyk Nielsen (frystyk@ptsun00.cern.ch)
Fri, 24 Jun 94 01:06:51 +0200


* * * * *

The CERN Common WWW Library is a general code base that can be used
to build clients and servers. It contains code for accessing HTTP, FTP,
Gopher, News, WAIS, Telnet servers, and the local file system.
Furthermore it provides modules for parsing, managing and presenting
hypertext objects to the user and a wide spectra of generic programming
utilities.

* * * * *

This release contains some bugfixes and improvements from the last
release. However, it is still capable of compiling directly with the
CERN HTTP Server and Proxy Server without modifications if the patch
from Rainer Klute has been put in to version 3.0pre6 (this is
basically a change of a function name).

This release should remove the problems with POST, remote host names,
and direct WAIS access - also for the Proxy server.

CERN Common Code Library 2.16pre2 is available, source code:

ftp://info.cern.ch/pub/www/src/WWWLibrary_2.16.tar.Z

Its is known to compile on Sun4, Solaris, HP, NeXT, NeXT-386, and
Decstation Ultrix.

Diffs and old versions are available at

ftp://info.cern.ch/pub/www/src/old
ftp://info.cern.ch/pub/www/src/diffs

Documentation is available at

http://info.cern.ch/hypertext/WWW/Library/Status.html and
http://info.cern.ch/hypertext/WWW/Library/User/Guide.html

Programmer's Guide is available at

http://info.cern.ch/hypertext/WWW/Library/Implementation/Overview.html

The current address to send email about CERN Library is:

libwww@info.cern.ch

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

CERN Library 2.16 Prerelease Notes

Version 2.16 prerelease 2

WAIS Client

The WAIS client has been improved and some bugs have been fixed:

* Bug in the parser of the search result from the WAIS module fixed
* Maximum number of lines presented from a search made a configuration
variable. Default value is 100 (was 40)
* Introduced WAIS's own error messages as they are returned from the WAIS
library
* The presentation of WAIS on the screen made nicer (well - I think it is!)

HTTCP Module

* Bug in the host cache fixed

Access Authorisation

* Premature free of memory fixed
* Missing initialization fixed

Version 2.16 prerelease 1

New Features and Changed Interfaces

HTTP Client

HTTP module contains the code for the HTTP client. The module is now
reorganized and made more modular.

Automatic Redirection
Now supported by the HTTP Module. The name of the new URL is parsed
to the client via the error_stack as a ERR_INFO message, see
HTError module. The maximum number of redirections is set by the
variable HTMaxRedirections.

Referer Field in HTTP request
Clients are provided the possibility of sending a Referer Field in
a HTTP Request. This is done by filling out the HTRequest->parentAnchor
field.

>From field in HTTP Request
Clients can now send the full email address of the current user in
the HTTP From field. The feature is turned off by default as it
might get a bit tricky through a Proxy.

204 Response
Support of return code `204 No Response'

FTP Client

HTFTP module contains the code for the FTP client. The FTP client has
changed a lot in this release. It is now a complete state machine
where the actual action executed is a function of the current state.

The client now follows the suggestions given in rfc 1123: "Requirements
for Internet Hosts -- Application and Support".

Establishment of the data connection now comply to rfc 1579:
"Firewall-friendly FTP" such that the procedure is

1. try PASV
2. if that fails, try PORT

The URL is now parsed according to the (latest) specifications:

url : f t p: / / login / path [ ftptype ]
login : [ user [ : password ] @ ] hostport
hostport: host [ : port ]
ftptype : A formcode | E formcode | I | L digits
formcode: N | T | C

Both directory listings and file retrieval use the same procedure:

1. First try to go to the location directly, as we are often
talking to a UNIX server or one that 'understands' UNIX syntax
2. If it fails, then go to the location step by step using CWD. In
that way we should not have any problems on any platform, and
thus it is not necessary to make special hacks for VMS, etc.

Long directory listings are supported for unix-like systems and VMS.
This includes NetWare and WindowsNT. See Future plans for more and
Directory Listings

Information from the FTP-server is pr default presented to the client
using the following rule:

1. If you are connecting to the root directory at a ftp site, we
show the 'login' message (might be a concatenation of several
messages) just like in a normal ftp session.
2. If you have a more specific URL, then you probably already know
the site and are less interested in the login message. Instead
we show any local message when making a CWD to the right location.

Gopher Client

The Gopher has been revised and improved error handling has been implemented.

Information Messages
Some Gopher servers send back information messages in a line containing
"error.host". This information is treated like login information
from FTP servers so that it is represented as a message before or after
the actual listing.

Iconized Listings
Listings now contain icons in the same way as the other listings.

CSO Name Server
The CSO Name Server client outputs in HTML and not only <PRE> as before.

Content Type Recognition
The Gopher module uses it's own content-type recognition inherited
from HTTP when handling gopher text and gopher binary files. This
means, that e.g. PostScript files get handled correctly.

Local File Access

The new version of HTFile module is a lot smaller as all Directory
listing stuff has moved to HTDirBrw module. New error handling has
been implemented.

Passive and Active Connection Establishment

Calls to connect() and accept() now go through the functions
HTDoConnect() and HTDoAccept() respectively.

Cache of Host Names and Addresses

HTInetParse() that is called from within HTDoConnect now has an
internal cache of the names and (possible multiple) IP-addresses of
visited hosts. This minimizes the access to the file /etc/hosts and
the Domain Name Server, even though aliases are not recognized in the cache.

The default cache size is 500 entries and a host stays as long as a
connect() succeeds. That is, if connection is refused for some
reason, the host is taken out of the cache.

The time to make a connection to a multihomed host is measured every
time and a mean access time is calculated so that HTDoConnect always
takes the fastest IP-address, see Future plans.

Improved Functionality of DNS requests

The Library now provides functionality for obtaining the full mail
address of the user, full domain name of the host and also the
possibility for setting both values. This means that the user can use
his official email address, e.g. in the HTTP request.

Long Directory Listings

Long directory listings for HTTP, FTP and files on the local file
system supported. For the moment only a part of the functionality,
e.g, sorting, which columns to show etc. is exploited, see Future Plans.

Icon Management

Icons in directory listings are bound to MIME content-types and
encoding. They can be found in the HTIcons module. The default set of
icons is set up using HTStdIconInit() and new icons can be added
dynamicly using HTAddIcon().

File Descriptions in Directory Listings

File descriptions are supported for long HTTP directory listings. The
default thing is to peek the title of the HTML files.

Error and Information Message Management

A new error handling module is introduced in HTError. It uses the
error_stack entry in the HTRequest structure.
It handles nested error messages so that we can give a reason for the
error, e.g.

Error in ...
This error occurred because ...
This is caused by ...
etc.

It also makes it possible for the Library to pass information back to
the client so that the the Library doesn't act like a `black hole'.
An example is HTTP redirection with status code `Moved 301'. Now the
new URL is parsed back to the client via the error_stack so that the
client can update the reference when possible.

The function that generates and outputs the error messages to the user
is put into HTErrorMsg Module so that it can be overwritten by a smart
client or server.

Guessing the Content Type of a Stream

The HTGuess module reads a part a stream and determines the content
type with the highest probability from a statistical analysis.

Minor Stuff

tmpnam()
Because of problems on NeXT platforms the tmpnam() function is now
replaced by HTFWriter_filename() in HTFWriter.c. The function has
two modes: Give back a hash name or the last part of the URL (which
normally is more readable).

HTMLPutImg()
New function to make it easier to put out an HTML <IMG> tag.

HTParseInet()
Added one more parameter to tell whether it is a multihomed host or
not. (This is used in the host cache).

HTInetStatus()
Should no more be used directly but is called from HTErrorAdd so
that the message goes all the way back to the user

HTError
This typedef is now obsolete and will be removed in future releases

HTLoad()
Added new parameter to HTLoad: BOOL keep_error_stack. If YES then
the error_stack is not cleared. This is used in redirection etc.

HTLoadError()
Because of the new HTError module, this function in HTML.c is not
needed anymore.

Bug Fixes

This is a list of fixed bugs from earlier versions.

* Memory faults in HTSimplify() in HTParse.c has been fixed
* README files in directory listings now know how to handle '<', '>'
and '&' correctly. Though the file still has to be ASCII. See future
plans for handling this file.
* tmpnam is no more used in the Library because of problems on NeXT
platform. Instead a new function called HTFWriter_filename() in
HTFWriter.c is written.
* HTInputSocket_getCharacter now returns a int and not a char so that
EOF is no longer a member of the char set.
* HTMLGen_start_element() is only allowed to put extra '\n' in <PRE>
mode if it is between parameters in a tag
* Changed type of <IMG> into SGML_EMPTY so that it doesn't expect end
tag <\IMG>
* Nested <PRE> is no more a problem in HTMLGen_start_element.
* Removed all #elif as not all compilers on HPUX likes it.
* Changed HTChunk such that chunk->data is '\0' terminated at any
time. This actually makes HTChunkTerminate less needed but be aware
that HTChunk->size changes.
* Removed non-portable d_namlen field in HTMulti.
* Moved definition of NO_GROUPS to tch.html
* Moved definition of HT_MAX_PATH to tch.html
* Proxy server now closes connection in HTTP.c. This was only problem
in non-forking servers (VMS).
* Definition of HT_NO_DATA moved to HTUtils.html where the other
return codes are placed.
* Functions from HTAlert Module that prompt the user don't get
confused about ctrl-D anymore.

On the Working List

This is what we were are working on right now!

MIME-parser
A new MIME-parser that can be used as a general module. For the
moment there is a large number of individual MIME-parsers, and
there is a lot of redundant coding.

Multi-threaded HTTP Module.
The implementation is currently in its test phase but as the module
has been turned completely up side down it still needs some heavy
testing. Look here for more information on the implementation.

Multihomed hosts
If a connect fails on a multihome dhost then automaticly try
another IP-address.

Whois++
Actually a WhoIs++ module has been implemented (thanks to Michael Mealling,
ccoprmm@oit.gatech.edu) in the library but it is not in this
release as I haven't found many WhoIs++ servers and that the port
chosen is 43 just like the old WhoIs protocol, and that makes it a
bit tricky.

Future Plans

This is what we are going to implement. If somebody should get the idea of
writing some of the modules mentioned, it will be appreciated a lot ;-).
Contact www-bug@info.cern.ch for further coordination.

README File in directory listings
Make it possible to have both Ascii files (using <PRE>...</PRE>)
and HTML files.

OS/2 listings
Implement long FTP directory listings for OS/2 platforms.

Multipart retrieval in HTTP
This will make the transmission time for documents containing
inlined images much faster. Some implementation ideas have been
discussed but a final design is not chosen yet.

Ideas for New Features

This is what we have not started yet but what we would like to implement.

Virtual Documents
Pass virtual documents as objects instead of HTML files. Then the
client can choose the best way to represent the data and reorganize
it without consulting the server again.

Separation of Protocol Modules
The protocol modules should be separated completely from the HTML
machinery so that it is possible to, e.g., get raw FTP directory
listings through to the user.

--
 Henrik Frystyk		| Ari Luotonen		  | Mark Donszelmann
 frystyk@dxcern.cern.ch	| luotonen@dxcern.cern.ch | duns@vxdeop.cern.ch
 + 41 22 767 8265	| + 41 22 767 8583	  | + 41 22 767 3555

-------- World-Wide Web Project, CERN, CH-1211 Geneve 23, Switzerland --------