A short paper on creating Web Space

Paul (Paul.Wain@brunel.ac.uk)
Wed, 15 Jun 1994 13:14:25 +0100 (BST)


Hi,

I enclose the 1st draft (complete with all errors) of a short paper that
I knocked up last night in/this morning, based upon the trials and
tribulations of creating a uniform web space in a multi-department
environment. It covers the possible areas of resistance that can be
encountered, and looks for a path through them.

Obviously it needs more work done on it, but at this point Im looking
for suggestions as to whether people think that the points that are made
are indeed valid. What solutions people can recommend etc. (Also typo
and gramatical corrections appreciated *sigh*)

The URL of the paper is:

http://http2.brunel.ac.uk:8080/paul/papers/intro_html_3.html

But I have appeneded a text version. The HTML version has links to the
good and bad HTML examples.

Comments please :)

Paul

p.s. watch out for a paper coming soon commemorating the 1st aniversary
of HTTPD served web space at Brunel.

.-------------------------------------------------------------------------.
|_______Paul_S._Wain,_(X.500_Project_Engineer_and_WWW/HTTP_chappie),______|
| Computer Centre, Brunel University, Uxbridge, Middx., UB8 3PH, ENGLAND. |
|___VOICE:_+44_895_274000_extn_2391_______EMAIL: Paul.Wain@brunel.ac.uk __|
| http://http2.brunel.ac.uk:8080/paul/ |
`-------------------------------------------------------------------------'

ABSTRACT

With the formalization of the HTML standards at the May 1994 WWW
conference, a number of changes to the way that documents are marked
up using HTML came into effect. Some of these are minor, some major
and some just a reiteration of good practices that people are not
using.

This paper outlines what is wrong with the way HTML is being viewed at
Brunel at the moment and what could be done to perhaps solve some of
this, outlining some of the ongoing situations that need to be
resolved, and possible suggestions for doing so.

INTRODUCING HTML 2.0 AND HTML 3.0 COMPLIANT MARKUP INTO BRUNEL UNIVERSITY.

1 Introduction.

The computer centre at Brunel University has recently undergone the
HTML revolution and begun moving some of its user documentation on
line. In addition to this a large number of users have created their
own home pages. As a result there are a large number of HTML pages
within the brunel.ac.uk domain.

The Web structure at Brunel is probably fairly unique. We currently
have 4 httpd servers all serving different tasks. The structure of
these looks something like:

The User
|
http3 <==> 150Mb Cache
|
'-------+----+---+-------`
| | | |
http1 http1 http2 world
port: 8080 4040 8080


Basically the local servers break down as follows:

http1.brunel.ac.uk:8080
The main Brunel service. University home page. Guide to Brunel.

http1.brunel.ac.uk:4040
The StudentSoft service. Holds information on the Studentsoft
project.

http2.brunel.ac.uk:8080
Solaris manual pages and user home pages. Also holds some
newsgroup home pages.

In addition to this (as the above textual representation indicates) we
also have a cache service running on http3 with a small cache of
approximately 150Mb. This is sufficient at the moment with most users
currently being away on industrial placements or on holiday.

Currently we recommend two Web browsers at Brunel, NCSA Mosaic for
X/Openwindows, and Lynx for text based systems. Both are only
available on SunOS 4.1.3, (Although a Solaris 2.3 version of Mosaic is
under test) and only Mosiac is actively supported by the User Support
team.

From this then it can be seen that Brunel is probably of average size
for a World Wide Web site, so many problems introducing new tools and
requirements for HTML found at Brunel may apply elsewhere.

So what are the problems associated with the introduction of the new
standards?

2 But It Works!

A common misconception among users writing HTML is the "But it works
with the current version of Mosaic" attitude. This leads to some
interesting bad practices, which, when combined with the new
definition of items such as paragraphs and lists, may cause absolute
chaos.

For example, the following sort of markup is quite common:

<HTML>
<TITLE>A document</title>
<H1>A test document</H1>
<P>
This is a paragraph with a list:
<MENU>
<IMG SRC="dot.gif" ALT="*">Item one<BR>
<IMG SRC="dot.gif" ALT="*">Item two<BR>
</MENU>
</P>

The basic structure of this document isn't two hard to derive.
Basically, what it should read is:

<HTML>
<HEAD>
<TITLE>A document</TITLE>
</HEAD>
<BODY>
<H1>A test document</H1>
<P>
This is a paragraph with a list
</P>
<DL>
<DT><IMG SRC="dot.gif" ALT="*">Item One</DT>
<DT><IMG SRC="dot.gif" ALT="*">Item two</DT>
</DL>
</BODY>
</HTML>

There is a big difference between the two. One is correct and one
isn't (ignoring the fact that the <HEAD> and <BODY> tags can be
implied in the first example). Both will render in Mosaic 2.4. But
only one passes a DTD compliance test.

With the moves to make documents compliant in the future this will
cause great problems. So much so that I am starting to tell people
that if they write their documents in the way of the first example
then I can only see their documents working correctly after about the
next 3 months.

The problem is however that because of the fact that bad practices
currently work with existing browsers, people are unwilling to make
the change. But things get worse.

3 Using Editors - Correct vs. Easiest to use

This can probably be described as the root of the problem. With the
current HTML DTDs only just being laid down in stone, many HTML
editors are still a step or two behind the time. This creates a
situation whereby documents are being produced by editors that claim
to be compliant with current standards (again this means Mosaic!) but
which are not correct.

For example these pages were produced using HoTMetaL from Soft Quad.
As far as I can tell it uses a very rigid interpretation of the HTML
3.0 DTD, in that it will not let certain tags be nested although
looking at the DTD they can be. However looking at other editors, and
taking the output they produce (supposedly compliant) it will not read
into HoTMetaL. The problem with explaining this to users is that
other editors are easier to use!

So which is the right path to take? Obviously the compliant path since
this guarantees that a document will display in the future. But if the
user tools are not in place to help people do this, we are stuck at
base one. And the problem gets worse when we consider that many people
out there are editor illiterate and so need things as simple as they
can get.

(Aside: Today for the 1st time I am using HoTMetaL with "show tags"
off. Its taken me a week to learn how to use it correctly, and
understand its warnings. So what chance the normal user? On the other
hand, there are editors out there which our User Support people tell
me can be learnt in a few minutes but which produce dubious output.
What would a typical user chose?)

4 House/Corporate Styles.

Another situation we are currently trying to resolve at Brunel at the
moment is that of the introduction of a default style for markup in
departmental pages. That is, trying to define a common layout of
information for entry pages for departmental information.

This creates a situation whereby we need to be able to enforce both:
* Layout and content.
* HTML style (i.e. version 2.0 or 3.0 compliant)

While this can be considered a side issue of the two previous cases it
does draw the two together nicely. It is envisaged that templates will
be provided for users to use to create their own pages. But the
problem still exists as to ensuring that the document is still DTD
compliant and what to do if it isn't!

There is still a market out there for correction tools.

(Another note: Again HoTMetaL will tell you what is wrong, but it wont
correct it. I don't know about other editors since I never go that
far!)

5 Conclusion

In writing this paper I deliberately chose not to offer solutions to
the problems being encountered, and not to discus options that are
being considered until this point. However we do have some ideas under
review and these are basically as follows:

1. Decide upon your page style but remember what correct HTML can and
can't do.

2. Produce templates for your users to use if they want. These should
include as many examples as you can provide. Describle them within
the document if you can!

3. Consider the imposition of a default HTML authoring tool. If
people want to use such a tool ensure that everyone is using the
same one. Remember that it should be possible to use this tool on
more than one platform (e.g. MS Windows, UNIX, MAC). Remeber that
not everyone can run eXceed for MS Windows.

4. Enforce your decisions. If someone produces HTML that is broken,
suggest that they should be using the default editor. If you have
the default editor set up correctly, they wont be able to write
bad HTML. However you will also need to include provision for
converting bad HTML to good HTML. Always tell people to read HTML
primers before starting.

5. Where possible don't allow bad HTML to be served by your HTTP
daemons. This of may not always be possible (as in the case of
Brunel).

6. Pray to your relevant diety.


6 Footnote

Finally I would just like to add a small plea to the world in general.
I feel that the following are really needed at this point in time:

1. True HTML 2.0 and HTML 3.0 browsers. (I know these are in the
pipeline) I would especially like to see a version of Mosaic that
complains if the HTML is wrong since this will prevent the biggest
resistance to the future of the Web.

2. More HTML editors supporting WYSIWYG. I know these are starting to
appear but they need to start producing better HTML from the DTD
point of view. (Again, I know this is a new area, but it needs to
stated.)

3. HTML fixers. Tools to take bad HTML and make it good. After all if
Mosaic can display it it should be able to write it back out as it
should be.