Comments on HTML+ discussion document (long)

Bert Bos (bert@let.rug.nl)
Mon, 1 Nov 1993 15:54:05 +0100 (MET)


I've taken the weekend to delve into the new HTML+ specification. Here
are my comments.

The HTML+ draft is a good example of a balance between a vision of the
future and a realistic, implementable plan. Compared to HyTime -- to
which it has become more similar with this generation -- it is clear
that the former wants to cover everything that may become possible in
the future, while HTML+ goes no further than the technology of next
year (which is already impressive enough!)

When designing this year's HTML, one should nevertheless keep an open
eye for the things that may be added in the years after. Keeping the
door open for (future) HyTime compatibility seems a healthy approach
to me. A few of the comments below refer to HyTime in this manner.

3. Headers
-------

ad "nestable sections"

Keeping explicit, context-independent header-levels will make the
browser simpler. But we can express the structure of a document
with sections as well, by assuming an (omittable) element
enclosing every header and the subsequent text:

<!ELEMENT SECTION1 O O (H1, %bodytext ?, SECTION2*)>
<!ELEMENT SECTION2 O O (H2, %bodytext ?, SECTION3*)>
etc.

This will make it illegal to skip a level, which is essential if
some browser or printer driver wants to number the headings. It
also allows a link to be made to an entire SECTION, instead of to
header only.

4. Paragraphs
---------

ad "HTML+ formally doesn't require you to wrap text up as paragraphs"

This may be true conceptually, but the to the software this is
less easy. It would be better to require all text to be wrapped up
as something. In other words, if untagged text follows a header, a
P tag is assumed:

<!ELEMENT P O O (L | %text)+>
<!ENTITY % bodytext "(%block | %lists | %paras)>
<!ELEMENT SECTION1 O O (H1, %bodytext ?, SECTION2)>
etc.

When PCDATA is encountered after a H1, H@, etc, a P tag is
automatically inserted. NB. to make this acceptable to a
validating SGML parser, some trickery with SHORTREFs is needed, in
order to skip unwanted blank space, but that can be done (I tried)
and it doesn't affect the browser.

5.2. Hypertext links
---------------

ad "HREF"

Why not take one further step and make HTML+ HyTime compliant? It
involves adding one more element to the DTD and a number of
attributes that will not show up, since they all get default
values:

<!ATTLIST A
-- Anchor attributes --
id ID #IMPLIED
rel CDATA #IMPLIED
... etc.
-- Extra for HyTime --
ref ID #REQUIRED -- link to NOTLOC element --
-- Required for HyTime --
HyTime NAME #FIXED "clink"
HyNames CDATA #FIXED "target linkends"
>
<!ELEMENT notloc - - CDATA>
<!ATTLIST notloc
id IDREF #REQUIRED -- link to A element --
notation NOTATION #FIXED "WWW"
>

(To make this complete, there should also be a declaration
<!NOTATION WWW... somewhere in the DTD.) It doesn't matter where
in the document the NOTLOC element is inserted, it could be inside
the A element, at the top or end of the document, as long as there
is a NOTLOC for every A. Using any authoring tool (e.g., the
html-mode for Emacs) generating the NOTLOC and the ID to bind
NOTLOC and A together should be automatic.

In fact HTML+ already works with this indirection partially, in
the LINK idref attribute.

ad "TYPE", "SIZE", and "METHODS"

It is already noted in the text, but it should also be stressed in
the user interface of any browser that uses these attributes:
don't trust these attributes!

5.6. Logical emphasis
---------------

ad "Q"

The browser should insert quotation marks, such as `to be' or "to
be", or whatever style of quoting is preferred.

ad "CITE"

The browser should display this as (Festinger...) or [Festinger],
or whatever style is preferred. CITE is meant for use in running
text, not in a bibliography.

ad "ACRONYM"

A browser might use small caps instead of the full-size caps.

5.7. Extending the set of logical roles
----------------------------------

ad "isn't meant to apply retroactively"

[Great idea, this RENDER element!] The best place for RENDER
elements is therefore at the top of the document. It is an empty
element, there is no </RENDER>.

The comma-separated list of styles is probably better changed to a
blank-separated list, as is customary in SGML, I believe.

<!ATTLIST RENDER
tag CDATA #REQUIRED -- Why was this #IMPLIED? --
style CDATA #IMPLIED
>

5.9. Images
------

ad "text flowing around the image"

While this may look nice for an image at the start of a a
paragraph, it isn't so nice for images anywhere else. It is also
difficult to implement. Better not require this. Instead, require
that an image *never* overlaps with text.

ad "IMAGE"

The footnote that recommends the IMAGE element over IMG should be
promoted to a normal sentence. (And why not drop the ALT
attribute of IMG altogether?)

ad "SEETHRU"

This is a nice feature, that can make displays much more
attractive, but it will always be dependent on the format of the
image. For XPM, no such attribute is needed; for 256-color images
it can be an RGB or HSV value in X format; for true-color images
it has to be a color range or approximate color.

ad "multipart/mixed"

How can the browser recognize which part of the multipart message
corresponds to a given URL? (But maybe this paragraph should be
moved to the HTTP definition anyway.)

5.11. Conditional text
----------------

The normal SGML method would be to use `marked sections:'

... text before the marked section...
<![ %online [ ... text that only appears when on-line... ]]>
... more general text...
<![ %printer [ ... text that only appears on the printer... ]]>

%online and %printer are entities, that have the values:

<!ENTITY % online "INCLUDE">
<!ENTITY % printer "IGNORE">

for the browser, and the other way round for the printer.

6.1. Longer quotations
-----------------

ad footnote 1 "quote by name"

This is certainly useful. It allows one to automatically show the
latest version of something, without having to change the document
itself (cf. Windows DDE). It works for IMAGEs, so why not for
text? But it should not be a function of the QUOTE element. Better
to define TXT and TEXT (analogous to IMG and IMAGE).

Maybe we want to quote not a complete document, but only a certain
element, identified with an ID attribute. This might yield a P or
a TABLE, or L, etc.

6.4. Notes and admonishments
-----------------------

ad "ROLE attribute"

In the absence of a SRC attribute (And I strongly recommend
writers to omit it for all but the exceptional types of notes),
the ROLE attribute should determine the rendering of the note and
the note icon. The value of the ROLE attribute should therefore
not be printed (it is not a word, but a type). The following list
of predefined ROLEs should be recognized by browsers:

note - (no icon)
warning - exclamation mark, or triangular traffic sign
error - stop traffic sign
info - circled "i"
tip - index finger pointing up

7.3. Plain lists
-----------

Plain lists are sufficiently different from bulleted lists to
warrant an element of their own. I would suggest dropping the
PLAIN attribute and only use DIR instead.

9. Tables
------

The TB element has been omitted from the description. Also, it is
used but not defined in the DTD.

11. Literal text
------------

ad "TAB"

Instead of the width of the capital M, use the "em". When the font
has no em defined, the width of the M or something similar could
be used instead.

14.1. HTMLPLUS
--------

Official versions of HTML+ should be mentioned in the SGML
declaration, but the attributes of the HTMLPLUS tag could be used
to notify the browser of extra requirements or hints, that do not
affect the DTD. FORMS=off is such a requirement: a browser must
comply. An example of a hint could be LANG=NL, telling the browser
to apply Dutch formatting conventions as much as possible. (It
becomes the default for all other LANG attributes.)

14.4. ISINDEX
-------

ad "the search field always visible"

This is mentioned in 2.2, but it might be stressed here again.
This is what makes ISINDEX different from INPUT. A good example of
the use of ISINDEX is as a sort of command line. Maybe ISINDEX
should therefore be called something different, like ISINPUT or
HASCLI.

14.7. Links
-----

ad "UseIndex"

The UseIndex attribute implies that there is an index and gives
its URL, but does it also mean that the current document is
"searchable"? Maybe the browser should show a different prompt
from the one used for ISINDEX.

15. Large documents
---------------

ad "implicit links"

The table of contents concept is an instance of a more abstract
concept, that of independent links. What is described here is
essentially a method for adding hyperlinks to documents that don't
have them. So why not make it more general. Example:

<ILINK from="http://mach.ine/doc1#id1"
to="http://mach.ine/doc2#id2" -- hyperlink between elements -->
<ILINK from="http://mach.ine/doc1"
to="http://mach.ine/doc2"
role="next" -- hyperlink between documents -->

(Or better yet, use the indirection of HyTime.)

ad "WWW-link"

This is similar in concept to the REL=subdocument idea, but it
works completely differently. It should be in a numbered section
of its own.

Appendix I
----------

HTMLPLUS could be defined as just

<!ELEMENT HTMLPLUS O O (HEAD, BODY)>

So many elements have the three attributes ID, LANG and INDEX,
that it might be clearer to put them in an entity.

ad "OL"

Why isn't a list defined as:

<!ELEMENT (OL|UL) - - (LI*)>
...
<!ELEMENT LI - O (%block|%lists|%paras)*>

i.e., a list consists of nothing but list items, but a list item may
contain more than just text.

ad "A"

The INDEX attribute is missing? Why is SIZES specified as NAMES,
when only numbers are allowed?

ad "character entities"

The list of character entities should be referred to by name:

<!ENTITY %Latin1 PUBLIC ...>
%Latin1

This allows an SGML application to substitute a different file,
e.g., one that maps entities to LaTeX macros.

Appendix III
------------

There should have been some documentation on how this code is
used.

ad "for (i = 0; pgon[i][X]..."

Typically a C programmer! First use an index and only then check
if it is valid to do so. No Pascal programmer would do it like
this. Better to write:

pgon[MAXVERTS-1][X] = -1; /* Ensure termination */
for (numverts = 0; pgon[numverts][X] != -1; numverts++) ;

Not only is it safer, it is also slightly faster.

ad "p = (double*) pgon + 1"

Please don't use this style in example code! Replace this by p = 0
and replace every use of p by pgon[p][Y], etc.

Miscellaneous comments
-----------------------

At the moment, there is only one annotation server for the whole of
WWW. Clearly, this is not a long term solution. The load should be
distributed. I can see two solutions:

1) an algorithm in the browser computes the annotation server to
contact given a URL (a hashing functions or the `nearest domain'.)

2) every document specifies its own annotation server, in a LINK
element:

<LINK role=Annotations href="http://hoohoo.ncsa.uiuc.edu:8001/">

--
                     _________________________________
                    / _   Bert Bos <bert@let.rug.nl>  |
           ()       |/ \  Alfa-informatica,           |
            \       |\_/  Rijksuniversiteit Groningen |
             \_____/|     Postbus 716                 |
                    |     9700 AS GRONINGEN           |
                    |     Nederland                   |
                    \_________________________________|