Structure vs. appearance in HTML

Stavros Macrakis (macrakis@osf.org)
Thu, 21 Sep 1995 12:32:44 -0400


Brandon Plewe <PLEWE@plewe.cit.buffalo.edu> says in
<950721041622.57@plewe.cit.buffalo.edu>:

...people don't care about rational structure as much as they do
about immediate results....

Most users have no idea what structure there is. What they know is
what they can see and do.

I have spent years trying to explain to novices the value of
structure-encoding as opposed to presentation-encoding, and the
truth of the matter is: other than a few very advanced
applications, most people out there just don't care.

I agree that structure is important, but only if you can do something
with it. So far, I am not aware of any Web tools that actually DO
anything useful/interesting/amusing with HTML structure. (Well, OK,
some not-very-well-known browsers do do holophrasting.)

For that matter, HTML doesn't really _have_ that much usable
structure. Here are some examples:

-- Only in 3.0 do we get hierarchical structure via DIV.

-- The math operators are defined in terms of rendering ("...close in
spirit to the representation used in LaTeX and TeX, and is being
designed with regard to the ability to render HTML Math to speech
as well as to graphical and textual displays") and not mathematical
semantics. This will make it clumsy to cut and paste formulae into
your favorite math software. For example, the differential "dx" is
apparently indistinguishable from the product of variable d and
variable x. (I say apparently because the spec is incomplete.) On
the other hand, it is true that there are things you might want to
display which don't make sense to math software (e.g. ellipses in
certain cases).

-- In fact, the math spec is very highly appearance-oriented: "HTML
math doesn't provide direct support for multi-line equations, as
this can be effectively handled by combining math with the TABLE
element." So how is my renderer supposed to resize formulae as a
function of screen width if they're split into multiple table
entries?!

-- The ADDRESS element, which might seem to have useful semantics, has
no internal structure. Wouldn't it be nice if a tool could extract
an name and e-mail address and phone number from the address?

In the final analysis, it is not clear how much useful structure you
can provide in a simple, general-purpose DTD like HTML. Something
like the TEI DTD has lots of useful structure, but it is a very big
DTD, and still doesn't cover a lot of important areas.

If the HTML standard wants to win out in the end, there is only one
answer: we have to ***show*** everybody the virtues of TrueHTML, not
just explain them. There *must* be a top-notch browser that is truly
committed to the HTML standard....

That's not enough. There have got to be tools that actually exploit
whatever structure there is in HTML. _That_ is the virtue of
"TrueHTML".

On the other hand, there are many providers who DO NOT want to provide
structural information. Consider in particular a tool that could
strip out ads automatically....

-s