Various HTML questions

Daniel W. Connolly (connolly@hal.com)
Fri, 29 Apr 1994 11:55:42 -0500


>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Paul "S." Wain <Paul.Wain@brunel.ac.uk>
>Date: Thu, 28 Apr 1994 16:00:05 +0200
>Subject: Quick Question on <Hn> and <HR>
>
>Is the following valid HTML?
>
><HEAD>
> <TITLE>
> A title
> </TITLE>
></HEAD>
><BODY>
> <H1>
> some text
> <HR>
> </H1>
></BODY>

By my working DTD, sgmls says:

sgmls: SGML error at test.html, line 9 at ">":
H1 end-tag implied by HR start-tag; not minimizable
sgmls: SGML error at test.html, line 10 at ">":
H1 end-tag ignored: doesn't end any open element (current is BODY)

(by the way... just go grab the thing and build it... then you can
answer these questions yourself!!!)

>Im thinking not which is a shame because the specification say that
>there needs to be a complete clear line between a </H#> tag and the next
>line of whatever.

Where does it say this in the spec? I don't think any HTML spec says
_exactly_ how HTML is to be rendered. Seems reasonable to me that an HR
following an Hn might be rendered without intervening space if you
so desire -- you just need a smart enough renderer, I guess.

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Paul "S." Wain <Paul.Wain@brunel.ac.uk>
>Date: Fri, 29 Apr 1994 09:51:56 +0200
>Subject: Re: Quick Question on <Hn> and <HR>
>
[...]
>For example compare something like:
>
><BODY>
> <H1>
> This is an header
> <HR>
> </H1>
> Just a bit of text for example purposes...
></BODY>
>
>to:
>
><BODY>
> <H1>
> This is an header
> </H1>
> <HR>
> Just a bit of text for example purposes...
></BODY>
>
>Just from a presentation point of view I think that the 1st one looks a
>little cleaner. (No it isnt just a Mosaic-ism *grin* Look at it in Lynx
>too)... Infact here is a sample:
[...]
>
>Does anyone else support this move? Can anyone see any problems?

Not in the near term... my take on the DTD I'm spinning is: if it works
today, fine. If not, it goes on the shelf...

As for the future... then question is: what's the structural significance
of an <HR>? Let's say it's to separate sections, like the old <P> used
to separate paragraphs. Then what's it doing inside a header? If you want
big rules under your header, I'd be more inclined to use
<H1 STYLE="RULE-BELOW">...</H1>

Or, better yet, let's say that HR has _no_ structural significance...
it's just decoration. Then I'd rather see a processing instruction, like:

<H1>...<? hrule></H1>

which can be written:

<H1>...&hr;</H1>

if we introduce:

<!ENTITY hr "<? hrule>">

into the DTD. Same goes for <BR>... let's define it as having _no_
structural significance, and henceforth write:

line1 &br;
line2 &br;

as a short form of

line 1 <? linebreak>
line 2 <? linebreak>

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)
>Date: Fri, 29 Apr 1994 11:32:45 +0200
>Subject: Re: Quick Question on <Hn> and <HR>
>
>Paul.Wain@brunel.ac.uk writes:
>
[...]
>Correct HTML/HTML+, yes. From the point of view of the logical structure of the
>document - which the HTML claims to represent - the rule is clearly a top line,
>a part of the title; it is not a piece of body matter in this instance. So I
>would argue that the first case is correct; it is just that HTML does not
>support it yet.

This would be overloading the HR tag to have a new meaning in addition
to "big visible thing that separates sections... like \sectd in RTF."
If you want to be able to express this, we need to
(1) put more presentation info in the DTD
(2) introduce a set of processing instructions to represent
"decorations" like this
(3) develop a stylesheet mechanism

My preference is (1), as a short-term solution on the road to (3).

>
>And I remember that the spec says an Hn element includes all the space needed to
>set it off from the body text.

Could you give a specific citation? If it's really in there, I'll take it
out in the next revision. "Typical rendering" suggests are part of the spec...
rendering edicts are not.

>Now that HTML is being re-DTD'ed based on current practice - an excellent idea
>-
>it would be timely to make this change too.

What a contradiction!!! "make this change" when describing "current practice"!!
Current practice includes current browser implementations.

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Dave Raggett <dsr@hplb.hpl.hp.com>
>Date: Fri, 29 Apr 1994 12:00:20 +0200
>Subject: Re: Quick Question on <Hn> and <HR>
>
>I do agree that authors should have more control over appearence

So do I...

>and consequently have added alignment parameters to headings and
>paragraphs in HTML+. Other ideas are under consideration e.g.
>the paragraph style, the font names (as URNs of course) and text
>and background colours (and texture), and margins.

Blech... why put all this in the DTD? Are we abandoning the idea
of representing some sort of abstract structure in favor of developing
yet another way to encode a picture of a document?

Processing instructions and style sheets, dude. It's the only way to go.

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Paul "S." Wain <Paul.Wain@brunel.ac.uk>
>Date: Fri, 29 Apr 1994 12:42:23 +0200
>Subject: Re: Quick Question on <Hn> and <HR>
>
>@ I am afraid that I don't agree with you, even on the aesthetics
>@ of your example. Look at printed books and how they give chapter
>@ headings. White space is used liberally to set off the heading.
>
>Okay, just a quick counter. I have just pulled out my copy of the HTML
>(not HTML+) specification by TBL. This has a page of the form:
>
>
>Low_Level header - e.g. document name
>----------------------------------------------------------------------------

This is just a way to render headers! I doubt there are any <HR>'s
in the document.
>
>*hrm* Im wondering now how many other little "tricks" are invalid that
>most of the clients allow :)
>

Yes... it's a scary thought. That's why I'm developing a test suite.
This one will definitely go in there.

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Martijn Koster <m.koster@nexor.co.uk>
>Date: Fri, 29 Apr 1994 13:23:03 +0200
>Subject: Re: Quick Question on <Hn> and <HR>
>
>What I said was that the dislike of a browsers display of a certain
>HTML+ construct is not sufficient a reason to change the use of that
>construct. I agree, if we can give clients hints to improve rendering
>of our documents that is a good thing.

[...]

>What I want to prevent is the HR being used in another way than as a
>block separator. Something like "<H1>Hi<Hr>there</H1>" strikes me as
>abuse, and your proposed change to the DTD would allow that.

Well said.

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Paul "S." Wain <Paul.Wain@brunel.ac.uk>
>Date: Fri, 29 Apr 1994 13:37:09 +0200
>Subject: Re: Quick Question on <Hn> and <HR>
>
>Martijn Koster wrote:
>@ and I can see that. Maybe an <H1 rule=bottom> would do this?
>
>Yes, thinking about this, this could be a better way to go.
[...]
>... Yeah, a lot more flexible at the same time as being more "correct".
[...]
>So what would the additional rules be then?
>
><H# rule=bottom> miss out whitespace under a header?
><H# rule=top> miss out whitespace above a header?
>
>That would accomplish things quite nicely I guess. And again as usual,
>backwards compatibility wont be affected. :)

But it does mean changes to the DTD... If you want to stick any old
presentation info at all in your docs, you can do processing instructions
all over the place without changing the DTD. So for example, if I'm
writing an HTML->xxx converter, and the page breaks aren't coming out
right, I can add:

<? pagebreak>

to my document and hack up my converter to look for "pagebreak"
processing instructions. It will still parse by the same DTD.
(adding arbitrary attributes requires a change to the dtd) A
proliferation of slightly incompatible processing instructions is not
nearly such a mess as a proliferation of slightly incompatible DTDs.

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Brian Kelly <ECL6BK@lucs-01.novell.leeds.ac.uk>
>Date: Fri, 29 Apr 1994 16:13:24 +0200
>Subject: More questions on tags
>
>How do the line break tags (<BR> and <L>) fit in with the pararagraph
>container? I can see authors using these tags to control the appearance
>of their document using their favourite browser, but losing the
>structure of the document.

Well said... yet another reason to express linebreaks as...
<? linebreak>
or
<? br>
or
&br; <!-- with a new br entity in the DTD-->
rather than
<br>

>To give a specific example consider a document containing a list of
>people in an orgnisation
>
><HTML><HEAD><TITLE>Staff List</TITLE></HEAD>
><BODY>
><H2>Prime Minister</H2>
><P>John Smith</P>
><H2>Ministers</H2>
><P>Douglas Hurd</P>
><P>Kenneth Baker</P>
>..
>
>The author on seeing this rendered ...
>may be tempted to write:
>
><HTML><HEAD><TITLE>Staff List</TITLE></HEAD>
><BODY>
><H2>Prime Minister</H2>
><P>John Smith<BR>
><H2>Ministers</H2>
><P>Douglas Hurd<BR>
><P>Kenneth Baker<BR>
>..

The </P> tags are in there either way... their just implicit
in this version...

>or
>
><HTML><HEAD><TITLE>Staff List</TITLE></HEAD>
><BODY>
><H2>Prime Minister</H2>
>John Smith<BR>
><H2>Ministers</H2>
>Douglas Hurd<BR>
>Kenneth Baker<BR>
>..

This last one is actually what I'd prefer to see, except that the
two paragraphs should have <p> start tags, and the BR element
should be a processing instruction:

<H2>Prime Minister</H2>
<p>John Smith<?BR>
<H2>Ministers</H2>
<p>Douglas Hurd<?BR>
<p>Kenneth Baker<?BR>

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Rick Troth <troth@rice.edu>
>Date: Fri, 29 Apr 1994 16:27:28 +0200
>Subject: Re: More questions on tags
>
>
> I must be missing something.

No kidding... about 100 message on P as a separator vs. a
container what you've been missing.

> I don't see the reasoning
>behind making <P> be a "container".
>
>> <P>John Smith</P>
>
> Do we *really* have to do this?

No... </p> can be left out just like </li> and </dt>.
But the consensus I hear is that P is a container.

>To: Multiple recipients of list <www-talk@www0.cern.ch>
>From: Dave Raggett <dsr@hplb.hpl.hp.com>
>Date: Fri, 29 Apr 1994 16:35:21 +0200
>Subject: Re: More questions on tags
>
>HTML+ browsers will automatically infer missing paragraph elements so:
>
> <H2>Prime Minister</H2>
> John Smith<BR>
> <H2>Ministers</H2>
>
>will be interpreted as if the author had given:
>
> <H2>Prime Minister</H2>
> <P>John Smith<BR></P>
> <H2>Ministers</H2>

Note Well: This makes an HTML+ browser a NON-CONFORMING sgml parser.
My current compromise is
(1) p is a container, but...
(2) if you forget the <p> start tag, the text is just floating
around in the <body> element... it's not in any particular P element.
e.g.:

<h1>head</h1>
text
<p>more text
<h2>head</h2>
<p>more text

is parsed as:

<body>
<h1>
head
</h1>
text
<p>
more text
</p>
<h2>
head
</h2>
<p>
more text
</p>
</body>