Re: Processing instructions for style tweaks?

Murray Maloney (murray@sco.com)
Thu, 1 Dec 1994 16:32:15 -0500 (EST)


>
> > Subject: Re: Processing instructions for style tweaks?
> > Date: Wed, 30 Nov 1994 10:39:36 -0500 (EST)
> > From: Murray Maloney <murray@sco.com>
> >
> > I am dead set against PIs. Sure we could develop conventions,
> > but they could never be verified as conforming by an SGML parser.
> > No, PIs are bad! PIs are worse even than format-specific
> > SGML elements like <I> and <B> which can readily be mapped
> > to any formatting desired at the reader's end.
> >
> > . . .
> >
>
>
> I don't want to come out as if I'm championing PIs. I believe in
> "clean SGML" [Sharon Adler used to talk of "polluting" the SGML
> with format information] as much as anyone.
>
> But, as Murray elegantly pointed out in the rest of his post (that
> I elided), we must allow for other people with other viewpoints.
> In particular, there are (at least sometimes for some people) good
> reasons for wanting more control over style that can be achieved
> via, say, DSSSL Lite location/query mechanisms.

Thanks for the compliment.

I was going to let this go, but the more that I thought
about it, the more felt that I had to pursue it.
I don't mean to denigrate Dan's idea, or suggest
that Paul is wrong for supporting it. However,
I have to argue against PIs as our solution to
the often expressed need to have local control
over formatting.

So, please forgive me for what I am about to say.
I really think that it needs to be said.

>
> However, I do disagree with "PIs are worse even than format-specific
> SGML elements." I think you're wrong, here, Murray. Having formatting
> markup *indistinguishable* from structural markup (i.e., having it all
> be DTD elements--some with "good semantics" and some with "bad semantics")
> is the worst way to go.

Perhaps I spoke too strongly here. But I remain convinced that
PIs are not a happy solution and one that we would all regret
in the end. Read on...
>
> The advantage of using PIs for formatting-specific markup is that it's
> easy to strip/ignore them when one wants to slough off the "pollution"
> of embedded format-specific information.

But that is also true of attributes. The advantage of attributes
is that they can be parsed and verified as conforming by
an SGML parser and by an application, but the application
can choose to ignore them.

In the HTML 2.0 spec, we have been very careful to only make
formatting suggestions for HTML elements, by using the
wording "typical rendering" as opposed to specifying
the rendering with a hard and fast rule.

>
> For example, a PI might be used to force a page break or twiddle a line
> break for certain esthetic reasons during final production (this
> example may be more relevant to hardcopy, high-quality composition),
> but as soon as the publication has gone to press and its time to
> database the information for reuse or subsequent revision, you want to
> strip such markup that is not part of the base information per se but
> only an artifact of a particular presentation situation that is now a
> thing of the past. If I had a <newpage> element in there instead of a
> <?DL newpage> processing instruction, I would need to have a more
> sophisticated filter--that I would need to change with every new
> format-specific element I added--to strip them all.

Perhaps I have been misunderstood -- ya, that must be it -- so
please allow me to explain my position.

I am not in favor of a proliferation of formatting tags.
I am in favor of using attributes to associate formatting
with an HTML/SGML element. While I am quite content to
leave the <BR> and <HR> tags alone, I am not proposing
a <SPACE size=20pts> element. Neither am I writing
in support of a variety of elements proposed by Netscape
that could have been handled with entities.

So, what I was hoping is that we could define a few sets
of attributes (attribute architectural forms) that could
be attached to elements to provide for formatting control
at the element level. I imagine that there might be several
classes of elements (I haven't though this all the way through)
including INLINES, HEADINGS, and BLOCKS.

INLINES would have attributes that could affect the presentation
in terms of typeface, type size, and perhaps other characteristics
like kerning, character spacing, word spacing, reverse, etc.
I am not advocating anything specifically (except typeface and size),
but rather suggesting some potential characteristics that could be
adjusted by an author and possibly respected by a browser.

BLOCKS (paragraphs, address blocks, etc) would have attributes
that could affect the presentation in terms of line filling,
hyphenation, justification, line length, left/right/centre
adjustment of lines, line spacing, etc. Again, I am not
advocating anything, only offering potential candidates.
(Possibly, the attributes associated with INLINES would also
be available to BLOCKS.)

HEADINGS would have attributes similar to BLOCKS, but might
have other attributes.

And so on.
>
> With PIs, I can just strip everything of the form <?DL...>, or if my
> software handles it, just say "write -nopi" and get a depolluted
> version of the SGML. And, if I send the SGML--PIs and all--to another
> conforming SGML system that hasn't been programmed to do anything
> special with <?DL...> PIs, 'no harm, no foul,' it just works and the
> PIs are ignored.

Right. And with attributes you can simply ignore them.
Or you can ignore them selectively according to the
user's wishes -- as specified via a dialog. No harm, no foul.
The big difference is that you don't have to use a special
filter or paser to ignore attributes, and you do have a
syntax that is verifiable by an SGML parser.
>
> Finally, using formatting elements doesn't solve many of the problems
> because they either can't be used everywhere one might want, or their
> content models have to be so lax as to destroy the structure of the
> original DTD. PIs don't have to drastically change the ESIS tree of
> the document.

Here is where I may have been misunderstood -- and it is my own
fault for saying that I prefer formatting elements over PIs.

I am not in favor of a proliferation of formatting tags.
I am in favor of using attributes to associate formatting
with an HTML/SGML element.

Having said that, I am still more willing to accept some
tags that are intended strictly for formatting than PIs.
As examples, I point you to <I> and <B>. Yes, I have heard
all of the arguments. But I fail to see how <STRONG> is
intrinsically better than <B>. Perhaps that is because
I do not necesarily believe that something that is coded
as a <B> needs to be represented in a bold typeface.
My position is that the "typical rendering" of <B> is bold,
but <B> is simply a container. Given two phrases coded as
<B> and <STRONG>, I defy anyone to tell me that they are
at once able to describe the semantics of one and not the other.

>
> I do think there are better and worse ways of using PIs to implement
> the kind of format-override control that's being discussed. My earlier
> posting described in more detail how I would use PIs to allow for
> instance-specific location mechanisms whose specific formatting effects
> would still be specified in the style sheet.

Finally, I have some practical issues with PIs.

-- We cannot define a DTD for PIs in an HTML document.
So, we'll have the same mess we had with HTML before
Dan started his effort to write a spec. Nobody will
know for sure and there won't be any way to verify
it except for the "Mosaic test". Heaven forfend!

-- Placing PIs before and/or after elements will force
applications to save formatting instructions until
the next element is encountered or to look ahead
in case there are formatting instructions coming.

I am not a browser implementor, but I don't think
that that is clean. In fact, I think that it is
unnecesarily complex and will discourage browser
developers from implementing this functionality.

-- Forcing authors of HTML documents to learn another
language syntax to include their formatting hints
will discourage them from doing so.

All in all, the way that I would read this if I were a cynic is:

OK, they agreed that author's formatting hints
was a feature that the WWW community was demanding,
so they set out to design something that nobody
would want to use in the hope that demand would
taper off and the greater wisdom of pure SGML and
DSSSL style sheets would win the day.

Fortunately, I am not a cynic. But judging from the articles
that I see posted in the comp.infosystems.www.* newsgroups,
there are plenty of them out there waiting.

>
> paul
>
>
> Paul Grosso
> VP Research Chief Technical Officer
> ArborText, Inc. SGML Open
>
> Email: paul@arbortext.com
> or pbg@texcel.no

===========================================================================
---------------------------------------------------------------------------
Murray C. Maloney Internet: murray@sco.com
Technical Publications Writer/Architect Uucp: ...uunet!sco!murray
SCO Canada, Inc. My Phone: (416) 960-4031
130 Bloor Street West, 10th Floor Fax: (416) 922-2704
Toronto, Ontario, Canada M5S 1N5 SCO Phone: (416) 922-1937
===========================================================================
---------------------------------------------------------------------------
Sponsor member of Davenport Group (ftp://ftp.ora.com/pub/davenport/)
Member of IETF HTML Working Group (http://www.hal.com/%7Econnolly/html-spec/)
Member of SGML Open Internet and WWW Technical Committee
===========================================================================