Re: Page breaks when you print formatted html docs

Daniel W. Connolly (connolly@hal.com)
Fri, 03 Jun 1994 11:00:42 -0500


In message <9406020928.AA04245@dragget.hpl.hp.com>, Dave Raggett writes:
>> Is there any code we can imbed in the html doc which will force page
>> breaks at appropriate spots? Like the tops of sections beginning
>> with <h1>, etc.?
>
>This kind of thing shouldn't go into the HTML. Instead you need a better
>utility for converting to Postscript. In the longer term, style sheets
>will carry this info for specified page sizes etc.

If you need to hack today, i.e. you have some application where it's
easier to hack the documents to tell your converter where to put page
breaks than it is to fix the converter to "do it right," I'd suggest
writing:

<? pagebreak>

or
&pagebreak;

The <? pagebreak> thingy is a "processing instruction." It's a kind of
SGML escape mechanism. It's almost like a comment, except that it _is_
part of the ESIS, that is, the parsed representation of a document.

The &pagebreak; thingy is an entity reference. It presupposes that
somewhere in the prologue (DTD), somebody has written:

<!ENTITY pagebreak SDATA [pagebreak]>
or
<!ENTITY pagebreak "<? pagebreak>">
or
<!ENTITY pagebreak "<pagebreak>">

The handy thing about the &pagebreak; form is that it works like a
macro that you can later redefine without editing your documents. For
example, suppose folks had been writing &br; all along in stead of
<br>. And suppose the first version of the HTML DTD that defined br
defined it thusly:

<!ENTITY br "<? linebreak>">

because we believed linebreaks had no structural significance. Then,
we decide after a while that in fact they do have structural
significance -- the parts of a postal address, for example, are split
into separate lines to distinguish street address from city and state,
etc. Then we could revise the DTD to read:

<!ENTITY br "<br>">

and presto! without changing the documents, they all benefit from this
change. (Some documents would potentially become invalid, since
<?linebreak> is allowed anywhere, and <br> might only be allowed in
certain places).

On the practical side, I think you can choose either form of markup
and get the results you're after -- namely, that it doesn't show up on
the screen when you preview it with Mosaic. Quick check... Crud.
Mosaic displays &pagebreak; and <?pagebreak>.

So much for that idea!

Oh well... there's always the old comment hack
<!-- pagebreak -->

Dan