HTML Feature Test Entities, P as a container vs. separator

Daniel W. Connolly (
Mon, 11 Apr 1994 14:06:05 -0500

On the Fate of P:

I gather that the general opinion is that HTML document
structure should look like:

t ...


p with emphasis in it

Unfortunately, the common way that this is coded is:

p with <em>emphasis</em> in it
<li>item 1
<li>item 2

The unfortunate part is that there's no DTD (well, none that I can find)
that will enable a conforming SGML parser to infer that structure from
that document. However, if folks are willing to put <P> tags at the
_beginning_ of every paragraph, it can be done.

My current solution is
(1) Docs lacking <P> start tags are supported in a backwards
compatible mode of the DTD, ala:

<!ENTITY % HTML.pSeparator "INCLUDE">
<!ENTITY % html PUBLIC "-//connolly WWW HTML 1.8//EN">
<title>backwards compatiblem mode</title>
para 1
para 2

in this mode, the text of the paras are content of the BODY element,
and the P elements are empty, ala:

back.. ...




(2) In the standard usage of the DTD, paragraphs are containers
and require explicit start tags, ala:

<!DOCTYPE HTML "-//connolly WWW HTML 1.8//EN">
<title>backwards compatiblem mode</title>
<p>para 1
<p>para 2

The parser infers:

back.. ...




Here are the current feature test macros:

<![ %HTML.Minimal [
<!ENTITY % HTML.linkRelationships "IGNORE">
<!ENTITY % HTML.linkMethods "IGNORE">
<!ENTITY % HTML.linkRedundantInfo "IGNORE">
<!-- @@ nested lists -->
<!-- @@ phrases -->

<![ %HTML.Obsolete [
<!ENTITY % HTML.font-phrase "INCLUDE">
<!ENTITY % HTML.pSeparator "INCLUDE">

<!ENTITY % HTML.pSeparator "IGNORE"
-- use P element as paragraph separator, rather that container.
This means not all paragraphs need to start with a <P> tag.

<!ENTITY % HTML.linkRelationships "INCLUDE"
-- Adding markup to links to show the relationship between
ends of a link

<!ENTITY % HTML.linkMethods "INCLUDE"
-- Adding markup to links to show the methods supported
by the referent object

<!ENTITY % HTML.linkRedundantInfo "INCLUDE"
-- Adding markup to links to give redundant information
like URN, content type, title...

-- Anchor names should be distinct. SGML parser can validate
this if the NAME attribute of the A element is declared as ID.
But that restricts the syntax of an anchor name to an SGML name,
i.e. a letter followed by letters, numbers, periods and dashes,
up to NAMELEN (34) characters long.

-- Support for the <PLAINTEXT> tag as a sign of the
end of th HTML data stream and the beginning of a stream
of text/plain data
-- Is the TITLE element #PCDATA, RCDATA, or CDATA content?
On Mosaic, it's #PCDATA, but in the linemode browser,
it's more like CDATA, but not quite.

-- Used by the NeXT implementation to keep track of the
next anchor id to use

<!ENTITY % HTML.font-phrase "IGNORE"
-- allow B, I, TT, U outside PRE,
CITE, VAR, etc. inside PRE

-- treat XMP, LISTING as CDATA, as per linemodeWWW

-- Support for forms as per

If you're interested, see
for background etc., and
for the DTD itself.

Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010