Re: using NOTATIONs inline

Dan Connolly (connolly@pixel.convex.com)
Mon, 8 Jun 92 00:17:48 -0500


In article <23177A@erik.naggum.no> you write:
>Dan Connolly <connolly@convex.com> writes:
>|
>| The WWW group is attempting to define a multimedia interchange
>| format called HTML. . . .
>
>Why not use HyTime?
>
Eric:
Partyly because of ignorance (we've heard of HyTime, but we don't
know the details). I'd expect a HYTIME engine to be quite a bit
of work to implement. And partly because, as I understand it, HYTIME
doesn't go as far as to perscribe a DTD. The WWW project needs
one particluar language, not a whole architecture.

I'd certainly like to know more about HYTIME's techniques for addressing
documents, esp. elements of documents.

Now for the WWW gang:
>:
>| That is, is it possible to put an arbitrary 8 bit binary stream
>| _inside_ an SGML document? My guess is: no. But if we use
>| CDATA, can we include anything that doesn't contain the closing
>| tag in full?
>
>If you by "the closing tag in full" mean the entire end-tag, complete
>with etago, generic identifier, and tagc, as in "</image>", this is not
>the way SGML does it. CDATA and SDATA are terminated by a etago
>"delimiter-in-context", which is an etago (end-tag open, "</") delimiter
>followed by a name start character, or a grpo (group open, "(")
>delimiter if concurrent document types are allowed. In the reference
>concrete syntax, this means that the regular expression "</[(a-z]"
>matches the end of CDATA and SDATA elements.
>
>You can also use marked sections, with a CDATA status keyword, in which
>case the CDATA is terminated by the mse delimiter (marked section end,
>"]]>").
>
>:
>| Someone made the point that an SGML document is only allowed to
>| include SGML characters as specified by the SGML declaration, and if
>| we're going to use the default SGML declaration, we have to stick to
>| the characters blessed by it.
>
>Blessed and blessed. The SGML declaration is supposed to reflect the
>reality of the document, not enforce arbitrary limits on them. So you
>write an SGML declaration which fits the document.
>
>| That's not my understanding. I thought that inside CDATA (or SDATA,
>| I think) you could put _anything_ but the closing tag in full.
>
>As said above, the etago delimiter-in-context terminates the data,
>regardless of whether it's a legal end-tag in that context.
>
>You should be aware that the SGML parser will parse the contents of the
>"binary" content, and ignore record start, and treat record ends
>different from other characters. In addition, it's an error for an SGML
>entity to contain characters with any of the numbers listed in the
>SHUNCHAR part of the SYNTAX declaration. This is _not_ what you want
>with binary data.
>
>| What's the scoop? Do we have to use external entities for raw data?
>
>Yes. An external entity that is not an SGML text entity requires a
>notation identifier, so you only need to list the entities in the DTD,
>with notation, and refer to them by name in the document instance.
>
>If this is not satisfactory, you should declare the objects to be CDATA,
>and use a binary to text-only transformation scheme. There are several
>such schemes. Among them, base64 is the preferred encoding in my view,
>since it's available as part of the new Multipurpose Internet Mail
>Extensions (MIME) RFC-to-be. (The latest draft is available for
>anonymous FTP as ftp.ifi.uio.no:/pub/SGML/MIME.6.ps and MIME.6.txt for
>two weeks from today. Section 5.2 which concerns the base64 encoding is
>also available as ftp.ifi.uio.no:/pub/SGML/base64.txt.) Transformation
>back to the binary form from the text-only form may be done on the fly
>by the application before sending the data to the notation interpreter.
>
My idea is to use MIME encodings, but put these attachments _outside_
the SGML text, in an attached (or external) body part.

>In addition to being much easier to deal with in SGML, this also makes
>SGML documents containing such content robust with respect to file
>transfer, etc.
>
>Hope this helps,
></Erik>

Thanks. Mostly it confirms my suspicions, but it should also provide
a somewhat authoritative answer (no references to ISO 8879 here :-)
to the WWW project.

>--
>Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
>Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
>Boks 1570, Vika | <erik@naggum.no> | JTC 1/SC 18/WG 8 | Memento,
>0118 OSLO, NORWAY | <enag@ifi.uio.no> | SGML UG SIGhyper | vita brevis.