Re: Re Dan on implementation

lenst@lysator.liu.se
Fri, 25 Feb 1994 11:32:43 --100


In message <9402170147.AA06698@ulua.hal.com>, "Daniel W. Connolly"
<connolly@hal.com> uses C as an example of a context free language and
then goes on to write:

>If we keep HTML down to a context-free language composed of regular
>tokens, then folks can write little 20-line ditties in perl, elisp,
>lex, yacc, etc. and get real work done.

Can you write a little 20-line perl program that lists the variables
of a C program?

>If we require real-time processing of all legal SGML documents,
>we buy nothing in terms of functionality, and we render almost
>all current implementations broken.

I don't think it has been suggested that browsers need to be able to
process *all* legal SGML documents. It is after all a specific DTD
and a specific SGML declaration.

>>| <!-- this: <A HREF="abc"> looks like a link too! -->
>>
>>How so? It's in a comment, and so will be ignored by a parser.
>
>Yes, by an SMGL compliant parser, but not by any parser built
>out of standard parsing tools like regular expressions, lex, and yacc.
>(well, actually, you could do it with lex, but it's a pain...)

Recognising a comment can be done with regular expressions. If you
have trouble making lex and yacc handle this, I don't think it is
because the limitations of lex and yacc.

>>| And this: a < b > c has no markup at all, even though it
>>| uses the "magic" < and > chars.
>>
>>But not in the magic combinations <[A-Za-z] etc.
>
>Right. The famous "delimiter in context". Contrast this with the
>vast majority of "context free" languages in use.

I will compare this with C. In C "/" is a token used for the division
operator and "*" is a token used for the multiplication operator, but
when "/" is followed by "*" it is a comment start. This is consistent
with a "context free" language as is recognising a "<" as a start tag
opener when it is followed by a letter.

>You say "crippled", I say "expedient". Remember: the documents are
>still conforming. It's just the WWW client parser that's non-standard.

It is harder to make SGML tools produce correct HTML if HTML has a lot
of arbitrary restrictions.

--
Lennart Staflin  <lenst@lysator.liu.se>