Re: HTML draft - clarification of quoted string processing

Marc Andreessen (marca@ncsa.uiuc.edu)
Sun, 5 Dec 93 02:31:47 -0800


Dave_Raggett writes:
> Bryan Cheung writes:
>
> > I may just be blind, but I don't see a place in the HTML spec which
> > describes what is supposed to happen to special characters inside quoted
> > strings. Consider an HTML statement such as:
>
> > <form method=post action="/htbin-post/banner hello > foobar">

Heavily illegal. The special characters should be encoded, as usual,
as %xx. (This is because ACTION specifies a URL.)

> > ...form goes here
> > </form>
>
> > My question relates to how the i/o redirection character (or any special
> > character) is to be treated when used within quotes inside of a standard
> > HTML directive. Should special characters be completely protected when
> > quoted inside of a directive?? Does it make sense to specify that escapes
> > such as &gt; be used within quoted strings? Where should this go in the
> > spec? (I looked for it, and can't find it - please point me there if I
> > missed it.
>
> Page 331 of Goldfarb's SGML Handbook says that parsers derive the attribute
> value from the attribute value literal (the stuff between the quote marks)
> by replacing any entity references or character references within the literal
> and then normalising by replacing any contiguous whitespace by a single
> space character. Note you can use " or ' as quote marks for attribute value
> literals.
>
> Thanks for pointing out this topic - it is rather obscure and clearly needs
> to be included in the HTML+ spec. I can garantee that most browsers are
> currently doing the wrong thing for attributes!

Bleah. Let's be more restrictive. An encoding method exists for URLs
anyway; other attributes/values can be limited to reasonable characters.
This keeps things simple.

Marc