Re: Performance analysis questions

Rick Troth (troth@rice.edu)
Fri, 27 May 1994 15:57:35 -0500 (CDT)


I'm surprised and crushed by Dan's response.

> > Argh! This is bad. Not picking on you, George,
> >but you've presumed on something that a *lot* of people seem to
> >presume on. It should be
> >
> > CR LF [any amount of white space] CR LF
> >
> > There are contemporary systems that CAN NOT generate a
> >completely empty line in places. This is a problem for certain
> >mail user agents which don't see the header termination because
> >the blank line isn't an "empty line" (cr/lf/cr/lf). ...

> >
> > Thoughts?
>
> Yes... let's nip this sort of thing in the bud, shall we?
>
> HTTP is not Internet Mail.

Right. And Internet Mail is broken. Let's not see HTTP
break because someone misinterpreted the spec. We need to clarify this.
I say that we should clarify it in the looser direction w/r/t plain
text and trailing whitespace in particular. I see no reason to
penalize clients and servers that have platform limitations ...
unless it's just out of spite. What's the deal, Dan?

> HTTP is not for the human eye: it's for a piece of software that groks
> TCP (or perhaps some other reliable transport eventually...).

If by this statement you're pointing out a misimplication
in my note, I accept the correction. I didn't mean to suggest
that HTTP is for human consumption. What I *did* (still do)
mean to suggest is that, to the greatest extent possible,
HTTP be clearly defined as a PLAIN TEXT protocol.

I think we all agree that "plain text" protocols are a
Good Thing. But we've agreed to that without bothering to define
what on earth "plain text" is. I don't think HTTP should be the
protocol to bear the plain text torch, but I think it'd be foolish
to plow blindly forward without thinking carefully about it. There's
so much in HTTP that's wonderful, things like the URL being a single
blank-delimited token; see it all the way through! (you'd better be
discarding any trailing white space from your GETs; are you???)

That's why I said ...

> > Try this:
> >
> > o a line of text is NUL terminated
> > (assuming you're coding C on UNIX) [YMMV]

Better: "a line of text is end-of-record terminated",
where end-of-record is defined by local O/S considerations.

> > o when sending "on the wire" append CR & LF
> > o when receiving, accept either NL (LF)
> > or CR LF for end-of-line
> > o when processing, ignore trailing white space

And add one more: TAB and SPACE process the same.

This isn't arbitrary, it fits the "be conservative about
what you generate and liberal about what you accept" rule.
(or do we disagree about that too?)

> Let us keep the HTTP protocol clear and free of such kludgery.

This is not kludgery! This is robust design.

> In the HTTP headers, A line is terminated by CRLF. That's octet 13,
> octet 10. Anything else is broken. One should not expect to use
> idioms such as:
> printf("HearderName: stuff\n")
> or
> echo "HeaderName: stuff"
> successfully. Care must be taken to terminate lines with CRLF.

Certainly. Any HTTPD will have to map local conventions to
on-the-wire streams. Any HTTPD will have to map end-of-line to 0x0D 0x0A.
We can make certain demands of the various HTTP server implementations.
I say we *not* demand that "plain text" (including HTTP headers)
be anything more than outlined above.

> Similarly for the blank line that ends the headers: I'm not sure if
> RFC822 specifies that the line shall be empty or not, but I'd support
> a clarification in HTTP that says it shall.

That's the problem. It doesn't specify!

I'd support a clarification that it NEED NOT be empty.
If you specify that it MUST BE EMPTY (eg: CR/LF/CR/LF) then
at least you've specified, but you'll have tightened the spec
in the direction of least ease of implementation.

> The data stream is something different altogether.

And they're beyond the scope of this argument. But ...

> This means, for example, that you shouldn't expect html lines to
> be terminated in any particular way. Of course it doesn't matter how
> they're terminated except inside PRE elements. There, I'd say that
> a newline is (CR|LF|CRLF).

Which can safely become LF, LF, LF on a UNIX client host.
You wouldn't want the "save to disk" option to leave those CRs in
there, would you? Still, here too, any trailing white space should
be considered fair game.

> Daniel W. Connolly "We believe in the interconnectedness of all things"

Exactly!

> Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
> <connolly@hal.com> http://www.hal.com/%7Econnolly/index.html

-- 
Rick Troth <troth@rice.edu>, Rice University, Information Systems