Re: Performance analysis questions

Daniel W. Connolly (connolly@hal.com)
Sat, 14 May 1994 00:41:58 -0500


In message <Pine.3.89.9405130741.A12231-0100000@brazos.is.rice.edu>, Rick Troth
writes:
>
>> If it's the header boundary that's a problem, you could
>> quadruple the speed with read(f, &s[i], 4) since you have at least
>> "GET " for HTTP/0.9 requests and HTTP/1.0 headers will terminate
>> with CR LF CR LF (well, they better!).
>>
>> -- George
>
> Argh! This is bad. Not picking on you, George,
>but you've presumed on something that a *lot* of people seem to
>presume on. It should be
>
> CR LF [any amount of white space] CR LF
>
> There are contemporary systems that CAN NOT generate a
>completely empty line in places. This is a problem for certain
>mail user agents which don't see the header termination because
>the blank line isn't an "empty line" (cr/lf/cr/lf). Let the blank
>line be just blank to the eyes, not necessarily "empty".
>
> Try this:
>
> o a line of text is NUL terminated
> (assuming you're coding C on UNIX) [YMMV]
> o when sending "on the wire" append CR & LF
> o when receiving, accept either NL (LF)
> or CR LF for end-of-line
> o when processing, ignore trailing white space
>
> Thoughts?

Yes... let's nip this sort of thing in the bud, shall we?

HTTP is not Internet Mail. HTTP is a protocol based on a reliable byte
stream, such as TCP. A reliable byte stream does not munge
whitespace. It doesn't lose characters because it translated to EBCDIC
and back.

HTTP is not for the human eye: it's for a piece of software that groks
TCP (or perhaps some other reliable transport eventually...).

It is not the case that there are 1000s of broken HTTP implementations
out there that we need to support. There are perhaps 10 or 20, with 2
or 3 represending 99% of the traffic.

Let us keep the HTTP protocol clear and free of such kludgery.

In the HTTP headers, A line is terminated by CRLF. That's octet 13,
octet 10. Anything else is broken. One should not expect to use
idioms such as:
printf("HearderName: stuff\n")
or
echo "HeaderName: stuff"
successfully. Care must be taken to terminate lines with CRLF.

Similarly for the blank line that ends the headers: I'm not sure if
RFC822 specifies that the line shall be empty or not, but I'd support
a clarification in HTTP that says it shall.

The data stream is something different altogether. The possible
content-transfer-encodings are:
7bit -- 7bit text, lines terminated by CRLF (no reason to use this)
8bit -- 8bit text, lines terminated by CRLF
binary -- 8bit data, not necessarily any linebreaks anywhere.

I believe binary is the default Content-Transfer-Encoding in HTTP
(though I believe I saw 8bit documented as the default somewhere...).

This means, for example, that you shouldn't expect html lines to
be terminated in any particular way. Of course it doesn't matter how
they're terminated except inside PRE elements. There, I'd say that
a newline is (CR|LF|CRLF).

Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
<connolly@hal.com> http://www.hal.com/%7Econnolly/index.html