Re: Cache woes.

Daniel W. Connolly (connolly@hal.com)
Fri, 20 May 1994 08:37:42 -0500


In message <3803.9405201240@molnir.brunel.ac.uk>, Paul "S." Wain writes:
>
>Its just come to my notice that the Cern HTTPD/cache decodes a URI, and then
>re-encodes it before getting the document from the server in question.
>
>Is this "safe" so to speak? It is causing a small, but no insignificant
>change to a URI.

It should be safe. In my interpretation of the current URI spec,
it is safe, for example, if the client asks:

GET http://www.hal.com/~connolly/index.html HTTP/1.0

since ~ is not a valid character in a URI, for the proxy to turn
around and ask the parent server:

GET http://www.hal.com/%7Econnolly/index.html HTTP/1.0

Ah! Note: this is not "safe" for arbitrary URL's -- only for
URL schemes which are known to use the %XX syntax (all currently
defined URL schemes do: ftp:, http:, hmmmm... not sure about
gopher: -- the gopher folks seem to want to use their own syntax
for some things.) It is the consensus of the URI working group
(and NOT my opinion!) that new URL schemes are not bound to
use the %XX syntax, nore the significance of the /, ?, ;, =
characters.

It is also safe to change:

GET http://www.hal.com/%7E%63onnolly/index.html HTTP/1.0
to
GET http://www.hal.com/%7Econnolly/index.html HTTP/1.0

though it is NOT safe to mess with /, ?, ;, or = chars. e.g.
it's not cool to change:

GET http://machost/volume/file%20with%2Fin%20it

to

GET http://machost/volume/file%20with/in%20it

since this changes the meaning of the URL.

See the spec for details (sorry... I'm at a vt100 terminal, or
I'd give a machine-readable reference)

Dan