Re: Session-ID proposal

Dave Kristol (dmk@allegra.att.com)
Tue, 8 Aug 95 11:35:57 EDT


Bob Wyman <bobwyman@medio.com> wrote:
>
> re: Dave Kristol's Session-ID proposal
>
> WHEN DOES A SESSION END?
>
> Your draft says that one of the "key points in the session paradigm" is that
> "The session has a beginning and an end." However, in your draft, it isn't
> apparent in all cases that both parties (or all parties if a proxies are
> involved) can determine when a session ends.
Agreed. In fact, either the client or the server can end a session.
The client does so by failing to return a Session-ID. Because the
session information is opaque, the client can't really determine when
the server ends (or changes) a session, except perhaps if the server
returns an empty Session-ID or none. In fact, the client is a
relatively passive collaborator in this mechanism.
>
> You insist that "when the client terminates execution, it discards all
> Session-ID information." This, I assume, "ends" the session from the point
> of view of the client. However, the server will never discover that this has
> happened.
I'm also assuming that it's not important that the server know this.
(See below.)
>
> Although you don't specify it clearly, the requirement that a client "must
> return [the Session-ID] to the server for the next transaction by any method
> ..." might be read to imply that the client can end a session by making
> another request without sending the Session-ID of the last response (if non-
> conformance is an implicit "end".). This would require the server to
It isn't non-conformance, really. It grants control to the client to
choose how willing it is to identify itself with a session. There has
been much discussion about the tradeoff of privacy through "clicktrail"
following vs. the usefulness of having stateful sessions. So, yes, a
client can end a session as you say.
> maintain a list of Session-ID's and the client hosts to which they were
> assigned and then doing a lookup in this list on *every* request to
> determine if the most recent request should have used a Session-ID. But,
> that won't work for clients that are hidden behind proxies or that come from
> multi-tasking machines since many clients may appear to be using the same
> host name.
I disagree. The point of the proposal is that all interesting state should
be embedded in the opaque session information. The whole point of the
proposal, in fact, is to avoid the server's having to keep any kind of
database to keep track of sessions. The state bounces back and forth in
Session-ID headers.

Firewalls and proxies pose no particular problem, because if your request
initiates a new session, your client gets a new set of state information
that is unique to the request. The server augments it (perhaps) as it
services your request, then returns a revised Session-ID header, which
your client saves and returns the next time you make a request to the
same server.
>
> On the server side, you say that the server may send "...a different, or no
> Session-ID response header" in response to a request which includes a
> Session-ID. Although your draft doesn't explicitly say this, I assume you
> intend that when the client receives a Session-ID response which is
> different from that transmitted in the corresponding request, that the
> client should assume that the old session has ended and a new session may be
> starting. However, it is a method whose utility is limited to only those
> times when the session's end coincides with an opportunity to send a
> response.
Another point of my proposal is that the client does not have to be very
smart. In particular, its notion of session is limited to the requirement
that it return any Session-ID it gets to the server from which it got it.
So the client doesn't know or care whether a session has ended and a new
one has started. Such information would be embodied in the content that
the server returns to the client.
>
> The fear, of course, is that since there are a number of scenarios in which
> the server can't discover that a session has ended, that the server will be
> forced to build an ever-growing list of active sessions. This is not good.
> One solution to this problem would be to provide for a session expiration
> date (either absolute or relative to last response) which would give the
> server a mechanism for purging it's Session-ID tables. (Note: Netscape's
> "cookie" proposal does something similar although expiring a "cookie" is
> somewhat different than expiring a Session-ID since one would assume that a
> Session-ID has some level of uniqueness -- not addressed in your draft ---,
> while there is no such requirement for a cookie.)
Not necessary. As I said, the state is in the opaque session
information and passes back and forth between client and server. If a
given server wants to put an expiration in that information and act on
such an expiration, it is free to do so. Or not.
>
> WHAT IS A CLIENT?
>
> As far as I know, there is no formal reference model for the Web, thus, it
> is necessary from time to time to ask what people mean when talking about
> specific architectural elements. Seeing your requirement that client
> "discards all Session-ID information" when it terminates, and that the
> "client" must send the Session-ID on the next request I'm worried about
> imprecision in defining the client. For instance, if I have a Web browser
> that allows me to have two open windows (i.e. Netscape), if I get a Session-
> ID as the result of activity in window "A", am I required to send that
> Session-ID with requests generated in window "B"? If I close window "A", do
> I keep the Session-ID or delete it?
Excellent question. I hadn't considered that case. If the two windows
can act wholly independently, I would consider them to behave like two
separate client programs and act accordingly with respect to Session-ID.
>
> WHAT IS A SERVER?
>
> Many of the demands for Session-ids or session-state have been intended to
> allow CGI scripts to distinguish between clients and to maintain client-
> specific state on the server side. This leads to the question (another
> reference model problem): What is the server? Is the session with the HTTP
> server or is it with the CGI script? Your draft indicates that you think the
> server is identified by "server name (IP address) and port combination." It
> seems to me that this means we're going to have to implement some
> potentially complex method for letting CGI scripts know the session ids for
> the clients they speak to. Choices seem to be: 1) a configuration parameter
> that tells the server to always generate Session-ID's when a particular CGI
> script is run. The Session-ID would then be given to the CGI in an
> environment variable or some simlilar process. 2) An API by which the CGI
> can ask the server to provide a Session-ID and tell the CGI script what it
> is. Additionally, given that you suggest that Session-ID's can be changed or
> eliminated by the server, we'll need a mechanism for CGI scripts and servers
> to negotiate and/or inform each other of these changes.
Given my earlier remarks about how state resides in the opaque
information, it should be clear that the current CGI mechanism is
adequate. A CGI can examine the incoming Session-ID, if any, and can
generate a new one or repeat an old one. Clearly the content of the
opaque information is application-dependent. I won't dispute that
things could get messy. I'm trying to propose enabling technology,
not solve all the problems.

One possible implementation is for the opaque information to refer to a
file or database on the server side. The server should be able to collect
old (datebase) information and identify stale (opaque) information.
>
> Whatever the method of telling the CGI what the Session-ID is, unless the
> specification states that Session-ID's have some sort of uniqueness to them,
> they won't be useful for many of the purposes that CGI scripts would want to
> use them.
>
> My personal preference would be for the CGI to be able to generate its own
> Session-ID which addresses whatever it thinks its operational requirements
> are. Thus, I would argue that the CGI script is the "server", not the HTTP
> daemon.
I'm not sure how to respond to these two paragraphs. I'll hope my
earlier remarks have done so. From a user's perspective, an HTTP
daemon program and a CGI script that it starts are indistinguishable.
>
> ARE WE DEFINING PROTOCOL OR USER INTERFACE?
>
> Your requirement that the client discard information when it "terminates
> execution" might be a good recommendation, however, it is inappropriate as a
> *requirement* of the HTTP Protocol. The protocol should place no constraints
> on program execution models -- only on what data flows over the wire.
I think there are other, similar, examples (If-Modified-Since) where
the protocol specification has to spill over into execution models to
be useful. And again, the discarding of Session-IDs is partly a
privacy matter, as well as a protocol matter.
>
> PROBLEMS WITH CACHING
>
> You state that when a caching proxy gets a Session-ID response header, "it
> must not cache that header as part of its cache state." However, you don't
> prohibit the caching proxy from caching the body of the response. Thus, it
> would appear that a second client could make a request and have that request
> satisfied from cache without ever discovering that Session-ID's were
> available for the document. This would be a particular problem when a
> response came back with both a Session-ID and an Expires: header since the
> cache might decide not to do a HEAD, GET, or conditional GET until after or
> close to the Expire time. It would seem that the cache should remember that
> the Session-ID response header was there (whether or not it caches the
> actual Session-ID) and then always do a conditional GET for the document
> even if the Expires: time hasn't passed. (NOTE: Of course, you could insist
> that the semantics of Expires be changed if found coincident with a Session-
> ID in a response. -- the question is whether the "Expires" header is
> globally meaningful or meaningful only within the context of the session)
Other proposals similar to this one have run into caching problems
because the state is somehow embedded in the document or URL. I wanted
to avoid that. I'm assuming that the resource content is the same,
independent of Session-ID (subject to the same constraints that would
apply if there were no Session-ID). Hence the Session-ID gets passed
through the proxy in both directions, but the resource content itself
can be cached. Given this treatment of Session-ID, it's expected and
desirable for a caching proxy to serve out the same content when
possible.
>
> WHAT PROBLEMS ARE BEING SOLVED HERE?
>
> The draft doesn't really give any information about the specific uses that
> are expected for the protocol features defined. This makes it hard to
> evaluate. For instance, I can see that what you have defined would be useful
> in tracking "clickstreams." However, the problems mentioned above and others
> not mentioned make it hard to see how this proposal will help with "shopping
> carts" and a variety of other applications identified in the recent www-talk
> discussions on this subject.

Actually, I had "clickstreams" in mind less than "shopping carts" and
library system navigation.

Thanks for your extensive comments. I'll try to tweak the proposal to
improve it.

Dave Kristol