BGI-spec 1.4

Simon E Spero (ses%tipper@tipper.oit.unc.edu)
Fri, 01 Jul 94 23:18:35 -0400


DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT
DRAFT $Id: BGI-spec,v 1.4 1994/07/02 02:47:01 ses Exp ses $ DRAFT
DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT

Binary Gateway Interface -
An API for dynamically extensible HTTP servers

July 1st 1994

Simon Spero
University of North Carolina at Chapel Hill
ses@unc.edu

Abstract:

Many HTTP servers currently support an interface protocol allowing
them to pass requests on external scripts. This protocol is known as
CGI. This mechanism is extremely flexible, but is unsuited to
high performance applications. In this paper we discuss an alternative
approach to server extensibility and propose an alternative interface
protocol based on dynamically linked functions. We compare the two
approaches and indicate some of the advantages and disadvantages of
each.

Introduction.
-------------
The Common Gateway Interface (CGI)[McCool 93] is a standard way of
allowing the manager of an information server to add extra functionality to a
server without needing to modify the http server itself. This functionality
is achieved by starting an external gateway process, and passing messages to
and from that process. CGI is not specific to the HTTP protocol.

CGI communicates with the gateway process through a number of different
mechanisms. Information about the request is passed through about 20
environment variables. Information about queries is also passed via the
command line. For requests that contain information in addition to the HTTP
header, the additional data will be made available on standard input.

The gateway script responds by sending the result to standard output.
Normally the output is processed on to the client. For efficiency, if
a script name begins with a magic string "nph-", the output is not parsed,
and may be send directly to the client.

This system is extremely flexible; however the design is not suitable for
use in high performance servers. There are several reasons for this. The
first problem is the processing overhead caused by the creation of an
extra process to handle each request.

Secondly, the server is required to process any and all HTTP headers,
and to generate an environment variable for each of them before
passing the request on to the gateway. Most of these headers will not
be needed by the gateway module.

Thirdly, unless the "nph-" escape hatch is used, the server must read and
parse the results of the gatewayed operation before sending them on to the
client.


A Binary Gateway Interface
--------------------------
An alternative way of extending the functionality of a server is to make
use of the dynamic linking facilities available under most modern operating
systems. If a standard set of function calls for handling requests is
defined, then extended operations can be handled as cheaply as standard ones.

Design Goals
------------
The designed presented in the following section is intended to meet several
design goals.

1) Fast. Extensions should be able to run as fast as
built in functions.

2) Lazy. Headers should not be parsed or evaluated unless
absolutely necessary.

3) Portable. Gateways developed for one operating system should
be usable on another system without requiring
extensive modifications.

4) Simple. The gateway author should not spend more time
working on the interface code than she does on
the actual gateway.

BGI design
-----------
The design is somewhat inspired by the Plan 9 file system, and to a lesser
extent, the extension system used for the System V.4 name resolution library.

The BGI model is based on the model of a hierachical name space. Specialised
handlers can be mounted at any point in the name space; these handlers will
be responsible for handling any requests that lie beneath their mount points,
unless a more specific handlers is mounted below it.

Servers do not need to use this model internally; however BGI handlers do
need to be told where they are mounted so that they can determine how much
prefix to remove from a URL.

Example: Suppose we have a namespace with the following handlers
mounted at the indicated points.

Mount point Handler
--------------------------------------------------
/ file_handler
/image-maps map_handler
/pictures picture_handler
/pictures/office-scene videopix_handler
/cgibin cgi_handler
/search-me wais_handler

A request for "/pictures/simon.gif" would be handled by picture_handler, as
would a request for "/pictures/simon.jpeg". However, a request for
"/pictures/office-scene" would invoke the videopix_handler.
However, asking for "/picture" would invoke the file_handler.

BGI handlers are compiled object code modules containing three functions
which are used to mount and unmount handlers, and to handle incoming requests.

Handler Methods
---------------

Init

void* <module>_init(char* mount_point,char* args)

This function is used to initialise a handler for attachment to a point in
the namespace. The value returned should either be 0, indicating that a problem
occured, or a cookie which will be passed to the handler function.

Unmount

int <module>_umount(char* mount_point, void* cookie)

This function should remove the handler from the indicated mount point,
and free up any memory allocated for the cookie.

Handler:

int <module>_handler(void* cookie, char* method, char* url, char* version,
sock_buf* buf)

This function handles all requests on this mount point.

Arguments:

cookie: This is the token that was returned by the initialisation
routine.

method: The method that was used to invoke this handler

uri: The uri passed for this request. All hex escapes will be replaced
by the corresponding characters before this routine is called.

version: The version string passed in the request. If no version was passed,
this string will be set to null.

buf: This argument is a container for the socket to use for this request
together with a buffer containing information already read from the
client.

typdef struct _sockbuf {
char* buffer; // pointer to start of I/O buffer
int buf_size; // total size of this buffer
char* end_of_data; // pointer to character after end of valid data
// in this buffer.
char* current_ptr; // pointer to first available character in buffer
int sock; // the socket
}

Result code:

If no errors occur, the handler function should return 0 or 200. If an error
occurs, the handler should return either 0, or a valid HTTP error code. If
a status code other than 200 is returned, the server will generate an
appropriate error message.

Notes:

All handler functions must be re-entrant.
Handler functions should not close the connection themselves.


Library functions
-----------------

Server implementors should make the following functions available to gateway
implementors.

---
int handle_url(char* method, char* url, char* version, sock_buf* buf)

Used to handle redirections, so that a handler can simply compute an alternate url and then have that resolved.

---
int http_error(int socket, int code, char* version)
Generate an error message corresponding to error 'code'

---

STAY TUNED - more functions need documenting.

Comparisons -----------

BGI offers a much faster alternative to CGI for extending servers; however there are several disadvantages. The most obvious problem is that BGI itself uses compiled modules, whereas CGI programs can be written in interpreted languages. Since a CGI emulation module can be implemented under BGI, this is problem can be circumvented.

Also, since BGI doesn't automatically handle all header processing, if extensive header processing is needed, this must be handled by the application. Adding functions to support header manipulation to the support library would certainly help this.

Open Issues ------------

1) It might be better to have separate handlers for each method, rather than having the single handler with its operation argument. This would allow different handlers to manage GET and POST requests. However, this would complicate the interface, since most handlers would only support a single method.

Currently, my favourited solution is to go with a single function per mountpoint, but to then implement a BGI module that dispatches to other BGI modules based on the method.

2) Adding more functions to the support library will make implementing gateways easier. I'm open to suggestions.

References:

[McCool 93] Introduction to CGI, http://hoohoo.ncsa.uiuc.edu/cgi/

# # $Log: BGI-spec,v $ # Revision 1.4 1994/07/02 02:47:01 ses # Emphasised that umount should dispose of the cookie # # Changed interface handler spec, replacing the integer operation with the # method string, adding a paramater for version, and bundling all I/O # paramaters into a single structure. # # Clarified (hah) semantics of result code from handler function. # # Changed interface to handle_url to match changes in handler # # Added version to http_error # # Revision 1.3 1994/06/28 23:01:33 ses # cookie is now a void* # # Revision 1.2 1994/06/23 21:16:13 ses # Added argument string to mount entry point # # Added buf_valid to handler entry point # # Revision 1.1 1994/06/22 16:00:00 ses # Initial Release