Get the right middleware

By Patrick Collins
Herald webmaster

December 2, 1997

When The Sydney Morning Herald first went online in April 1995, creating an edition each night was a laborious procedure of copy and paste and automatic formatting with Microsoft Word Macros. I hear you chuckle, but this was the first step toward creating our own Web Publishing system. This process taught us the foundations for building a template system (the Story Processor). It allows us to create entire news sections without having to labour over HTML on a page-per-page basis.

Although our software is custom-built, a similar process of building Web sites is being used on many large Web sites on the Internet. They store their content in a database of some description in a raw text format. The database is then used to "publish" entire sections of the site by pouring that content into HTML templates. This allows the Web masters to change the look and feel of many pages by simply changing one HTML template and then re-publishing. The result is a professional-looking Web site.

The problem with this "database publish" approach to Web publishing is that the content ends up as static HTML files on your disk. A site which can modify itself based on user preferences or browser-type is desirable by many Web masters.

The next school of thought which really started to catch on in early 1996 was "Dynamic pages" or "Data-driven content". With this approach, when a user selects a page, software builds the HTML page on the fly from content and templates stored in a database. The pages never actually exist on disk and can be heavily customised based on user preferences or browser-type.

There are myriad tools available on the market, all of whose makers promise perfect dynamic Web publishing. Commonly called middleware, their chief responsibility is to act as the glue between the Web server and the database.

Middleware aims to facilitate HTML creation and database communication with the minimum of programming and fuss. Most commercial middleware will utilise your Web server's native API rather than relying on CGI. Some of the more common middleware is: Cold Fusion by Allaire Corp; ASP (Active Server Page) from Microsoft; or Netscape's Server-Side JavaScript.

The fundamental problem with using middleware products to do your Web Publishing is the speed and scalability of creating pages from a database for every page request on your site. It may work in a theoretical world where marketing guff reigns supreme and buzz words can power any Web site. In practice however, the Data-driven Web Publishing approach simply isn't feasible for growing sites.

You may hear a lot about the problem of scalability and choose to ignore the issues. But let me assure you, it is a very real problem that you should consider seriously when deciding what technology to use. Particularly if you've decided to go down the NT path.

The usage trends for the Internet show growth at an average rate of 50 per cent every six months (based on Telstra bandwidth figures). If your server is to withstand the growth over the next year you will need a solution which can scale to handle massive amounts of requests (take into account the extra usage your site will get during the Olympics).

If the Herald Web site were to embrace a data-driven approach, we would need middleware that could query a database and build pages at a sustained rate of 40 pages per second. Most middleware vendors (including database vendors) gulp when we ask them if their solutions could power our Web site (any takers?).

The optimal solution for Web Publishing lies somewhere in the middle of the two approaches. The content still needs to be stored in a database with dynamic templating facilities. But the middleware needs to be able to cache the requests to disk to eliminate redundant calls to the database.

Many software companies are turning their attention to database caching technologies, including Kiva's Enterprise Server [ http://www.kivasoft.com/ ] and Vignette's StoryServer [ http://www.vignette.com/ ]. Vignette is a high-end middleware product targeted directly at large sites. Developed by C/net for http://www.cnet.com/, it has an elegant and efficient component-based approach to page building. Page components such as navigation bars, advertising banners and content sections are all cached to disk separately and put together to meet a user's demand.

Eventually Web servers will be distributed with all of these features built-in. Next week we'll discuss the Web Publishing tools you can use today with a vision for the future and an eye on your wallet.

Copyright 1997