Using open source software to design, develop, and deploy a collaborative Web site

Part 1: Introduction and overview

Level: Intermediate

Alister Lewis-Bowen (mailto:alister@us.ibm.com?subject=Introduction and overview), Senior Software Engineer, IBM
Stephen Evanchik (mailto:evanchik@us.ibm.com?subject=Introduction and overview), Software Engineer, IBM
Louis Weitzman (mailto:louisw@us.ibm.com?subject=Introduction and overview), Senior Software Engineer, IBM

11 Jul 2006

In this series, follow along as the IBM Internet Technology Group team designs, develops, and deploys a closed community Web site using a suite of software that is freely available. The open source community provides various tools that, when plugged together, begin to create a useful development and production environment for complex Web applications. Using these tools as a foundation, we provide a methodology and set of enhancements to help you simplify the production process. Although customization is still necessary, this series shows you the tools and techniques to get relatively complicated Web sites up and running quickly using open source tools, including Drupal, MySQL, PHP, Apache, and Eclipse technologies. In this first article, you'll compare our approach with other software tools available and explore the enhancements we made.

Introduction

Today, Web sites are a critical part of business, and the tools to create and deploy Web sites are becoming more flexible and easier to use. However, the production of complicated Web applications that require more than the standard methods of interaction (such as blogs) is not trivial. Often, each application within an organization can require customization.

In this series we use a fictitious organization, International Business Council (IBC), to show you how to more effectively maximize your Web site capabilities. IBC connects its employees with external business partners in a collaborative community; however, the existing Web site is not meeting their current business needs, and the site must be redesigned. The new, customized Web site must have document storage, discussion groups, specialized workgroups, conference scheduling, schedule session descriptions, session expiration, and other tasks.

The users' role is to enhance the offerings of the company by improving strategic and tactical decisions. The user community is organized around core issues of workgroups. The community meets face-to-face several times a year in a conference setting. At the conference, issues are identified and then resolved between meetings. The Web site is used to provide information about the community's activities, such as conferences, and as a way to track progress about issues raised at the meetings.

The existing Web site is based on a document repository that provides a way to exchange documents and update the members about upcoming events. However, the Web site is inadequate for promoting community interaction, especially in the context of Web 2.0 sites with enhanced capabilities such as Weblogs, discussion groups, RSS feeds, and so on. The team's goal is to encourage the community's interaction and provide a publishing framework to support the community activities.

Issues

Since this is going to be a secure Web site, we need session management to support expiration and acknowledgement of a terms and conditions document before access can be granted. We also want to support a direct manipulation approach to content editing -- if there is an action to take on a piece of content, we want that interaction control to be placed next to the content it affects.

Other issues with the existing Web site are based on inconsistent visual language, navigation, and information architecture. Content is often hidden within the documents that are placed in the repository; there is no indication of what is in the document until you download and view it. As a result of our analysis, the team decided that the model of a document repository was the wrong model to enforce. We want a content management system to do much more than that.

Design process

Our design process was iterative, starting from analysis, to prototyping, to evaluation. We involved the end users as early as possible with questionnaires, design alternatives, and prototypes of varying fidelity. We wanted to focus our decisions on both the business goals and the end user's needs. Before launching into a set of functional requirements or development, we found that the earlier we involved the user, the quicker we could understand what was useful and what decisions were good ones. Without a whole lot of effort we gathered some very valuable feedback, while making the user feel they were part of the design process. We also gained a lot of credibility with the user community.

Another component to understanding the design is to understand the information architecture. For example, through the user research, we learned that one of the most important requirements is access to three major areas:

It was also clear from the user feedback that a clean, uncluttered, and simple architecture was required.

Our analysis identified three classes of users (or personas) who come to the site: customers, workgroup leaders, and administrators. These personas formed our design and decision-making process. By learning what information is important and how it relates to the major personas, we were able to create a content architecture, its hierarchy, how it might be presented, and how each class of user might interact with that information. For example, conference information might include agenda or session items, the scheduling of those items, topics of interest that spanned conferences, and action items that were resolved between conferences.

As analysis proceeded, additional features started to emerge. For example, to support an active community we wanted to enable discussions and comments on the content, and support contextual feedback and online collaboration. Another important client requirement that emerged was the use of a unique, yet neutral, brand or visual identity. Because the Web site presents a neutral ground where users from different backgrounds come to collaborate, a strong visual connection to any one company or background could cause unnecessary distraction.

Development environment

To ease our own development process, we wanted to use an existing content management system to help generate a timely solution. Most content management systems could support the basic functions we needed, but there was an obvious need for detailed customization. An out-of-the-box implementation was not going to be sufficient. We wanted to base our development on the significant body of work that has been done in the open source community. The tools we eventually used are from that tradition.

We had several goals for our development environment, such as being able to write and test our code changes independently of the network. When the changes were sufficiently tested, we wanted to share that code with the development team. This iterative development cycle lead us to use the remote versioning system CVS, which let us synchronize with our team members and maintain a code base we could all share.

We chose Eclipse to support the project and use all the technologies in an integrated development environment (IDE). Eclipse provides a number of extensions and makes the integration with Concurrent Versions System (CVS) very straightforward. Eclipse perspectives provide several views and editors that support the current activity. In our case, that activity was editing PHP modules and HTML fragments. Eclipse also tracks local changes to your code. So even if you don't check in the files to CVS, you can still recover earlier versions of a file from your local machine. This feature of Eclipse helps ensure that you never lose code again. We created a centralized development and test environment so we could optimize working with the code and other members of the team -- your time should be spent writing and testing code, not managing the files and other resources on your system.

The selection for our content management system had implications for the other tools we'd need to use. In the case of Drupal, this meant PHP, HTML, and Cascading Style Sheets (CSS) for the development of pages and MySQL for the back-end storage.

Requirements

We generated a set of requirements that helped guide us in selecting a content management system. If these features did not exist, we wanted the system to be easily extended to include these functions. The requirements include:

Figure 1 shows a typical page from the final design for this Web site. As we explore the different aspects of the design and implementation, we'll describe the pages in more detail.


Figure 1. Typical page from the IBC Web site

An open source content management solution

There are many ways to manage your Web site content, from the simple Web log (blog) engine system that allows limited content publishing to a full content management system framework to application frameworks, on which you can build your own custom content management system. With the wide array of choices in the open source space, it can be hard to choose the right solution for your needs.

Ultimately, we chose Drupal. However, it is useful to describe the rationale for our decision. The next section describes some of the candidate systems, including: Drupal, Mambo, Typo3, Ruby on Rails, Movable Type, WordPress, and TextPattern.

Comparison of content management solutions

Drupal

"Drupal is software that allows an individual or a community of users to easily publish, manage and organize a great variety of content on a Web site. Tens of thousands of people and organizations have used Drupal to set up scores of different kinds of Web sites, including: Drupal includes features to enable content management systems, blogs, collaborative authoring environments, forums, newsletters, picture galleries, file uploads and download, and much more. Drupal is open source software licensed under the GPL and is maintained and developed by a community of thousands of users and developers. Drupal is free to download and use." (Source: CMS Matrix)

Drupal is a relative youngster compared to other content management systems (CMSs). However, we got the impression the framework was well written, robust, very extensible, and seemed to have a thriving development community that was generating a lot of adoption and support.

As with other CMSs, the framework was very extensible. Many of the features we needed were provided as modules that could easily be snapped into the core functions of our Web site.

The framework and templating (theming) system are all written in PHP; there is no separate tag language to be learned. If you need to break out of the framework, it is very easy to do. (Of course, this isn't recommended, but it does offer ultimate flexibility.)

Session management is built into the core functions, which was more than other CMSs provided. This could help us pass some hurdles later on.

Drupal is known for scalability, or ease of growing a Web site from a small set of users to an enterprise level. The framework also has the ability to 'throttle' areas of the site that could cause potential problems during heavy traffic situations.

There is still an apparent learning curve to the "Drupal Way" of creating sites, but significantly less compared with other CMSs. The ability to use PHP to move freely between the business logic layer and the presentation layer (using the PHP template engine) was also very appealing.

We'd heard that the access control of Drupal can be more granular, but figured we could deal with that using the flexible extensibility. Drupal 4.7 has just been released with many enhancements to Version 4.6.

Mambo

"Mambo Open Source is one of the finest open source content management systems available today. The default installation of Mambo is easy to set up and easy to maintain. The setup utility uses a 4-step wizard interface that allows you to install the entire system without the need of advanced technical knowledge. Once installed, the system includes a variety of templates that you can choose from and a large number of functions that are ready to go. Content can be added, edited, and manipulated without having to know HTML, XML, or DHTML -- just enter your content using a friendly editor and click Publish. More advanced users are able to control the system to a level that suits their skills. The core files are written in PHP [Mambo is based on Linux™, Apache, MySQL, P for PHP, Perl and Python (LAMP)] and can be modified easily. The system is robust, proven, and backed by a large community of users and professional developers. (As of early 2006, Mambo is five years old!)" (Source: CMS Matrix)

At the time, Mambo was popular and seemed to offer a very easy install and an attractive, easy-to-use administration interface. Usually disregarded, the back end of a CMS becomes very important if you need it to work well for clients who need to administer the Web site after you hand it off.

The easy installation seemed to get us to a point where almost all the function we needed was available and ready to be themed. However, as with many CMSs, the templating is limited to a tag system that leaves you at the mercy of the quality of the markup that is substituted for the tags. This is fine if the markup is valid, semantically structured, and adequately sprinkled with CSS ID and class attributes to aid styling. If it isn't, then you can find yourself delving into the guts of the application to figure out how to correct the generated output.

Mambo also offered limited session management, although it was still more than other CMSs offer.

The development path seemed confusing, and the future of this solution was not certain to us. Mambo's development track is divided into several solutions. Miro is a commercial product and Joomla seems to be a new CMS spawned from Mambo. Mambo still exists and its current development path seems to be more stable.

Typo3

"TYPO3 is an enterprise-level open source content management system released under the GPL. It runs on more than 122,000 servers worldwide. The application has been translated into 43 languages and is actively being developed in a community of over 27,000 users in 60 countries. Some of its users include BASF, DaimlerChrysler, EDS, Konika-Minolta, Volkswagen, UNESCO, as well as numerous universities, government agencies, and nonprofit organizations." (Source: CMS Matrix )

Typo3 is big. Big application. Big community. Big adoption. Big list of extended features and contributions. The learning curve is big, too. There is no doubt that Typo3 could do all we required, but there seemed to be other issues in addition to complexity.

The markup generated by a lot of the core and contributed modules used structure from the mid to late '90s, with lots of table layouts, not much use of effectively placed CSS ID and class attributes, and sometimes invalid structure. We wanted to use current best practices to keep our development iterations flexible, so this wasn't going to help timely development. The templating system also seemed very complex compared to other solutions. The time invested in understanding how to theme the Web content outweighed the power of the templating system.

The administrative interface felt awkward and old, especially compared to Mambo. This was an important consideration, because we needed to hand off the eventual administration of the site.

If we'd had more time, Typo3 might have been an option. But, it seemed like it needed a rewrite to keep it fresh and competitive with other CMS that are emerging. It appears that the issue of using standards based xHTML and CSS is being addressed in the new version of Typo3.

Ruby on Rails

"Ruby is a pure object-oriented programming language with a super clean syntax that makes programming elegant and fun. Ruby successfully combines Smalltalk's conceptual elegance, Python's ease of use and learning, and Perl's pragmatism. Ruby originated in Japan in the early 1990s and has started to become popular worldwide in the past few years as more English language books and documentation have become available. Rails is an open source Ruby framework for developing database-backed Web applications. Rail's guiding principles: less software and convention over configuration. Less software means you write fewer lines of code to implement your application. Keeping your code small means faster development and fewer bugs, which makes your code easier to understand, maintain, and enhance. You will see how Rails cuts your code burden shortly.
Convention over configuration means an end to verbose XML configuration files -- there aren't any in Rails! Instead of configuration files, a Rails application uses a few simple programming conventions that allow it to figure out everything through reflection and discovery. Your application code and your running database already contain everything that Rails needs to know!" (Source: Rolling with Ruby on Rails)

Typical content management systems allow authenticated users to create content forming pages as part of a Web site. Ruby On Rails (ROR), however, does not provide this out of the box. It provides a Web application framework. Using ROR, you can build a custom CMS from scratch.

At the time of our project, there was a huge buzz around ROR, mainly promoted by 37Signals and its impressive array of real online ROR applications such as Base Camp.

We liked the way the bindings to the database through the framework lived up to the hype. These did help speed up those repetitive parts of development that hook to your database table columns.

At the time of our investigation, the framework was somewhat blog centric. Stability was an issue, but we recognized its potential for use on future projects.

Creating a custom CMS has its appeal. For example, with a custom solution we could have created a themable administrative interface. But, given our time restrictions, we needed a CMS framework to build on.

Blog engines

We considered using an existing publishing system that principally supported the creation of blogs. We've previously used such systems to support content for non-blog Web sites by redefining the way categories and data are used. These types of solutions are not aligned to compete with the likes of Drupal, Mambo, and Typo3. For example, session management as provided by Drupal and Typo3 is not typically supported. However, they do provide a very simple and quick way of creating a simple CMS.

Given that these solutions are blog centric, here are some alternatives:

Movable Type

"Movable Type is a powerful and customizable publishing platform allowing users to create attractive, expressive Weblogs within a personal publishing system that is infinitely customizable and versatile. Running as server-based software, Movable Type has been adopted by individuals and corporations who are drawn to its depth of features, open architecture, and robust library of third-party plug-ins designed to extend the system's functionality. Building from the current base of hundreds of plug-ins created by dedicated developers around the world, an entirely new class of applications can be built on top of the familiar and tested Movable Type system." (Source: CMS Matrix)

One of the more popular blog publishing systems at the time, this Perl implementation has a large community of contributors and a good support structure. The immediate roadblock was the creation of a charging structure by the makers, Six Apart, to support the development of its product. Because we were trying to create an open source solution, this nipped the idea of using Movable Type in the bud.

WordPress

"WordPress is a state-of-the-art, semantic personal publishing platform with a focus on aesthetics, Web standards, and usability. What a mouthful. WordPress is both free and priceless at the same time. More simply, WordPress is what you use when you want to work with your blogging software, not fight it. WordPress' default capabilities can be increased many fold (and new functions can be easily added) through its easy-to-use, plug-in architecture." (Source: CMS Matrix)

WordPress was growing in stature at the time of our project. Similar to Drupal, the wiki-style documentation system supporting this solution is useful. The core code is clean and easy to extend, and the user interface is very easy to use.

The templating system is a typical tab-based system, which compared well with other blog publishing platforms. The generated output from the system supported current best practices, and made the development of content layout and accessibility much easier.

One shortfall of WordPress is its lack of caching capabilities, which we thought would limit scalability.

TextPattern

"A free, flexible, elegant, easy-to-use content management system for all kinds of Web sites, even Weblogs. When it comes to publishing on the Internet, beginners and experts alike are met with a bothersome paradox: word processors and graphics applications allow anyone to do a pretty good job of managing text and images on a personal computer, but to make these available to the worldwide Web -- a seemingly similar environment of documents and destinations -- ease of use vanishes behind sudden requirements for multilingual programming skills, proficiency in computer-based graphic design, and, ultimately, the patience of a saint. Those who soldier on anyway may find themselves further held back by the Web's purported inflexibility with written language, with its reluctance to cope with all but the plainest of text, or by the unpredictable results brought about by using WYSIWYG Web editors. TextPattern is a Web application designed to help overcome these and other hurdles to publishing online, and to simplify the production of well-structured, standards-compliant Web pages." (Source: CMS Matrix)

Like WordPress, TextPattern looked like another well-crafted blog publishing system. It has a clean administrative interface and seems easy to use. However, it lacks a lot of the features we were looking for, including session control and caching.

Figure 2 shows some of the software requirements for the products discussed above.


Figure 2. Software requirements for some content management frameworks
 

Decision to use Drupal

Because we needed to make this Web site design easy for ourselves and anyone adopting the solution, the ease of installing the framework and the time it would take to figure out how to use it was a key factor. While Ruby on Rails (ROR) was intriguing, we decided that too much time would be spent writing a CMS from scratch, so ROR was dropped from our consideration.

If we were to be able to effectively control access to the information for each persona, having robust and flexible session and user management would make our implementation easier. Of course, the speed of implementation would also be improved by having a robust pluggable infrastructure backed up with a vibrant community contributing quality extensions to the existing framework.

Another key aspect was the potential to ramp up the scalability as the number of concurrent users started to increase.

The ease of adjusting the way the content was displayed was crucial; we needed to remain flexible during iterations of the design and any future adjustments. This so-called "themability" also was required for using the current best practices of Web design with respect to semantic xHTML, CSS, and accessible design.

Figure 3 shows a comparison of how the candidates met our requirements. It was obvious that we needed something more than the blog engines could provide.


Figure 3. Rating the candidates based on requirements

Mambo was very appealing from the ease of install and the UI, but the development track at the time was fractured and didn't give us any confidence of support.

Typo3 seemed to have a huge community and the maturity we were looking for. However, the learning curve for using Typo3 is daunting in comparison to Drupal.

We did have to invest some time to learn the Drupal way, and the framework just seemed to make sense. We also felt that Drupal provided the right combination of framework and flexibility to break out of the framework when needed to get the job done. With all things considered, we decided to use Drupal. The landscape of open source CMS is continuously changing, and in the future we'll revisit these and any new entries in the field.

Drupal in detail

Drupal contains many built-in features and is easily extensible with a vibrant community supporting and adding to the portfolio of additional features. The basic features include:

Extending Drupal

We used Drupal's module framework to add the extended features we needed to support our Web site. The extended features include:

We developed our Web site in Drupal 4.6, but Drupal 4.7 is now available. This series of articles will base its discussion on the 4.7 implementation.

Other content management frameworks

There are many content management frameworks available. You should evaluate each in relation to your own requirements. In this article you learned about the advantages and disadvantages of:

Summary

In this article you are introduced to the IBM Internet Technology Group's series about the design, development, and deployment of a collaborative Web site using open source software. The article gave you an overview of the project, our requirements, and a comparison of several content management systems we analyzed. We also explained our decision to use Drupal and how we could extend Drupal to meet our objectives.

Stay tuned for the next article, which will describe a flexible design methodology to address the process of designing applications. This process may be used to design a user experience for Web sites or applications. We then jump into the technical aspect of the development process with step-by-step guidelines you can use to install the development tool suite and all the supporting technologies. We'll follow up with discussions of other aspects of customizing the development environment and putting it to work. These topics include:

We intend to help get you up and running as quickly as possible with a robust set of content management tools -- enabling you to efficiently customize your Web applications.

Resources

Learn

Get products and technologies

Discuss

About the authors

Alister Lewis-Bowen is a senior software engineer in IBM's Internet Technology Group. He has worked on Internet and Web technologies as an IBM UK employee since 1993. Alister was brought to the US to work on the Web sites for the IBM-sponsored sports events, then as senior Webmaster for ibm.com. He is currently helping create semantic Web prototypes. Contact Alister at mailto:alister@us.ibm.com?cc=.

Stephen Evanchik is a software engineer in IBM's Internet Technology Group. He has been a contributor to many open source software projects, the most notable being his IBM TrackPoint driver in the Linux kernel. Stephen is currently working with emerging semantic Web technologies. Contact Stephen at mailto:evanchik@us.ibm.com?cc=.

Louis Weitzman is a senior software engineer in IBM's Internet Technology Group. For 30 years he has worked at the intersection of design and computation. He helped develop an XML, fragment-based content management system in use by ibm.com, and currently is involved with bringing the design process to emerging projects. Contact Louis at mailto:louisw@us.ibm.com?cc=.

Copyright 2006