Kernel comparison: Improvements in kernel development from 2.4 to 2.6

The more things change, the more they stay the course

Paul Larson, Software Engineer
Linux Technology Center, IBM

17 Feb 2004

The long-awaited 2.6 kernel is finally here. The IBM Linux Technology Center's Paul Larson takes a look behind the scenes at the tools, tests, and techniques -- from revision control and regression testing to bugtracking and list keeping -- that helped make 2.6 a better kernel than any that have come before it.

In the three years of active development leading up to the recent release of the new 2.6 Linux kernel, some interesting changes took place in the way the Linux kernel is developed and tested. In many ways, the methods used to develop the Linux kernel are much the same today as they were 3 years ago. However, several key changes have improved overall stability as well as quality.

Source code management

Historically, there never was a formal source code management or revision control system for the Linux kernel. It's true that many developers did their own revision control, but there was no official Linux CVS archive that Linus Torvalds checked code into and others could pull from. This lack of revision control often left gaping holes between releases, where nobody really knew which changes were in, whether they were merged properly, or what new things to expect in the upcoming release. Often, things were broken in ways that could have been avoided had more developers been able to see changes as they were made.

The lack of formal revision control and source code management led many to suggest the use of a product called BitKeeper. BitKeeper is a source control management system that many kernel developers had already been using successfully for their own kernel development work. Shortly after the first 2.5 kernels were released, Linus Torvalds began using BitKeeper on a trial basis to see if it would fit his needs. Today, BitKeeper is used to manage the Linux kernel source code for both the main 2.4 and 2.5 kernels. To most users, who may have little or no concern for kernel development, this may seem insignificant. However, there are several ways that users can benefit from the changes that the use of BitKeeper have brought about in the methods used to develop the Linux kernel.

One of the key benefits that BitKeeper has provided is in merging patches. When multiple patches are applied to the same base of code, and some of those patches affect the same parts, merging problems are to be expected. A good source code management system can do some of the more tedious parts of this automatically, which makes merging patches faster and allows greater throughput for patches going into the kernel. As the community of Linux kernel developers expands, revision control is important for helping keep track of all the changes. Since a single person is responsible for integrating these changes into the main Linux kernel, tools such as BitKeeper are essential to ensure that patches aren't forgotten and are easily merged and managed.

Having a live, central repository for the latest changes to the Linux kernel is invaluable. Every change or patch that is accepted into the kernel is tracked as a changeset. End users and developers can keep their own copy of the source repository and update it at will with the latest changesets using a simple command. For developers, this means the ability to always be working with the latest copy of the code. Testers can use these logical changesets to determine which change caused a problem, shortening the time needed for debugging. Even end users who want to use the latest kernels can benefit from a live, central repository directly, since they now have the ability to update as soon as a feature or bugfix they need goes into the kernel. Any user can also provide immediate feedback and bug reports on code as it is being merged into the kernel.

Parallel development

As the Linux kernel has grown, become more complex, and gained the attention of more developers that tend to specialize in the development of particular aspects of the kernel, another interesting change has come about in the methods used to develop Linux. During the development of the 2.3 kernel version, there were a few other kernel trees besides the main one released by Linus Torvalds.

During the course of development of 2.5, there was an explosion of kernel trees. Some of this parallelization of development was made possible through the use of source code management tools because of the ability to keep parallel lines of development synchronized. Some of the parallelization of development was necessary to allow others to test large changes before they were accepted. There were kernel maintainers that kept their own trees that focused on specific components and goals such as memory management, NUMA features, scalability improvements, and architecture-specific code, and even some trees that collected and tracked lots of small bug fixes.

Figure 1. The Linux 2.5 development tree

The advantage to this parallel development model is that it allows developers of large changes, or large amounts of similar changes towards a particular goal, the freedom to develop in a controlled environment without affecting the stability of the kernel for everyone else. When developers are ready, they can release patches against the current version of the Linux kernel that implement all of the changes they have made so far. Testers in the community can then easily test those changes and provide feedback. As pieces are proven to be stable, those pieces can be merged into the main Linux kernel individually, or even all at once.

Testing in the Bazaar

Historically, the approach to testing the Linux kernel has centered around the open source development model. Since the code is open to review by other developers as soon as it is released, there was never a formal verification cycle performed as is common in other forms of software development. The philosophy behind this approach, called "Linus's Law" in "The Cathedral and the Bazaar" (please see Resources for a reference to that work) is "Given enough eyeballs, all bugs are shallow." In other words, heavy peer review should catch most of the really large problems.

In reality though, the kernel has many complex interactions. Even with abundant peer review, many serious bugs can slip though. Additionally, end users can, and often do, download and use the latest kernels as they are released. At the time 2.4.0 was released, many in the community were calling for a more organized testing effort to complement the strengths of ad-hoc testing and code review. Organized testing includes the use of test plans, repeatability in the testing process, and the like. The use of all three methods leads to better code quality than the original two methods alone.

Linux Test Project

One of the first contributors to bringing organized testing to Linux was the Linux Test Project (LTP). This project is aimed at improving the quality of Linux through more organized testing methods. Part of this test project includes the development of automated test suites. The main test suite developed by the LTP is also called the Linux Test Project. At the time the 2.4.0 kernel was released, the LTP test suite only had around 100 tests. As Linux was growing and maturing through 2.4 and 2.5 kernels, the LTP test suite was growing and maturing as well. Today, the Linux Test Project contains well over 2000 tests, and the number of tests is still growing!

Code coverage analysis

New tools are now being used that instrument the kernel in such a way that code coverage analysis can be performed. Coverage analysis tells us which lines of code in the kernel are executed while a given test is running. More importantly, coverage analysis exposes which areas of the kernel are not being tested at all. This data is important because it shows which new tests should be written to test those areas of the kernel, leading to a kernel that is more thoroughly tested.

Nightly kernel regression testing

During the 2.5 development cycle, another project undertaken by the Linux Test Project involved using the LTP test suite to perform nightly regression testing of the Linux kernel. The use of BitKeeper created a live, central repository for pulling snapshots of the Linux kernel at any time. Before the use of BitKeeper and snapshots came about, testers had to wait for releases before testing could begin. Now, testers can test the changes as they are being made.

Another advantage of using automation tools to perform regression tests nightly is fewer changes introduced since the last test. If a new regression bug is found, it is often easy to detect which change is likely to have caused it.

Also, since the change is very recent, it is still fresh on the minds of the developers -- hopefully making it easier for them to remember and fix the relevant code. Perhaps there should be a corollary to Linus' Law stating that some bugs are shallower than others, because those are exactly the ones that nightly kernel regression testing weeds out. The ability to do this daily, during the development cycle and before actual releases are made, enables the testers who only look at full releases to spend their eyeball time only on more serious and time-consuming bugs.

Scalable Test Platform

Another group called the Open Source Development Labs (OSDL) has also made significant contributions to Linux testing. Some time after the 2.4 kernel had been released, the OSDL created a system called the Scalable Test Platform (STP). The STP is an automated test platform that allows developers and testers to run tests made available through the system on hardware at OSDL. Developers can even test their own patches against kernels using this system. The scalable test platform simplifies the testing process since STP takes care of building the kernel, setting up the test, running the test, and gathering results. Results are then archived for future comparisons. Another benefit of this system is that many people do not have access to large systems such as SMP machines with 8 processors. Through STP, anyone can run tests on large systems such as these.


Kernel version history

Many of us are familiar with the Linux kernel version numbering system by now, but Andries Brouwer reminds us of how atypical it really is [ http://www.win.tue.nl/~aeb/linux/lk/lk-2.html#ss2.1 ].

The first public release of Linux was version 0.02 [ http://linux-bangalore.org/articles/bday.php ] in October 1991. Two months later, in December 1991, Linus released version 0.11, the first stand-alone kernel capable of operating without Minix.

After the release of 0.12 one month later, the version number jumped to 0.95 in March as a reflection of the system's growing maturity. Nonetheless, the 1.0.0 milestone didn't come until two years later, in March 1994.

The chronology of the two "streams" of kernel development dates from about this time. Even-numbered kernels (such as 1.0, 2.2, 2.4, and now 2.6) are stable, "production" models. Meanwhile the odd-numbered kernel versions (1.1, 2.3) are cutting-edge or "development" kernels. Until recently, work on a new development kernel followed the release of a stable kernel only by a few months. However, work on 2.5 started some ten months after 2.4 was finished.

So when can we expect kernel 2.7? It's hard to say, but there is already a thread to discuss it [ http://kerneltrap.org/forum/linux/kernel/2.7 ] at KernelTrap.

Until that happens, you may enjoy reading more about the History of Linux [ http://ragib.hypermart.net/linux/ ] in this article from Ragib Hasan.


Tracking bugs

One of the biggest improvements in organized testing of the Linux kernel that has happened since the release of 2.4 is bug tracking. Historically, bugs found in the Linux kernel were reported to the Linux kernel mailing list, to more component- or architecture-specific mailing lists, or directly to the individual that maintains the section of code where the bug was found. Deficiencies in this system were quickly revealed as the number of people developing and testing Linux increased. In the past, bugs were often missed, forgotten, or ignored unless the person reporting the bug was incredibly persistent.

Now, a bug tracking system has been installed at OSDL (see Resources for a link) for reporting and tracking bugs against the Linux kernel. The system is configured so that the maintainer of a component is notified when a bug against that component has been reported. The maintainer can then either accept and fix the bug, reassign the bug if it turns out to actually be a bug in another part of the kernel, or reject it if it turns out to be something such as a misconfigured system. Bugs reported to a mailing list run the risk of being lost as more and more e-mail pours onto the list. In a bug tracking system, however, there is always a record of every bug and the state it is in.


Volumes of information

In addition to these automated methods of information management, an amazing amount of information was gathered and tracked by various members of the open source community during the development of what would become the 2.6 Linux kernel.

For instance, a status list was created at the Kernel Newbies site to keep track of new kernel features that had been proposed. The list contains items sorted by status, which kernel they had been included in if they were complete, and how far along they were if they were still incomplete. Many of the items on the list contain links to a Web site for large projects, or to a copy of an e-mail message explaining the feature in the case of smaller items.

The "post-halloween document," meanwhile, told users what to expect from the upcoming 2.6 kernel (see Resources for a link). The post-halloween document mostly discussed major changes that users would notice and system utilities that would need to be updated in order to take advantage of them. Linux distributors and even end users wanting an early peek at what would be in the 2.6 kernels were the main audience for this information, which allowed them to determine if there were programs they should upgrade in order to take advantage of new features.

The Kernel Janitors project kept (and in fact is still keeping) a list of smaller bugs and cleanups that needed to be fixed. Many of these bugs or cleanups are caused by a larger patch going into the kernel that requires changes to many parts of the code, such as something that would affect device drivers. Those who are new to kernel development can work on items from this list, allowing them a chance to benefit the community while learning how to write kernel code on smaller projects.

In yet another pre-release project, John Cherry tracked the number of errors and warnings found during the kernel compile for every version of the kernel that was released. These compile statistics consistently dropped over time, and releasing these results in a systematic way made it obvious how much progress was being made. In many cases, some of these warnings and error messages could be used in the same way the Kernel Janitors list is used, as compile warnings are often attributable to minor bugs that require little effort to fix.

Finally, there was Andrew Morton's "must-fix" list. Since he had been chosen as the maintainer of the post-release 2.6 kernel, he exercised his prerogative to outline those problems he believed to be the highest priority for resolution before the release of the final 2.6 kernel. The must-fix list contained references to bugs in the kernel Bugzilla system, features that need to be finished out, and other known issues that many felt should block the release of 2.6 until resolved. This information helped to set the roadmap for what steps needed to be taken before the new release was made; it also provided valuable information to those who were curious about how close the much-anticipated 2.6 release was to being made.

Some of these resources have obviously ceased to be maintained since the release of the 2.6 kernel late last year. Others have found that their work has not ended after that major release, and continue to post updates. It will be interesting to see which are picked up again, and what additional innovations are made, once we again approach a major release.

Conclusion

When most people think about a new stable version of the kernel, the first question is usually, "What's new in this release?" Below the surface of features and fixes though, there is a process that is being refined over time.

Open source development is thriving in the Linux community. The looseness of the confederacy of coders who work on the kernel and other aspects of Linux allow the group to adapt successfully. In many ways, the way that Linux is developed and tested -- and specifically, the way this has evolved over time -- has had more impact on the reliability of the new kernel than many of the individual enhancements and bug fixes have had.

Resources

About the author

Paul Larson works on the Linux Test team in the Linux Technology Center at IBM. Some of the projects he has been working on over the past year include the Linux Test Project, 2.5/2.6 kernel stabilization, and kernel code coverage analysis. He can be reached at pl@us.ibm.com.

Copyright 2004