Making Linux a World-Class Enterprise Server OS

Projects: Linux scalability: Enterprise-class Linux

Making Linux a World-Class Enterprise Server OS

Chuck Lever, Netscape Communications Corp.
chuckl@netscape.com

$Id: enterprise.html,v 1.4 1999/11/12 20:12:54 cel Exp $

Abstract

We provide a list of issues that remain to be resolved which will help Linux become a world-class enterprise server OS. These items could help make up Netscape's technical agenda for Linux.

Introduction

Linux is an open-source POSIX-compliant operating system that runs on commodity Intel PC hardware. Linux also happens to be very stable, crashing far less than other operating systems in its class. As such, it is one of the most widely deployed operating systems supporting network services such as mail and web servers.

Despite high acclaim and ultra-stability, Linux remains a work in progress. There are several aspects of Linux that can be improved to better position it for enterprise service. In this document, we provide a list of issues that remain to be resolved which will help Linux become a world-class enterprise server OS. These items could help make up Netscape's technical agenda for Linux.

The Issues

The general areas in which we are interested are:

Reliability - improving system recovery mechanisms, backup/restore, and fault-tolerance.
Performance - how close applications running on Linux can get to optimal hardware speed, especially applications that require high performance; support for high-performance hardware like RAID and gigabit networking such as ATM and Gbit ethernet.
Scalability - improving system throughput, overload characteristics, relieving architectural constraints, enhancing administration of large installations.
Security - improving imperviousness to network and local attacks, reducing or eliminating the risk of buffer overflows, continuous security testing of all bundled applications and utilities.
Standards compliance - network implementations should be well-behaved; useful and common APIs should maintain standards compliance (e.g. POSIX).
Quality Assurance - reducing defect rate and defect re-introduction.

Specific Technical Suggestions

Issue	Explanation
Support for large memory configurations	Proper support for 1G to 4G physical RAM (Intel CPU limitations aside, the kernel can automatically do the correct limited thing); in the long term, full OS support for 36-bit addresses on hardware that supports them. This isn't an issue for hardware architectures that already support 64-bit addresses.
Improved SMP scalability	Scheduler should scale to large number of threads/processes. System should scale well with additional CPUs.
64-bit file lengths	Internal and application programming interfaces need to be ready for 64-bit file lengths. File system administration and backup tools need to handle 64-bit file lengths.
SCSI I/O throughput	Improved reliability for tapes and esoteric devices; support for wide range of modern SCSI chipsets and devices. SCSI drivers need to support greater level of concurrent I/O.
TCP throughput and standards compliance	TCP is a complex and ever-evolving protocol. Bugs and performance problems are easily introduced during the development process.
high performance asynchronous I/O APIs	Current RT signal API used to support async I/O does not integrate well with threads. A queued I/O API or kernel-mediated event dispatching system that supports UI events as well as file descriptor wakeup would be very cool.
Functional and performance validation testing	Regular functional testing and performance validation would help catch significant bugs early, and would also allow maintenance of a history of system performance improvements for catching performance creep.
High performance/reliability file system	RAID-enabled support for high data resiliency with high performance in file systems used by network servers.
System performance monitoring	This includes support for tools like iostat, vtop, and sard.
Fault toleration	Support for loss of CPU in SMP config; proper reconfiguration during card or memory failures.
Large site administration	Support for things such as secure java consoles, remote administration like cfengine, security features like TripWire; creating coherent and integrated system documentation.
Constraint relief	Support for 32-bit UIDs, large number of file descriptors and fdsets, large number of threads/processes, size of shared segments and swap areas, kernel I/O concurrency, and so on.
Quality assurance and piloting arenas	Informal now, but can we demonstrate that a more formal process can have positive effect? What is, for example, Red Hat doing in this regard?
Internationalization	Internationalization of kernel and utilities (I know, it's a difficult and not very sexy job, but somebody's got to do it).
Security, security, security	Eliminating buffer overflows, eliminating TCP vulnerabilities; including ssh in popular Linux distributions; including good security documentation.
High-end networking and disk subsystems	Support Gigabit ethernet and ATM; explore zero-copy I/O; look at support for RAID and beyond.
Servicability enhancements	Support for IPMI and other advanced hardware monitoring; Improving Oops and dump tracing, and configuration snapshotting for "first-time capture" of significant system problems.
Constraint relief for large number of users	Support for 32-bit UIDs, high efficiency password lookup; look into quota support, utmp, utmpx, wtmp.
Complete support for serial console	Linux has much of this already, but work needs to be done to get server hardware ready to support this.
Support for Name Service caching	Domain Name Service performance and reliability is critical to network server performance. Solaris, for example, uses nscd to help ensure that DNS and yp/NIS+ performs well.
Support for "direct I/O"	For example, locking down user pages, then do scatter/gather DMA I/O without another copy, or support for PCI->PCI I/O through main memory without CPU intervention; closer to zero-copy behavior across kernel and device drivers. See IO-Lite or McVoy's splice() paper.
Prioritization of standards compliance	Standards compliance is important, but not to the exclusion of alternatives that can co-exist in the system API and provide better functionality and performance.

Someone Else's Agenda

For completeness, we list some examples of areas that don't effect the enterprise-worthiness of Linux. These are on someone else's agenda.

Small system performance
Support for embedded systems and esoteric processors
Interactive improvements (e.g. interactive responsiveness, GUI improvements like GNOME, KDE, or other alternative window management)
Sound support
Support for VFAT, HPFS, and other legacy workstation file systems
Support for low-bandwidth networking like packet radio, ISDN, PPP, and SLIP
Support for portable computers (e.g. IrDA, DHCP client, or roaming support)

This document was written as part of the Linux Scalability Project. For more information, see our home page.
If you have comments or suggestions, email linux-scalability@citi.umich.edu