Projects: Linux scalability: Enterprise-class Linux

Making Linux a World-Class Enterprise Server OS

Chuck Lever, Netscape Communications Corp.
chuckl@netscape.com

$Id: enterprise.html,v 1.4 1999/11/12 20:12:54 cel Exp $

Abstract

We provide a list of issues that remain to be resolved which will help Linux become a world-class enterprise server OS. These items could help make up Netscape's technical agenda for Linux.

This document is Copyright © 1999 Netscape Communications Corp., all rights reserved. Trademarked material referenced in this document is copyright by its respective owner.

Introduction

Linux is an open-source POSIX-compliant operating system that runs on commodity Intel PC hardware. Linux also happens to be very stable, crashing far less than other operating systems in its class. As such, it is one of the most widely deployed operating systems supporting network services such as mail and web servers.

Despite high acclaim and ultra-stability, Linux remains a work in progress. There are several aspects of Linux that can be improved to better position it for enterprise service. In this document, we provide a list of issues that remain to be resolved which will help Linux become a world-class enterprise server OS. These items could help make up Netscape's technical agenda for Linux.

The Issues

The general areas in which we are interested are:

Specific Technical Suggestions

Issue Explanation
Support for large memory configurations Proper support for 1G to 4G physical RAM (Intel CPU limitations aside, the kernel can automatically do the *correct* limited thing); in the long term, full OS support for 36-bit addresses on hardware that supports them. This isn't an issue for hardware architectures that already support 64-bit addresses.
Improved SMP scalability Scheduler should scale to large number of threads/processes. System should scale well with additional CPUs.
64-bit file lengths Internal and application programming interfaces need to be ready for 64-bit file lengths. File system administration and backup tools need to handle 64-bit file lengths.
SCSI I/O throughput Improved reliability for tapes and esoteric devices; support for wide range of modern SCSI chipsets and devices. SCSI drivers need to support greater level of concurrent I/O.
TCP throughput and standards compliance TCP is a complex and ever-evolving protocol. Bugs and performance problems are easily introduced during the development process.
high performance asynchronous I/O APIs Current RT signal API used to support async I/O does not integrate well with threads. A queued I/O API or kernel-mediated event dispatching system that supports UI events as well as file descriptor wakeup would be very cool.
Functional and performance validation testing Regular functional testing and performance validation would help catch significant bugs early, and would also allow maintenance of a history of system performance improvements for catching performance creep.
High performance/reliability file system RAID-enabled support for high data resiliency with high performance in file systems used by network servers.
System performance monitoring This includes support for tools like iostat, vtop, and sard.
Fault toleration Support for loss of CPU in SMP config; proper reconfiguration during card or memory failures.
Large site administration Support for things such as secure java consoles, remote administration like cfengine, security features like TripWire; creating coherent and integrated system documentation.
Constraint relief Support for 32-bit UIDs, large number of file descriptors and fdsets, large number of threads/processes, size of shared segments and swap areas, kernel I/O concurrency, and so on.
Quality assurance and piloting arenas Informal now, but can we demonstrate that a more formal process can have positive effect? What is, for example, Red Hat doing in this regard?
Internationalization Internationalization of kernel and utilities (I know, it's a difficult and not very sexy job, but somebody's got to do it).
Security, security, security Eliminating buffer overflows, eliminating TCP vulnerabilities; including ssh in popular Linux distributions; including good security documentation.
High-end networking and disk subsystems Support Gigabit ethernet and ATM; explore zero-copy I/O; look at support for RAID and beyond.
Servicability enhancements Support for IPMI and other advanced hardware monitoring; Improving Oops and dump tracing, and configuration snapshotting for "first-time capture" of significant system problems.
Constraint relief for large number of users Support for 32-bit UIDs, high efficiency password lookup; look into quota support, utmp, utmpx, wtmp.
Complete support for serial console Linux has much of this already, but work needs to be done to get server hardware ready to support this.
Support for Name Service caching Domain Name Service performance and reliability is critical to network server performance. Solaris, for example, uses nscd to help ensure that DNS and yp/NIS+ performs well.
Support for "direct I/O" For example, locking down user pages, then do scatter/gather DMA I/O without another copy, or support for PCI->PCI I/O through main memory without CPU intervention; closer to zero-copy behavior across kernel and device drivers. See IO-Lite or McVoy's splice() paper.
Prioritization of standards compliance Standards compliance is important, but not to the exclusion of alternatives that can co-exist in the system API and provide better functionality and performance.

Someone Else's Agenda

For completeness, we list some examples of areas that don't effect the enterprise-worthiness of Linux. These are on someone else's agenda.

This document was written as part of the Linux Scalability Project. For more information, see our home page.
If you have comments or suggestions, email linux-scalability@citi.umich.edu

Copyright 1999