Unix Text Processing

Dale Dougherty and Tim O'Reilly
and the staff of O'Reilly & Associates, Inc.

Consulting Editors:
Stephen G. Kochan and Patrick H. Wood

Hayden Books
A Division of Howard W. Sams & Company
4300 West 62nd Street
Indianapolis, Indiana 46268 USA

FIRST EDITION
SECOND PRINTING — 1988

All rights reserved. No pan of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of the information contained herein. While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions. Neither is any liability assumed for damages resulting from the use of the information contained herein.

International Standard Book Number: 0-672-46291-5
Library of Congress Catalog Card Number: 87-60537

Acquisitions Editor: Therese Zak
Editor: Susan Pink Bussiere
Cover: Visual Graphic Services, Indianapolis - Design by Jerry Bates - Illustration by Patrick Sarles
Typesetting: O'Reilly & Associates, Inc.

Printed in the United States of America

Trademark Acknowledgements

All terms mentioned in this book that are known to be trademarks or service marks are listed below. Howard W. Sams & Co. cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark.

Apple is a registered trademark and Apple LaserWriter is a trademark of Apple Computer, Inc.
devps is a trademark of Pipeline Associates, Inc.
Merge/286 and Merge/386 are trademarks of Locus Computing Corp.
DDL is a trademark of Imagen Corp.
Helvetica and Times Roman are registered trademarks of Allied Corp.
IBM is a registered trademark of International Business Machines Corp.
Interpress is a trademark of Xerox Corp.
LaserJet is a trademark of Hewlett-Packard Corp.
LaserWriter is a trademark of Apple Computer, Inc.
Linotronic is a trademark of Allied Corp.
Macintosh is a trademark licensed to Apple Computer, Inc.
Microsoft is a registered trademark of Microsoft Corp.
MKS Toolkit is a trademark of Mortice Kern Systems, Inc.
Multimate is a trademark of Multimate International Corp.
Nutshell Handbook is a trademark of O'Reilly & Associates, Inc.
PC-Interface is a trademark of Locus Computing Corp.
PostScript is a trademark of Adobe Systems, Incorporated.
PageMaker is a registered trademark of Aldus Corporation.
SoftQuad Publishing Software and SQtroff are trademarks of SoftQuad Inc.
WordStar is a registered trademark of MicroPro International Corp.
UNIX is a registered trademark of AT&T.
VP/ix is a trademark of Interactive Systems Corp. and Phoenix Technologies, Ltd.

Preface

Many people think of computers primarily as "number crunchers" and think of word processors as generating form letters and boilerplate proposals. That computers can be used productively by writers, not just research scientists, accountants, and secretaries, is not so widely recognized. Today, writers not only work with words, they work with computers and the software programs, printers, and terminals that are part of a computer system.

The computer has not simply replaced a typewriter; it has become a system for integrating many other technologies. As these technologies are made available at a reasonable cost, writers may begin to find themselves in new roles as computer programmers, systems integrators, data base managers, graphic designers, typesetters, printers, and archivists.

The writer functioning in these new roles is faced with additional responsibilities. Obviously, it is one thing to have a tool available and another thing to use it skillfully. Like a craftsman, the writer must develop a number of specialized skills, gaining control over the method of production as well as the product. The writer must look for ways to improve the process by integrating new technologies and designing new tools in software.

In this book, we want to show how computers can be used effectively in the preparation of written documents, especially in the process of producing book-length documents. Surely it is important to learn the tools of the trade, and we will demonstrate the tools available in the UNIX environment. However, it is also valuable to examine text processing in terms of problems and solutions: the problems faced by a writer undertaking a large writing project and the solutions offered by using the resources and power of a computer system.

In Chapter 1, we begin by outlining the general capabilities of word-processing systems. We describe in brief the kinds of things that a computer must be able to do for a writer, regardless of whether that writer is working on a UNIX system or on an IBM PC with a word-processing package such as WordStar or MultiMate. Then, having defined basic word-processing capabilities, we look at how a text-processing system includes and extends these capabilities and benefits. Last, we introduce the set of text processing tools in the UNIX environment. These tools, used individually or in combination, provide the basic framework for a text-processing system, one that can be custom-tailored to supply additional capabilities.

Chapter 2 gives a brief review of UNIX fundamentals. We assume you are already somewhat acquainted with UNIX, but we included this information to make sure that you are familiar with basic concepts that we will be relying on later in the book.

Chapter 3 introduces the vi editor, a basic tool for entering and editing text. Although many other editors and word-processing programs are available with UNIX, vi has the advantage that it works, without modification, on almost every UNIX system and with almost every type of terminal. If you learn vi, you can be confident that your text editing skills will be completely transferable when you sit down at someone else's terminal or use someone else's system.

Chapter 4 introduces the nroff and troff formatting programs. Because vi is a text editor, not a word-processing program, it does only rudimentary formatting of the text you enter. You can enter special formatting codes to specify how you want the document to look, then format the text using either nroff or troff. (The nroff formatter is used for formatting documents to the screen or to typewriter-like printers; troff uses much the same formatting language, but has additional constructs that allow it to produce more elaborate effects on typesetters and laser printers.)

In this chapter, we also describe the different types of output devices for printing your finished documents. With the wider availability of laser printers, you need to become familiar with many typesetting terms and concepts to get the most out of troff's capabilities.

The formatting markup language required by nroff and troff is quite complex, because it allows detailed control over the placement of every character on the page, as well as a large number of programming constructs that you can use to define custom formatting requests or macros. A number of macro packages have been developed to make the markup language easier to use. These macro packages define commonly used formatting requests for different types of documents, set up default values for page layout, and so on.

Although someone working with the macro packages does not need to know about the underlying requests in the formatting language used by nroff and troff, we believe that the reader wants to go beyond the basics. As a result. Chapter 4 introduces additional basic requests that the casual user might not need. However, your understanding of what is going on should be considerably enhanced.

There are two principal macro packages in use today, ms and mm (named for the command-line options to nroff and troff used to invoke them). Both macro packages were available with most UNIX systems; now, however, ms is chiefly available on UNIX systems derived from Berkeley 4.x BSD, and mm is chiefly available on UNIX systems derived from AT&T System V. If you are lucky enough to have both macro packages on your system, you can choose which one you want to learn. Otherwise, you should read either Chapter 5, The ms Macros, or Chapter 6, The mm Macros, depending on which version you have available.

Chapter 7 returns to vi to consider its more advanced features. In addition, it takes a look at how some of these features can support easy entry of formatting codes used by nroff and troff.

Tables and mathematical equations provide special formatting problems. The low-level nroff and troff commands for typesetting a complex table or equation are extraordinarily complex. However, no one needs to learn or type these commands, because two preprocessors, tbl and eqn, take a high-level specification of the table or equation and do the dirty work for you. They produce a "script" of nroff or troff commands that can be piped to the formatter to lay out the table or equations. The tbl and eqn preprocessors are described in Chapters 8 and 9, respectively.

More recent versions of UNIX (those that include AT&T's separate Documenter's Workbench software) also support a preprocessor called pic that makes it easier to create simple line drawings with troff and include them in your text. We talk about pic in Chapter 10.

Chapter 11 introduces a range of other UNIX text-processing tools—programs for sorting, comparing, and in various ways examining the contents of text files. This chapter includes a discussion of the standard UNIX spell program and the Writer's Workbench programs style and diction.

This concludes the first part of the book, which covers the tools that the writer finds at hand in the UNIX environment. This material is not elementary. In places, it grows quite complex. However, we believe there is a fundamental difference between learning how to use an existing tool and developing skills that extend a tool's capabilities to achieve your own goals.

That is the real beauty of the UNIX environment. Nearly all the tools it provides are extensible, either because they have built-in constructs for self-extension, like nroff and troff's macro capability, or because of the wonderful programming powers of the UNIX command interpreter, the shell.

The second part of the book begins with Chapter 12, on editing scripts. There are several editors in UNIX that allow you to write and save what essentially amount to programs for manipulating text. The ex editor can be used from within vi to make global changes or complex edits. The next step is to use ex on its own; and after you do that, it is a small step to the even more powerful global editor sed. After you have mastered these tools, you can build a library of special-purpose editing scripts that vastly extend your power over the recalcitrant words you have put down on paper and now wish to change.

Chapter 13 discusses another program awk—that extends the concept of a text editor even further than the programs discussed in Chapter 12. The awk program is really a database programming language that is appropriate for performing certain kinds of text-processing tasks. In particular, we use it in this book to process output from troff for indexing.

The next five chapters turn to the details of writing troff macros, and show how to customize the formatting language to simplify formatting tasks. We start in Chapter 14 by looking at the basic requests used to build macros, then go on in Chapter 15 to the requests for achieving various types of special effects. In Chapters 16 and 17, we'll take a look at the basic structure of a macro package and focus on how to define the appearance of large documents such as manuals. We'll show you how to define different styles of section headings, page headers, footers, and so on. We'll also talk about how to generate an automatic table of contents and index—two tasks that take you beyond troff into the world of shell programming and various UNIX text-processing utilities.

To complete these tasks, we need to return to the UNIX shell in Chapter 18 and examine in more detail the ways that it allows you to incorporate the many tools provided by UNIX into an integrated text-processing environment.

Numerous appendices summarize information that is spread throughout the text, or that couldn't be crammed into it.

***

Before we turn to the subject at hand, a few acknowledgements are in order. Though only two names appear on the cover of this book, it is in fact the work of many hands. In particular, Grace Todino wrote the chapters on tbl and eqn in their entirety, and the chapters on vi and ex are based on the O'Reilly & Associates' Nutshell Handbook, Learning the Vi Editor, written by Linda Lamb. Other members of the O'Reilly & Associates staff—Linda Mui, Valerie Quercia, and Donna Woonteiler—helped tirelessly with copyediting, proofreading, illustrations, typesetting, and indexing.

Donna was new to our staff when she took on responsibility for the job of copyfitting—that final stage in page layout made especially arduous by the many figures and examples in this book. She and Linda especially spent many long hours getting this book ready for the printer. Linda had the special job of doing the final consistency check on examples, making sure that copyediting changes or typesetting errors had not compromized the accuracy of the examples.

Special thanks go to Steve Talbott of Masscomp, who first introduced us to the power of troff and who wrote the first version of the extended ms macros, format shell script, and indexing mechanism described in the second half of this book. Steve's help and patience were invaluable during the long road to mastery of the UNIX text-processing environment.

We'd also like to thank Teri Zak, the acquisitions editor at Hayden Books, for her vision of the Hayden UNIX series, and this book's place in it.

In the course of this book's development, Hayden was acquired by Howard Sams, where Ten's role was taken over by Jim Hill. Thanks also to the excellent production editors at Sams, Wendy Ford, Lou Keglovitz, and especially Susan Pink Bussiere, whose copyediting was outstanding.

Through it all, we have had the help of Steve Kochan and Pat Wood of Pipeline Associates, Inc., consulting editors to the Hayden UNIX Series. We are grateful for their thoughtful and thorough review of this book for technical accuracy. (We must, of course, make the usual disclaimer: any errors that remain are our own.)

Steve and Pat also provided the macros to typeset the book. Our working drafts were printed on an HP LaserJet printer, using ditroff and TextWare International's tplus postprocessor. Final typeset output was prepared with Pipeline Associates' devps, which was used to convert ditroff output to PostScript, which was used in turn to drive a Linotronic L100 typesetter.

Chapter 1

From Typewriters to Word Processors

Before we consider the special tools that the UNIX environment provides for text processing, we need to think about the underlying changes in the process of writing that are inevitable when you begin to use a computer.

The most important features of a computer program for writers are the ability to remember what is typed and the ability to allow incremental changes—no more retyping from scratch each time a draft is revised. For a writer first encountering word-processing software, no other features even begin to compare. The crudest command structure, the most elementary formatting capabilities, will be forgiven because of the immense labor savings that take place.

Writing is basically an iterative process. It is a rare writer who dashes out a finished piece; most of us work in circles, returning again and again to the same piece of prose, adding or deleting words, phrases, and sentences, changing the order of thoughts, and elaborating a single sentence into pages of text.

A writer working on paper periodically needs to clear the deck—to type a clean copy, free of elaboration. As the writer reads the new copy, the process of revision continues, a word here, a sentence there, until the new draft is as obscured by changes as the first. As Joyce Carol Gates is said to have remarked: "No book is ever finished. It is abandoned."

Word processing first took hold in the office as a tool to help secretaries prepare perfect letters, memos, and reports. As dedicated word processors were replaced with low-cost personal computers, writers were quick to see the value of this new tool. In a civilization obsessed with the written word, it is no accident that WordStar, a word-processing program, was one of the first best sellers of the personal computer revolution.

As you learn to write with a word processor, your working style changes. Because it is so easy to make revisions, it is much more forgivable to think with your fingers when you write, rather than to carefully outline your thoughts beforehand and polish each sentence as you create it.

If you do work from an outline, you can enter it first, then write your first draft by filling in the outline, section by section. If you are writing a structured document such as a technical manual, your outline points become the headings in your document; if you are writing a free-flowing work, they can be subsumed gradually in the text as you flesh them out. In either case, it is easy to write in small segments that can be moved as you reorganize your ideas.

Watching a writer at work on a word processor is very different from watching a writer at work on a typewriter. A typewriter tends to enforce a linear flow—you must write a passage and then go back later to revise it. On a word processor, revisions are constant—you type a sentence, then go back to change the sentence above. Perhaps you write a few words, change your mind, and back up to take a different tack; or you decide the paragraph you just wrote would make more sense if you put it ahead of the one you wrote before, and move it on the spot.

This is not to say that a written work is created on a word processor in a single smooth flow; in fact, the writer using a word processor tends to create many more drafts than a compatriot who still uses a pen or typewriter. Instead of three or four drafts, the writer may produce ten or twenty. There is still a certain editorial distance that comes only when you read a printed copy. This is especially true when that printed copy is nicely formatted and letter perfect.

This brings us to the second major benefit of word-processing programs: they help the writer with simple formatting of a document. For example, a word processor may automatically insert carriage returns at the end of each line and adjust the space between words so that all the lines are the same length. Even more importantly, the text is automatically readjusted when you make changes. There are probably commands for centering, underlining, and boldfacing text.

The rough formatting of a document can cover a multitude of sins. As you read through your scrawled markup of a preliminary typewritten draft, it is easy to lose track of the overall flow of the document. Not so when you have a clean copy—the flaws of organization and content stand out vividly against the crisp new sheets of paper.

However, the added capability to print a clean draft after each revision also puts an added burden on the writer. Where once you had only to worry about content, you may now find yourself fussing with consistency of margins, headings, boldface, italics, and all the other formerly superfluous impedimenta that have now become integral to your task.

As the writer gets increasingly involved in the formatting of a document, it becomes essential that the tools help revise the document's appearance as easily as its content. Given these changes imposed by the evolution from typewriters to word processors, let's take a look at what a word-processing system needs to offer to the writer.

A Workspace

One of the most important capabilities of a word processor is that it provides a space in which you can create documents. In one sense, the video display screen on your terminal, which echoes the characters you type, is analogous to a sheet of paper. But the workspace of a word processor is not so unambiguous as a sheet of paper wound into a typewriter, that may be added neatly to the stack of completed work when finished, or torn out and crumpled as a false start. From the computer's point of view, your workspace is a block of memory, called a buffer, that is allocated when you begin a word-processing session. This buffer is a temporary holding area for storing your work and is emptied at the end of each session.

To save your work, you have to write the contents of the buffer to a file. A file is a permanent storage area on a disk (a hard disk or a floppy disk). After you have saved your work in a file, you can retrieve it for use in another session.

When you begin a session editing a document that exists on file, a copy of the file is made and its contents are read into the buffer. You actually work on the copy, making changes to it, not the original. The file is not changed until you save your changes during or at the end of your work session. You can also discard changes made to the buffered copy, keeping the original file intact, or save multiple versions of a document in separate files.

Particularly when working with larger documents, the management of disk files can become a major effort. If, like most writers, you save multiple drafts, it is easy to lose track of which version of a file is the latest.

An ideal text-processing environment for serious writers should provide tools for saving and managing multiple drafts on disk, not just on paper. It should allow the writer to

work on documents of any length;
save multiple versions of a file;
save part of the buffer into a file for later use;
switch easily between multiple files;
insert the contents of an existing file into the buffer;
summarize the differences between two versions of a document.

Most word-processing programs for personal computers seem to work best for short documents such as the letters and memos that offices chum out by the millions each day. Although it is possible to create longer documents, many features that would help organize a large document such as a book or manual are missing from these programs.

However, long before word processors became popular, programmers were using another class of programs called text editors. Text editors were designed chiefly for entering computer programs, not text. Furthermore, they were designed for use by computer professionals, not computer novices. As a result, a text editor can be more difficult to learn, lacking many on-screen formatting features available with most word processors.

Nonetheless, the text editors used in program development environments can provide much better facilities for managing large writing projects than their office word-processing counterparts. Large programs, like large documents, are often contained in many separate files; furthermore, it is essential to track the differences between versions of a program.

UNIX is a pre-eminent program development environment and, as such, it is also a superb document development environment. Although its text editing tools at first may appear limited in contrast to sophisticated office word processors, they are in fact considerably more powerful.

Tools for Editing

For many, the ability to retrieve a document from a file and make multiple revisions painlessly makes it impossible to write at a typewriter again. However, before you can get the benefits of word processing, there is a lot to learn.

Editing operations are performed by issuing commands. Each word-processing system has its own unique set of commands. At a minimum, there are commands to

move to a particular position in the document;
insert new text;
change or replace text;
delete text;
copy or move text.

To make changes to a document, you must be able to move to that place in the text where you want to make your edits. Most documents are too large to be displayed in their entirety on a single terminal screen, which generally displays 24 lines of text. Usually only a portion of a document is displayed. This partial view of your document is sometimes referred to as a window.* If you are entering new text and reach the bottom line in the window, the text on the screen automatically scrolls (rolls up) to reveal an additional line at the bottom. A cursor (an underline or block) marks your current position in the window.

There are basically two kinds of movement:

scrolling new text into the window
positioning the cursor within the window

When you begin a session, the first line of text is the first line in the window, and the cursor is positioned on the first character. Scrolling commands change which lines are displayed in the window by moving forward or backward through the document. Cursor-positioning commands allow you to move up and down to individual lines, and along lines to particular characters.

After you position the cursor, you must issue a command to make the desired edit. The command you choose indicates how much text will be affected: a character, a word, a line, or a sentence.

Because the same keyboard is used to enter both text and commands, there must be some way to distinguish between the two. Some word-processing programs assume that you are entering text unless you specify otherwise; newly entered text either replaces existing text or pushes it over to make room for the new text. Commands are entered by pressing special keys on the keyboard, or by combining a standard key with a special key, such as the control key (CTRL).

Other programs assume that you are issuing commands; you must enter a command before you can type any text at all. There are advantages and disadvantages to each approach. Starting out in text mode is more intuitive to those coming from a typewriter, but may be slower for experienced writers, because all commands must be entered by special key combinations that are often hard to reach and slow down typing. (We'll return to this topic when we discuss vi, a UNIX text editor.)

Far more significant than the style of command entry is the range and speed of commands. For example, though it is heaven for someone used to a typewriter to be able to delete a word and type in a replacement, it is even better to be able to issue a command that will replace every occurrence of that word in an entire document. And, after you start making such global changes, it is essential to have some way to undo them if you make a mistake.

A word processor that substitutes ease of learning for ease of use by having fewer commands will ultimately fail the serious writer, because the investment of time spent learning complex commands can easily be repaid when they simplify complex tasks.

And when you do issue a complex command, it is important that it works as quickly as possible, so that you aren't left waiting while the computer grinds away. The extra seconds add up when you spend hours or days at the keyboard, and, once having been given a taste of freedom from drudgery, writers want as much freedom as they can get.

Text editors were developed before word processors (in the rapid evolution of computers). Many of them were originally designed for printing terminals, rather than for the CRT-based terminals used by word processors. These programs tend to have commands that work with text on a line by-line basis. These commands are often more obscure than the equivalent office word-processing commands.

However, though the commands used by text editors are sometimes more difficult to learn, they are usually very effective. (The commands designed for use with slow paper terminals were often extraordinarily powerful, to make up for the limited capabilities of the input and output device.)

There are two basic kinds of text editors, line editors and screen editors, and both are available in UNIX. The difference is simple: line editors display one line at a time, and screen editors can display approximately 24 lines or a full screen.

The line editors in UNIX include ed, sed, and ex. Although these line editors are obsolete for general-purpose use by writers, there are applications at which they excel, as we will see in Chapters 7 and 12.

The most common screen editor in UNIX is vi. Learning vi or some other suitable editor is the first step in mastering the UNIX text-processing environment. Most of your time will be spent using the editor.

UNIX screen editors such as vi and emacs (another editor available on many UNIX systems) lack ease-of-learning features common in many word processors—there are no menus and only primitive on-line help screens, and the commands are often complex and nonintuitive—but they are powerful and fast. What's more, UNIX line editors such as ex and sed give additional capabilities not found in word processors—the ability to write a script of editing commands that can be applied to multiple files. Such editing scripts open new ranges of capability to the writer.

*Some editors, such as emacs, can split the terminal screen into multiple windows. In addition, many high-powered UNIX workstations with large bit mapped screens have their own windowing software that allows multiple programs to be run simultaneously in separate windows. For purposes of this book, we assume you are using the vi editor and an alphanumeric terminal with only a single window.

Document Formatting

Text editing is wonderful, but the object of the writing process is to produce a printed document for others to read. And a printed document is more than words on paper; it is an arrangement of text on a page. For instance, the elements of a business letter are arranged in a consistent format, which helps the person reading the letter identify those elements. Reports and more complex documents, such as technical manuals or books, require even greater attention to formatting. The format of a document conveys how information is organized, assisting in the presentation of ideas to a reader.

Most word-processing programs have built-in formatting capabilities. Formatting commands are intermixed with editing commands, so that you can shape your document on the screen. Such formatting commands are simple extensions of those available to someone working with a typewriter. For example, an automatic centering command saves the trouble of manually counting characters to center a title or other text. There may also be such features as automatic pagination and printing of headers or footers.

Text editors, by contrast, usually have few formatting capabilities. Because they were designed for entering programs, their formatting capabilities tend to be oriented toward the formats required by one or more programming languages.

Even programmers write reports, however. Especially at AT&T (where UNIX was developed), there was a great emphasis on document preparation tools to help the programmers and scientists of Bell Labs produce research reports, manuals, and other documents associated with their development work.

Word processing, with its emphasis on easy-to-use programs with simple on-screen formatting, was in its infancy. Computerized phototype setting, on the other hand, was already a developed art. Until quite recently, it was not possible to represent on a video screen the variable type styles and sizes used in typeset documents. As a result, phototypesetting has long used a markup system that indicates formatting instructions with special codes. These formatting instructions to the computerized typesetter are often direct descendants of the instructions that were formerly given to a human typesetter—center the next line, indent five spaces, boldface this heading.

The text formatter most commonly used with the UNIX system is called nroff. To use it, you must intersperse formatting instructions (usually one- or two-letter codes preceded by a period) within your text, then pass the file through the formatter. The nroff program interprets the formatting codes and reformats the document "on the fly" while passing it on to the printer. The nroff formatter prepares documents for printing on line printers, dot-matrix printers, and letter-quality printers. Another program called troff uses an extended version of the same markup language used by nroff, but prepares documents for printing on laser printers and typesetters. We'll talk more about printing in a moment.

Although formatting with a markup language may seem to be a far inferior system to the "what you see is what you get" (wysiwyg) approach of most office word-processing programs, it actually has many advantages.

First, unless you are using a very sophisticated computer, with very sophisticated software (what has come to be called an electronic publishing system, rather than a mere word processor), it is not possible to display everything on the screen just as it will appear on the printed page. For example, the screen may not be able to represent boldfacing or underlining except with special formatting codes. WordStar, one of the grandfathers of word-processing programs for personal computers, represents underlining by surrounding the word or words to be underlined with the special control character ^S (the character generated by holding down the control key while typing the letter S). For example, the following title line would be underlined when the document is printed:

^SWord Processing with WordStar^S

Is this really superior to the following nroff construct?

.ul
Text Processing with vi and nroff

It is perhaps unfair to pick on WordStar, an older word-processing program, but very few word-processing programs can complete the illusion that what you see on the screen is what you will get on paper. There is usually some mix of control codes with on-screen formatting. More to the point, though, is the fact that most word processors are oriented toward the production of short documents. When you get beyond a letter, memo, or report, you start to understand that there is more to formatting than meets the eye.

Although "what you see is what you get" is fine for laying out a single page, it is much harder to enforce consistency across a large document. The design of a large document is often determined before writing is begun, just as a set of plans for a house are drawn up before anyone starts construction. The design is a plan for organizing a document, arranging various parts so that the same types of material are handled in the same way.

The parts of a document might be chapters, sections, or subsections. For instance, a technical manual is often organized into chapters and appendices. Within each chapter, there might be numbered sections that are further divided into three or four levels of subsections.

Document design seeks to accomplish across the entire document what is accomplished by the table of contents of a book. It presents the structure of a document and helps the reader locate information.

Each of the parts must be clearly identified. The design specifies how they will look, trying to achieve consistency throughout the document. The strategy might specify that major section headings will be all uppercase, underlined, with three blank lines above and two below, and secondary headings will be in uppercase and lowercase, underlined, with two blank lines above and one below.

If you have ever tried to format a large document using a word processor, you have probably found it difficult to enforce consistency in such formatting details as these. By contrast, a markup language—especially one like nroff that allows you to define repeated command sequences, or macros—makes it easy: the style of a heading is defined once, and a code used to reference it. For example, a top-level heading might be specified by the code .H1, and a secondary heading by .H2.

Even more significantly, if you later decide to change the design, you simply change the definition of the relevant design elements. If you have used a word processor to format the document as it was written, it is usually a painful task to go back and change the format.

Some word-processing programs, such as Microsoft WORD, include features for defining global document formats, but these features are not as widespread as they are in markup systems.

Printing

The formatting capabilities of a word-processing system are limited by what can be output on a printer. For example, some printers cannot backspace and therefore cannot underline. For this discussion, we are considering four different classes of printers: dot matrix, letter quality, phototypesetter, and laser.

A dot-matrix printer composes characters as a series of dots. It is usually suitable for preparing interoffice memos and obtaining fast printouts of large files.

A letter-quality printer is more expensive and slower. Its printing mechanism operates like a typewriter and achieves a similar result.

A letter-quality printer produces clearer, easier-to-read copy than a dot-matrix printer. Letter-quality printers are generally used in offices for formal correspondence as well as for the final drafts of proposals and reports.

Until very recently, documents that needed a higher quality of printing than that available with letter-quality printers were sent out for typesetting. Even if draft copy was word-processed, the material was often re-entered by the typesetter, although many typesetting companies can read the files created by popular word-processing programs and use them as a starting point for typesetting.

There are several major advantages to typesetting. The high resolution allows for the design of aesthetically pleasing type. The shape of the characters is much finer. In addition, where dot-matrix and letter-quality type is usually constant width (narrow letters like i take up the same amount of space as wide ones like m), typesetters use variable-width type, in which narrow letters take up less space than wide ones. In addition, it's possible to mix styles (for example, bold and italic) and sizes of type on the same page.

Most typesetting equipment uses a markup language rather than a wysiwyg approach to specify point sizes, type styles, leading, and so on. Until recently, the technology didn't even exist to represent on a screen the variable-width typefaces that appear in published books and magazines.

AT&T, a company with its own extensive internal publishing operation, developed its own typesetting markup language and typesetting program—a sister to nroff called troff (typesetter-roff). Although troff extends the capabilities of nroff in significant ways, it is almost totally compatible with it.

Until recently, unless you had access to a typesetter, you didn't have much use for troff. The development of low-cost laser printers that can produce near typeset-quality output at a fraction of the cost has changed all that.

Word-processing software (particularly that developed for the Apple Macintosh, which has a high-resolution graphics screen capable of representing variable type fonts) is beginning to tap the capabilities of laser printers. However, most of the microcomputer-based packages still have many limitations. Nonetheless, a markup language such as that provided by troff still provides the easiest and lowest-cost access to the world of electronic publishing for many types of documents.

The point made previously, that markup languages are preferable to wysiwyg systems for large documents, is especially true when you begin to use variable size fonts, leading, and other advanced formatting features. It is easy to lose track of the overall format of your document and difficult to make overall changes after your formatted text is in place. Only the most expensive electronic publishing systems (most of them based on advanced UNIX workstations) give you both the capability to see what you will get on the screen and the ability to define and easily change overall document formats.

Other UNIX Text-Processing Tools

Document editing and formatting are the most important parts of text processing, but they are not the whole story. For instance, in writing many types of documents, such as technical manuals, the writer rarely starts from scratch. Something is already written, whether it be a first draft written by someone else, a product specification, or an out-dated version of a manual. It would be useful to get a copy of that material to work with. If that material was produced with a word processor or has been entered on another system, UNIX's communications facilities can transfer the file from the remote system to your own.

Then you can use a number of custom-made programs to search through and extract useful information. Word-processing programs often store text in files with different internal formats. UNIX provides a number of useful analysis and translation tools that can help decipher files with nonstandard formats. Other tools allow you to "cut and paste" portions of a document into the one you are writing.

As the document is being written, there are programs to check spelling, style, and diction. The reports produced by those programs can help you see if there is any detectable pattern in syntax or structure that might make a document more difficult for the user than it needs to be.

Although many documents are written once and published or filed, there is also a large class of documents (manuals in particular) that are revised again and again. Documents such as these require special tools for managing revisions. UNIX program development tools such as SCCS (Source Code Control System) and diff can be used by writers to compare past versions with the current draft and print out reports of the differences, or generate printed copies with change bars in the margin marking the differences.

In addition to all of the individual tools it provides, UNIX is a particularly fertile environment for writers who aren't afraid of computers, because it is easy to write command files, or shell scripts, that combine individual programs into more complex tools to meet your specific needs. For example, automatic index generation is a complex task that is not handled by any of the standard UNIX text-processing tools. We will show you ways to perform this and other tasks by applying the tools available in the UNIX environment and a little ingenuity.

We have two different objectives in this book. The first objective is that you learn to use many of the tools available on most UNIX systems. The second objective is that you develop an understanding of how these different tools can work together in a document preparation system. We're not Just presenting a UNIX user's manual, but suggesting applications for which the various programs can be used.

To take full advantage of the UNIX text-processing environment, you must do more than just learn a few programs. For the writer, the job includes establishing standards and conventions about how documents will be stored, in what format they should appear in print, and what kinds of programs are needed to help this process take place efficiently with the use of a computer. Another way of looking at it is that you have to make certain choices prior to beginning a project. We want to encourage you to make your own choices, set your own standards, and realize the many possibilities that are open to a diligent and creative person.

In the past, many of the steps in creating a finished book were out of the hands of the writer. Proofreaders and copyeditors went over the text for spelling and grammatical errors. It was generally the printer who did the typesetting (a service usually paid by the publisher). At the print shop, a typesetter (a person) retyped the text and specified the font sizes and styles. A graphic artist, performing layout and pasteup, made many of the decisions about the appearance of the printed page.

Although producing a high-quality book can still involve many people, UNIX provides the tools that allow a writer to control the process from start to finish. An analogy is the difference between an assembly worker on a production line who views only one step in the process and a craftsman who guides the product from beginning to end. The craftsman has his own system of putting together a product, whereas the assembly worker has the system imposed upon him.

After you are acquainted with the basic tools available in UNIX and have spent some time using them, you can design additional tools to perform work that you think is necessary and helpful. To create these tools, you will write shell scripts that use the resources of UNIX in special ways. We think there is a certain satisfaction that comes with accomplishing such tasks by computer. It seems to us to reward careful thought.

What programming means to us is that when we confront a problem that normally submits only to tedium or brute force, we think of a way to get the computer to solve the problem. Doing this often means looking at the problem in a more general way and solving it in a way that can be applied again and again.

One of the most important books on UNIX is The UNIX Programming Environment by Brian W. Kernighan and Rob Pike. They write that what makes UNIX effective "is an approach to programming, a philosophy of using the computer." At the heart of this philosophy "is the idea that the power of a system comes more from the relationships among programs than from the programs themselves."

When we talk about building a document preparation system, it is this philosophy that we are trying to apply. As a consequence, this is a system that has great flexibility and gives the builders a feeling of breaking new ground. The UNIX text-processing environment is a system that can be tailored to the specific tasks you want to accomplish. In many instances, it can let you do just what a word processor does. In many more instances, it lets you use more of the computer to do things that a word processor either can't do or can't do very well.

Chapter 2

UNIX Fundamentals

The UNIX operating system is a collection of programs that controls and organizes the resources and activities of a computer system. These resources consist of hardware such as the computer's memory, various peripherals such as terminals, printers, and disk drives, and software utilities that perform specific tasks on the computer system. UNIX is a multiuser, multitasking operating system that allows the computer to perform a variety of functions for many users. It also provides users with an environment in which they can access the computer's resources and utilities. This environment is characterized by its command interpreter, the shell.

In this chapter, we review a set of basic concepts for users working in the UNIX environment. As we mentioned in the preface, this book does not replace a general introduction to UNIX. A complete overview is essential to anyone not familiar with the file system, input and output redirection, pipes and filters, and many basic utilities. In addition, there are different versions of UNIX, and not all commands are identical in each version. In writing this book, we've used System V Release 2 on a Convergent Technologies' Miniframe.

These disclaimers aside, if it has been a while since you tackled a general introduction, this chapter should help refresh your memory. If you are already familiar with UNIX, you can skip or skim this chapter.

As we explain these basic concepts, using a tutorial approach, we demonstrate the broad capabilities of UNIX as an applications environment for text-processing. What you learn about UNIX in general can be applied to performing specific tasks related to text-processing.

The UNIX Shell

As an interactive computer system, UNIX provides a command interpreter called a shell. The shell accepts commands typed at your terminal, invokes a program to perform specific tasks on the computer, and handles the output or result of this program, normally directing it to the terminal's video display screen.

UNIX commands can be simple one-word entries like the date command:

$ date
Tue Apr 8 13:23:41 EST 1987

Or their usage can be more complex, requiring that you specify options and arguments, such as filenames. Although some commands have a peculiar syntax, many UNIX commands follow this general form:

command option(s) argument(s)

A command identifies a software program or utility. Commands are entered in lowercase letters. One typical command. 1s, lists the files that are available in your immediate storage area, or directory.

An option modifies the way in which a command works. Usually options are indicated by a minus sign followed by a single letter. For example. 1s -1 modifies what information is displayed about a file. The set of possible options is particular to the command and generally only a few of them are regularly used. However, if you want to modify a command to perform in a special manner, be sure to consult a UNIX reference guide and examine the available options.

An argument can specify an expression or the name of a file on which the command is to act. Arguments may also be required when you specify certain options. In addition, if more than one filename is being specified, special metacharacters (such as * and ?) can be used to represent the filenames. For instance. 1s -1 ch* will display information about all files that have names beginning with ch.

The UNIX shell is itself a program that is invoked as pan of the login process. When you have properly identified yourself by logging in, the UNIX system prompt appears on your terminal screen.

The prompt that appears on your screen may be different from the one shown in the examples in this book. There are two widely used shells: the Bourne shell and the C shell. Traditionally, the Bourne shell uses a dollar sign ($) as a system prompt, and the C shell uses a percent sign (%). The two shells differ in the features they provide and in the syntax of their programming constructs. However, they are fundamentally very similar. In this book, we use the Bourne shell.

Your prompt may be different from either of these traditional prompts. This is because the UNIX environment can be customized and the prompt may have been changed by your system administrator. Whatever the prompt looks like, when it appears, the system is ready for you to enter a command.

When you type a command from the keyboard, the characters are echoed on the screen. The shell does not interpret the command until you press the RETURN key. This means that you can use the erase character (usually the DEL or BACKSPACE key) to correct typing mistakes. After you have entered a command line, the shell tries to identify and locate the program specified on the command line. If the command line that you entered is not valid, then an error message is returned.

When a program is invoked and processing begun, the output it produces is sent to your screen, unless otherwise directed. To interrupt and cancel a program before it has completed, you can press the interrupt character (usually CTRL-C or the DEL key). If the output of a command scrolls by the screen too fast, you can suspend the output by pressing the suspend character (usually CTRL-S) and resume it by pressing the resume character (usually CTRL-Q).

Some commands invoke utilities that offer their own environment—with a command interpreter and a set of special "internal" commands. A text editor is one such utility, the mail facility another. In both instances, you enter commands while you are "inside" the program. In these kinds of programs, you must use a command to exit and return to the system prompt.

The return of the system prompt signals that a command is finished and that you can enter another command. Familiarity with the power and flexibility of the UNIX shell is essential to working productively in the UNIX environment.

Output Redirection

Some programs do their work in silence, but most produce some kind of result, or output. There are generally two types of output: the expected result—referred to as standard output—and error messages—referred to as standard error. Both types of output are normally sent to the screen and appear to be indistinguishable. However, they can be manipulated separately—a feature we will later put to good use.

Let's look at some examples. The echo command is a simple command that displays a string of text on the screen.

$ echo my name
my name

In this case, the input echo my name is processed and its output is my name. The name of the command—echo—refers to a program that interprets the command-line arguments as a literal expression that is sent to standard output. Let's replace echo with a different command called cat:

$ cat my name
cat: Cannot open my
cat: Cannot open name

The cat program takes its arguments to be the names of files. If these files existed, their contents would be displayed on the screen. Because the arguments were not filenames in this example, an error message was printed instead.

The output from a command can be sent to a file instead of the screen by using the output redirection operator (>). In the next example, we redirect the output of the echo command to a file named reminders.

$ echo Call home at 3:00 > reminders
$

No output is sent to the screen, and the UNIX prompt returns when the program is finished. Now the cat command should work because we have created a file.

$ cat reminders
Call home at 3:00

The cat command displays the contents of the file named reminders on the screen. If we redirect again to the same filename, we overwrite its previous contents:

$ echo Pick up expense voucher > reminders
$ cat reminders
Pick up expense voucher

We can send another line to the file, but we have to use a different redirect operator to append (>>) the new line at the end of the file:

$ echo Call home at 3:00 > reminders
$ echo Pick up expense voucher >> reminders
$ cat reminders
Call home at 3:00
Pick up expense voucher

The cat command is useful not only for printing a file on the screen, but for concatenating existing files (printing them one after the other). For example:

$ cat reminders todolist
Call home at 3:00
Pick up expense voucher
Proofread Chapter 2
Discuss output redirection

The combined output can also be redirected:

$ cat reminders todolist > do_now

The contents of both reminders and todolist are combined into do now. The original files remain intact.

If one of the files does not exist, an error message is printed, even though standard output is redirected:

$ rm todolist
$ cat reminders todolist > do_now
cat: todolist: not found

The files we've created are stored in our current working directory.

Files and Directories

The UNIX file system consists of files and directories. Because the file system can contain thousands of files, directories perform the same function as file drawers in a paper file system. They organize files into more manageable groupings. The file system is hierarchical. It can be represented as an inverted tree structure with the root directory at the top. The root directory contains other directories that in turn contain other directories.*

On many UNIX systems, users store their files in the /usr file system. (As disk storage has become cheaper and larger, the placement of user directories is no longer standard. For example, on our system, /usr contains only UNIX software; user accounts are in a separate file system called /work.)

Fred's home directory is /usr/fred. It is the location of Fred's account on the system. When he logs in, his home directory is his current working directory. Your working directory is where you are currently located and changes as you move up and down the file system.

A pathname specifies the location of a directory or file on the UNIX file system. An absolute pathname specifies where a file or directory is located off the root file system. A relative pathname specifies the location of a file or directory in relation to the current working directory.

To find out the pathname of our current directory, enter pwd.

$ pwd
/usr/fred

The absolute pathname of the current working directory is /usr/fred. The 1s command lists the contents of the current directory. Let's list the files and subdirectories in /usr/fred by entering the 1s command with the -F option. This option prints a slash (/) following the names of subdirectories. In the following example, oldstuff is a directory, and notes and reminders are files.

$ 1s -F
reminders
notes
oldstuff/

When you specify a filename with the is command, it simply prints the name of the file, if the file exists. When you specify the name of directory, it prints the names of the files and subdirectories in that directory.

$ 1s reminders
reminders
$ 1s oldstuff
ch01_draft
letter.212
memo

In this example, a relative pathname is used to specify oldstuff. That is, its location is specified in relation to the current directory, /usr/fred. You could also enter an absolute pathname, as in the following example:

$ 1s /usr/fred/oldstuff
ch01_draft
letter.212
memo

Similarly, you can use an absolute or relative pathname to change directories using the cd command. To move from /usr/fred to /usr/fred/oldstuff, you can enter a relative pathname:

$ cd oldstuff
$ pwd
/usr/fred/oldstuff

The directory /usr/fred/oldstuff becomes the current working directory.

The cd command without an argument returns you to your home directory.

$ cd

When you log in, you are positioned in your home directory, which is thus your current working directory. The name of your home directory is stored in a shell variable that is accessible by prefacing the name of the variable (HOME) with a dollar sign ($). Thus:

$ echo $HOME
/usr/fred

You could also use this variable in pathnames to specify a file or directory in your home directory.

$ 1s $HOME/oldstuff/memo
/usr/fred/oldstuff/memo

In this tutorial, /usr/fred is our home directory.

The command to create a directory is mkdir. An absolute or relative pathname can be specified.

$ mkdir /usr/fred/reports
$ mkdir reports/monthly

Setting up directories is a convenient method of organizing your work on the system. For instance, in writing this book, we set up a directory /work/textp and, under that, subdirectories for each chapter in the book (/work/textp/ch0l,/work/textp/ch02, etc.). In each of those subdirectories, there are files that divide the chapter into sections (sect1, sect2, etc.). There is also a subdirectory set up to hold old versions or drafts of these sections.

*In addition to subdirectories, the root directory can contain other/lie systems. A file system is the skeletal structure of a directory tree, which is built on a magnetic disk before any files or directories are stored on it. On a system containing more than one disk, or on a disk divided into several partitions, there are multiple file systems. However, this is generally invisible to the user, because the secondary file systems are mounted on the root directory, creating the illusion of a single file system.

Copying and Moving Files

You can copy, move, and rename files within your current working directory or (by specifying the full pathname) within other directories on the file system. The cp command makes a copy of a file and the mv command can be used to move a file to a new directory or simply rename it. If you give the name of a new or existing file as the last argument to cp or mv, the file named in the first argument is copied, and the copy given the new name. (If the target file already exists, it will be overwritten by the copy. If you give the name of a directory as the last argument to cp or mv, the file or files named first will be copied to that directory, and will keep their original names.)

Look at the following sequence of commands:

$ pwd
/usr/fred
$ 1s -F
meeting
oldstuff/
notes
reports/
$ mv notes oldstuff
$ 1s
meeting
oldstuff
reports/
$ mv meeting meet.306
$ 1s oldstuff
ch01_draft
letter.212
memo
notes

Print working directory

List contents of current directory

Move notes to oldstuff directory
List contents of current directory

Rename meeting
List contents of oldstuff subdirectory

In this example, the mv command was used to rename the file meeting and to move the file notes from /usr/fred to /usr/fred/oldstuff. You can also use the mv command to rename a directory itself.

Permissions

Access to UNIX files is governed by ownership and permissions. If you create a file, you are the owner of the file and can set the permissions for that file to give or deny access to other users of the system. There are three different levels of permission:

r    Read permission allows users to read a file or make a copy of it.
w    Write permission allows users to make changes to that file.
x    Execute permission signifies a program file and allows other users to execute this program.

File permissions can be set for three different levels of ownership:

owner    The user who created the file is its owner.
group    A group to which you are assigned, usually made up of those users engaged in similar activities and who need to share files among themselves.
other    All other users on the system, the public.

Thus, you can set read, write, and execute permissions for the three levels of ownership. This can be represented as:

rwxrwxrwx
/   |  \
owner group other

When you enter the command 1s -1, information about the status of the file is displayed on the screen. You can determine what the file permissions are, who the owner of the file is, and with what group the file is associated.

$ 1s -1 meet.306
-rw-rw-r-- 1 fred techpubs 126 March 6 10:32 meet.306

This file has read and write permissions set for the user fred and the group techpubs. All others can read the file, but they cannot modify it. Because fred is the owner of the file, he can change the permissions, making it available to others or denying them access to it. The chmod command is used to set permissions. For instance, if he wanted to make the file writeable by everyone, he would enter:

$ chmod o+w meet.306
$ 1s -1 meet.306
-rw-rw-rw- 1 fred techpubs 126 March 6 10:32 meet.306

This translates to "add write permission (+w) to others (o)." If he wanted to remove write permission from a file, keeping anyone but himself from accidentally modifying a finished document, he might enter:

$ chmod go-w meet.306
$ 1s -1 meet.306
-rw-r-r-- 1 fred techpubs 126 March 6 10:32 meet.306

This command removes write permission (-w) from group (g) and other (o).

File permissions are important in UNIX, especially when you start using a text editor to create and modify files. They can be used to protect information you have on the system.

Special Characters

As part of the shell environment, there are a few special characters (metacharacters) that make working in UNIX much easier. We won't review all the special characters, but enough of them to make sure you see how useful they are.

The asterisk (*) and the question mark (?) are filename generation metacharacters. The asterisk matches any or all characters in a string. By itself, the asterisk expands to all the names in the specified directory.

$ echo *
meet.306 oldstuff reports

In this example, the echo command displays in a row the names of all the files and directories in the current directory. The asterisk can also be used as a shorthand notation for specifying one or more files.

$ 1s meet*
meet.306
$ 1s /work/textp/ch*
/work/textp/chOl
/work/textp/ch02
/work/textp/ch03
/work/textp/chapter_make

The question mark matches any single character.

$ 1s /work/textp/chOl/sect?
/work/textp/chOl/sect1
/work/textp/ch01/sect2
/work/textp/ch01/sect3

Besides filename metacharacters, there are other characters that have special meaning when placed in a command line. The semicolon (;) separates multiple commands on the same command line. Each command is executed in sequence from left to right, one before the other.

$ cd oldstuff;pwd;1s
/usr/fred/oldstuff
ch01_draft
letter.212
memo
notes

Another special character is the ampersand (&). The ampersand signifies that a command should be processed in the background, meaning that the shell does not wait for the program to finish before returning a system prompt. When a program takes a significant amount of processing time, it is best to have it run in the background so that you can do other work at your terminal in the meantime. We will demonstrate background processing in Chapter 4 when we look at the nroff/troff text formatter.

Environment Variables

The shell stores useful information about who you are and what you are doing in environment variables. Entering the set command will display a list of the environment variables that are currently defined in your account.

$ set
PATH .:bin:/usr/bin:/usr/local/bin:/etc
argv ()
cwd /work/textp/ch03
home /usr/fred
shell /bin/sh
status 0
TERM wy50

These variables can be accessed from the command line by prefacing their name with a dollar sign:

$ echo $TERM
wy50

The TERM variable identifies what type of terminal you are using. It is important that you correctly define the TERM environment variable, especially because the vi text editor relies upon it. Shell variables can be reassigned from the command line. Some variables, such as TERM, need to be exported if they are reassigned, so that they are available to all shell processes.

$ TERM=tvi925; export TERM

Tell UNIX I'm using a Televideo 925

You can also define your own environment variables for use in commands.

$ friends="alice ed ralph"
$ echo $ friends
alice ed ralph

You could use this variable when sending mail.

$ mail $friends
A message to friends
<CTRL-D>

This command sends the mail message to three people whose names are defined in the friends environment variable. Pathnames can also be assigned to environment variables, shortening the amount of typing:

$ pwd
/usr/fred
$ book="/work/textp"
$ cd $book
$ pwd
/work/textp

Pipes and Filters

Earlier we demonstrated how you can redirect the output of a command to a file. Normally, command input is taken from the keyboard and command output is displayed on the terminal screen. A program can be thought of as processing a stream of input and producing a stream of output. As we have seen, this stream can be redirected to a file. In addition, it can originate from or be passed to another command.

A pipe is formed when the output of one command is sent as input to the next command. For example:

$ 1s | wc

might produce:

10 10 72

The 1s command produces a list of filenames which is provided as input to we. The we command counts the number of lines, words, and characters.

Any program that takes its input from another program, performs some operation on that input, and writes the result to the standard output is referred to as a filter. Most UNIX programs are designed to work as filters. This is one reason why UNIX programs do not print "friendly" prompts or other extraneous information to the user. Because all programs expect—and produce—only a data stream, that data stream can easily be processed by multiple programs in sequence.

One of the most common uses of filters is to process output from a command. Usually, the processing modifies it by rearranging it or reducing the amount of information it displays. For example:

$ who            List who is on the system, and at which terminal
peter	tty001	Mar 6 17:12
walter	tty003	Mar 6 13:51
chris	tty004	Mar 6 15:53
val	tty020	Mar 6 15:48
tim	tty005	Mar 4 17:23
ruth	tty006	Mar 6 17:02
fred	tty000	Mar 6 10:34
dale	tty008	Mar 6 15:26
$ who | sort      List the same information in alphabetic order
chris	tty004	Mar 6 15:53
dale	tty008	Mar 6 15:26
fred	tty000	Mar 6 10:34
peter	tty001	Mar 6 17:12
ruth	tty006	Mar 6 17:02
tim	tty005	Mar 4 17:23
val	tty20	Mar 6 15:48
walter	tty003	Mar 6 13:51
$

The sort program arranges lines of input in alphabetic or numeric order. It sons lines alphabetically by default. Another frequently used filter, especially in textprocessing environments, is grep, perhaps UNIX's most renowned program. The grep program selects lines containing a pattern:

$ who | grep tty00l       Find out who is on terminal 1
peter tty00l Mar 6 17:12

One of the beauties of UNIX is that almost any program can be used to filter the output of any other. The pipe is the master key to building command sequences that go beyond the capabilities provided by a single program and allow users to create custom "programs" of their own to meet specific needs.

If a command line gets too long to fit on a single screen line, simply type a backslash followed by a carriage return, or (if a pipe symbol comes at the appropriate place) a pipe symbol followed by a carriage return. Instead of executing the command, the shell will give you a secondary prompt (usually >) so you can continue the line:

$ echo This is a long line shown here as demonstration |
> wc
	1 10 49

This feature works in the Bourne shell only.

Shell Scripts

A shell script is a file that contains a sequence of UNIX commands. Part of the flexibility of UNIX is that anything you enter from the terminal can be put in a file and executed. To give a simple example, we'll assume that the last command example (grep) has been stored in a file called whoison:

$ cat whoison
who | grep tty00l

The permissions on this file must be changed to make it executable. After a file is made executable, its name can be entered as a command.

$ chmod +x whoison
$ 1s -1 whoison
-rwxrwxr-x 1 fred	doc	123 Mar 6 17:34 whois
$ whoison
peter	tty00l	Mar 6 17:12

Shell scripts can do more peter tty00l Mar 6 17:12

Shell scripts can do more than simply function as a batch command facility. The basic constructs of a programming language are available for use in a shell script, allowing users to perform a variety of complicated tasks with relatively simple programs.

The simple shell script shown above is not very useful because it is too specific. However, instead of specifying the name of a single terminal line in the file, we can read the name as an argument on the command line. In a shell script, $1 represents the first argument on the command line.

$ cat whoison
who | grep $1

Now we can find who is logged on to any terminal:

$ whoison tty004
chris	tty004	Mar 6 15:53

Later in this book, we will look at shell scripts in detail. They are an important part of the writer's toolbox, because they provide the "glue" for users of the UNIX system—the mechanism by which all the other tools can be made to work together.