next up previous contents index
Next: 1.4.3 Programming languages and Up: 1.4 Software Features Previous: 1.4.1 Basic commands and

1.4.2 Text processing and word processing

Almost every computer user has a need for some kind of document preparation system. (How many computer enthusiasts do you know who still use pen and paper? Not many, we'll wager.) In the PC world, word processing is the norm: it involves editing and manipulating text (often in a ``What-You-See-Is-What-You-Get'' environment) and producing printed copies of the text, complete with figures, tables, and other garnishes.  

In the UNIX world, text processing is much more common, which is quite different than the classical concept of word processing. With a text processing system, text is entered by the author using a ``typesetting language'', which describes how the text should be formatted. Instead of entering the text within a special word processing environment, the source may be modified with any text editor such as vi or Emacs. Once the source text (in the typesetting language) is complete, the user formats the text with a separate program, which converts the source to a format suitable for printing. This is somewhat analogous to programming in a language such as C, and ``compiling'' the document into a printable form.

      There are many text processing systems available for Linux. One is groff, the GNU version of the classic nroff text formatter originally developed by Bell Labs and still used on many UNIX systems worldwide. Another modern text processing system is , developed by Donald Knuth of computer science fame. Dialects of , such as , are also available.

Text processors such as and groff differ mostly in the syntax of their formatting languages. The choice of one formatting system over another is also based upon what utilities are available to satisfy your needs, as well as personal taste.

For example, some people consider the groff formatting language to be a bit obscure, so they use , which is more readable by humans. However, groff is capable of producing plain ASCII output, viewable on a terminal, while is intended primarily for output to a printing device. However, various programs exist to produce plain ASCII from -formatted documents, or to convert to groff, for example.

Another text processing system is texinfo, an extension to   used for software documentation by the Free Software Foundation. texinfo is capable of producing a printed document, or an online-browsable hypertext ``Info'' document from a single source file. Info files are the main format of documentation used by GNU software such as Emacs.    

Text processors are used widely in the computing community for producing papers, theses, magazine articles, and books (in fact, this book was produced using ). The ability to process the source language as a plain text file opens the door to many extensions to the text processor itself. Because source documents are not stored in an obscure format, readable only by a particular word processor, programmers are able to write parsers and translators for the formatting language, extending the system.

What does such a formatting language look like? In general, the formatting language source consists mostly of the text itself, along with ``control codes'' to produce a particular effect, such as changing fonts, setting margins, creating lists, and so on.

As an example, take the following text:

Mr. Torvalds:

We are very upset with your current plans to implement post-hypnotic suggestion in the Linux terminal driver code. We feel this way for three reasons:

  1. Planting subliminal messages in the terminal driver is not only immoral, it is a waste of time;
  2. It has been proven that ``post-hypnotic suggestions'' are ineffective when used upon unsuspecting UNIX hackers;
  3. We have already implemented high-voltage electric shocks, as a security measure, in the code for login.
We hope you will reconsider.

This text would appear in the formatting language as the following:

\begin{quote}
Mr. Torvalds: 

We are very upset with your current plans to implement {\em post-hypnotic
suggestion\/} in the {\bf Linux} terminal driver code. We feel this
way for three reasons:
\begin{enumerate}
\item Planting subliminal messages in the kernel driver is not only
      immoral, it is a waste of time;
\item It has been proven that ``post-hypnotic suggestions'' are ineffective
      when used upon unsuspecting UNIX hackers;
\item We have already implemented high-voltage electric shocks, as a 
      security measure, in the code for {\tt login}.
\end{enumerate}
We hope you will reconsider. 
\end{quote}

The author enters the above ``source'' text using any text editor, and generates the formatted output by processing the source with . At first glance, the typesetting language may appear to be obscure, but it's actually quite easy to learn. Using a text processing system enforces typographical standards when writing. For example, all enumerated lists within a document will look the same, unless the author modifies the definition of the enumerated list ``environment''. The primary goal is to allow the author to concentrate on writing the actual text, instead of worrying about typesetting conventions.

WYSIWYG word processors are attractive for many reasons; they provide a   powerful (and sometimes complex) visual interface for editing the document. However, this interface is inherently limited to those aspects of text layout which are accessible to the user. For example, many word processors provide a special ``format language'' for producing complicated expressions such as mathematical formulae. This is identical text processing, albeit on a much smaller scale.

The subtle benefit of text processing is that the system allows you to specify exactly what you mean. Also, text processing systems allow you to edit the source text with any text editor, and the source is easily converted to other formats. The tradeoff for this flexibility and power is the lack of a WYSIWYG interface.

Many users of word processors are used to seeing the formatted text as they edit it. On the other hand, when writing with a text processor, one generally does not worry about how the text will appear when formatted. The writer learns to expect how the text should look from the formatting commands used in the source.

There are programs which allow you to view the formatted document on a graphics display before printing. For example, the xdvi program displays a ``device independent'' file generated by the system under the X Windows environment. Other software applications, such as xfig, provide a WYSIWYG graphics interface for drawing figures and diagrams, which are subsequently converted to the text processing language for inclusion in your document.

Admittedly, text processors such as nroff were around long before word processing was available. However, many people still prefer to use text processing, because it is more versatile and independent of a graphics environment. In either case, the idoc word processor is also available for Linux, and before long we expect to see commercial word processors becoming available as well. If you absolutely don't want to give up word processing for text processing, you can always run MS-DOS, or some other operating system, in addition to Linux.

There are many other text-processing-related utilities available. The powerful METAFONT system, used for designing fonts for , is included   with the Linux port of . Other programs include ispell, an interactive spell checker and corrector; makeindex, used for generating indicies in documents; as well as many groff and -based macro packages for formatting many types of documents and mathematical texts. Conversion programs to translate between or groff source to a myriad of other formats are available.

 



next up previous contents index
Next: 1.4.3 Programming languages and Up: 1.4 Software Features Previous: 1.4.1 Basic commands and



Matt Welsh
mdw@sunsite.unc.edu