The Art of Unix Programming

The Art of Unix Programming

Eric Steven Raymond

This book and its on-line version are distributed under the terms of the Creative Commons Attribution-NoDerivs 1.0 license, with the additional proviso that the right to publish it on paper for sale or other for-profit use is reserved to Pearson Education, Inc. A reference copy of this license may be found at http://creativecommons.org/licenses/by-nd/1.0/legalcode.

AIX, AS/400, DB/2, OS/2, System/360, MVS, VM/CMS, and IBM PC are trademarks of IBM. Alpha, DEC, VAX, HP-UX, PDP, TOPS-10, TOPS-20, VMS, and VT-100 are trademarks of Compaq. Amiga and AmigaOS are trademarks of Amiga, Inc. Apple, Macintosh, MacOS, Newton, OpenDoc, and OpenStep are trademarks of Apple Computers, Inc. ClearCase is a trademark of Rational Software, Inc. Ethernet is a trademark of 3COM, Inc. Excel, MS-DOS, Microsoft Windows and PowerPoint are trademarks of Microsoft, Inc. Java. J2EE, JavaScript, NeWS, and Solaris are trademarks of Sun Microsystems. SPARC is a trademark of SPARC international. Informix is a trademark of Informix software. Itanium is a trademark of Intel. Linux is a trademark of Linus Torvalds. Netscape is a trademark of AOL. PDF and PostScript are trademarks of Adobe, Inc. UNIX is a trademark of The Open Group.

The photograph of Ken and Dennis in Chapter 2 appears courtesy of Bell Labs/Lucent Technologies.

The epigraph on the Portability chapter is from the Bell System Technical Journal, v57 #6 part 2 (July-Aug. 1978) pp. 2021-2048 and is reproduced with the permission of Bell Labs/Lucent Technologies.

Revision History
Revision 1.0 19 September 2003 esr
This is the content that went to Addison-Wesley’s printers.
Revision 0.4 5 February 2003 esr
Release for public review.
Revision 0.3 22 January 2003 esr
First eighteen-chapter draft. Manuscript walkthrough at Chapter 12. Limited release for early reviewers.
Revision 0.2 2 January 2003 esr
First manuscript walkthrough at Chapter 7. Released to Dmitry Kirsanov at AW production.
Revision 0.1 16 November 2002 esr
First DocBook draft, fifteen chapters. Languages rewritten to incorporate lots of feedback. Transparency, Modularity, Multiprogramming, Configuration, Interfaces, Documentation, and Open Source chapters released. Shipped to Mark Taub at AW.
Revision 0.0 1999 esr
Public HTML draft, first four chapters only.
Dedication

To Ken Thompson and Dennis Ritchie, because you inspired me.

Table of Contents

Preface
Who Should Read This Book
How to Use This Book
Related References
Conventions Used in This Book
Our Case Studies
Author’s Acknowledgements
I. Context

  1. Philosophy
    Culture? What Culture?
    The Durability of Unix
    The Case against Learning Unix Culture
    What Unix Gets Wrong
    What Unix Gets Right
    Open-Source Software
    Cross-Platform Portability and Open Standards
    The Internet and the World Wide Web
    The Open-Source Community
    Flexibility All the Way Down
    Unix Is Fun to Hack
    The Lessons of Unix Can Be Applied Elsewhere
    Basics of the Unix Philosophy
    Rule of Modularity: Write simple parts connected by clean interfaces.
    Rule of Clarity: Clarity is better than cleverness.
    Rule of Composition: Design programs to be connected with other programs.
    Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
    Rule of Simplicity: Design for simplicity; add complexity only where you must.
    Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
    Rule of Transparency: Design for visibility to make inspection and debugging easier.
    Rule of Robustness: Robustness is the child of transparency and simplicity.
    Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
    Rule of Least Surprise: In interface design, always do the least surprising thing.
    Rule of Silence: When a program has nothing surprising to say, it should say nothing.
    Rule of Repair: Repair what you can — but when you must fail, fail noisily and as soon as possible.
    Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
    Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
    Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
    Rule of Diversity: Distrust all claims for one true way.
    Rule of Extensibility: Design for the future, because it will be here sooner than you think.
    The Unix Philosophy in One Lesson
    Applying the Unix Philosophy
    Attitude Matters Too
  2. History
    Origins and History of Unix, 1969-1995
    Genesis: 1969–1971
    Exodus: 1971–1980
    TCP/IP and the Unix Wars: 1980-1990
    Blows against the Empire: 1991-1995
    Origins and History of the Hackers, 1961-1995
    At Play in the Groves of Academe: 1961-1980
    Internet Fusion and the Free Software Movement: 1981-1991
    Linux and the Pragmatist Reaction: 1991-1998
    The Open-Source Movement: 1998 and Onward
    The Lessons of Unix History
  3. Contrasts
    The Elements of Operating-System Style
    What Is the Operating System’s Unifying Idea?
    Multitasking Capability
    Cooperating Processes
    Internal Boundaries
    File Attributes and Record Structures
    Binary File Formats
    Preferred User Interface Style
    Intended Audience
    Entry Barriers to Development
    Operating-System Comparisons
    VMS
    MacOS
    OS/2
    Windows NT
    BeOS
    MVS
    VM/CMS
    Linux
    What Goes Around, Comes Around
    II. Design
  4. Modularity
    Encapsulation and Optimal Module Size
    Compactness and Orthogonality
    Compactness
    Orthogonality
    The SPOT Rule
    Compactness and the Strong Single Center
    The Value of Detachment
    Software Is a Many-Layered Thing
    Top-Down versus Bottom-Up
    Glue Layers
    Case Study: C Considered as Thin Glue
    Libraries
    Case Study: GIMP Plugins
    Unix and Object-Oriented Languages
    Coding for Modularity
  5. Textuality
    The Importance of Being Textual
    Case Study: Unix Password File Format
    Case Study: .newsrc Format
    Case Study: The PNG Graphics File Format
    Data File Metaformats
    DSV Style
    RFC 822 Format
    Cookie-Jar Format
    Record-Jar Format
    XML
    Windows INI Format
    Unix Textual File Format Conventions
    The Pros and Cons of File Compression
    Application Protocol Design
    Case Study: SMTP, the Simple Mail Transfer Protocol
    Case Study: POP3, the Post Office Protocol
    Case Study: IMAP, the Internet Message Access Protocol
    Application Protocol Metaformats
    The Classical Internet Application Metaprotocol
    HTTP as a Universal Application Protocol
    BEEP: Blocks Extensible Exchange Protocol
    XML-RPC, SOAP, and Jabber
  6. Transparency
    Studying Cases
    Case Study: audacity
    Case Study: fetchmail’s -v option
    Case Study: GCC
    Case Study: kmail
    Case Study: SNG
    Case Study: The Terminfo Database
    Case Study: Freeciv Data Files
    Designing for Transparency and Discoverability
    The Zen of Transparency
    Coding for Transparency and Discoverability
    Transparency and Avoiding Overprotectiveness
    Transparency and Editable Representations
    Transparency, Fault Diagnosis, and Fault Recovery
    Designing for Maintainability
  7. Multiprogramming
    Separating Complexity Control from Performance Tuning
    Taxonomy of Unix IPC Methods
    Handing off Tasks to Specialist Programs
    Pipes, Redirection, and Filters
    Wrappers
    Security Wrappers and Bernstein Chaining
    Slave Processes
    Peer-to-Peer Inter-Process Communication
    Problems and Methods to Avoid
    Obsolescent Unix IPC Methods
    Remote Procedure Calls
    Threads — Threat or Menace?
    Process Partitioning at the Design Level
  8. Minilanguages
    Understanding the Taxonomy of Languages
    Applying Minilanguages
    Case Study: sng
    Case Study: Regular Expressions
    Case Study: Glade
    Case Study: m4
    Case Study: XSLT
    Case Study: The Documenter’s Workbench Tools
    Case Study: fetchmail Run-Control Syntax
    Case Study: awk
    Case Study: PostScript
    Case Study: bc and dc
    Case Study: Emacs Lisp
    Case Study: JavaScript
    Designing Minilanguages
    Choosing the Right Complexity Level
    Extending and Embedding Languages
    Writing a Custom Grammar
    Macros — Beware!
    Language or Application Protocol?
  9. Generation
    Data-Driven Programming
    Case Study: ascii
    Case Study: Statistical Spam Filtering
    Case Study: Metaclass Hacking in fetchmailconf
    Ad-hoc Code Generation
    Case Study: Generating Code for the ascii Displays
    Case Study: Generating HTML Code for a Tabular List
  10. Configuration
    What Should Be Configurable?
    Where Configurations Live
    Run-Control Files
    Case Study: The .netrc File
    Portability to Other Operating Systems
    Environment Variables
    System Environment Variables
    User Environment Variables
    When to Use Environment Variables
    Portability to Other Operating Systems
    Command-Line Options
    The -a to -z of Command-Line Options
    Portability to Other Operating Systems
    How to Choose among the Methods
    Case Study: fetchmail
    Case Study: The XFree86 Server
    On Breaking These Rules
  11. Interfaces
    Applying the Rule of Least Surprise
    History of Interface Design on Unix
    Evaluating Interface Designs
    Tradeoffs between CLI and Visual Interfaces
    Case Study: Two Ways to Write a Calculator Program
    Transparency, Expressiveness, and Configurability
    Unix Interface Design Patterns
    The Filter Pattern
    The Cantrip Pattern
    The Source Pattern
    The Sink Pattern
    The Compiler Pattern
    The ed pattern
    The Roguelike Pattern
    The ‘Separated Engine and Interface’ Pattern
    The CLI Server Pattern
    Language-Based Interface Patterns
    Applying Unix Interface-Design Patterns
    The Polyvalent-Program Pattern
    The Web Browser as a Universal Front End
    Silence Is Golden
  12. Optimization
    Don’t Just Do Something, Stand There!
    Measure before Optimizing
    Nonlocality Considered Harmful
    Throughput vs. Latency
    Batching Operations
    Overlapping Operations
    Caching Operation Results
  13. Complexity
    Speaking of Complexity
    The Three Sources of Complexity
    Tradeoffs between Interface and Implementation Complexity
    Essential, Optional, and Accidental Complexity
    Mapping Complexity
    When Simplicity Is Not Enough
    A Tale of Five Editors
    ed
    vi
    Sam
    Emacs
    Wily
    The Right Size for an Editor
    Identifying the Complexity Problems
    Compromise Doesn’t Work
    Is Emacs an Argument against the Unix Tradition?
    The Right Size of Software
    III. Implementation
  14. Languages
    Unix’s Cornucopia of Languages
    Why Not C?
    Interpreted Languages and Mixed Strategies
    Language Evaluations
    C
    C++
    Shell
    Perl
    Tcl
    Python
    Java
    Emacs Lisp
    Trends for the Future
    Choosing an X Toolkit
  15. Tools
    A Developer-Friendly Operating System
    Choosing an Editor
    Useful Things to Know about vi
    Useful Things to Know about Emacs
    The Antireligious Choice: Using Both
    Special-Purpose Code Generators
    yacc and lex
    Case Study: Glade
    make: Automating Your Recipes
    Basic Theory of make
    make in Non-C/C++ Development
    Utility Productions
    Generating Makefiles
    Version-Control Systems
    Why Version Control?
    Version Control by Hand
    Automated Version Control
    Unix Tools for Version Control
    Runtime Debugging
    Profiling
    Combining Tools with Emacs
    Emacs and make
    Emacs and Runtime Debugging
    Emacs and Version Control
    Emacs and Profiling
    Like an IDE, Only Better
  16. Reuse
    The Tale of J. Random Newbie
    Transparency as the Key to Reuse
    From Reuse to Open Source
    The Best Things in Life Are Open
    Where to Look?
    Issues in Using Open-Source Software
    Licensing Issues
    What Qualifies as Open Source
    Standard Open-Source Licenses
    When You Need a Lawyer
    IV. Community
  17. Portability
    Evolution of C
    Early History of C
    C Standards
    Unix Standards
    Standards and the Unix Wars
    The Ghost at the Victory Banquet
    Unix Standards in the Open-Source World
    IETF and the RFC Standards Process
    Specifications as DNA, Code as RNA
    Programming for Portability
    Portability and Choice of Language
    Avoiding System Dependencies
    Tools for Portability
    Internationalization
    Portability, Open Standards, and Open Source
  18. Documentation
    Documentation Concepts
    The Unix Style
    The Large-Document Bias
    Cultural Style
    The Zoo of Unix Documentation Formats
    troff and the Documenter’s Workbench Tools
    TeX
    Texinfo
    POD
    HTML
    DocBook
    The Present Chaos and a Possible Way Out
    DocBook
    Document Type Definitions
    Other DTDs
    The DocBook Toolchain
    Migration Tools
    Editing Tools
    Related Standards and Practices
    SGML
    XML-DocBook References
    Best Practices for Writing Unix Documentation
  19. Open Source
    Unix and Open Source
    Best Practices for Working with Open-Source Developers
    Good Patching Practice
    Good Project- and Archive-Naming Practice
    Good Development Practice
    Good Distribution-Making Practice
    Good Communication Practice
    The Logic of Licenses: How to Pick One
    Why You Should Use a Standard License
    Varieties of Open-Source Licensing
    MIT or X Consortium License
    BSD Classic License
    Artistic License
    General Public License
    Mozilla Public License
  20. Futures
    Essence and Accident in Unix Tradition
    Plan 9: The Way the Future Was
    Problems in the Design of Unix
    A Unix File Is Just a Big Bag of Bytes
    Unix Support for GUIs Is Weak
    File Deletion Is Forever
    Unix Assumes a Static File System
    The Design of Job Control Was Badly Botched
    The Unix API Doesn’t Use Exceptions
    ioctl2 and fcntl2 Are an Embarrassment
    The Unix Security Model May Be Too Primitive
    Unix Has Too Many Different Kinds of Names
    File Systems Might Be Considered Harmful
    Towards a Global Internet Address Space
    Problems in the Environment of Unix
    Problems in the Culture of Unix
    Reasons to Believe
    A. Glossary of Abbreviations
    B. References
    C. Contributors
    D. Rootless Root
    Editor’s Introduction
    Master Foo and the Ten Thousand Lines
    Master Foo and the Script Kiddie
    Master Foo Discourses on the Two Paths
    Master Foo and the Methodologist
    Master Foo Discourses on the Graphical User Interface
    Master Foo and the Unix Zealot
    Master Foo Discourses on the Unix-Nature
    Master Foo and the End User
    List of Figures

2.1. The PDP-7.
3.1. Schematic history of timesharing.
4.1. Qualitative plot of defect count and density vs. module size.
4.2. Caller/callee relationships in GIMP with a plugin loaded.
6.1. Screen shot of audacity.
6.2. Screen shot of kmail.
6.3. Main window of a Freeciv game.
8.1. Taxonomy of languages.
11.1. The xcalc GUI.
11.2. Screen shot of the original Rogue game.
11.3. The Xcdroast GUI.
11.4. Caller/callee relationships in a polyvalent program.
13.1. Sources and kinds of complexity.
18.1. Processing structural documents.
18.2. Present-day XML-DocBook toolchain.
18.3. Future XML-DocBook toolchain with FOP.
List of Tables

8.1. Regular-expression examples.
8.2. Introduction to regular-expression operations.
14.1. Language choices.
14.2. Summary of X Toolkits.
List of Examples

5.1. Password file example.
5.2. A .newsrc example.
5.3. A fortune file example.
5.4. Basic data for three planets in a record-jar format.
5.5. An XML example.
5.6. A .INI file example.
5.7. An SMTP session example.
5.8. A POP3 example session.
5.9. An IMAP session example.
6.1. An example fetchmail -v transcript.
6.2. An SNG Example.
7.1. The pic2graph pipeline.
8.1. Glade Hello, World.
8.2. A sample m4 macro.
8.3. A sample XSLT program.
8.4. Taxonomy of languages — the pic source.
8.5. Synthetic example of a fetchmailrc.
8.6. RSA implementation using dc.
9.1. Example of fetchmailrc syntax.
9.2. Python structure dump of a fetchmail configuration.
9.3. copy_instance metaclass code.
9.4. Calling context for copy_instance.
9.5. ascii usage screen.
9.6. Desired output format for the star table.
9.7. Master form of the star table.
10.1. A .netrc example.
10.2. X configuration example.
18.1. groff1 markup example.
18.2. man markup example.
19.1. tar archive maker production.