by Eric S. Raymond esr@thyrsus.com
One fine day in January 2017 I was reminded of something I had half-noticed a few times over the previous decade. That is, younger hackers don’t know the bit structure of ASCII and the meaning of the odder control characters in it.
This is knowledge every fledgling hacker used to absorb through their pores. It’s nobody’s fault this changed; the obsolescence of hardware terminals and the near-obsolescence of the RS-232 protocol is what did it. Tools generate culture; sometimes, when a tool becomes obsolete, a bit of cultural commonality quietly evaporates. It can be difficult to notice that this has happened.
This document is a collection of facts about ASCII and related technologies, notably hardware serial terminals and RS-232 and modems. This is lore that was at one time near-universal and is no longer. It’s not likely to be directly useful today - until you trip over some piece of still-functioning technology where it’s relevant (like a GPS puck), or it makes sense of some old-fart war story. Even so, it’s good to know anyway, for cultural-literacy reasons.
One thing this collection has that tends to be indefinite in the minds of older hackers is calendar dates. Those of us who lived through all this tend to have remembered order and dependencies but not exact timing; here, I did the research to pin a lot of that down. I’ve noticed that people have a tendency to retrospectively back-date the technologies that interest them, so even if you did live through the era it describes you might get a few surprises from reading this.
There are lots of references to Unix in here because I am mainly attempting to educate younger open-source hackers working on Unix-derived systems such as Linux and the BSDs. If those terms mean nothing to you, the rest of this document probably won’t either.
Hardware context
Nowadays, when two computers talk to each other, it’s usually via TCP/IP over some physical layer you seldom need to care much about. And a “terminal” is actually a “terminal emulator”, a piece of software that manages part of your display and itself speaks TCP/IP.
Before ubiquitous TCP/IP and bit-mapped displays things were very different. For most hackers that transition took place within a few years of 1992 - perhaps somewhat earlier if you had access to then-expensive workstation hardware.
Before then there were video display terminals - VDTs for short. In the mid-1970s these had displaced an earlier generation of printing terminals derived from really old technology called a “teletype”, which had evolved around 1900 from Victorian telegraph networks. The very earliest versions of Unix in the late 1960s were written for these printing terminals, in particular for the Teletype Model 33 (aka ASR-33); the “tty” that shows up in Unix device names was a then-common abbreviation for “teletype”.
In those pre-Internet days computers didn’t talk to each other much, and the way teletypes and terminals talked to computers was a hardware protocol called “RS-232” (or, if you’re being pedantic, “EIA RS-232C”) [1]. Before USB, when people spoke of a “serial” link, they meant RS-232, and sometimes referred to the equipment that spoke it as “serial terminals”.
RS-232 had a very long service life; it was developed in the early 1960s, not originally for computer use but as a way for teletypewriters to talk to modems. Though it has passed out of general use and is no longer common knowledge, it’s not quite dead even today.
I’ve been simplifying a bit here. There were other things besides RS-232 and serial terminals going on, notably on IBM mainframes. But they’ve left many fewer traces in current technology and its folklore. This is because the lineage of modern Unix passes back through a now-forgotten hardware category called a “minicomputer”, especially minicomputers made by the Digital Equipment Corporation. ASCII, RS-232 and serial terminals were part of the technology cluster around minicomputers - as was, for that matter, Unix itself.
Minicomputers were wiped out by workstations and workstations by descendants of the IBM PC, but hackers old enough to remember the minicomputer era (mid-1960s to mid-1980s) tend still to get a bit misty-eyed about the DEC hardware they cut their teeth on.
Often, however, nostalgia obscures how very underpowered those machines were. For example: a DEC VAX 11-780 minicomputer in the mid-1980s, used for timesharing and often supporting a dozen simultaneous users, had less than 1/1000 the processing power and less than 1/5000 times as much storage available as a low-end smartphone does in 2017.
In fact, until well into the 1980s microcomputers ran slowly enough (and had poor enough RF shielding) that this was common knowledge: you could put an AM radio next to one and get a clue when it was doing something unusual, because either fundamentals or major subharmonics of the clock frequencies were in the range of human audibility. Nothing has run that slowly since the turn of the 21st century.
The strange afterlife of the Hayes smartmodem
About those modems: the word is a portmanteau for “modulator/demodulator”. Modems allowed digital signals to pass over copper phone wires - ridiculously slowly by today’s standards, but that’s how we did our primitive wide-area networking in pre-Internet times. It was not generally known back then that modems had first been invented in the late 1950s for use in military communications, notably the SAGE air-defense network; we just took them for granted.
Today modems that speak over copper or optical fiber are embedded invisibly in the Internet access point in your basement; other varieties perform over-the-air signal handling for smartphones and tablets. A variety every hacker used to know about (and most of us owned) was the “outboard” modem, a separate box wired to your computer and your telephone line.
Inboard modems (expansion cards for your computer) were also known, but never sold as well because being located inside the case made them vulnerable to RF noise, and the blinkenlights on an outboard were useful for diagnosing problems. Also, most hackers learned to interpret (at least to some extent) modem song - the beeping and whooshing noises the outboards made while attempting to establish a connection. The happy song of a successful connect was identifiably different from various sad songs of synchronization failure.
These old-fashioned modems were, by today’s standards, unbelievably slow. Modem speeds increased from 110 bits per second back at the beginning of interactive computing to 56 kilobits per second just before the technology was effectively wiped out by wide-area Internet around the end of the 1990s, which brought in speeds of a megabit per second and more (20 times faster). For the longest stable period of modem technology after 1970, about 1984 to 1991, typical speed was 9600bps. This has left some traces; it’s why surviving serial-protocol equipment tends to default to a speed of 9600bps.
There was a line of modems called “Hayes Smartmodems” that could be told to dial a number, or set parameters such as line speed, with command codes sent to the modem over its serial link from the machine. Every hacker used to know the “AT” prefix used for commands and that, for example, ATDT followed by a phone number would dial the number. Other modem manufacturers copied the Hayes command set and variants of it became near universal after 1981.
What was not commonly known then is that the “AT” prefix had a helpful special property. That bit sequence (1+0 1000 0010 1+0 0010 1010 1+, where the plus suffix indicates one or more repetitions of the preceding bit) has a shape that makes it as easy as possible for a receiver to recognize it even if the receiver doesn’t know the transmit-line speed; this, in turn, makes it possible to automatically synchronize to that speed [2].
That property is still useful, and thus in 2017 the AT convention has survived in some interesting places. AT commands have been found to perform control functions on 3G and 4G cellular modems used in smartphones. On one widely deployed variety, “AT+QLINUXCMD=” is a prefix that passes commands to an instance of Linux running in firmware on the chip itself (separately from whatever OS might be running visibly on the phone).
Preserving core values
From about 1955 to 1975 - before semiconductors - the dominant technology in computer memory used tiny magnetic doughnuts strung on copper wires. The doughnuts were known as “ferrite cores” and main memory thus known as “core memory” or “core”.
Unix terminology was formed in the early 1970s, and compounds like “in core” and “core dump” survived into the semiconductor era. Until as late as around 1990 it could still be assumed that every hacker knew from where these terms derived; even microcomputer hackers for which memory had always been semiconductor RAM tended to pick up this folklore rapidly on contact with Unix.
After 2000, however, as multiprocessor systems became increasingly common even on desktops, “core” increasingly took on a conflicting meaning as shorthand for “processor core”. In 2017 “core” can still mean either thing, but the reason for the older usage is no longer generally understood and idioms like “in core” may be fading.
36-bit machines and the persistence of octal
There’a a power-of-two size hierarchy in memory units that we now think of as normal - 8 bit bytes, 16 or 32 or 64-bit words. But this did not become effectively universal until after 1983. There was an earlier tradition of designing computer architectures with 36-bit words.
There was a time when 36-bit machines loomed large in hacker folklore and some of the basics about them were ubiquitous common knowledge, though cultural memory of this era began to fade in the early 1990s. Two of the best-known 36-bitters were the DEC PDP-10 and the Symbolics 3600 Lisp machine. The cancellation of the PDP-10 in ‘83 proved to be the death knell for this class of machine, though the 3600 fought a rear-guard action for a decade afterwards.
Hexadecimal is a natural way to represent raw memory contents on machines with the power-of-two size hierarchy. But octal (base-8) representations of machine words were common on 36-bit machines, related to the fact that a 36-bit word naturally divides into 12 3-bit fields naturally represented as octal. In fact, back then we generally assumed you could tell which of the 32- or 36-bit phyla a machine belonged in by whether you could see digits greater than 7 in a memory dump.
Here are a few things every hacker used to know that related to these machines:
36 bits was long enough to represent positive and negative integers to an accuracy of ten decimal digits, as was expected on mechanical calculators of the era. Standardization on 32 bits was unsuccessfully resisted by numerical analysts and people in scientific computing, who really missed that last 4 bits of accuracy.
A “character” might be 9 bits on these machines, with 4 packed to a word. Consequently, keyboards designed for them might have both a meta key to assert bit 8 and a now-extinct extra modifier key (usually but not always called “Super”) that asserted bit 9. Sometimes this selected a tier of non-ASCII characters including Greek letters and mathematical symbols.
It used also to be generally known that 36-bit architectures explained some unfortunate features of the C language. The original Unix machine, the PDP-7, featured 18-bit words corresponding to half-words on larger 36-bit computers. These were more naturally represented as six octal (3-bit) digits.
The immediate ancestor of C was an interpreted language written on the PDP-7 and named B. In it, a numeric literal beginning with 0 was interpreted as octal.
The PDP-7’s successor, and the first workhorse Unix machine was the PDP-11 (first shipped in 1970). It had 16-bit words - but, due to some unusual peculiarities of the instruction set, octal made more sense for its machine code as well. C, first implemented on the PDP-11, thus inherited the B octal syntax. And extended it: when an in-string backslash has a following digit, that was expected to lead an octal literal.
The Interdata 32, VAX, and other later Unix platforms didn’t have those peculiarities; their opcodes expessed more naturally in hex. But C was never adjusted to prefer hex, and the surprising interpretation of leading 0 wasn’t removed.
Because many later languages (Java, Python, etc) copied C’s low-level lexical rules for compatibility reasons, the relatively useless and sometimes dangerous octal syntax besets computing platforms for which three-bit opcode fields are wildly inappropriate, and may never be entirely eradicated [3].
The PDP-11 was so successful that architectures strongly influenced by it (notably, including Intel [4] and ARM microprocessors) eventually took over the world, killing off 36-bit machines.
RS232 and its discontents
A TCP/IP link generally behaves like a clean stream of 8-bit bytes (formally, octets). You get your data as fast as the network can run, and error detection/correction is done somewhere down below the layer you can see.
RS-232 was not like that. Two devices speaking it had to agree on a common line speed - also on how the byte framing works (the latter is why you’ll see references to “stop bits” in related documentation). Finally, error detection and correction was done in-stream, sort of. RS232 devices almost always spoke ASCII, and used the fact that ASCII only filled 7 bits. The top bit might be, but was not always, used as a parity bit for error detection. If not used, the top bit could carry data.
You had to set your equipment at both ends for a specific combination of all of these. After about 1984 anything other than “8N1” - eight bits, no parity, one stop bit - became increasingly rare. Before that, all kinds of weird combinations were in use. Even parity (“E”) was more common than odd (“O”) and 1 stop bit more common than 2, but you could see anything come down a wire. And if you weren’t properly set up for it, all you got was “baud barf” - random 8-bit garbage rather than the character data you were expecting.
This, in particular, is one reason the API for the POSIX/Unix terminal interface, termios(3), has a lot of complicated options with no obvious modern-day function. It had to be able to manipulate all these settings, and more.
Another consequence was that passing binary data over an RS-232 link wouldn’t work if parity was enabled - the high bits would get clobbered. Other now-forgotten wide-area network protocols reacted even worse, treating in-band characters with 0x80 on as control codes with results ranging from amusing to dire. We had a term, “8-bit clean”, for networks and software that didn’t clobber the 0x80 bit. And we needed that term…
Even the RS-232 physical connector varied. Standard RS-232 as defined in 1962 used a roughly D-shaped shell with 25 physical pins (DB-25), way more than the physical protocol actually required (you can support a minimal version with just three wires, and this was actually common). Twenty years later, after the IBM PC-AT introduced it in 1984, most manufacturers switched to using a smaller DB-9 connector (which is technically a DE-9 but almost nobody ever called it that). If you look at a PC with a serial port it is most likely to be a DB-9; confusingly, DB-25 came to be used for printer parallel ports (which originally had a very different connector) before those too were obsolesced by USB and Ethernet.
Anybody who worked with this stuff had to keep around a bunch of specialized hardware - gender changers, DB-25-to-DB-9 adapters (and the reverse), breakout boxes, null modems, and other stuff I won’t describe in detail because it’s left no traces in today’s tech. Hackers of a certain age still tend to have these things cluttering their toolboxes or gathering dust in a closet somewhere.
The main reason to still care about any of this (other than understanding greybeard war stories) is that some kinds of sensor and control equipment, routers, and IoT devices still speak RS-232, often wrapped inside a USB emulation. The most common devices that do this are probably GPS sensors designed to talk to computers (as opposed to handheld GPSes or car-navigation systems).
Because of devices like GPSes, you may still occasionally need to know what an RS-232 “handshake line” is. These were originally used to communicate with modems; a terminal, for example, could change the state of the DTR (Data Terminal Ready) line to indicate that it was ready to receive, initiate, or continue a modem session.
Later, handshake lines were used for other equipment-specific kinds of out-of-band signals. The most commonly re-used lines were DCD (data carrier detect) and RI (Ring Indicator).
Three-wire versions of RS-232 omitted these handshake lines entirely. A chronic source of frustration was equipment at one end of your link that failed to supply an out-of-band signal that the equipment at the other end needed. The modern version of this is GPSes that fail to supply their 1PPS (a high-precision top-of-second pulse) over one of the handshake lines.
Another significant problem was that an RS-232 device not actually sending data was undetectable without analog-level monitoring equipment. You couldn’t tell a working but silent device from one that had come unplugged or suffered a connection fault in its wiring. This caused no end of complications when troubleshooting and is a major reason USB was able to displace RS-232 after 1994.
A trap for the unwary that opened up after about the year 2000 is that peripheral connectors labeled RS232 could have one of two different sets of voltage levels. If they’re pins or sockets in a DB9 or DB25 shell, the voltage swing between 1 and 0 bits can be as much as 50 volts, and is usually about 26. Bare connectors on a circuit board, or chip pins, increasingly came to use what’s called “TTL serial” - same signalling with a swing of 3.3 or (less often) 5 volts. You can’t wire standard RS232 to TTL serial directly; the link needs a device called a “level shifter”. If you connect without one, components on the TTL side will get fried.
RS-232 passed out of common knowledge in the mid- to late 1990s, but didn’t finally disappear from general-purpose computers until around 2010. Standard RS-232 is still widely used on industrial controllers and in some niche applications, including point-of-sale systems and diagnostic consoles on commercial-grade routers. The TTL serial variant is often used on maker devices.
UUCP and BBSes, the forgotten pre-Internets
Every hacker over a certain age remembers either UUCP or the BBS scene. Many participated in both. In those days access to to the “real” net (ARPANET, which became Internet) was difficult if you weren’t affiliated with one of a select group of federal agencies, military contractors, or university research labs. So we made do with what we had, which was modems and the telephone network.
UUCP stands for Unix to Unix Copy Program. Between its escape from Bell Labs in 1979 and the mass-market Internet explosion of the mid-1990s, it provided slow but very low-cost networking among Unix sites using modems and the phone network.
UUCP was a store-and-forward system originally intended for propagating software updates, but its major uses rapidly became email and a thing called USENET (launched 1981) that was the ur-ancestor of Stack Overflow and other modern web fora. It supported topic groups for messages which, propagated from their point of origin through UUCP, would eventually flood to the whole network.
In part, UUCP and USENET were a hack around the two-tier rate structure that then existed for phone calls, with “local” being flat-rate monthly and “long-distance” being expensively metered by the minute. UUCP traffic could be relayed across long distances by local hops.
A direct descendant of USENET still exists, as Google Groups [5], but was much more central to the hacker culture before cheap Internet. Open source as we now know it germinated in USENET groups dedicated to sharing source code. Several conventions still in use today, like having project metadata files named README and NEWS and INSTALL, originated there in the early 1980s.
Two key dates in USENET history were universally known. One was the Great Renaming in 1987, when the name hierarchy of USENET topic groups was reorganized. The other was the “September that never ended” in 1993, when the AOL commercial timesharing services gave its users access to USENET. The resulting vast flood of newbies proved difficult to acculturate.
UUCP explains a quirk you may run across in old mailing-list archives: the bang-path address. UUCP links were point-to-point and you had to actually specify the route of your mail through the UUCP network; this led to people publishing addresses of the form “…!bigsite!foovax!barbox!user”, presuming that people who wanted to reach them would know how to reach bigsite.
UUCP was notoriously difficult to configure, enough so that people who knew how often put that skill on their CVs in the justified expectation that it could land them a job.
Meanwhile, in the microcomputer world, a different kind of store-and-forward evolved - the BBS (Bulletin-Board System). This was software running on a computer (after 1991 usually an MS-DOS machine) with one (or, rarely, more) attached modems that could accept incoming phone calls. Users (typically, just one user at a time!) would access the BBS using a their own modem and a terminal program; the BBS software would allow them to leave messages for each other, upload and download files, and sometimes play games.
The first BBS, patterned after the community notice-board in a supermarket, was fielded in Chicago in 1978. Over the next eighteen years over a hundred thousand BBSes flashed in and out of existence, typically operated out of the sysop’s bedroom or garage with a spare computer.
From 1984 the BBS culture evolved a primitive form of internetworking called “FidoNet” that supported cross-site email and a forum system broadly resembling USENET.
During a very brief period after 1990, just before mass-market Internet, software with BBS-like capabilities but supporting multiple simultaneous modem users (and often offeing USENET access) got written for low-cost Unix systems. The end-stage BBSes, when they survived, moved to the Web and dropped modem access. The history of cellar.org chronicles this period.
A handful of BBSes are still run by nostalgicists, and some artifacts from the culture are still preserved. But, like the UUCP network, the BBS culture as a whole collapsed when inexpensive Internet became widely available.
Almost the only cultural memory of BBSes is around a family of file-transfer protocols - XMODEM, YMODEM, and ZMODEM - developed and used on BBSes. For hackers of that day who did not cut their teeth on minicomputers with native TCP/IP, these were a first introduction to concepts like packetization, error detection, and retransmission. To this day, hardware from at least one commercial router vendor accepts software patches by XMODEM upload through a serial port.
Also roughly contemporaneous with USENET and the BBS culture, and also destroyed or absorbed by cheap Internet, were some commercial timesharing services supporting dialup access by modem, of which the best known were AOL (America Online) CompuServe, and GEnie. These provided BBS-like facilities. Every hacker knew of these, though few used them. They have left no traces at all in today’s hacker culture.
Terminal confusion
The software terminal emulators on modern Unix systems are the near-end - and probably final - manifestations of a long and rather confused history. It began with early displays sometimes called “glass TTYs” because they emulated teletypes - but less expensively, because they didn’t require consumables like paper. The phrase “dumb terminal” is equivalent. The first of these was shipped in 1969. The best-remembered of them is probably still the ADM-3 from 1975.
The very earliest VDTs, like the ASR-33, could form only upper-case letters. An interesting hangover from that these devices was that, even though most VDTs made after 1975 could form lower-case letters, for many years afterwards Unix and Linux responded to an all-upper-case login by switching to an mode which upcased all input. If you created an account with an all-upper-case login name and a mixed-case password, hilarity ensued. If the password was also upper-case the hilarity was less desparate but still confusing for the user.
The classic “smart terminal” VDT designs that have left a mark on later computing appeared during a relatively short period beginning in 1975. Devices like the Lear-Siegler ADM-3A (1976) and the DEC VT-100 (1978) inherited the 80-character line width of punched cards (longer than the 72-character line length of teletypes) and supported as many lines as could fit on an approximately 4:3 screen; they are the reason your software terminal emulator has a 24x80 or 25x80 default size.
These terminals were called “smart” because they could interpret control codes to do things like addressing the cursor to any point on the screen in order to produce truly 2-dimensional displays [6]. The ability to do bold, underline or reverse-video highlighting also rapidly became common. Colored text and backgrounds, however, only became available a few years before VDTs were obsolesced; before that displays were monochromatic. Some had crude, low-resolution dot graphics; a few types supported black-and-white vector graphics.
Early VDTs used a crazy variety of control codes. One of the principal relics of this era is the Unix terminfo database, which tracked these codes so terminal-using applications could do abstracted operations like “move the cursor” without being restricted to working with just one terminal type. The curses(3) library still used with software terminal emulators was originally intended to make this sort of thing easier.
After 1979 there was an ANSI standard for terminal control codes, based on the DEC VT-100 (being supported in the IBM PC’s original screen driver gave it a boost) [7]. By the early 1990s ANSI conformance was close to universal in VDTs, which is why that’s what your software terminal emulator does.
This whole technology category was rapidly wiped out in general-purpose computing, like dinosaurs after the Alvarez strike, when bit-mapped color displays on personal computers that could match the dot pitch of a monochrome VDT became relatively inexpensive, around 1992. The legacy VDT hardware lingered longest in dedicated point-of-sale systems, remaining not uncommon until as late as 2010 or so.
It’s not true, as is sometime suggested, that heritage from the VDT era explains the Unix command line - that actually predated VDTs, going back to the last generation of printing terminals in the late 1960s and early 1970s. Every hacker once knew that this is why we often speak of of “printing” output when we mean sending it to standard output that is normally connected to a terminal emulator.
What the VDT era does explain is some of our heritage games (see next section) and a few surviving utility programs like vi(1), top(1) and mutt(1). These are what advanced visual interfaces looked like in the VDT era, before bitmapped displays.
Games before GUIs
Before bit-mapped color displays became common and made graphics-intensive games the norm, there was a vigorous tradition of games that required only textual interfaces or the character-cell graphics on a VDT.
Every hacker once knew what the phrase “You are in a maze of twisty little passages, all alike” meant, and often used variants about confusing situations in real life (For example, “You are in a maze of twisty little technical standards, all different”). It was from the very first dungeon-crawling adventure game, Colossal Cave Adventure (1976). People who knew this game from its beginnings often thought of it as ADVENT, after its 6-character filename on the PDP-10 where it first ran.
ADVENT had a direct successor that was even more popular - Zork, first released in 1979 by hackers at MIT and later successfully commercialized. This game is why every hacker once knew that a zorkmid was the currency of the Great Underground Empire, and that if you wander around in dark places without your lantern lit you might be eaten by a grue.
There was a another family of games that took a different, more visual approach to dungeon-crawling. They are generally called “roguelikes”, after one of the earliest widely-distributed games in this group, Rogue from 1980. They featured top-down, maplike views of dungeon levels through which the player would wander battling monsters and seeking treasure.
The most widely played games in this group were Hack (1982) and Nethack (1987). Nethack is a notable for having been one of the earliest programs in which the development group was consciously organized as a distributed collaboration over the Internet; at the time, this was a sufficiently novel idea to be advertised in the project’s name.
These games gradually passed out of universal common knowledge after the mid-1990s, but they retain devoted minority followings today. Their fans accurately point out that the primitive state of interface design encouraged concentration on plot and story values, leading to a surprisingly rich imaginative experience.
ASCII
ASCII, the American Standard Code for Information Interchange, evolved in the early 1960s out of a family of character codes used on teletypes.
ASCII, unlike a lot of other early character encodings, is likely to live forever - because by design the low 127 code points of Unicode are ASCII. If you know what UTF-8 is (and you should) every ASCII file is correct UTF-8 as well.
The following table describes ASCII-1967, the version in use today.
Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex
0 00 NUL 16 10 DLE 32 20 48 30 0 64 40 @ 80 50 P 96 60 ` 112 70 p
1 01 SOH 17 11 DC1 33 21 ! 49 31 1 65 41 A 81 51 Q 97 61 a 113 71 q
2 02 STX 18 12 DC2 34 22 “ 50 32 2 66 42 B 82 52 R 98 62 b 114 72 r
3 03 ETX 19 13 DC3 35 23 # 51 33 3 67 43 C 83 53 S 99 63 c 115 73 s
4 04 EOT 20 14 DC4 36 24 $ 52 34 4 68 44 D 84 54 T 100 64 d 116 74 t
5 05 ENQ 21 15 NAK 37 25 % 53 35 5 69 45 E 85 55 U 101 65 e 117 75 u
6 06 ACK 22 16 SYN 38 26 & 54 36 6 70 46 F 86 56 V 102 66 f 118 76 v
7 07 BEL 23 17 ETB 39 27 ‘ 55 37 7 71 47 G 87 57 W 103 67 g 119 77 w
8 08 BS 24 18 CAN 40 28 ( 56 38 8 72 48 H 88 58 X 104 68 h 120 78 x
9 09 HT 25 19 EM 41 29 ) 57 39 9 73 49 I 89 59 Y 105 69 i 121 79 y
10 0A LF 26 1A SUB 42 2A * 58 3A : 74 4A J 90 5A Z 106 6A j 122 7A z
11 0B VT 27 1B ESC 43 2B + 59 3B ; 75 4B K 91 5B [ 107 6B k 123 7B {
12 0C FF 28 1C FS 44 2C , 60 3C < 76 4C L 92 5C \ 108 6C l 124 7C |
13 0D CR 29 1D GS 45 2D - 61 3D = 77 4D M 93 5D ] 109 6D m 125 7D }
14 0E SO 30 1E RS 46 2E . 62 3E > 78 4E N 94 5E ^ 110 6E n 126 7E ~
15 0F SI 31 1F US 47 2F / 63 3F ? 79 4F O 95 5F _ 111 6F o 127 7F DEL
It used to be common knowledge that the original 1963 ASCII had been sightly different. It lacked tilde and vertical bar; 5E was an up-arrow rather than a caret, and 5F was a left arrow rather than underscore. Some early adopters (notably DEC) held to the 1963 version.
If you learned your chops after 1990 or so, the mysterious part of this is likely the control characters, code points 0-31. You probably know that C uses NUL as a string terminator. Others, notably LF = Line Feed and HT = Horizontal Tab, show up in plain text. But what about the rest?
Many of these are remnants from teletype protocols that have either been dead for a very long time or, if still live, are completely unknown in computing circles. A few had conventional meanings that were half-forgotten even before Internet times. A very few are still used in binary data protocols today.
Here’s a tour of the meanings these had in older computing, or retain today. If you feel an urge to send me more, remember that the emphasis here is on what was common knowledge back in the day. If I don’t know it now, we probably didn’t generally know it then.
NUL (Null)
Survives as the string terminator in C.
SOH (Start of Heading)
Rarely used (as Ctrl-A) as a section divider in otherwise textual formats. Some versions of Unix mailbox format used it as a message divider. One very old version-control system (SCCS) did something similar.
STX (Start of Text), ETX (End of Text)
Very rarely used as packet or control-sequence delimiters. You will probably never see this, and the only place I’ve ever seen it was on a non-Unix OS in the early 1980s. ETX is Ctrl-C, which is a SIGINT interrupt character on Unix systems, but that has nothing to do with its ASCII meaning per se and probably derives from abbreviating the word “Cancel”.
EOT (End of Transmission)
As Ctrl-D, the way you type “End of file” to a Unix terminal.
ENQ (Enquiry)
In the days of hardware serial terminals, there was a convention that if a computer sent ENQ to a terminal, it should reply with terminal type identification. While this was not universal, it at least gave computers a fighting chance of autoconfiguring what capabilities it could assume the terminal to have.
ACK (Acknowledge)
It used to be common for wire protocols written in ASCII to use ENQ/ACK as a handshake, sometimes with NAK as a failure indication. Hackers used to use ACK in speech as “I hear you” and were a bit put out when this convention was disrupted in the 1980s by Bill The Cat’s “Ack! Thppt!”
BEL (Bell)
Make the bell ring on the teletype - an attention signal. This often worked on VDTs as well, but is no longer reliably the default on software terminal emulators. Some map it to a visual indication like flashing the title bar.
BS (Backspace)
Still does what it says on the tin, though there has been some historical confusion over whether the backspace key on a keyboard should behave like BS (nondestructive cursor move) or DEL (backspace and delete). Never used in textual data protocols.
HT (Horizontal tab)
Still does what it says on the tin. Sometimes used as a field separator in Unix textual file formats, but this is now old-fashioned and declining in usage.
LF (Line Feed)
The Unix textual end-of-line. Printing terminals interpreted it as “scroll down one line”; the Unix tty driver would normally wedge in a CR right after it on output.
VT (Vertical Tab)
In the days of printing terminals this often caused them to scroll down a configurable number of lines. VDTs had any number of possible behaviors; at least some pre-ANSI ones interpreted VT as “scroll up one line”. The only reason anybody remembers this one at all is that it persisted in Unix definitions of what a whitespace character is, even though it’s now extinct in the wild.
FF (Form Feed)
Eject the current page from your printing terminal. Many VDTs interpreted this as a “clear screen” instruction. Software terminal emulators sometimes still do.
CR (Carriage Return)
It is now possible that the reader has never seen a typewriter, so this needs explanation: “carriage return” is the operation of moving your print head or cursor to the left margin. Windows, other non-Unix operating systems, and some Internet protocols (such as SMTP) tend to use CR-LF as a line terminator, rather than bare LF. Pre-Unix MacOS used a bare CR.
SO (Shift Out), SI (Shift In)
Escapes to and from an alternate character set. Unix software used to emit them to drive pre-ANSI VDTs that interpreted them that way, but native Unix usage is rare to nonexistent.
DLE (Data Link Escape)
Sometimes used as a packet-framing character in binary protocols. That is, a packet starts with a DLE, ends with a DLE, and if one of the interior data bytes matches DLE it is doubled.
DC[1234] (Device Control [1234])
Never to my knowledge used specially after teletypes. However: there was a common software flow-control protocol, used over ASCII but separate from it, in which XOFF (DC3) was used as a request to pause transmission and XON (DC1) was used as a request to resume transmission. As Ctrl-S and Ctrl-Q these were implemented in the Unix terminal driver and long outlived their origin in the Model 33 Teletype. And not just Unix; this was implemented in CP/M and DOS, too.
NAK (Negative Acknowledge)
Never to my knowledge used specially after teletypes, but it would have been unsurprising to anybody if a device-control protocol used it as a negative response to ENQ.
SYN (Synchronous Idle)
Never to my knowledge used specially after teletypes. Be careful not to confuse this with the SYN (synchronization) packet used in TCP/IP’s SYN SYN-ACK initialization sequence.
ETB (End of Transmission Block)
Never to my knowledge used specially after teletypes.
CAN (Cancel), EM (End of Medium)
Never to my knowledge used specially after teletypes.
SUB (Substitute)
DOS and Windows use Ctrl-Z (SUB) as an end-of-file character; this is unrelated to its ASCII meaning. It was common knowledge then that this use of ^Z had been inherited from a now largely forgotten earlier OS called CP/M (1974), and into CP/M from earlier DEC minicomputer OSes such as RSX-11 (1972).
ESC (Escape)
Still commonly used as a control-sequence introducer. This usage is especially associated with the control sequences recognized by VT100 and ANSI-standard VDTs, and today by essentially all software terminal emulators
[FGRU]S ({Field|Group|Record|Unit} Separator)
Never to my knowledge used specially after teletypes. FS, as Ctrl-\, sends SIGQUIT under some Unixes, but this has nothing to do with ASCII. Ctrl-] (GS) is the exit character from telnet, but this also has nothing to do with its ASCII meaning.
DEL (Delete)
Usually an input character meaning “backspace and delete”. Under older Unix variants, sometimes a SIGINT interrupt character.
Not all of these were so well known that any hacker could instantly map from mnemonic to binary, or vice-versa. The well-known set was roughly NUL, BEL, BS, HT, LF, FF, CR, ESC, and DEL.
There are a few other bits of ASCII lore worth keeping in mind…
The Control modifier on your keyboard basically clears the top three bits of whatever character you type, leaving the bottom five and mapping it to the 0..31 range. So, for example, Ctrl-SPACE, Ctrl-@, and Ctrl-` all mean the same thing: NUL.
A Meta or Alt key on a VDT added 128 to the ASCII keycode for whatever it’s modifying (probably - on a few machines with peculiar word lengths they did different things). Software terminal emulators have more variable behavior; many of them now simply insert an ESC before the modified key, which Emacs treats as equivalent to adding 128.
Very old keyboards used to do Shift just by toggling the 32 or 16 bit, depending on the key; this is why the relationship between small and capital letters in ASCII is so regular, and the relationship between numbers and symbols, and some pairs of symbols, is sort of regular if you squint at it. The ASR-33, which was an all-uppercase terminal, even let you generate some punctuation characters it didn’t have keys for by shifting the 16 bit; thus, for example, Shift-K (0x4B) became a [ (0x5B)
Key dates
These are dates that every hacker knew were important at the time, or shortly afterwards. I’ve tried to concentrate on milestones for which the date - or the milestone itself - seems to have later passed out of folk memory.
1961
MIT takes delivery of a PDP-1. The first recognizable ancestor of the hacker culture of today rapidly coalesces around it.
1969
Ken Thompson begins work on what will become Unix. First commercial VDT ships; it’s a glass TTY. First packets exchanged on the ARPANET, the direct ancestor of today’s Internet.
1970
DEC PDP-11 first ships; architectural descendants of this machine, including later Intel microprocessors, will come to dominate computing.
1973
Interdata 32 ships; the long 32-bit era begins [8]. Unix Edition 5 (not yet on the Interdata) escapes Bell Labs to take root at a number of educational institutions. The XEROX Alto pioneers the “workstation” - a networked personal computer with a high-resolution display and a mouse.
1975
First Altair 8800 ships. Beginning of heroic age of microcomputers. First 24x80 and 25x80 “smart” (addressable-cursor) VDTs. ARPANET declared “operational”, begins to spread to major universities.
1976
“Lions’ Commentary on UNIX 6th Edition, with Source Code” released. First look into the Unix kernel source for most hackers, and was a huge deal in those pre-open-source days.
1977
Unix ported to the Interdata. First version with a kernel written largely in C rather than machine-dependent assembler.
1978
First BBS launched - CBBS, in Chicago.
1981
First IBM PC ships. End of the heroic age of micros. TCP/IP is implemented on a VAX-11/780 under 4.1BSD Unix; ARPANET and Unix cultures begin to merge.
1982
Sun Microsystems founded. Era of commercial Unix workstations begins.
1983
PDP-10 canceled. This is effectively the end of 36-bit architectures anywhere outside of deep mainframe country, though Symbolics Lisp machines hold out a while longer. ARPANET, undergoing some significant technical changes, becomes Internet.
1984
AT&T begins a largely botched attempt to commercialize Unix, clamping down on access to source code. In the BBS world, FidoNet is invented.
1985
RMS published GNU Manifesto. This is also roughly the year the C language became the dominant lingua franca of both systems and applications programming, eventually displacing earlier compiled language so completely that they are almost forgotten.
1986
Intel 386 ships; end of the line for 8- and 16-bit PCs. Consumer-grade hardware in this class wouldn’t be generally available until around 1989, but after that would rapidly surpass earlier 32-bit minicomputers in and workstations capability.
1991
Linux and the World Wide Web are (separately) launched.
1992
Bit-mapped color displays with a dot pitch matching that of a monochrome VDT (and a matching ability to display crisp text at 80x25) ship on consumer-grade PCs. Bottom falls out of the VDT market.
1993
Linux gets TCP/IP capability, moves from hobbyist’s toy to serious OS. America OnLine offers USENET access to its uses; “September That Never Ended” begins.
1994
Mass-market Internet takes off in the U.S. USB promulgated.
1995-1996
Peak years of UUCP/USENET and the BBS culture, then collapse under pressure from mass-market Internet.
1997
I first give the “Cathedral and Bazaar” talk.
1999
Peak year of the dot-com bubble. End of workstation era: Market for Suns and other proprietary Unix workstations collapses under pressure from Linux running on PCs.
2005
Major manufacturers cease production of cathode-ray tubes in favor of flat-panel displays. Flat-panels have been ubiquitous on new hardware since about 2003. There is a brief window until about 2007 during which high-end CRTs no longer in production still exceed the resolution of flat-panel displays and are still sought after. Also in 2005, AOL drops USENET support and Endless September ends.
2007-2008
64-bit transition in mass market PCs. The 32-bit era ends. The first smartphones ship.
Request to contributors
A lot of people reading this have been seized by the urge to send me some bit of lore or trivia for inclusion. Thank you, but bear in mind that the most important choice is what to leave out. Here are some guidelines:
I’m trying to describe common knowledge at the time. That means not every bit of fascinating but obscure trivia belongs here.
Anything from a tech generation before early minis - in particular the era of mainframes, punched cards, and paper tape - is out of scope. I gotta draw the line somewhere, and it’s there.
Stories about isolated survivals of old tech today are not interesting if the tech wasn’t once common knowledge.
Please do not send me timeline entries for dates which you think are important unless you think the date has been generally been forgotten, or is in serious danger of same.
Supporting this work
If you enjoyed this, please contribute at my Patreon page so I won’t be forced to get a $DAYJOB and no longer have time to think up or write things like this document. I work on a lot of more serious projects too, including critical network infrastructure. So give generously; the civilization you save could be your own.
Related Reading
How To Become A Hacker
The Lost Art of C Structure Packing
Change history
1.0: 2017-01-26
Initial version.
1.1: 2017-01-27
Pin down the date DB-9 came in. Added a minor section on the persistence of octal. More on the afterlife of RS-232.
1.2: 2017-01-29
More about the persistence of octal. Mention current-loop ASR-33s. 36-bit machines and their lingering influence. Explain ASCII shift. A bit more about ASCII-1963. Some error correction.
1.3: 2017-01-30
Added “Key dates” and “Request to contributors”.
1.4: 2017-02-03
The curious survival of the Hayes AT command set.
1.5: 2017-02-04
TTL in serial and maker devices. The AT Hayes prefix explained. UUCP and long distance rates. Reference to space-cadet keyboard removed, as it turned out to ship a 32-bit word. Improved description of ASCII shift.
1.6: 2017-02-08
How VDTs explain some heritage programs, and how bitmapped displays eventually obsolesced them. Explain why the ADM-3 was called “dumb” even though it was smart.
1.7: 2017-02-09
The BBS subculture. XMODEM/YMODEM/ZMODEM. Commercial timesharing. Two dates in USENET history.
1.8: 2017-02-14
Heritage games. The legacy of all-uppercase terminals. Where README came from. What “core” is. The ARPANET. Monitoring your computer with a radio.
- Actually, there was an even older style of tty interface called “current loop” that the ASR-33 originally used; in the 1970s dual-mode ASR-33s that could also speak RS-232 began to ship, and RS-232 eventually replaced current loop entirely.
- A full explanation of the magic of the AT prefix can be found at http://esr.ibiblio.org/?p=7333&cpage=1#comment-1802568
- Python 3 and Perl 6 have at least gotten rid of the dangerous leading-0-for-octal syntax, but Go kept it
- Early Intel microprocessors weren’t much like the 11, but the 80286 and later converged with it in important ways.
- The old free-floating USENET still exists too, but Google Groups is where you can find what has been preserved of the historical USENET archives.
- Confusingly, the ADM-3A (which could address any screen cell) was described in marketing copy as a “dumb” terminal, not a “smart” one. This is because there was a rival definition of “smart” as capable of doing local editing of the screen without involving the remote computer, like an IBM 3270. But since minicomputers never used that capability this definition was never live in the Unix world, and has mostly faded out of use.
- It was not commonly known that the VT100 was designed to fit a 1976 stanfard called ECMA-48; ANSI simply adopted it.
- There were a few 32-bit minis before the Interdata, but they seem to have been designed for real-time or other non-timesharing uses.