blog.sonpike.net

LARGE BYTE

the polyunsaturation pill

Mix: the big 1009: the world’s first polyunsaturated computer. People know about it (right?), but I have to imagine the average taocp reader just kind of skips over a lot of those denser assembly sections, unless they’re really trying to scope out an algorithm. All the “official” MIX work has been moved over to the MMIX Masters project, and it feels like hobbyist development has started to slow down.1 And all of that is natural, of course; imagine you’re working through your copy of Vol. 1, 3rd edition for the first time: we serve past exercise “[IM830]”

Show that

Q(n)+1=k=1m1kckΓ(k2)(n2)1k2+O(n1m/2) Q(n) + 1 = \sum_{k=1}^{m-1} k c_k \Gamma\left(\frac{k}{2}\right) \left( \frac{n}{2} \right)^{1-\frac{k}{2}} + O\left(n^{1 - m/2}\right)

for all m1m \geq 1.

… and its accompanying quotation from Helen Keller,

I feel as if I should succeed in doing something in mathematics, although I cannot see why it is so very important.

— HELEN KELLER (1898)

… before coming unceremonously across the “mythical computer called ‘MIX,’” followed directly by its deprecation:

“MIX is now quite obsolete … [though it] is still worth knowing, because it helps to provide a context for subsequent developments”

Granted, I never got to read that deprecation on my first pass.2 My first exposure to taocp was an old, cloth-bound second edition to the first volume, published in 1973 (5 years after the original publication in 1968). Though, also granted, having worked through the book for the first time in 2024, I feel as though the justification is entirely unnecessary. On the contrary, I can’t see how one could possibly learn more effectively in learning a hypothetical machine which is more “similar to machines that have become dominant during the 1990s” than by learning good old MIX, with all of its “obsolete” particularities.

image from a 1968 publisher’s brochure

We talk a lot – we worry a lot – about things being up-to-date: in phase with the State-of-the-Art and all that. The word worth starts getting thrown around a lot here: that’s enough to make anyone anxious. “Is it worth it to learn this language?,” “is this old book really worth a read?”4 Clearly even Knuth wants to make sure that the information is “relevant,” even if MIX itself was never intended to directly mimic any specific pattern or architecture of the era (though as we shall see, it does). However, I have to say, I feel that Knuth himself doesn’t really have a grasp on how much more interesting and educational MIX can be now, in our age of memory safety and “AI” “generated” code, than it was even at the time of its original inception. To learn a little bit about MIX today is to give yourself a crash course in almost every stupid mistake and weird decision we’ve ever made in the history of computing. And that may just be the most useful thing you could possibly learn in our field, at this exact, particular moment in its history, with all its own stupid mistakes and weird decisions.5

there are exactly eight bits in a byte

Computers are difficult, and their difficulty is, for the relative-novice like myself, two-fold.

Fold one: Computers are difficult6 to reason about because every individual action that a computer takes is exceedingly, devastatingly simple, and yet, in combination, the number of operations a computer can do and how fast it can do them creates an aggregate result which is so large and complex that it’s practically impossible to really “take it all in.” A child with no interest in computers or mathematics could understand a simple and well-written assembly program if it were well explained7 better than they could ever understand a program written in a higher-level language.

Fold two: what we do as programmers is often built atop so many layers of abstraction and the details get so lost in the noise that hearsay and correlation take over and often triumph over reality. I’ve read many an anecode of “guy who, around 2003-2005, had to argue ceaselessly with colleagues that 64-bit computing was, in fact, not necessarily slower than 32-bit computing after some unrelated bug slowed down their program.”8

The 64- vs 32-bits thing is very particular, but I think it illustrates well this question of “why do we think these things?” or more specifically, “where do these ideas and preconceptions come from?” Computer science – being a field that is so vast, so hederogenous, and so profitable – has had a lot of resources and time and man-power to attempt, invest heavily in, and fail tremendously in many more directions than most fields of study ever get the chance or courage to venture. Certain directions are so divergent, so strange, that we forget about them entirely, turn them retroactively into little blips on the historical radar, and some directions are so consequential, that we forget that things weren’t always like this.

there are five bytes in a word…

Bytes – sizes of things, physical storage – is a great example.

“Rollie Arndt inside the computer [univac 1]”

Most programmers, regardless of genus, are probably at least passingly familiar with Binary-coded Decimal conversions and operations. It’s “a weird thing that old CPUs used to do,” so if you’ve ever peeked under the hood of or – god forbid – made an emulator before, you’ve probably at least stumbled through an implementation of a BCD conversion algorithm. That’s always how I conceptualized it, at least: it was an option that certain CPUs allowed the user to use – if it were convenient in some small case for the UI or something – with binary being self-evidently the star of the show. So, reading that “MIX has a peculiar property in that it is both binary and decimal at the same time,” I wasn’t really sure what to think about that. “Programmers who are accustomed to a binary machine can think of MIX as binary; those accustomed to decimal may regard MIX as decimal.” What on Earth does it mean to be a programmer accustomed to decimal?

Well, looking back, maybe an IBM hacker in the 1950s would have asked me the inverse. Of the “16 actual computers very similar to MIX and on which MIX could easily be simulated” that Knuth lists in MIX’s introductory preamble, the 4 first entries are IBM machines (the 360, 650, 709, and 7070) and 3 of those 4 (all but the 709) are decimal machines. Knuth dedicates the first volume of taocp to “the Type 650 computer once installed at Case Institute of Technology.” Perhaps he himself “natively” thought of programs in decimal, rather than binary, if only by way of muscle memory.

Interestingly, despite being a 36-bit binary computer, we can find that this picture from Columbia University describes the machine as holding “327,000 binary digits.” This isn’t so outlandish, given that the machine – as most of the era did – did have support for decimal arithmetic, but it also goes to show that the way that these machines were conceptualized at the time always seemed to fall back to decimal, regardless of the actual architecture. In many ways, it was a marketing thing, a choice of convenience – after all, what else would you say on the ad? Nobody knew what a kilobyte was.10

but, like, what is a kilobyte ? In 1999, Knuth himself (rhetorically) asked this question while writing the MMIX fascicle,11 and he bit into the IUPAC-IDCNS decision to use the, now standard, kibi-/mibi-/gibi- convention pretty hard: “who would voluntarily want to use MiB for a maybe-byte?!” While I think Knuth was right when he said that it was “unlikely that people will automatically warm to ‘mebibytes,’” (I think the term, while certainly within the realm of “common technical parlance” is undoubtedly on the rarer side and most people just leave it semi-ambiguous or subject to context) I don’t think I agree with Knuth (et al.)’s suggestion of “large kilobyte” either. All this digression to say: I’m still not convinced we know what a kilobyte is.

… plus a sign

So byte was pretty nebulous. A lot of vendors at the time stuck to words or good old fashioned decimal digits to ship product. Fine. But for the programmer, there had to be something – with Word sizes often much bigger than what we’re dealing with even today12, we needed some kind of way to break them up. MIX, for example, works like this:

The basic unit of MIX data is a byte. Each byte contains an unspecified amount of information, but it must be capable of holding at least 64 distinct values. That is, we know that any number between 0 and 63, inclusive, can be contained in one byte. Furthermore, each byte contains at most 100 distinct values. On a binary computer a byte must therefore be composed of six bits; on a decimal computer we have two digits per byte. […] A computer word consists of five bytes and a sign. The sign portion has only two possible values, + and -.

Well that’s awful: unspecified byte sizes?13 Plus we have to deal with this terrible sign bit.

0 1 2 3 4 5
± Byte Byte Byte Byte Byte

One word in MIX

For all of the strange, real-world constraints that Knuth put into the design of this fictional machine (accounting for how the machine would signal an error with its little bell, the “GO” button, how to connect the printer, …) you have to wonder what the ideal way of implementing such a shape would be. The natural, naive solution that a modern programmer might think of (if one wanted to stay as close as possible to the architecture as described) would be to simply give the thing a horrifying 31 bit form which makes me – as it should anyone in our civilized, well-aligned era – feel quite ill.14 Though, perhaps this thing was designed from a “decimal first” point of view – maybe Knuth’s dear 650 provided him with some inspration that we lack in our less enlightened era?

closeup of the 650’s indicators

I think it’s fair to say that yes, this clearly did provide him with quite a lot of inspiration for the architecture of MIX. Mind you, it’s not exactly the same idea – that “sign bit” at the top is not a ±, but rather a way of counting to 10 without ever having to use more than 1 bit at a time, sort of like how you would count on your fingers in base-616. Wikipedia refers to this as a “Bi-quinary coded decimal” scheme, and apparently it’s what abacuses use. Here’s a modified version of the table on Wikipedia, with the values for each bit.

Value bits (0-5) bits (0-1-2-3-4)
0 10 10000
1 10 01000
2 10 00100
3 10 00010
4 10 00001
5 01 10000
6 01 01000
7 01 00100
8 01 00010
9 01 00001

Like the Univac-1, this design comes, at least in part from needing checksums on everything. Since you only ever needed to have 1 “bi-bit” and 1 “quin-bit” active on any given number, the computer could tell instantly that something was busted if this wasn’t the case.

Looking ever so slightly deeper into the structure of MIX’s words, and ever so slightly into the future, we can see that the IBM 7070 even more closely resembles the structure a MIX word, complete with specific “bytes” for field control, addresses, data and the like.

0 1 2 3 4 5
± Address Address Index Field Opcode

One word in MIX, but more precisely…

0 1 2 3 4 5
± Opcode Index Field Address Address

… and one word for the 7070 – how familiar!17

The 7070 represented a single digit with 5 bits, giving it a “two-out-of-five” checksum that would halt operation immediently on failure. This may well be a little closer to what Knuth was expecting when considering the physical implementation of the MIX. Perhaps you could have an eight-bit “byte”, using 4 bits per digit and – to simulate a machine of the era – you could use the left over space as a sort of checksum.

oh, god, there’s something weird about the sign isn’t there

Yeah, if you clicked on those above links you might have seen that the sign in the 7070 is a “trit” – meaning that it can express 3 distinct values, rather than two, those being “positive, negative, and alphabetic.” While this is, again, quite alien to the modern programmers, wasn’t too terribly uncommon for the era (how could it be ?, we’ve already seen “bi-quinary,” a little ternary is a walk in the park). The MIX itself uses a similar system for its comparison flags: using a single trit to represent Less than, Greater than, and Equal to. In that context, it’s not so strange given that we still use a similar system to compare numbers, such as with C++’s spaceship operator (<=>) or Rust’s std::cmp::Ordering:

#[repr(i8)]
pub enum Ordering {
    Less = -1,
    Equal = 0,
    Greater = 1,
}

So while having a trit in every single one of your computer’s words is a bit more egregious (more opportunities for checksums, I guess) than a single trit in your status Word, the concept is perhaps not as alien as it first seems.

7070 programming, for ants

so, then how many bits are in a byte ?

While every passionate programmer’s favorite answer to every simple computing question is something along the lines of “well, that depends on your perspective,” there is certainly no one – no group of people on earth – more pedantic, subjective, and obsessive about the myriad wrinkles and folds of computing history18 than the members of the C++ standards committee. October of 2024, JF Bastien – who I believe may well be gunning for some sort of “most controversial standards proposals” any% – published “There are exactly 8 bits in a byte”. This and its subsequent revisions are all rather interesting papers, as are all of the rebuttals. Its very existence and need to insist upon itself is historically interesting as well, as I imagine the majority of programmers – even those who work somewhat closely to the hardware – already believe this assumption of __CHAR_BIT__ == 8 to be airtight, given that it is essentially tautological for any modern compiler. Knuth himself, in his introduction to MIX, noted that

Since 1975 or so, the word “byte” has come to mean a sequence of precisely eight binary digits, capable of representing the numbers 0 to 255. […] When we speak of bytes in connection with MIX we shall confine ourselves to the former sense of the word, harking back to the days when bytes were not yet standardized.

It’s been about thirty-odd years since he wrote that, and his “not yet” gets riper every moment. And in any case, history is unyeilding in its desire to fend off standardisation, for as of revision 5, published February 2025, to Bastion’s proposal, in his own words:

No consensus. The paper’s title is therefore erroneous: as far as WG21 is concerned, there are at least 8 bits per bytes. Maybe 9, 24, 16, 32, or maybe 2048. The author therefore expects that library and compiler implementations of C++ will finally support non-8-bit architectures—which they have failed to do for over a decade—now that WG21 (of which implementors are members) has clearly expressed the desire to do so.

With a little typedefing, maybe we could even get GCC running in MIX.

mason pike
january 3rd, 2026

  1. We got that super cool FPGA from Michael Michael Schröder in like 2022, of course, but that almost feels like the end of an era. The victory lap.↩︎

  2. not that I would have listened to it anyway.↩︎

  3. taocp page @ stanford.edu↩︎

  4. One’s mind goes immediently to Dr. Alexandrescu’s 2019 keynote @ cppcon: “Throw away Cormack, throw away Knuth… Oh no, you shouldn’t throw away Knuth… [but] the [algorithm] books are obsolete.” He’s being hyperbolic, obviously, but it’s a strong phrase that I think a lot of people more or less agree with. Throw ’em away.↩︎

  5. Knuth, D. E. (1997). The Art of Computer Programming, Volume 1: Fundamental Algorithms (3rd ed.). Addison Wesley. (Reading, Massachusetts: Addison-Wesley, 1997), 650pp. ISBN 0-201-89683-4. If I don’t cite a quote, or ambiguously refer to something as taocp, you can reasonably assume it’s from here.↩︎

  6. perhaps “frustrating” is the better word↩︎

  7. “We take the number from this box, then we put it in this box. Then we add 2 to all the numbers in this box. …” No variables, no weird abstractions to worry about.↩︎

  8. I can’t find the exact thread I was thinking of when I wrote this, but this 2004 osnews article and this arstechnica that it references are really interesting. The comments are maybe even more fascinating, though, as a time capsule of people flaming each other, nit-picking, arguing over this topic. It’s all very evocative to me as an example of this “reality lost in the noise and abstraction” idea.↩︎

  9. One of several of Ed Thelen’s absolutely unbelievable websites on the Univac 1. Some of the best documentation of old tech on the internet. A genuine hero for the preservation of this stuff.↩︎

  10. needed to factcheck myself here. it seems kind of fuzzy, but from what I could find (much of this is just on, or linked by, wikipedia, mind you), it seems that the earliest patent using the term “kilobyte” was from around 1969, so 10 years after the 709’s first installation. Wikipedia seems to suggest that “Kilobit,” however, was in something approximating common usage from around 1961. I can’t find anything relating to a convincing “first time” kilobit was used as a word in adverising, but a quick scan of Electronics by David Morton tells me that the 1-kilobit RAM memory chip was commercialized in 1972, so I’ll take that to mean that it was probably around this time that we began to see a shift from measuring in digits and words to measuring in denominations of bytes and bits.↩︎

  11. I have no better place to put this digression, but I must say that good technical writers are rare these days. Geniuses who can cogently explain their ideas to working, interested professionals are a dime a dozen, but it’s rare that they are particularly good writers who clearly care deeply about their writing style. I would posit that Christopher J. Date fits the bill here, and I always quite enjoy reading what he puts out, with all his strange asides and philosophical musings. Knuth, it should go without saying, is a timelessly good, effortlessly funny writer, and he is sure to remind me each time I open one of his books. Such as when, in the intro to the referenced Fascicle, he muses that “People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird.”↩︎

  12. The Univac-1, which we have already seen a little bit of, had a massive 84 bit word: 72 data bits (representing 11 decimal digits) and 6 parity bits. Obviously huge, but in context, pretty sensible. Computers of the era were prone to mechanical failures causing unreliable results: hence all the checksums.

    From Unisys History Newsletter Volume 5, Number 1, January 2001
    "UNIVAC I: The First Mass-Produced Computer" by George Gray
    
    The computer had a high-degree of self-checking: all processing was done in duplicate by two sets of circuitry, and the results were compared to be sure they were identical. Donald Marquardt of DuPont recalled: "One of the big advantages of the UNIVAC was in fact the ability to rely on the accuracy of the numbers when they came out.... Now there were some other computers that I used during that same period where I would make two or three runs on the machine and come up with two or three different sets of numbers...."

    In addition, some of those data-bits were “dead,” as the decimal digits were represented in Excess-3 “because it simplified the complementing (making negative) of numbers and made the carries come out right for digit-by-digit decimal addition.” Neat.↩︎

  13. you already know where i’m going with this.↩︎

  14. I’m shamelessly leaving this as an open exercise for the reader, since – honestly – I don’t have a great solution either. I started with the bitfield solution, looking a little something like:

    #define WORD_MAX 0b111111111111111111111111111111U
    #define HALF_WORD_MAX 0b111111111111U
    
    typedef struct LargeRegister {
        bool sign;
        unsigned int value : 30;
    } __attribute__((aligned(8))) LargeRegister;
    
    typedef LargeRegister Word;
    
    typedef struct SmallRegister {
        bool sign;
        uint16_t value : 12;
    } __attribute__((aligned(4))) SmallRegister;

    … But this obviously is just me begging my CPU to never cache anything and give me the most cursed alignments possible, particularly for small registers. Going forward, I think I might just retire the bitfield thing and represent the Words as an unsigned integer. They’re kind of swag, and it’s cool that the rotations and truncations Just Work like this, but on the other hand, it makes the interop between my C implementation and any other program that wants to talk to it so tedious.↩︎

  15. wikimedia link↩︎

  16. … but, like, you only have 1 finger on your left hand and 4 fingers on your right…↩︎

  17. for simplicity, I’m splitting the 7070’s word up into 2-digit “bytes” – this isn’t too dissimilar to how it’s represented in the manual (same link as above), and helps to demonstrate the similairities between it and the MIX.↩︎

  18. if it isn’t clear, this is a complement. it’s honestly aspirational.↩︎