blog.sonpike.net

JONG ARCHAEOLOGY

one of my favorite things about programming is that you don’t actually have to do anything “right” or “by the books” in order to make something that works exactly as intended and has the necessary robustness to withstand a production environment. before i got into computers, one of the things that turned me off of it is that it seemed like everything was so perfect; you type your password in the box, it gets sent along some magic wires, and wow I can see all my banking information or whatever. sure, there’s bugs, but – one would think – that stuff is surely inevitable on such a large scale.

it’s only when you go into those offices, start peeking in the codebases, talking to the teams, that you realize that – yeah, it all works – but there’s no magic here at all. it’s just people, making people decisions to the best of their people ability. they have opinions and preferences and sometimes they miss something obvious and other times they catch on to the most minute and subtle of errors. i love that, it’s why i like my job and it’s why i do it in my free-time. making things is cool, specifically because even if they won’t be perfect, they will be yours.

well, at least until recently, right? i’ll spare the reader the same conversion i’m sure they’ve had countless times before, but a lot of the strangeness and humanity that comes part and parcel with the art of computer programming gets lost in the noise of LLMs and AI reviews and “let me run that by claude real quick,” and so on and so forth. i don’t use the stuff. even apart from my own political, environmental, and moral qualms with the technology1 – even if all that stuff got cleaned up tomorrow – it’s just not fun to program by arguing with a weird chatbot, and even if the code it makes works, i don’t really care because the code it spits out isn’t fun to read. i love getting to review merge requests at work because i get to have fun conversations with my colleagues about why they did this or that using so and so pattern, why not do it like this, etc etc. it reminds me of when i used to play in a lot of jazz band concerts – i love to be up there with people just throwing ideas at the wall, sometimes being amazed at a sudden spark of brilliance i myself could have never thought up, other times being shocked at the utter strangeness or alienness of an idea. both are fun, and both are human.

point being: i’m a fast inverse square root enjoyer2 ; strange constraints produce strange solutions, and strange solutions are really fun to stumble across. in a world where constraints are increasingly homogenized and abstracted away (vis a vis “cloud engineering” and so on) and people are just literally writing less code – preferring instead to have it written for them – occult and obscure (that is to say, creative) solutions to software problems become increasingly rare and precious as everything drifts toward the mean. probably for the best in terms of “making a good product” but i don’t make products in my free time, and when i come across a weird, oddball solution to a problem in a production environment (and i’m not getting paid to fix it) i always take a moment to sit back and appreciate it. let’s appreciate this one together.

jonging

3-dan derrell

while ai is not fun, riichi mahjong is very fun. i’ve been playing online on and off for a few years and i often play a game while i’m on the exercise bike or on a long bus ride. so a natural intersection of interests for me is mahjong engines and mahjong programming in general. recently, i’ve been trying to read up on what little literature is openly available on the topic in english and while there’s not a ton of stuff out there, there are some lovely articles by Sanjiang Li, Xueqing Yan, and Yongming Li available on arxiv which i’ve been using to try and put together my own engine.

as being able to read back and score game logs is, of course, a top priority for any engine, i’ve been putting together a little rust crate that grabs my recent tenhou logs and parses them into something my engine can play with. that, on its face, seemed like an in-and-out ‘saturday afternoon’ sort of project – write a little http requester, parse the response into some enums, and done.

well, i find the right url, i send the request, and i get something that looks like this:

    <!-- hundreds of lines .... -->
    <INIT seed="0,0,0,3,5,87" ten="250,250,250,250" oya="0" hai0="75,107,63,7,13,76,2,122,41,95,110,3,34" hai1="68,79,31,74,52,11,133,106,25,71,15,112,49" hai2="21,82,48,128,86,9,83,81,115,36,19,51,24" hai3="100,108,130,73,78,97,91,37,8,42,55,30,104"/>
    <T132/>
    <D122/>
    <U57/>
    <E106/>
    <V44/>
    <F115/>
    <W64/>
    <!-- .... -->
    <G30/>
    <T127/>
    <D127/>
    <U40/>
    <E40/>
    <V113/>
    <F113/>
    <N who="1" m="43625" />
    <E49/>
    <V101/>
    <F101/>
    <W103/>
    <G124/>
    <T23/>
    <D41/>
    <U1/>
    <E1/>
    <V35/>
    <F35/>
    <W32/>
    <G32/>
    <T29/>
    <D110/>
    <U46/>
    <E88/>
    <N who="2" m="51279" />
    <F26/>
    <!-- .... -->
    <G55/>
    <AGARI ba="0,0" hai="16,21,24,44,47,48,51,54,55,82,83" m="51279" machi="55" ten="30,7700,0" yaku="8,1,52,1,54,2" doraHai="87" who="2" fromWho="3" sc="250,0,250,0,250,77,250,-77" />
    <!-- hundreds of lines .... -->

sure, that’s not too bad. you’d always prefer to see some json or something, but tenhou came out in like 2007 so you can’t blame the boomer format. unfortunately, it’s more or less impossible to just guess the actions that are being represented here by their xml representation, so after a bit of guess-and-checking while running the logs back through the official client I came across mthrok’s wonderful tenhou-log-utils in which they have written their own parser for these raw logs. very neat!

the majority of these tags, as you can imagine if you know anything about riichi mahjong, are draws and discards. they’re certainly obfuscated, and i ended up capturing them like this:

if tag_name.starts_with(['T', 'U', 'V', 'W']) { } // draw for player index tag_name[0] - 84
if tag_name.starts_with(['D', 'E', 'F', 'G']) { } // discard from player index tag_name[0] - 68

why’d they do it like that ? no idea. but it’s probably the least strange design decision of the whole project. the next thing to parse would be the calls, and a safe bet would be that those are represented by the Ns. my first thought was that a tag like <N who="2" m="51279" /> would represent the WEST player (since EAST is probably 0 and mahjong moves counter-clockwise) and that the m represents the meld. correct again, self, but now the hard part: what do those characters actually represent? my first thought was “tiles that could be used in the meld,” like maybe your 5th, 1st, 2nd, 7th, and 9th tile were all possible candidates? but obviously that doesn’t make sense, so i figured that there was some kind of internal encoding or obfuscation going on, so i take a look through mthrok’s parser.py script for how they handed them and… well…

def _parse_shuntsu(meld):
    # Adopted from http://tenhou.net/img/tehai.js
    t = (meld & 0xfc00) >> 10
    r = t % 3
    t = t // 3
    t = 9 * (t // 7) + (t % 7)
    t *= 4
    h = [
        t + 4*0 + ((meld & 0x0018)>>3),
        t + 4*1 + ((meld & 0x0060)>>5),
        t + 4*2 + ((meld & 0x0180)>>7),
    ]
    if r == 1:
        h = [h[1], h[0], h[2]]
    elif r == 2:
        h = [h[2], h[0], h[1]]
    return h

that’s not really what i meant by encoding. i figured like, “9 represented the 9 of bamboo” or something. it was shocking to see such violent bit manipulation to handle an XML attribute for a riichi mahjong game, but thankfully mthrok put a reference to the original source code hosted on the tenhou domain (posted by the developers specifically to help people who want to build these kinds of parsers. you can find them talking a bit about it in their man page). having a look at the relevant code in that file3, we find:

var t = (m & 0xfc00) >> 10;
var r = t % 3;
t = parseInt(t / 3);
t = parseInt(t / 7) * 9 + (t % 7);
t *= 4;
var h = [
  t + 4 * 0 + ((m & 0x0018) >> 3),
  t + 4 * 1 + ((m & 0x0060) >> 5),
  t + 4 * 2 + ((m & 0x0180) >> 7),
];
switch (r) {
  case 1:
    h.unshift(h.splice(1, 1)[0]);
    break;
  case 2:
    h.unshift(h.splice(2, 1)[0]);
    break;
}
ret += sprintHai136(h[0], kui == 3 ? 1 : 0);
ret += sprintHai136(h[1], kui == 2 ? 1 : 0);
ret += sprintHai136(h[2], kui == 1 ? 3 : 0);

somehow i didn’t expect it to really be like that in the original script. if anything, mthrok made the logic surrounding those scary unshifts a lot friendlier. to my surprise there’s another “script” (really just a code snippet, presumably from the backend) linked in the header of the tenhou version as well: mentsu136.txt.4 it sports the label “副露面子のビットフィールド” (exposed meld bitfield) and i present it here, in its entirety:

union{
    struct{ // 順子
        WORD kui:2;
        WORD syuntsu:1; // 1
        WORD hai0:2;
        WORD hai1:2;
        WORD hai2:2;
        WORD __padding__:1;
        WORD type6:6; // 21*3=63(0x3F)
        /*
        (0) 01 23①
        (1) 02 2①3
        (2) 03 ①23
        (3) 11 13②
        (4) 12 1②3
        (5) 13 ②13
        (6) 21 12③
        (7) 22 1③2
        (8) 23 ③12
        */
    };
    struct{ // 刻子
        WORD kui:2;
        WORD syuntsu:1; // 0
        WORD koutsu:1; // 1
        WORD chakan:1; // 0
        WORD hai_unused:2;
        WORD __padding__:2;
        WORD type7:7; // 34*3=102(0x66)
        /*
        (0) 01 23①
        (1) 02 2①3
        (2) 03 ①23
        (3) 11 13②
        (4) 12 1②3
        (5) 13 ②13
        (6) 21 12③
        (7) 22 1③2
        (8) 23 ③12
        */
    };
    struct{ // 槓子
        WORD kui:2;
        WORD syuntsu:1; // 0
        WORD koutsu:1; // 0
        WORD chakan:1; // 0
        WORD nuki:1; // 0
        WORD __padding__:2;
        WORD type8:8; // 136(0x88)
        /*
        (0) 01 234①
        (1) 02 23①4
        (2) 03 ①234
        (3) 11 134②
        (4) 12 13②4
        (5) 13 ②134
        (6) 21 124③
        (7) 22 12③4
        (8) 23 ③124
        (9) 31 123④
        (A) 32 12④3
        (B) 33 ④123
        */
    };
    struct{ // 加カン -> 刻子から
        WORD kui:2;
        WORD syuntsu:1; // 0
        WORD koutsu:1; // 0
        WORD chakan:1; // 1
        WORD hai_added:2;
        WORD __padding__:2;
        WORD type7:7; // 34*3=102(0x66)
        /*
        (0) 01 23①
        (1) 02 2①3
        (2) 03 ①23
        (3) 11 13②
        (4) 12 1②3
        (5) 13 ②13
        (6) 21 12③
        (7) 22 1③2
        (8) 23 ③12
        */
    };
    struct{ // 抜き
        WORD kui:2; // 0
        WORD syuntsu:1; // 0
        WORD koutsu:1; // 0
        WORD chakan:1; // 0
        WORD nuki:1; // 1
        WORD __padding__:2;
        WORD type8:8; // 136(0x88)
    };
    WORD all;
};

let’s not mince words here: i love this code. this is my favorite piece of code i’ve seen so far in 2026 – a perfect mixture of a mindbogglingly strange design with a really genius solution to the problem being solved. when i saw this, i knew that i had to write a little something about it, even if it’s just to help myself understand it better. despite the fact that the code is publicly available, you don’t see many people talking about it in english.

ビットフィールド barry

i think that the basic problem being solved by this code is relatively clear on its face, but let’s start with the basics before we get into the cool stuff.

first off, if you’re anything like me,5 you were already clapping your hands and jumping up and down from the moment we saw that beautiful WORD syuntsu:1; // 1. that’s right folks, we have a homebrewed tagged union implementation with each union member holding a bitfield representing information about the meld.

tagged union ?

seems like everybody that writes one of these things has a different word for its implementation. wikipedia lists “variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct,” to name a few. in rust we just call them “enums,” which is kind of hilarious in context. Casey Muratori, an exemplary game developer and computer scientist, discusses in his wonderful “big OOPs” talk that this pattern was first invented by Charles Anthony “Tony” Hoare in his 1966 work “Record Handling”, so despite it being heavily associated with newer design patterns, the idea predates even the proliferation of the more common form of switch statements.

this is probably the most powerful programming idiom in a language’s toolbelt and is – in my opinion – no small part of the reason why rust was able to capture the imagination of basically every developer who has ever used it. it’s one of those things that, whenever i use a language that doesn’t natively support it as a first-class feature, i can really feel its absence.

the interesting part of this implementation (one of a few) is that we don’t have “one” tag, we have a sort of bitfield that we have to check in a specific order (since, for example, if you have a truthy value for m & 1 << 2, the bitfield “short-circuits” and ends early) in order to narrow down which tile we’re considering. the basic usecase is that you receive this 16-bit integer and then you can run a simple if-else statement on it:

if meld & (1 << 2) != 0 {
    // chii
} else if meld & (1 << 3) != 0 {
    // pon
} else if meld & (1 << 4) != 0 {
    // kakan
} else if meld & (1 << 5) != 0 {
    // nuki
} else {
    // minkan / ankan
}

… and as long as you’re familiar with the organization of the data, you’ll end up with the right discriminant. below is a tabular representation of the union that will hopefully elucidate the shape of this “walking bitfield”:

type 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
chii (順子) kui6 kui 1 t7 t t' t' t'' t'' padding data data data data data data
pon (刻子) kui kui 0 1 0 t t padding padding data data data data data data data
ankan / minkan kui kui 0 0 0 0 padding padding data data data data data data data data
kakan (加槓) kui kui 0 0 1 t t padding padding data data data data data data data
nuki (順子) 0 0 0 0 0 1 padding padding data data data data data data data data

this is really interesting, and actually does (probably?) save you a bit of space if you absolutely need to have this union represented in memory as a 16-bit int. like, if you asked me to write a tagged union for this, i’d probably come back to you with something on the order of:

typedef enum {
    TYPE_CHII,
    TYPE_PON,
    TYPE_ANKAN,
    TYPE_MINKAN,
    TYPE_NUKI,
} meld_type_t;

typedef struct {
    meld_type_t type;

    union {
        Chii   chii;
        Pon    pon;
        Ankan  ankan;
        Minkan minkan;
        Nuki   nuki;
    } meld;
} meld_t;

… but not only is that boring, we end up with a real problem. since we have 5 discriminants, we’re going to need at least 8 bits of memory to hold that enum tag8 in order to maintain good alignment, meaning that functionally every instance of this thing is going to hold at least 4 bits of completely dead information. it’s a little cleaner to look at, sure, but that original implementation is screaming at me that clearly we have 16 bits to work with – no more, no less ; maybe it’s a database thing – and so I wouldn’t be surprised if the original developers fell on a similar situation before realizing that even if their “walking tag” implementation is a little wily, you’re clearly saving space over the naive “traditional” implementation. being able to correctly identify a chii off of only one bit of information rather than four (or in my example, eight) gives us plenty of space for to hold the important information pertaining to “what tile came from who”. and while it isn’t “safe” in the modern sense of “type safety”, assuming that the developer always knows to check from least to most significant bit, this implementation works quite well, and most importantly: looks cool.

chii charlie

vocabulary corner a chii is a call that opens your hand and robs a discard from your opponent to obtain the missing tile from a sequence of 3 characters in the same suit. for example, having the 4 and 5 of bamboo would allow you to chii for the 3 or the 6 of the same suit. chiis do not “wrap around,” so an 8, 9, and 1 would not form a valid chii. chiis can only be called on the player to your left (so, the player that goes right before you).

chii’s – aside from being the first member of the union, and thus logically “going first” – possess the most elegant and efficient implementation of all the melds, so let’s start with that. first of all, thinking about the data structure, since – for all member structs9 – bits 0 and 1 are reserved for information on the “targeted” player and bit 2 – for chii – is used to identify the discriminant, we have 13 bits of information with which to describe:

this is actually pretty tricky, but we manage to do it with one bit to spare with two rather interesting tricks. the simpler trick of the two is to hold the “tile version” (whether the tile represents copy 0, 1, 2, or 3) in bits 3-8, each pair of bits representing the version of one tile. we’ll use that later to index back into “master array” of all the tiles once we figure out what values these versions actually represent. the trickier part is the clever little algorithm and data structure used for figuring out which of those three versions actually aligns with which tile while also correctly identifying the lowest tile in the grouping.

looking back at the union itself, notice the comment left by the developers:

WORD type6:6; // 21*3=63(0x3F)
/*
(0) 01 23①
(1) 02 2①3
(2) 03 ①23
(3) 11 13②
(4) 12 1②3
(5) 13 ②13
(6) 21 12③
(7) 22 1③2
(8) 23 ③12
*/

not the clearest comment i’ve ever read, but from what i understand of it, we can see that we have allocated 6 bits of information to this field of the union struct, meaning that we can store up to 63 unique values. the devs made a really clever observation here that while we have 136 possible tiles to choose from, we can narrow down that possibility space a lot:

very neat! the real slight of hand here is equally as neat: the encoding duplicates each of the 21 tiles twice in order to represent which tile was given by the opponent, hence the *3 in the comment, bringing us to the immensely satisfying result of 63: exactly 6 bits of information. the usecase here is simple, as we saw before, but it was difficult to reason about without all the ancillary information.

// Take top 6 bits &'ed with the meld to get tile info.
let mut tile = (meld & 0xFC00) >> 10;

// divide by three, removing the "duplicates" representing which tile was given by the other player
tile /= 3

// build the "step ladder" representing the tile and player giving the tile
tile = 9 * (tile / 7) + (tile % 7);

you can see how this creates a sort of “step ladder”, where every three steps we move up by 4 (representing the next tile, since the 4 copies of each tile are logically grouped together), until we reach 7 at which point we skip 8 and 9 and cycle back to the “1” of the next suit. and each of these three steps along the horizontal line represents which of the three tiles was given up by the opponent to complete the meld.

with that, we select the proper version of each tile, using the flags mentioned earlier:

// note that i'm representing these bits backwards from the table representation,
// so the rightmost bit is the least significant.
let mut opened_set = [
        tile +     ((meld & 0x0018) >> 3), // 0000 0000 0001 1000 (t)
        tile + 4 + ((meld & 0x0060) >> 5), // 0000 0000 0110 0000 (t')
        tile + 8 + ((meld & 0x0180) >> 7), // 0000 0001 1000 0000 (t'')
    ];

… and we simply take the mod 3 of those top 6 bits – before any transformation – to figure out which tile was given up by the other player

let r = tile % 3;
if r == 1 {
    opened_set = [opened_set[1], opened_set[0], opened_set[2]]
} else if r == 2 {
    opened_set = [opened_set[2], opened_set[0], opened_set[1]]
}

and just like that we’ve taken a 16-bit integer and turned it into a properly ordered 3-tuple representing a chii.

pon perry

vocabulary corner a pon is a triplet of 3 tiles of the same suit and same value. unlike chii, a pon can be called on any other player.

chii’s might be the most elegant of the melds, but that’s not to say that the pon implementation doesn’t have some neat tricks up its sleeve as well. our struct is obviously quite similar:

struct{ // 刻子
  WORD kui:2;
  WORD syuntsu:1; // 0
  WORD koutsu:1;  // 1
  WORD chakan:1;  // 0
  WORD hai_unused:2;
  WORD __padding__:2;
  WORD type7:7;   // 34*3=102(0x66)
};

the first thing of note is that we don’t cut off the bit field directly, like we do in the chii. we save one bit after the koutsu tag to have an “always 0” chakan tag. i’m not really sure why that is, but in practice it basically just serves as padding. next is the two bits of hai_unused, which is rather clever. it functions inversely to the chii’s hai0, hai1, and hai2 in that it represents the unused tile of the triple. neat!

this time, the “data” part of the struct is 7 bits, representing 127 possible values. however, unlike with the chii, we only need 10210 of those values (01100110), but that’s fine since we aren’t as starved for space with the pon because of the “unused hai” trick.

the original code here is extremely clever, but admittedly rather heinous.

var unused = (m & 0x0060) >> 5;
var t = (m & 0xfe00) >> 9;
var r = t % 3;
t = parseInt(t / 3);
t *= 4;
var h = [t, t, t];
switch (unused) {
  case 0:
    h[0] += 1;
    h[1] += 2;
    h[2] += 3;
    break;
  case 1:
    h[0] += 0;
    h[1] += 2;
    h[2] += 3;
    break;
  case 2:
    h[0] += 0;
    h[1] += 1;
    h[2] += 3;
    break;
  case 3:
    h[0] += 0;
    h[1] += 1;
    h[2] += 2;
    break;
}
switch (r) {
  case 1:
    h.unshift(h.splice(1, 1)[0]);
    break;
  case 2:
    h.unshift(h.splice(2, 1)[0]);
    break;
}
if (kui < 3) h.unshift(h.splice(2, 1)[0]);
if (kui < 2) h.unshift(h.splice(2, 1)[0]);

let’s pick this apart and make it a little more readable. as we can see in the comment attached to the struct, the “which player gave up this tile” rotation remains the same as with the chii, and the code for that remains the same. the first hurdle comes with the switch statement: we used the two bits of unused in order to find which tile isn’t in the triple, then create an array with the 3 remaining tiles in “version order” (copy0, then copy1, …). we could probably replace this with a slightly clearer little decoupling like this, to make the operation a little clearer:

let increments: [[u16; 3]; 4] = [
    [1, 2, 3],
    [0, 2, 3],
    [0, 1, 3],
    [0, 1, 2],
];

let mut h = [t, t, t];
let inc = increments[unused as usize];
h[0] += inc[0];
h[1] += inc[1];
h[2] += inc[2];

or, alternatively, we could get a little goofy with it and use some core api stuff i never get a chance to use. tis the season, after all.

let inc = increments[unused as usize];
let h: [u16; 3] = core::array::from_fn(|i| t + inc[i]);

the unshifts at the end, however, i find very unclear. since a chii can only possibly target the player to your left, we only need to figure out which tile came from them. a pon, however, can target any other player, and so after the inital tile rotation (copy/pasted from the chii function) we have to do some more complex rotations depending on which player gave up the tile. if it came from player 3 (relative to us, so if we are WEST (player 2), then player 3 would be SOUTH, despite SOUTH being the “absolute” player 1), we don’t rotate at all, whereas if it came from player 2, we rotate once, and from player 1 we rotate twice. the unshift – while being kind of tricky to reason about, is a very reasonable solution to this, since there isn’t a real “rotate” function for arrays in javascript. in rust, however, we can write this much more concisely:

// maybe an assert!(kui <= 3) to catch any bad parsing
h.rotate_right(3 - kui);
kans

kan kyle

kans are cool. apart from the fact that they’re the funniest thing you can do in riichi mahjong, the tenhou implementation is also very interesting. each of the tree kan types has its own branch, and each one behaves a little differently. there’s not a lot of unique logic for them, since they’re functionally very similar to a pon, so we’ll just hit the parts that make them unique.

vocabulary corner

a kan is when you have control of all four copies of a given tile and you choose to reveal this information. there are three kinds of kan, stemming from the three ways you can collect the tiles:

kans are counted as triplets when scoring a hand, so in calling one you get to draw an extra tile (to make up for the fact that you formed a triplet with 4 tiles) from the extra special “dead” wall which is otherwise untouched during play. kanning also results in a new dora indicator being revealed which – at least in my experience – means calling kan usually just results in giving your opponent 3 free han.

that’s part of what makes calling kan so funny: while it’s extremely dangerous, it can result in the most rare and exciting yakus such as the coveted Rinshan kaihou 「嶺上開花」(calling kan, drawing the tile from the dead wall, then winning with that tile).

shouminkan sheryl

shouminkans are logically identical to pons, and kind of exist as their logical dual in terms of data representation (instead of holding the “unused” tile, now we hold the “added tile”). besides some stinky, grungy html manip in the javascript – which i can’t resist pasting below – the actual parsing and logic is exactly the same.

if (kui == 3) {
  ret += sprintHai136(
    t + added,
    1,
    'style="position:relative;top:12px;z-index:1;"',
  );
  ret += "<br>";
  ret += sprintHai136(h[0], 1, 'style="position:relative;z-index:1;"');
  ret += '</td><td valign="bottom">';
  ret += sprintHai136(h[1], 0);
  ret += sprintHai136(h[2], 0);
} else if (kui == 2) {
  ret += sprintHai136(h[1], 0);
  ret += "</td><td>";
  ret += sprintHai136(
    t + added,
    1,
    'style="position:relative;top:12px;z-index:1;"',
  );
  ret += "<br>";
  ret += sprintHai136(h[0], 1, 'style="position:relative;z-index:1;"');
  ret += '</td><td valign="bottom">';
  ret += sprintHai136(h[2], 0);
} else if (kui == 1) {
  ret += sprintHai136(h[2], 0);
  ret += sprintHai136(h[1], 0);
  ret += "</td><td>";
  ret += sprintHai136(
    t + added,
    3,
    'style="position:relative;top:12px;z-index:1;"',
  );
  ret += "<br>";
  ret += sprintHai136(h[0], 3, 'style="position:relative;z-index:1;"');
}

daiminkan damien && ankan andy

first thing of note is that – for the first time – we have an all new rotation shape for these kans:

WORD type8:8; // 136(0x88)
/*
(0) 01 234①
(1) 02 23①4
(2) 03 ①234
(3) 11 134②
(4) 12 13②4
(5) 13 ②134
(6) 21 124③
(7) 22 12③4
(8) 23 ③124
(9) 31 123④
(A) 32 12④3
(B) 33 ④123
*/

similar to the chiis, the question we’re asking is clear: “which tile is coming from the other player?” this time we use the hui value to identify the source of the fourth tile (since it could be anyone, including the current player for the ankan). the top 8 bits represent our “tile 0”, kind of like with the chii, with the bottom two bits of “tile 0” deciding its position in the rotation table.11

var hai0 = (m & 0xff00) >> 8;
if (!kui) hai0 = (hai0 & ~3) + 3; // ANNKAN

the rotation logic is functionally identical to that of pons, but with the positioning at the end replaced with some more precise swaps, rather than unshifts.

switch (hai0 % 4) {
  case 0:
    h[0] += 1;
    h[1] += 2;
    h[2] += 3;
    break;
  case 1:
    h[0] += 0;
    h[1] += 2;
    h[2] += 3;
    break;
  case 2:
    h[0] += 0;
    h[1] += 1;
    h[2] += 3;
    break;
  case 3:
    h[0] += 0;
    h[1] += 1;
    h[2] += 2;
    break;
}
if (kui == 1) {
  var a = hai0;
  hai0 = h[2];
  h[2] = a;
}
if (kui == 2) {
  var a = hai0;
  hai0 = h[0];
  h[0] = a;
}

the html manip makes it relatively clear what’s happening: closed kans are rendered with the first and last tiles face down (hence the undefined) with the middle two tiles always being the first two of the array.

ret += sprintHai136(kui ? hai0 : undefined, kui == 3 ? 1 : 0);
ret += sprintHai136(h[0], kui == 2 ? 1 : 0);
ret += sprintHai136(h[1], 0);
ret += sprintHai136(kui ? h[2] : undefined, kui == 1 ? 3 : 0);

tenhou terry

no nuki ?

no nuki. in the source code, the parser literally skips the case.

}else if (m&(1<<5)){ // NUKI
// nop
}else{ // MINNKANN, ANNKANN

this is natural, since nuki is a really particular call. it doesn’t exist in the standard 4-player ruleset, so for tenhou’s purposes it’s a 3-player only rule.

simply put, if you draw a NORTH wind tile, you can choose to put it down, gain one dora, and then draw from the dead wall, like a kan. it doesn’t involve any other players and it doesn’t meld with other tiles, so there are no calculations or bitfield accesses necessary.

we’re very lucky that this source code exists and is archived, both for practical and historical purposes. tenhou is an incredibly important piece of mahjong history, but even if it wasn’t video game source code preservation and documentation is a pretty underappreciated thing. the number that i tend to see getting thrown around is “90% of pre-2000 video game source code” is gone. the joy of being able to pick through and have that odd sort of intellectual asynchronous conversation through studying the source code of a program that inspired you is already impossible for the overwhelming majority of the medium’s works. and a joy it is to be able to pick apart and take a look into the minds of the people that made such a historically important piece of software.

thank you to mthrok for having pubished such a well put together parser. without it, i don’t think i would have been able to make sense of the weird xml representation tenhou uses without an insane amount of trial and error, and i don’t think i would have ever found the original source code on tenhou.net. hopefully this article finds other people who are equally interested in tenhou and mahjong programming and can help others make even more sense of their format, and even a little bit about the history of the platform and who made it.

if you’re interested in further information, please have a look at hatenablog’s post12 documenting the log format. it goes into more detail on the XML format and the other tags that I didn’t talk about here. i’ll be using it as my main reference to write my own parser.

happy jonging

mason pike
march 5th, 2026

  1. not to reduce these facts nor belittle my own personal beliefs. if i have to say it clearly: this technology is harmful. i just don’t want this to become “another blog post talking about it.”↩︎

  2. required reading for the uninitiated ; optional watching for the lazy↩︎

  3. internet archive link for that, just in case. as stated, i’m fairly certain that these are up for reference purposes, but it is a little strange that they’re in an image directory and i’d be devastated if the script ever got truly “lost”.↩︎

  4. another ia link. this one has waaaaaayyy less crawls by the wayback machine. glad i caught it↩︎

  5. first of all, my condolences.↩︎

  6. the “callee”. the player whose tile was taken.↩︎

  7. following from the works of Sanjiang Li, Xueqing Yan, and Yongming Li mentioned above – but with a bit of variation – i’m using t to represent a given tile, t' another tile with the same value and suit. likewise t'' represents another tile of the same suit and value which is itself distinct from t and t'.↩︎

  8. yes, yes, this is assuming we don’t do any fancy compiler directives to manipulate the padding, no “bitflag enum” magic or anything equally clever. what i mean to say is: i don’t know if there is a method which is less complicated that allows us to do the same thing in the same space with the same guarantees↩︎

  9. you might be raising your eyebrows at this, and i think you’d be right. this also strikes me as a weird implementation detail. why hold this information for ALL union members when the only valid value is 0 for chii and nuki? why not use these two bits – for chii – to hold which tile came from the opponent ? i don’t really have a good answer for that, and i don’t want to speculate one way or the other. i’m going to continue here with the assumption that it is necessary that this information (kui) be encoded since it’s used in the sprintHai136() function in the original code:

    ret += sprintHai136(h[0], kui == 3 ? 1 : 0);
    ret += sprintHai136(h[1], kui == 2 ? 1 : 0);
    ret += sprintHai136(h[2], kui == 1 ? 3 : 0);

    given that hui is the position of the chii’d player relative to you, shouldn’t this always return the same value? I think so, but my intuition tells me that there’s something missing here that i’m not seeing, so i – as all good teachers do – leave this question open as an exercise for the reader.↩︎

  10. because with a pon we can have honor tiles, 8s, and 9s, so our possibility space is a bit wider.↩︎

  11. i admit i don’t really understand this (hai0&~3)+3; operation. of course we’re setting the rotation to 0 if it’s a closed kan, but wouldn’t this just set the bottom two bits to 00 only to set them back to 11 ? isn’t this strictly equivalent to hai | 3 for all unsigned integers? i’m not sure, and i’m given to assuming this is a chesterton’s fence type thing to guard against mistyped values or something (the (hai0/4)*4 that comes later gives me the same vibe), so i wont speculate too much.↩︎

  12. obligatory ia link↩︎