krabby: making an AST

In my previous post about krabby, I made a "scope tree" data structure with which I performed local name resolution. It was incredibly efficient compared to syn's AST, but I wanted a more realistic comparison of this memory layout style to syn. So I spent most of this week building a whole AST from scratch.

As a quick teaser: the results were excellent! The work-in-progress AST is currently about 60 times smaller than syn's, but I expect it to be about 25 times smaller once it's finished (I just haven't gotten around to encoding types yet). I'm still a bit surprised that there was so much space to optimize away. To be clear: rustc only uses the AST temporarily, and has more efficient data structures for later stages, so krabby as a whole isn't going to use 25 times less memory than rustc. But this gives us some measure of the benefits of krabby's memory layout techniques, and it only makes me more excited to get to the later stages.

The AST is incredibly memory-efficient, but it's also designed to be really fast to traverse. It's separated into expressions, paths, patterns, items, and types. These categories have very different semantics and store very different data; storing them separately allows them to use more specialized representations. Most traversals focus on one category, rather than the whole AST; they can iterate through the nodes for that category without loading the memory of the rest of the AST.

the simple nodes

Expressions, patterns, and types are represented relatively similarly to syn or rustc. The same node types exist—binary operations, slice patterns, reference types—and have the same meanings. What's changed is everything around them.

Each category is stored in three arrays, each containing a separate field for each node. A node is identified by its (32-bit) index in these arrays. The first array is the tag; it tracks the type of the node, and may embed some additional bits of information if it is convenient. The second array is the parent; it stores the index of the parent of the node (if any), instead of pointing to the node's children. The last array is the associated data; it provides auxiliary information that each node needs, e.g. the name of a method in a method call.

In a few cases, complex nodes are broken down into smaller pieces. For example, if-else expressions are represented by a top-level IfElse node, which has zero or more ElseIf nodes and exactly one Else node as children. In a bottom-up traversal, the explicit node types make it easier to distinguish sibling expressions (e.g. the alternate branches for the if-else) from each other. rustc represents else-if cases using nested if expressions, but the flattened representation used here should help reduce the depth of the AST.

syn and rustc dedicate a lot of memory to storing span information. This is useful for locating an AST node in the user's source code for diagnostics, but it isn't as relevant to the success path of compilation (at best, those spans will be carried forward for diagnostics at later compilation stages). We won't exclude them, but I'm borrowing a page from the Zig compiler here: we'll store a 32-bit token index for each AST node, locating a relevant token from the original token stream. This is a compact representation that is stored away from the rest of the AST, and it should avoid polluting the cache during traversals.

paths

Paths have the most creative representation of the entire AST. Paths are stored using two arrays (not counting the auxiliary array for tokens). These arrays have elements corresponding to path segments, not entire paths. The first array stores the tag of the segment, distinguishing e.g. an identifier segment (foo in foo::bar) from the generic arguments in a segment (e.g. <u32> in Vec<u32>). The tag specifies whether the segment is the first or last in the containing path, and (to some extent) what segments are before or after it. The second array stores associated data for the segment, such as the identifier for a simple path segment.

The path category also stores method references, external crate declarations, and import paths. These are assigned unique tag values, allowing them to be distinguished from normal paths. In any case, paths are referenced by storing the 32-bit index of the last segment of the path; a path is considered resolved when its last segment is resolved, so pointing to it (rather than the first segment) makes it easier to retrieve the results of name resolution.

items

Items are stored relatively similarly to the other "normal" categories, except for one important note: they store a copy of every block expression in the AST, as these can can contain items. Explicitly tracking block expressions (rather than pointing back to the AST) makes the item category independently traversable, which simplifies its usage during global name resolution (as it provides all the possible declarations to resolve against).

the results

You can find the code on SourceHut. At the moment, I haven't implemented literals and types. Even so, the results are still undeniable:

--- Parsing with syn
    ... done (23.812s)
        memory usage: +11048.055M
--- Building the syntax tree
    ... done (1.175s)
        memory usage: +177.641M
Syntax {
    idents: Idents { .. },
    paths: Paths {
        len: 2395012,
        cap: 4194304,
        num_segs: 1576443,
        num_type_segs: 16005,
        num_method_segs: 563318,
        num_extern_crate_segs: 692,
        num_use_segs: 238554,
    },
    pats: Pats {
        len: 2652254,
        cap: 4194304,
    },
    exprs: Exprs {
        len: 5614874,
        cap: 8388608,
    },
    items: Items {
        len: 3046565,
        cap: 4194304,
        num_block_exprs: 602798,
    },
}

My representation is currently about 60 times smaller than syn's AST. Even if I were to implement literals, types, and store token / span information, I expect my AST to remain at least 10 times smaller than syn's. I'm really happy with these results! They only represent the improvements in memory efficiency, and I can't wait to implement actual traversals over this AST to demonstrate this representation's access efficiency.

I've included some debug output from the AST. Given the large expanse of code I'm benchmarking against, I'm going to treat it like "the average Rust code", which is exactly what I'm optimizing for. The ratios between the various node types indicate the relative frequency of these parts of the AST, and they can help judge whether changes to the representation (especially those with subtle tradeoffs) will improve performance. This will guide any optimization of the AST.

what's next

Since I'm taking so much inspiration from Co-dfns, I took a moment to consider writing krabby in APL. It would be a lot more concise (I'm already at 6000 lines of code) and could be more performant, since Dyalog APL has good SIMD routines. While I had some trouble setting up Dyalog (due to keyboard configuration issues), I realised it would be much harder to document all the little details of my data structures in APL, and I wouldn't be able to control the low-level details of inter-thread communication for maximizing parallelism. I might reconsider (possibly with a sister language like J) once I've written my own parser and lexer.

During this week, I also talked to some more knowledgeable friends about name resolution. Global name res is significantly more complicated than local name res, and I needed some advice on how to go about doing it. It's interesting because it requires data from different files to be brought together, raising lots of questions about parallelism and efficient thread-safe communication. On the other hand, it cannot be disentangled from macro expansion, which I don't want to waste time implementing right now.

Although I was advised against it, I think I'm going to have to implement global name res soon. It'll also force me to set up a lot of architecture for krabby, such as parsing Cargo.toml, recognizing crates, loading the right source files, and launching worker threads. While macro expansion is going to be bad, I'll try to copy as much code from rustc as I can. Once that is all set up, I can begin benchmarking krabby against rustc directly, since they'll be doing all the same work.