krabby: the architecture

I felt a bit stuck this week. While I have developed an interesting AST to experiment on, going any further with it is difficult. The next major step is name resolution, and that will require the ASTs of different files to interact with each other. My current testing / benchmarking setup, while good for executing things per file, can't deal with that. Now that I know these memory layout techniques will be effective, I decided to start over with a focus on krabby's high-level architecture.

the task architecture

At a high level, compilation is composed of a large number of small tasks, such as the parsing of a source file or the type-checking of a specific function. Every compiler defines some way for tasks to be generated and distributed throughout the system, whether it is implicit in the control flow or explicitly implemented as an "engine" of some kind. For example, rustc uses a query-based architecture.

While rustc's architecture works pretty well for incremental builds, I think non-incremental builds can be made faster. I'm intrigued by the idea of batched execution: it might be faster to execute similar tasks (e.g. the type-checking of some function bodies) in a group rather than independently. I'm also optimistic about an "eager" architecture, where some tasks (e.g. the parsing of a source file) are generated upfront even if we don't know whether they'll be used. In contrast to a "lazy" architecture like rustc's, this adds more opportunities for early parallelism, and is a nice optimisation if these "guessed" tasks are likely to be used.

My vision for krabby is an eager, push-based task architecture. For every kind of task (e.g. parsing, name resolution, type-checking), there is a queue of pending tasks and a set of runners consuming from that queue. Compilation starts from (for example) a "parse lib.rs" task, which will lead to new tasks being pushed onto the various queues. Runners will wait on these queues and execute tasks as they arrive.

There is a fixed number of worker threads, and they are capable of executing any kind of task (i.e. every thread has a runner for every kind of task). A work-stealing architecture is used: tasks default to being added to a thread-local queue, but if another thread is idle, it can steal those tasks. This way, threads prioritize working on data that is already in their local cache. Runners representing earlier stages (e.g. lexing) are prioritized over later ones, as they have longer critical paths (i.e. they will have more work performed on them before compilation is finished, so they need to be executed earlier for compilation to finish quickly).

Runners attempt to load tasks in batches. This way, the CPU is executing the same code repeatedly, which means that it will stay in (and benefit from) the code cache. Combined with the memory layout techniques from previous posts, this opens up opportunities for vectorization and batching across tasks. I'm particularly excited for tasks that perform lots of "database" lookups (e.g. trait resolution).

Tasks can be added to the system speculatively. For example, the compilation of a crate might start with a "parse all Rust files within the src directory" task. Even though some of these files might not be included in the crate, it is highly likely that they will be, so parsing them upfront (instead of waiting for mod statements to be discovered in parsed files) is a good idea.

daemonization

While rustc performs some parallelization, it was not originally designed for it; like the build systems of C compilers, Cargo would (and still does) execute rustc in parallel for every crate in the dependency graph, so parallelism within a single crate is less important. But each invocation of rustc suffers some overhead, tasks can't be batched across crates, and every crate can only start compiling after its dependencies.

rustc is also incompatible with the needs of a language server, and so rust-analyzer has to re-implement a lot of compilation logic. At a high level, a language server basically re-compiles your code repeatedly with very small incremental changes; in theory, this could be supported as part of the compiler's standard incremental compilation architecture.

I want krabby to operate like a daemon. It can compile multiple crates and a language server (while using slightly different implementations of some tasks) can be implemented in the same codebase. By using a long-lived daemon, I can cache data in memory (as a faster alternative to the disk), execute tasks in bigger batches, and share resources across different compilations. By providing a wrapper that looks like rustc but submits jobs to the daemon, I can support any existing build system (but I will build in support for Cargo for better performance).

As a fallback, for conditions where a daemon is unsuitable (e.g. due to security restrictions or containerization), a one-shot runner executable will also be provided. This will offer the same functionality as the daemon, but will only be capable of executing an initially-provided set of tasks. It will be faster to start up but more costly if used repeatedly.

implementation

Here's the plan, in order:

I've set up a simple krabby-cli crate for emulating Cargo (and possibly rustc). Currently, it just has a build subcommand, for emulating cargo build.
I'm writing krabby-cargo, which will provide the actual logic for emulating Cargo; it'll be used by the CLI directly (to answer simple requests like for crate metadata) as well as by the main daemon (to track crate targets and the dependency graph). For now, it will just call the standard Cargo program for package management and generating a Cargo.lock file, which I'll read from to learn the dependency graph.
Once krabby-cargo can determine crate targets and the dependency graph, I'll build the actual push-based architecture, possibly in a standalone crate. krabby-cargo will use this to provide a runner and queues for Cargo related tasks.
For now, I'll implement all the glue logic in krabby-cli, so it does everything in-process instead of talking to a daemon. At this stage, krabby-cli should be able to list all the Rust files it needs to parse, from all the crates in the dependency graph.
I can then consider whether to implement the daemon (and a client crate for talking to it) or the lexer (simply using proc_macro2 for now). I suspect it will take at least four weeks for me to get here.