This week has been incredibly productive. I was one of the volunteers for RustWeek, which was a massive event with ~800 attendants and the first All-Hands (gathering of all Rust contributors) in over 6 years. I met so many amazing figures in the Rust community, and I had really interesting conversations with them about compiler performance and Krabby. I'm incredibly gratified to have received confirmation that I'm on the right track with this work, and I'm incredibly excited to see where I can take Krabby in the future.
In more concrete terms, I did put together the task architecture I had discussed in the previous post. My estimate of four weeks was surprisingly close. It took a while for me to wrap my head around the task architecture, in order to make it a concrete implementation, but it's here now. I've published the updated source code on SourceHut (now with a basic README and license).
The missing piece of the task architecture was a concept of a context. Every task is associated with a context, which is essentially just a set of hooks for reacting to the completion of the task. For example, the context for a lexing task may route the token stream output to the parser. Because the context is responsible for gluing together different kinds of tasks, the implementation of each task is independent of the rest of the system; this provides a degree of isolation that helps reason about the codebase more easily.
Here's a snippet of code demonstrating this architecture. This piece establishes the public-facing API of the lexer.
/// Lex a source file.
pub struct Lex<C: LexContext> {
/// The contents of the file.
pub contents: Arc<[u8]>,
/// The task context.
pub context: C,
}
/// Context for lexing.
pub trait LexContext {
/// React to the completion of lexing.
fn on_lexed(self, output: Arc<TokenStream>);
}
The Lex
type represents a single lexing task; an instance of it would be appended to a (possibly thread-local) work queue for the actual lexer runner. Context is expressed as a generic type parameter. In the real compiler, this would be a small enum
describing possible contexts in which a file may be lexed; its implementation of LexContext
would route the token stream to the parser (with a separate ParseContext
type for parsing context). In my current codebase, all the contexts gluing together the tasks for the compiler can be found in a single module.
Since the last blog post, I implemented basic Cargo functionality (the loading and parsing of Cargo manifests, and the discovery of workspaces and workspace members). I was then able to route all discovered Cargo packages to sourcing (i.e. loading source files), after which I implemented lexing (simply using proc-macro2
for now). I'm using the tracing
library to see what the code is doing. All put together, I have a working proof-of-concept of the Krabby compiler!
If you're interested, you can invoke krabby build <packages>
, where <packages>
is a list of directories containing Cargo packages to be sourced and lexed. Krabby will load the Cargo manifest of each crate and load and tokenize every source file, using 4 threads. It will output a trace-level log of all the steps it is executing to standard output via tracing
. Watching it actually distribute tasks effectively and execute quite fast (even without any optimization effort) has been really amazing.
My short-term goal is to implement name resolution (and perhaps macro expansion) as I will then be able to benchmark and test Krabby against rustc
comprehensively. While I could implement a simple parsing task using syn
very quickly, I don't want to implement name resolution on inefficient data structures. I'm going to take a few weeks to implement efficient memory layouts for token streams and the AST (while still using proc-macro2
and syn
for lexing/parsing). I will then dive headfirst into my high-parallel name resolution scheme, which I'll cover in a future blog post.
RustWeek has left me really energized, and with Krabby development going so well, I'm excited to see what the future holds.