omitting spans


on ; by arya dradjica

I have a large backlog of posts to write, but I quickly wanted to share a fun idea I explored on Thursday. I’ve been thinking about how incremental compilation should work in Krabby, and a major sticking point there is spans. But I think I have a plan.

A span refers to some portion of source code, e.g. 21:43 -- 22:17 in "foo.rs"; it is used for referring to source code when rustc prints diagnostics. You can find spans in most compilers. But rustc encodes more information into them: in particular, the edition and macro expansion they originate from. Macro expansion data is useful for diagnostics in a very similar way to line and column information; rustc diagnostics can refer to code originating from a macro invocation and print information about the macro invocation and the macro definition. But the most interesting part here is editions.

Rust’s edition mechanism is very interesting. It’s not a language standard, like C11 or C23; a Rust project can choose a certain edition and remain fully compatible with other Rust libraries. The choice of edition affects how the code within the project is processed, but remains a project-local choice. Conceptually, all the crates in your dependency graph are processed into an (effectively) edition-independent representation before being compiled together. Editions can can change syntactic choices (e.g. whether async is a keyword) but (usually) not cross-crate interactions like type-checking behavior. I’m hedging my words a bit because rustc sometimes abuses editions to make bigger changes.

The association of edition data to individual tokens causes … complications. In most compilers, span data would only be relevant for printing diagnostics. But rustc depends on the edition data encoded by spans throughout compilation:

The end result? rustc has to pass around spans everywhere, all the time. Spans are embedded in every intermediate representation within rustc – every token, every AST node, every HIR expression, every MIR statement. Spans are 8 bytes in size, but they reference more data stored in thread-local storage. rustc doesn’t use struct-of-arrays layouts for its data structures (due to the added complexity), so spans live next to important data and take up valuable space in cache. To worsen the problem, spans affect incremental compilation: innocuous changes to spans (e.g. because you re-order the functions in a file) can cause unnecessary rebuilds. rustc does the best it can in difficult circumstances, but I think a bigger change is necessary.

I think it is possible for Krabby to treat spans differently from rustc.

Step one: track edition data separately from spans. Based on the use sites I found in rustc (see below), edition data is usually extracted from the spans of particular keywords (e.g. dyn, async, if). Like all identifiers, keywords are interned into 32-bit IDs; I can embed edition data in there for these specific keywords.

Step two: compute spans on demand. Compiler passes that transform IRs will come in two versions, one that excludes spans and one that includes them. By default, span-excluding passes will be used; but if a span within a particular item turns out to be needed, all previous passes for that item will be repeated with spans. This will require a lot of effort.

I’m curious to explore this; this could improve performance in ways that are very hard to measure (caching effects and unnecessary recompiles).

appendix: use sites

Here are all the use sites I found in rustc for Span::edition() and some related convenience functions, excluding cases where a diagnostic (warning/error) is being generated. It’s probably not exhaustive – there are many ways to access information from Span and I didn’t check them all – but I found it quite informative.

N.B. I tried to understand the context for these uses, but I may have gotten them wrong! If you are more familiar with the codebase and can correct me, please reach out.