identifier interning in 2025


on ; by arya dradjica

Name resolution requires storing lots of identifiers and comparing them to each other very rapidly. While you could store identifiers as Strings and compare them using basic string equality, this is quite inefficient. A nearly universal approach is interning — assigning every identifier a unique numeric ID, and representing and comparing them by ID. Interning uses a hash table to deduplicate identifiers and ensure that distinct IDs refer to distinct identifiers.

Krabby uses interning; I implemented support for it while writing the lexer. Krabby’s token stream representation stores interned IDs for identifiers, so they can be used as-is in later steps. Interestingly, this happens to optimize parsing: keywords can be identified by assigning them special IDs and checking for those, instead of checking identifiers against a list of known keywords. This makes it easy to rapidly search for keywords in token streams (even using SIMD!) and could help speed up parsing.

However, while I was benchmarking Krabby to optimize task queueing, I noticed that identifier interning was surprisingly slow. I was using the symbol_table crate for interning; it supports concurrent interning, which is important as I aim for parallelized parsing. But profiling revealed that time was wasted in mutex contention. To address the problem quickly, I switched to a custom single-threaded interner. This improved performance drastically and let me focus my benchmarking on task queueing, but it was a problem I had to solve later.

Once takeaway was complete, in September, I returned to this issue. I looked into symbol_table more — it supported concurrent interning by splitting the underlying hash table into shards and wrapping each one in a mutex. I guessed that this design would not scale well for my performance goals, so I started considering writing my own concurrent hash table. This was a much deeper rabbit hole than I expected.

I had several points of reference for designing a concurrent string interner / hash table: symbol_table, string-interner (a fast single-threaded interner), hashbrown (a popular single-threaded hash table library), and papaya (an existing concurrent hash table library). At first glance, I thought I could use papaya as-is, but it heavily relies on heap allocations and would need to be rewritten to meet my performance requirements.

First, I played around with single-threaded hash table design, looking for something that would be easy to make concurrent. I wrote many implementations, benchmarked and profiled them, and found many interesting micro-optimizations. I eventually ended up with a solid foundation: a single-threaded interner that was significantly faster than string-interner! It used linear probing with a low load factor (around 50%), and heavily optimized how items from the hash table were compared to the identifier being interned.

My plan for making this concurrent was simple: every slot in the hash table would be an atomic variable. I don’t need to support removals, which greatly simplifies things. For the most part, this design would just work: lookups would load from the atomic variables, and insertions would use compare-exchanges to fill empty slots and detect races. The actual string contents would be written to a secondary data structure that is much simpler to manage.

But concurrency is hardest on hash table growth. Growing a hash table in a concurrent environment has two issues: 1) it is hard to know when the previous hash table allocation can be freed; 2) a growth operation can race with an insertion into the previous allocation, and synchronizing between them can be viciously difficult. Jana Dönszelmann made the excellent suggestion of using mmap to make an almost infinitely large allocation, so I would not have to free a hash table allocation in a concurrent context. But the latter issue was much harder.

Several weeks in, I had a design I thought might work. In short, it would use per-thread counters that tracked the position of the last insertion initiated on that thread; hash table growth would load those positions and try inserting a “reallocation has begun” marker into those slots. Either the insertion would succeed, in which case the thread growing the hash table had observed it and was guaranteed to account for it; or the reallocation marker would be placed there first, and the inserting thread would pause to help with reallocation. I was incredibly uncomfortable with it, and wasn’t motivated to try implementing it.

A week of anxious procrastination later, I realized that two months had passed by; it was November. My goal to implement a basic subset of name resolution by the end of the year was effectively unreachable. I decided to park my optimization efforts and make some headway on name resolution. I switched Krabby back to using symbol_table, and unhappily picked up the next item on my to-do list: parsing.

While I still haven’t returned to concurrent string interning, my time spent optimizing was not wasted: it became the basis for my RustWeek 2026 talk. I’ll write about the exact design I ended up with, and the specific micro-optimizations I explored, as I prepare for the talk and build the associated library, phonebook.