keeping secrets

My online identity comprises a number of accounts spread across dozens of websites and systems. They have different means of identity (e.g. usernames and e-mail addresses) and different modes of authentication (2FA, PIN codes, SSH keys, and traditional passwords). I've been using the pass tool to store the passwords for these. It's a simple tool that follows the UNIX philosophy, storing every password in a separate GPG-encrypted file. It's easy to understand and work with, and it comes with a number of useful integrations. However, I've been concerned about the fact that it doesn't protect usernames or e-mail addresses—if an attacker can observe the password store, they can locate my accounts.

I decided to make an unnecessarily secure secret storage mechanism with an especially paranoid threat model. I assume that malicious programs are present on the user's system and that they are using Meltdown and Spectre based attacks to observe victim programs. The secrets are managed by a daemon program that answers queries over a UNIX socket.

A quick aside: Linux does not have a comprehensive permission management system, which means that any process running with your user ID has access to pretty much whatever information it wants. A secure system would assign every application a different UID and manage permissions to user data through groups. This would be quite difficult to achieve today, partly due to UNIX's "everything is a file" philosophy.

Secrets are organized in a hierarchial structure, like 'pass', but are stored in a single encrypted file. The most important question is how those secrets are laid out within the file. They need to be easy to access but their position in the file should not reveal information about them (as attackers may be able to observe memory patterns). The depth of a secret in the hierarchy should also be obscured.

I chose to use a tree structure based on a keyed hash table. The file contains a randomly selected hash key. Each inner node is represented by a hash table. Its children are stored based on a cryptographic hash of their identifiers and the hash key. Leaf nodes point to secret data, while inner nodes point to other hash tables. To hide the depth of the tree, nodes containing a limited number of descendant leaves are collapsed into their parents.

A major issue is data sizes. I think every identifier should be exactly 32 bytes (shorter identifiers being padded with zero bytes). The value of a leaf node, however, may have various sizes: 32/64 bytes for passwords and secrets, 256 bytes for longer strings (e.g. 2FA backup codes), and any multiple of 4096 bytes for truly big stuff (e.g. SSH keys). Hash tables should have power-of-two sizes beginning with 32 elements, and should maintain a load factor between 0.25 and 0.75. Garbage values within the hash table should be treated exactly like normal entries (for the purpose of memory access patterns).