Phase 4: Performance Tuning

2026-02-15

A P2P network that can traverse NATs but chokes on its own I/O is not much use. Phase 4 continues with performance tuning: centralizing database configuration, caching fragment blobs in memory, managing QUIC connection lifecycles, and eliminating unnecessary disk reads from the attestation hot path.

The guiding principle was the same as the rest of Tesseras: do the simplest thing that actually works. No custom allocators, no lock-free data structures, no premature complexity. A centralized StorageConfig, an LRU cache, a connection reaper, and a targeted fix to avoid re-reading blobs that were already checksummed.

What was built

Centralized SQLite configuration (tesseras-storage/src/database.rs) — A new StorageConfig struct and open_database() / open_in_memory() functions that apply all SQLite pragmas in one place: WAL journal mode, foreign keys, synchronous mode (NORMAL by default, FULL for unstable hardware like RPi + SD card), busy timeout, page cache size, and WAL autocheckpoint interval. Previously, each call site opened a connection and applied pragmas ad hoc. Now the daemon, CLI, and tests all go through the same path. 7 tests covering foreign keys, busy timeout, journal mode, migrations, synchronous modes, and on-disk WAL file creation.

LRU fragment cache (tesseras-storage/src/cache.rs) — A CachedFragmentStore that wraps any FragmentStore with a byte-aware LRU cache. Fragment blobs are cached on read and invalidated on write or delete. When the cache exceeds its configured byte limit, the least recently used entries are evicted. The cache is transparent: it implements FragmentStore itself, so the rest of the stack doesn't know it's there. Optional Prometheus metrics track hits, misses, and current byte usage. 3 tests: cache hit avoids inner read, store invalidates cache, eviction when over max bytes.

Prometheus storage metrics (tesseras-storage/src/metrics.rs) — A StorageMetrics struct with three counters/gauges: fragment_cache_hits, fragment_cache_misses, and fragment_cache_bytes. Registered with the Prometheus registry and wired into the fragment cache via with_metrics().

Attestation hot path fix (tesseras-replication/src/service.rs) — The attestation flow previously read every fragment blob from disk and recomputed its BLAKE3 checksum. Since list_fragments() already returns FragmentId with a stored checksum, the fix is trivial: use frag.checksum instead of blake3::hash(&data). This eliminates one disk read per fragment during attestation — for a tessera with 100 fragments, that's 100 fewer reads. A test with expect_read_fragment().never() verifies no blob reads happen during attestation.

QUIC connection pool lifecycle (tesseras-net/src/quinn_transport.rs) — A PoolConfig struct controlling max connections, idle timeout, and reaper interval. PooledConnection wraps each quinn::Connection with a last_used timestamp. When the pool reaches capacity, the oldest idle connection is evicted before opening a new one. A background reaper task (Tokio spawn) periodically closes connections that have been idle beyond the timeout. 4 new pool metrics: tesseras_conn_pool_size, pool_hits_total, pool_misses_total, pool_evictions_total.

Daemon integration (tesd/src/config.rs, main.rs) — A new [performance] section in the TOML config with fields for SQLite cache size, synchronous mode, busy timeout, fragment cache size, max connections, idle timeout, and reaper interval. The daemon's main() now calls open_database() with the configured StorageConfig, wraps FsFragmentStore with CachedFragmentStore, and binds QUIC with the configured PoolConfig. The direct rusqlite dependency was removed from the daemon crate.

CLI migration (tesseras-cli/src/commands/init.rs, create.rs) — Both init and create commands now use tesseras_storage::open_database() with the default StorageConfig instead of opening raw rusqlite connections. The rusqlite dependency was removed from the CLI crate.

Architecture decisions

Decorator pattern for caching: CachedFragmentStore wraps Box<dyn FragmentStore> and implements FragmentStore itself. This means caching is opt-in, composable, and invisible to consumers. The daemon enables it; tests can skip it.
Byte-aware eviction: the LRU cache tracks total bytes, not entry count. Fragment blobs vary wildly in size (a 4KB text fragment vs a 2MB photo shard), so counting entries would give a misleading picture of memory usage.
No connection pool crate: instead of pulling in a generic pool library, the connection pool is a thin wrapper around DashMap<SocketAddr, PooledConnection> with a Tokio reaper. QUIC connections are multiplexed, so the "pool" is really about lifecycle management (idle cleanup, max connections) rather than borrowing/returning.
Stored checksums over re-reads: the attestation fix is intentionally minimal — one line changed, one disk read removed per fragment. The checksums were already stored in SQLite by store_fragment(), they just weren't being used.
Centralized pragma configuration: a single StorageConfig struct replaces scattered PRAGMA calls. The sqlite_synchronous_full flag exists specifically for Raspberry Pi deployments where the kernel can crash and lose un-checkpointed WAL transactions.

What comes next

Phase 4 continued — Shamir's Secret Sharing for heirs, sealed tesseras (time-lock encryption), security audits, institutional node onboarding, storage deduplication, OS packaging
Phase 5: Exploration and Culture — public tessera browser by era/location/theme/language, institutional curation, genealogy integration, physical media export (M-DISC, microfilm, acid-free paper with QR)

With performance tuning in place, Tesseras handles the common case efficiently: fragment reads hit the LRU cache, attestation skips disk I/O, idle QUIC connections are reaped automatically, and SQLite is configured consistently across the entire stack. The next steps focus on cryptographic features (Shamir, time-lock) and hardening for production deployment.