Phase 4: Performance Tuning
2026-02-15
A P2P network that can traverse NATs but chokes on its own I/O is not much use. Phase 4 continues with performance tuning: centralizing database configuration, caching fragment blobs in memory, managing QUIC connection lifecycles, and eliminating unnecessary disk reads from the attestation hot path.
The guiding principle was the same as the rest of Tesseras: do the simplest
thing that actually works. No custom allocators, no lock-free data structures,
no premature complexity. A centralized StorageConfig, an LRU cache, a
connection reaper, and a targeted fix to avoid re-reading blobs that were
already checksummed.
What was built
Centralized SQLite configuration (tesseras-storage/src/database.rs) — A
new StorageConfig struct and open_database() / open_in_memory() functions
that apply all SQLite pragmas in one place: WAL journal mode, foreign keys,
synchronous mode (NORMAL by default, FULL for unstable hardware like RPi + SD
card), busy timeout, page cache size, and WAL autocheckpoint interval.
Previously, each call site opened a connection and applied pragmas ad hoc. Now
the daemon, CLI, and tests all go through the same path. 7 tests covering
foreign keys, busy timeout, journal mode, migrations, synchronous modes, and
on-disk WAL file creation.
LRU fragment cache (tesseras-storage/src/cache.rs) — A
CachedFragmentStore that wraps any FragmentStore with a byte-aware LRU
cache. Fragment blobs are cached on read and invalidated on write or delete.
When the cache exceeds its configured byte limit, the least recently used
entries are evicted. The cache is transparent: it implements FragmentStore
itself, so the rest of the stack doesn't know it's there. Optional Prometheus
metrics track hits, misses, and current byte usage. 3 tests: cache hit avoids
inner read, store invalidates cache, eviction when over max bytes.
Prometheus storage metrics (tesseras-storage/src/metrics.rs) — A
StorageMetrics struct with three counters/gauges: fragment_cache_hits,
fragment_cache_misses, and fragment_cache_bytes. Registered with the
Prometheus registry and wired into the fragment cache via with_metrics().
Attestation hot path fix (tesseras-replication/src/service.rs) — The
attestation flow previously read every fragment blob from disk and recomputed
its BLAKE3 checksum. Since list_fragments() already returns FragmentId with
a stored checksum, the fix is trivial: use frag.checksum instead of
blake3::hash(&data). This eliminates one disk read per fragment during
attestation — for a tessera with 100 fragments, that's 100 fewer reads. A test
with expect_read_fragment().never() verifies no blob reads happen during
attestation.
QUIC connection pool lifecycle (tesseras-net/src/quinn_transport.rs) — A
PoolConfig struct controlling max connections, idle timeout, and reaper
interval. PooledConnection wraps each quinn::Connection with a last_used
timestamp. When the pool reaches capacity, the oldest idle connection is evicted
before opening a new one. A background reaper task (Tokio spawn) periodically
closes connections that have been idle beyond the timeout. 4 new pool metrics:
tesseras_conn_pool_size, pool_hits_total, pool_misses_total,
pool_evictions_total.
Daemon integration (tesd/src/config.rs, main.rs) — A new [performance]
section in the TOML config with fields for SQLite cache size, synchronous mode,
busy timeout, fragment cache size, max connections, idle timeout, and reaper
interval. The daemon's main() now calls open_database() with the configured
StorageConfig, wraps FsFragmentStore with CachedFragmentStore, and binds
QUIC with the configured PoolConfig. The direct rusqlite dependency was
removed from the daemon crate.
CLI migration (tesseras-cli/src/commands/init.rs, create.rs) — Both
init and create commands now use tesseras_storage::open_database() with
the default StorageConfig instead of opening raw rusqlite connections. The
rusqlite dependency was removed from the CLI crate.
Architecture decisions
- Decorator pattern for caching:
CachedFragmentStorewrapsBox<dyn FragmentStore>and implementsFragmentStoreitself. This means caching is opt-in, composable, and invisible to consumers. The daemon enables it; tests can skip it. - Byte-aware eviction: the LRU cache tracks total bytes, not entry count. Fragment blobs vary wildly in size (a 4KB text fragment vs a 2MB photo shard), so counting entries would give a misleading picture of memory usage.
- No connection pool crate: instead of pulling in a generic pool library,
the connection pool is a thin wrapper around
DashMap<SocketAddr, PooledConnection>with a Tokio reaper. QUIC connections are multiplexed, so the "pool" is really about lifecycle management (idle cleanup, max connections) rather than borrowing/returning. - Stored checksums over re-reads: the attestation fix is intentionally
minimal — one line changed, one disk read removed per fragment. The checksums
were already stored in SQLite by
store_fragment(), they just weren't being used. - Centralized pragma configuration: a single
StorageConfigstruct replaces scatteredPRAGMAcalls. Thesqlite_synchronous_fullflag exists specifically for Raspberry Pi deployments where the kernel can crash and lose un-checkpointed WAL transactions.
What comes next
- Phase 4 continued — Shamir's Secret Sharing for heirs, sealed tesseras (time-lock encryption), security audits, institutional node onboarding, storage deduplication, OS packaging
- Phase 5: Exploration and Culture — public tessera browser by era/location/theme/language, institutional curation, genealogy integration, physical media export (M-DISC, microfilm, acid-free paper with QR)
With performance tuning in place, Tesseras handles the common case efficiently: fragment reads hit the LRU cache, attestation skips disk I/O, idle QUIC connections are reaped automatically, and SQLite is configured consistently across the entire stack. The next steps focus on cryptographic features (Shamir, time-lock) and hardening for production deployment.