Tesseras

Phase 4: Punching Through NATs

2026-02-15

Most people's devices sit behind a NAT — a network address translator that lets them reach the internet but prevents incoming connections. For a P2P network, this is an existential problem: if two nodes behind NATs can't talk to each other, the network fragments. Phase 4 continues with a full NAT traversal stack: STUN-based discovery, coordinated hole punching, and relay fallback.

The approach follows the same pattern as most battle-tested P2P systems (WebRTC, BitTorrent, IPFS): try the cheapest option first, escalate only when necessary. Direct connectivity costs nothing. Hole punching costs a few coordinated packets. Relaying costs sustained bandwidth from a third party. Tesseras tries them in that order.

What was built

NatType classification (tesseras-core/src/network.rs) — A new NatType enum (Public, Cone, Symmetric, Unknown) added to the core domain layer. This type is shared across the entire stack: the STUN client writes it, the DHT advertises it in Pong messages, and the punch coordinator reads it to decide whether hole punching is even worth attempting (Cone-to-Cone works ~80% of the time; Symmetric-to-Symmetric almost never works).

STUN client (tesseras-net/src/stun.rs) — A minimal STUN implementation (RFC 5389 Binding Request/Response) that discovers a node's external address. The codec encodes 20-byte binding requests with a random transaction ID and decodes XOR-MAPPED-ADDRESS responses. The discover_nat() function queries multiple STUN servers in parallel (Google, Cloudflare by default), compares the mapped addresses, and classifies the NAT type:

Retries with exponential backoff and configurable timeouts. 12 tests covering codec roundtrips, all classification paths, and async loopback queries.

Signed punch coordination (tesseras-net/src/punch.rs) — Ed25519 signing and verification for PunchIntro, RelayRequest, and RelayMigrate messages. Every introduction is signed by the initiator with a 30-second timestamp window, preventing reflection attacks (where an attacker replays an old introduction to redirect traffic). The payload format is target || external_addr || timestamp — changing any field invalidates the signature. 6 unit tests plus 3 property-based tests with proptest (arbitrary node IDs, ports, and session tokens).

Relay session manager (tesseras-net/src/relay.rs) — Manages transparent UDP relay sessions between NATed peers. Each session has a random 16-byte token; peers prefix their packets with the token, the relay strips it and forwards. Features:

DHT message extensions (tesseras-dht/src/message.rs) — Seven new message variants added to the DHT protocol:

MessagePurpose
PunchIntro"I want to connect to node X, here's my signed external address"
PunchRequestIntroducer forwards the request to the target
PunchReadyTarget confirms readiness, sends its external address
RelayRequest"Create a relay session to node X"
RelayOfferRelay responds with its address and session token
RelayCloseTear down a relay session
RelayMigrateUpdate session after network change

The Pong message was extended with NAT metadata: nat_type, relay_slots_available, and relay_bandwidth_used_kbps. All new fields use #[serde(default)] for backward compatibility — old nodes ignore what they don't recognize, new nodes fall back to defaults. 9 new serialization roundtrip tests.

NatHandler trait and dispatch (tesseras-dht/src/engine.rs) — A new NatHandler async trait (5 methods) injected into the DHT engine, following the same dependency injection pattern as the existing ReplicationHandler. The engine's message dispatch loop now routes all punch/relay messages to the handler. This keeps the DHT engine protocol-agnostic while allowing the NAT traversal logic to live in tesseras-net.

Mobile reconnection types (tesseras-embedded/src/reconnect.rs) — A three-phase reconnection state machine for mobile devices:

  1. QuicMigration (0-2s) — try QUIC connection migration for all active peers
  2. ReStun (2-5s) — re-discover external address via STUN
  3. ReEstablish (5-10s) — reconnect peers that migration couldn't save

Peers are reconnected in priority order: bootstrap nodes first, then nodes holding our fragments, then nodes whose fragments we hold, then general DHT neighbors. A new NetworkChanged event variant was added to the FFI event stream so the Flutter app can show reconnection progress.

Daemon NAT configuration (tesd/src/config.rs) — A new [nat] section in the TOML config with STUN server list, relay toggle, max relay sessions, bandwidth limits (reciprocal vs bootstrap), and idle timeout. All fields have sensible defaults; relay is disabled by default.

Prometheus metrics (tesseras-net/src/metrics.rs) — 16 metrics across four subsystems:

6 tests verifying registration, increment, label cardinality, and double-registration detection.

Integration tests — Two end-to-end tests using MemTransport (in-memory simulated network):

Property tests — 7 proptest-based tests covering: signature round-trips for all three signed message types (arbitrary node IDs, ports, tokens), NAT classification determinism (same inputs always produce same output), STUN binding request validity, session token uniqueness, and relay rejection of too-short packets.

Justfile targetsjust test-nat runs all NAT traversal tests across tesseras-net and tesseras-dht. just test-chaos is a placeholder for future Docker Compose chaos tests with tc netem.

Architecture decisions

What comes next

With NAT traversal, Tesseras can connect nodes regardless of their network topology. Public nodes talk directly. Cone-NATed nodes punch through with an introducer's help. Symmetric-NATed or firewalled nodes relay through willing peers. The network adapts to the real world, where most devices are behind a NAT and network conditions change constantly.