AEVRION

// The atom

Everything Is an Event

The smallest unit of truth. Immutable. Content-addressable. Verifiable.

Three fields define identity: key, value, timestamp. The ID is blake3(key ‖ value ‖ timestamp) — a hash of the content. Same content on different nodes = same ID. No coordination needed.

Content-addressable means deduplication is free. Event already exists? Skip it. Verification is instant — recompute the hash, compare with the ID. Tampered? Hash won't match.

Deletion is an event, not a special case. An event with value: None means "this key is deleted as of this timestamp". The event persists — the delete is a fact in history, not an erasure.

What it is not

Not a log entry. Events are not ordered by a central sequencer. Each node assigns its own monotonic timestamp. Order emerges from time, not from sequence numbers.

Not a CRDT operation. Events don't carry merge semantics. Conflict resolution (LWW) happens at the index layer, not in the event itself. Events are pure data.

Not tied to a node. No node_id in the event. Two nodes that independently write the same key, value, and timestamp produce the same event, the same ID. Origin is metadata, not identity.

// Architecture

Four Components. One Truth.

Event is written once. Everything else is a reference by ID.

Foundation

Event Store

Flat content-addressable storage. id → Event. The only place where event data lives. Put, get, has. That's it.

Every other structure references events by their 32-byte ID. One event, one copy, many references.

Temporal axis

Merkle Tree

Organizes events by time. Leaves store lists of IDs, not events. Hash of a leaf = blake3(sorted(event_ids)). Enables O(log N) diff between any two nodes.

Two parameters: G (granularity) and B (branching factor). Everything else is derived. 5 levels cover 60 years.

Key → Value bridge

Merge Index

The only component clients see. Maps key → latest event. Full history per key. Point-in-time queries. Scan, prefix search.

Fully derived — rebuilt from Event Store + Tree at any time. Append-only on write path. LWW conflict resolution.

Connectivity

Sync Agent

Channel + handler + triggers per neighbor. One function: handle(msg). Seven message types. Repair emerges as a chain reaction.

Push for latency (optimistic). Verify/repair for consistency (pessimistic). Both use the same handler.

// Convergence

How Nodes Converge

Snapshot → diff → repair. Chain reaction. No coordinator.

Push (optimistic)

Local write → event is immediately sent to every neighbor. Best-effort, instant delivery. If it arrives — great. If not — verify will catch it.

Snapshot (periodic)

Every G, each node sends a tree snapshot to neighbors — hashes at every level, from fresh leaves to root. The closer to now, the more detail. Like a map: streets nearby, districts further, cities on the horizon.

≤160 hashes · ~6.5 KB · one round-trip

Compare

Receiver compares each hash with their own. Match = entire subtree verified. Mismatch at leaf = divergence pinpointed to the minute. Mismatch at level 3 = divergence somewhere in 22 days — drill down.

Repair (chain reaction)

Mismatch on leaf → EventList → NeedEvents → events flow. Not a mode — a chain reaction of handle() calls. Both sides repair simultaneously. Matching subtrees are never touched.

Converged

All hashes match. Same events, same tree, same merge index state. Every node arrived here independently, following the same deterministic rules. No vote. No leader. Just time and math.

LWW: higher timestamp wins · tie: higher event_id wins

// Entropy

The Real Problem Is Entropy

Distributed systems don't fail. They diverge. The question is how you detect and repair it.

Every distributed system faces the same fundamental challenge: nodes drift apart. Network partitions, delayed writes, clock skew, crashed replicas — entropy is not a bug, it's physics. The difference between systems is how they manage it.

Most systems treat anti-entropy as an operational burden — a maintenance task you schedule, monitor, and pray completes before the next one starts. Aevrion treats it as a built-in primitive — continuous, automatic, and proportional to actual drift.

R/P

Strong consistency

Consensus (Raft, Paxos, ZAB)

Prevent divergence entirely by coordinating every write through an elected leader. Used in etcd, CockroachDB, TiKV, ZooKeeper.

Entropy strategy

Prevent at all costs

Every write requires quorum acknowledgment before it's considered committed. Divergence is structurally impossible — but at the cost of availability and latency.

Write path

Leader → quorum → ack

All writes are serialized through a single leader. The leader replicates to a majority before confirming. Leader failure triggers election — writes stall for seconds.

Partition behavior

Minority cannot write

CAP theorem in action: the minority partition is read-only (or unavailable). Network split = degraded service. Cross-datacenter deployments pay the latency tax on every write.

Operational cost

Moderate

No anti-entropy repairs needed (consistency is built-in), but leader elections, log compaction, and snapshot management require monitoring and tuning.

Trade-off: zero entropy, but writes are slow, partitions are painful, and a single leader is a structural bottleneck.

Eventual consistency + manual repair

Cassandra (Anti-Entropy Repair)

Allow divergence during normal operation, fix it later with scheduled Merkle tree comparisons. The most widely deployed anti-entropy mechanism.

Entropy detection

Scheduled, batch, full-scan

Merkle trees are built on-demand during repair. Not maintained continuously — rebuilt from scratch on each nodetool repair invocation. Expensive.

Repair cost

All-or-nothing, O(total data)

Full repair scans every token range regardless of actual drift. Subrange repair exists but requires manual partitioning. Incremental repair adds flags, state, and its own failure modes.

Operational burden

A specialty in itself

Miss a repair cycle → tombstones resurrect deleted data. gc_grace_seconds is a ticking clock. Repair timeouts, vnodes, compaction interaction — each a source of incidents.

Resource impact

Competes with production

Repair is I/O and network intensive. Running it during peak hours degrades read/write latency. Not running it risks silent inconsistency. Lose-lose scheduling problem.

Trade-off: available writes, but entropy management is a manual, expensive, error-prone operational discipline.

∪

Conflict-free replicated data types

CRDTs (Riak, Automerge, Yjs)

Encode merge semantics into the data type. Concurrent writes are automatically resolved by mathematical properties of the type (commutativity, associativity, idempotence).

Entropy strategy

Make divergence safe

Instead of detecting and repairing divergence, CRDTs ensure that any merge order produces the same result. Elegant in theory — constrained in practice.

Data model constraints

Limited to specific types

G-Counters, PN-Counters, OR-Sets, LWW-Registers. Arbitrary KV with full history, point-in-time queries, and prefix scan doesn't map naturally to CRDT semantics.

Metadata growth

O(nodes × keys) overhead

Version vectors, causal dots, tombstone sets grow with cluster size. Each key carries per-node metadata. At scale, metadata can exceed payload size.

Correctness complexity

Hard to get right

Custom merge functions require formal proofs of convergence. Edge cases in concurrent delete+add, move operations, and nested types are a research-level problem.

Trade-off: automatic merge, but limited data models, growing metadata, and correctness that's hard to verify.

◆

Continuous temporal anti-entropy

Entropy is not prevented or tolerated — it's continuously measured and repaired. The temporal Merkle tree is always maintained, the progressive digest is always exchanged, convergence is always in progress.

Entropy detection

Continuous, progressive, proportional

Every G, nodes exchange a progressive digest — ~160 hashes covering all history. Recent data at minute precision, old data in month-wide blocks. Divergence is detected within one cycle.

Repair cost

O(actual drift), not O(total data)

Matching subtrees are skipped entirely. One hash comparison verifies 22 days of data. Repair touches only diverged branches. Week-long partition? Only the week's events are synced.

Operational burden

Zero

No nodetool repair. No gc_grace_seconds. No scheduled maintenance windows. Anti-entropy is a heartbeat, not a job. If the node is running, it's converging.

Conflict model

LWW — deterministic, stateless

Last-Write-Wins by timestamp, tiebreaker by event ID. No version vectors, no merge functions, no causal context. Events are pure data. Any node resolves the same conflict the same way.

No trade-off. Available writes + continuous convergence + zero operational cost + simple conflict model. Entropy is managed by the architecture, not by the operator.

Cassandra asks: "did you remember to run repair?"
Raft asks: "who is the leader right now?"
Aevrion doesn't ask. Repair is always running.

Events Converge.Inevitably.

Why Aevrion Exists

Everything Is an Event

One Write. Four Structures.

Four Components. One Truth.

Event Store

Merkle Tree

Merge Index

Sync Agent

The Bridge: Time ↔ Key

Two Numbers. The Whole Tree.

Closer = More Detail

Like a Map

Zero Wasted Work