A deep dive into the data structures, algorithms, and performance characteristics of the Udanax Gold hypertext engine, reimplemented in Rust.
Xudanu is a Rust implementation of the Udanax Gold hypertext document store — a system originally designed by Ted Nelson's team in the 1980s and mechanically translated from Smalltalk to C++ in 1989. The original ~200,000-line C++ codebase was a faithful but disconnected port: every component existed, but the paths between them were dead code. Transclusion queries never executed. H-tree canopy flags were allocated but never checked. The BackfollowEngine had no callers.
Xudanu reconnects these components into a working system and optimizes the critical paths. The result is a server that stores documents as content-addressed trees, tracks which documents share which text (transclusion), and answers queries like “find every document that contains this paragraph” in O(log n) instead of O(n).
This document is written for the engineering team. Whether you're a junior developer onboarding to the codebase or a senior architect evaluating the system's design, you'll find:
Everything in Xudanu comes back to one principle: content identity. When two documents contain the same text, that text gets the same BLAKE3 fingerprint. This fingerprint becomes a universal key that enables:
Deduplication — store shared text once in the GrandMap
Inverted indexes — fingerprint → list of documents that contain it
Fast diffing — intersect fingerprint sets to find shared regions
Tree pruning — skip entire H-tree subtrees that can't match your query
If you understand BLAKE3 fingerprints, you understand 80% of Xudanu's performance story.
All source is in the xudanu crate. The most important files:
| File | What It Implements |
|---|---|
src/edition/orgl.rs | O-tree (Loaf: Leaf / Split / Dsp) |
src/edition/edition.rs | Edition wrapper, element listing, caching |
src/edition/grandmap.rs | GrandMap content-addressed store |
src/edition/content_address.rs | BLAKE3 fingerprinting, ContentAddressIndex |
src/edition/canopy.rs | Canopy flag-bit pruning (BertCanopy / SensorCanopy) |
src/ent/htree.rs | H-tree version ancestry, HUpperCrumData |
src/edition/backfollow.rs | BackfollowEngine, EditionMeta, orchestration |
src/edition/transclusion.rs | TransclusionIndex, TrailBlazer, query types |
src/edition/shared_mapping.rs | SharedMapping, content_shared_region() |
src/edition/range_transclusion.rs | Range-scoped transclusion queries |
src/ent/dagwood.rs | DagWood partial ordering, concurrent edit resolution |
src/server/federation.rs | Federation CRDTs (OrSet, LwwRegister, PBFT) |
src/server/transport/snapshot.rs | Versioned snapshots, migration, checksums |
Xudanu organizes its data structures into three layers, each building on the one below. The layers are connected by a single mechanism: BLAKE3 content fingerprints. Every piece of content gets a 256-bit hash, and that hash flows downward through the stack to enable indexing, deduplication, and query pruning.
A document lives in the O-tree layer as an Edition wrapping a Loaf tree. Each text element gets a BLAKE3 fingerprint. Those fingerprints flow into two places:
BeIds. Same text always maps to the same ID. This is the deduplication layer.
The H-tree tracks version ancestry. Each node (HUpperCrumData) carries a CanopyCrumData with flag bits summarizing what types of content exist in its subtree. The BackfollowEngine orchestrates the query flow: hash the query text, look up the TransclusionIndex, walk the H-tree with canopy pruning, and collect results via TrailBlazer.
Two cross-cutting concerns span all three layers:
Both are covered in detail in Section 11 and Section 12.
Every document in Xudanu is an O-tree — an ordered tree of RangeElements. The tree is represented by the Loaf enum, which has three variants:
pub enum Loaf {
Leaf {
region: XnRegion,
entries: Vec<(i64, Arc<Carrier>)>,
default: Option<Arc<Carrier>>,
},
Split {
split: XnRegion,
in_child: Box<Loaf>,
out_child: Box<Loaf>,
},
Dsp {
offset: i64,
child: Box<Loaf>,
},
}
| Variant | Purpose | Analogy |
|---|---|---|
Leaf |
Stores the actual content: a sorted list of (position, element) pairs within a region |
A page in a book |
Split |
Divides content at a position. in_child holds everything below, out_child holds everything above |
A book split into two chapters |
Dsp |
Applies a positional offset without touching children. O(1) lazy transform | Renumbering pages without rewriting them |
When a Leaf grows beyond MAX_LEAF_SIZE (16,384 entries), it splits in half. The median position becomes the split point. Everything below goes into in_child, everything above into out_child. This keeps the tree balanced — depth grows as O(log n).
The Dsp variant is the most interesting optimization. Instead of rewriting every position when content is inserted or deleted, the O-tree wraps the affected subtree in a Dsp node that just adds an offset:
Before: Leaf [0:"Hello", 1:"World", 50:"Old", 51:"Text"]
After insertion at position 50, the existing elements from 50 onward need to shift. Instead of rewriting:
Split {
split: below(50),
in_child: Leaf [0:"Hello", 1:"World"], // unchanged
out_child: Dsp {
offset: +3, // shift everything by 3
child: Leaf [50:"Old", 51:"Text"] // original positions preserved!
}
}
The inner Leaf still stores positions 50 and 51. The Dsp { offset: +3 } adds 3 whenever those positions are read. No element was touched. O(1) instead of O(n).
Let's trace what happens when a user creates a document with the text "The quick brown fox" and saves it:
work_save_and_release with 4 text elementsEdition with a Loaf::Leaf containing:
entries: [
(0, Carrier { element: Text("The "), fp: 0x7a2f... }),
(1, Carrier { element: Text("quick "), fp: 0xb4c1... }),
(2, Carrier { element: Text("brown "), fp: 0xe8d3... }),
(3, Carrier { element: Text("fox"), fp: 0x1f9a... }),
]
region: XnRegion [0, 4)
4 entries is well under MAX_LEAF_SIZE (16,384), so it stays as a single Leaf. When the user adds more text and the leaf grows past the threshold, it will split. The tree grows deeper only as needed.
An O-tree isn't limited to text. The RangeElement enum supports 9 types, and any Edition can mix them freely:
| Type | Content | Content-Addressable? |
|---|---|---|
Text | UTF-8 string | Yes — BLAKE3 |
Data | Raw bytes | Yes — BLAKE3 |
Blob | Reference to large binary object | Yes |
Edition | Nested O-tree (composition) | Yes |
Label | Named anchor / identifier | No — structural |
PlaceHolder | Template slot | No — structural |
IDHolder | Identity mapping | No — structural |
Work | Reference to another document | No |
Overlay | Compositional layer | Depends on content |
When you edit a document, the O-tree doesn't copy the entire tree. It creates a new version that shares unchanged subtrees with the old one. This is the same technique used by immutable data structures in Clojure and Haskell.
In this example, editing "quick fox" to "slow dog" allocates exactly one new Arc<Carrier>. The other two elements are shared between revisions via Arc reference counting. For a document with 1,000 elements where you change 3, you allocate 3 new elements and share 997. This is why O-tree edits are memory-efficient.
Loaf enum is defined in src/edition/orgl.rs:9. build_bulk() at line 36 shows how leaves are split. The Edition wrapper with its OnceLock cache is in src/edition/edition.rs.
The GrandMap is the server's global content store. Every element from every document passes through it. Its job is simple but powerful: ensure that identical content always maps to the same backend ID.
When a document is saved, each RangeElement is passed to ContentAddressIndex::intern(). This function:
fingerprint_to_be_idBeId (deduplication hit)BeId, stores the mapping
pub struct ContentAddressIndex {
fingerprint_to_be_id: HashMap<[u8; 32], BeId>,
be_id_to_fingerprint: HashMap<BeId, [u8; 32]>,
next_be_id: BeId,
}
1. Save Document A: "The quick brown fox jumped"
intern(Text("The ")) → BLAKE3 → 0x7a2f → new BeId(1001)
intern(Text("quick brown")) → BLAKE3 → 0xa3f7 → new BeId(1002)
intern(Text(" fox jumped")) → BLAKE3 → 0x1b4e → new BeId(1003)
2. Save Document B: "I saw a quick brown yesterday"
intern(Text("I saw a")) → BLAKE3 → 0x5c92 → new BeId(1004)
intern(Text("quick brown")) → BLAKE3 → 0xa3f7 → existing BeId(1002) ← dedup hit!
intern(Text(" yesterday")) → BLAKE3 → 0xd861 → new BeId(1005)
Result: Document B's "quick brown" maps to the same BeId(1002) as Document A's. One copy in the GrandMap, two documents referencing it.
Beyond the ContentAddressIndex, the GrandMap also manages:
| Component | Purpose |
|---|---|
global_space | Global IdSpace for allocating unique IDs across all documents |
storage | InMemoryBeStorage — the actual element data store |
id_to_element | Maps BeId → the stored element for retrieval |
element_to_ids | Reverse index: element → list of Ids that reference it |
id_holders | Tracks IDHolder elements for identity mapping |
identity_map | Label-to-identity resolution for path navigation |
The GrandMap's deduplication is the foundation for everything else:
BeIds as keys — so deduplicated content automatically appears in the right query resultsHashMaps and serializes to a single JSON file (server.json). For large deployments (millions of documents), this file can grow very large. The v0.2 roadmap includes a chunked storage backend to address this.
GrandMap is defined in src/edition/grandmap.rs:68. ContentAddressIndex and the intern() method are in src/edition/content_address.rs:7 and line 65 respectively.
The Canopy is the mechanism that makes H-tree traversal O(log n) instead of O(n). Without it, finding transclusion matches would require visiting every node in the version history. With it, entire subtrees are skipped in a single comparison.
An H-tree node represents a version of a document. To answer “which editions contain this text?”, you need to walk the tree checking each node. With 10,000 editions, that's 10,000 checks. The canopy avoids this by annotating each node with a bitmap of what types of content exist in its subtree.
Each endorsement type (Text, Link, Blob, etc.) gets a bit position in a u32. The current allocation:
| Flag | Bit Position | Hex Value |
|---|---|---|
| PUBLIC_CLUB_FLAG | 0 | 0x00000001 |
| OTHER_CLUBS_FLAG | 1 | 0x00000002 |
| OTHER_ENDORSEMENTS_FLAG | 2 | 0x00000004 |
| FIRST_ENDORSEMENT_FLAG | 3+ | 0x00000008 ← |
| IS_SENSOR_WAITING_FLAG | 26 | 0x04000000 |
| IS_NOT_PARTIALIZABLE_FLAG | 27 | 0x08000000 |
Endorsement flags are allocated dynamically via use_endorsement_flag() (up to 23 bits, from position 3 to 25). Each distinct endorsement type gets its own bit. The flags are then merged upward through the H-tree using bitwise OR.
Xudanu implements two canopy variants that serve different purposes:
Static flag bits computed when an edition is registered. Used to prune H-tree traversal during transclusion queries. Named after the original Udanax Gold concept of a "Bert" (backwards edge tracker).
Updated: when editions are created or revised
Dynamic flag bits that track which queries are currently "waiting" for matches. When new content is added that matches a waiting query, the sensor fires a notification. This enables reactive transclusion alerts.
Updated: when queries are registered or content changes
For a balanced H-tree with depth d:
src/edition/canopy.rs. BertCanopy and SensorCanopy are structs. Flag allocation is in src/edition/props.rs (use_endorsement_flag() at line 30, endorsements_flags() at line 73). The compute_join() function merges two canopy crums.
While the O-tree stores what a document contains, the H-tree stores how versions relate. It's a directed acyclic graph (DAG) where each node — called a crum — represents a version snapshot, and edges point from child to parent (newer to older).
pub struct HUpperCrumData {
hcut: TracePosition, // where this node was split
o_parents: Vec<Arc<Mutex<dyn HPart>>>, // parent references (DAG edges)
bert_crum: Arc<Mutex<CanopyCrumData>>, // canopy flags for this subtree
base: HistoryCrumBase, // unique hash for identity
_bert_canopy: BertCanopy, // reference to owning canopy
}
Each crum knows its parents (multiple parents = merged branches). Each carries a CanopyCrumData that summarizes the endorsement types present in its entire subtree. This is the data the canopy pruning algorithm uses.
When a new edition is created:
HUpperCrumData is allocatedCanopyCrumData is created with this edition's endorsement flagspropagate_flags() function — each ancestor's flags become the bitwise OR of all descendants
Rev 0 (initial) → crum hash=0x001, flags=[T]
↓ parent
Rev 1 (typo fix) → crum hash=0x002, flags=[T]
↓ parent
Rev 2 (add links) → crum hash=0x003, flags=[T, L]
↓ parent
Rev 3 (add image) → crum hash=0x004, flags=[T, L, I]
After propagation:
Rev 3 canopy: [T, L, I] (own + all ancestors)
Rev 2 canopy: [T, L] (own + Rev 0, Rev 1)
Rev 1 canopy: [T] (own + Rev 0)
Rev 0 canopy: [T] (own only)
Query: "find Link transclusions"
→ Rev 3 has L → ✓ descend
→ Rev 2 has L → ✓ descend
→ Rev 1 no L → ✗ skip Rev 1 and Rev 0 (pruned 50%)
The H-tree is a DAG, not a simple chain. If two users edit the same document concurrently, the H-tree branches. When those branches are reconciled (via DagWood, Section 8), a merge node gets two parents.
Rev 0
↓
Rev 1
↙ ↘ ← concurrent edits
Rev 2a Rev 2b
↘ ↙ ← merge
Rev 3 → o_parents: [Rev 2a, Rev 2b]
The canopy correctly handles merge nodes: the merge crum's flags are the bitwise OR of both branches, so no matching content is lost.
The H-tree supports a concept called delayed backfollow — the ability to lazily discover transclusion relationships by walking the version chain backward. Instead of maintaining a complete index of every version's content, the system can traverse parent links on demand, using canopy flags to skip irrelevant branches.
This is particularly useful for:
HUpperCrumData is defined in src/ent/htree.rs:39. from_two() at line 64 handles merge nodes with two parents. Flag propagation is in src/edition/canopy.rs (propagate_flags()). The HPart trait at line 18 provides the polymorphic parent interface.
This is where everything comes together. A transclusion query asks: "which documents contain this text?" The naive approach scans every document. Xudanu answers it in O(log n) by combining content fingerprints, inverted indexes, canopy pruning, and H-tree traversal.
Step 1 — Hash:
BLAKE3("quick brown fox") → [0xa3, 0xf7, 0x1b, ..., 0xc1] (32 bytes)
Step 2 — Index lookup:
TransclusionIndex.get(0xa3f7...) → {work_42, work_107, work_891}
Three candidate documents. We didn't scan any document text — just a HashMap lookup.
Step 3 — Canopy-filtered H-tree walk:
work_42 H-tree: Root flags [T,L,I] & TextFlag → match → descend Left child [T] & TextFlag → match → descend → found at Rev 3 Right child [L,I] & TextFlag → no match → pruned work_107 H-tree: Root flags [T] & TextFlag → match → descend → found at Rev 1 work_891 H-tree: Root flags [L,B] & TextFlag → no match → entire tree pruned
Step 4 — Verify and collect:
TrailBlazer.record(Text("quick brown fox"), work_42_rev3) → true
TrailBlazer.record(Text("quick brown fox"), work_107_rev1) → true
Step 5 — Results:
[
TransclusionResult { element: Text("quick brown fox"), is_direct: true, work: 42 },
TransclusionResult { element: Text("quick brown fox"), is_direct: true, work: 107 },
]
The TransclusionIndex is the core inverted index. It maps each BLAKE3 fingerprint to the set of work IDs that contain an element with that fingerprint:
pub struct TransclusionIndex {
// fingerprint → set of work IDs containing this content
index: HashMap<[u8; 32], HashSet<u64>>,
}
It's populated during BackfollowEngine::register_edition(): when a new edition is saved, each of its content-addressable elements is indexed. The index is also used by content_shared_region() to find overlapping content between two specific documents.
The RangeTransclusionQuery in src/edition/range_transclusion.rs extends basic transclusion queries with position constraints:
TransclusionIndex and TrailBlazer are in src/edition/transclusion.rs. RangeTransclusionQuery is in src/edition/range_transclusion.rs. The BackfollowEngine orchestrates the full pipeline in src/edition/backfollow.rs. content_shared_region() is in src/edition/shared_mapping.rs:58.
When two users edit the same document at the same time, both save successfully, and neither sees the other's changes until later. The DagWood is the data structure that handles this. It defines a partial ordering of edit positions and provides operations to create new positions, detect concurrency, and reconcile conflicts.
A DagWood is a tree of TracePositions. Each position represents a point in the edit timeline. The tree structure encodes ordering: positions on the same branch are ordered (sequential edits), while positions on different branches are concurrent.
pub struct DagWood {
branches: BranchStore, // all branches
root: TracePosition, // the root position
trunk: HashMap<TracePosition, BranchId>, // main line positions
cached_position: Option<TracePosition>, // last known position
nav_cache: HashMap<BranchId, u32>, // navigation cache
}
The DagWood provides two key operations:
| Operation | Behavior | When Used |
|---|---|---|
new_position() |
Creates a position on the root branch, after the last known position | Single-user sequential edits |
new_position_after(p) |
Creates a successor to p. First successor is on the same branch; further successors branch off into a binary tree |
Concurrent edits after the same position |
// User A saves after pos 2 let pos_a = dagwood.new_position_after(pos_2); // → pos_2a on same branch (first successor) // User B also saves after pos 2 (concurrent!) let pos_b = dagwood.new_position_after(pos_2); // → pos_2b on a NEW branch (second successor) // → pos_a and pos_b are now concurrent, not ordered
The DagWood's nav_cache enables fast ordering comparisons. Given two positions, the system can determine:
This is used by the TraceView to present a coherent ordering to clients even when the underlying history is a DAG.
When concurrent edits need to be reconciled, Xudanu's current strategy is last-writer-wins at the element level. Both edits are preserved in the history; the H-tree stores both branches. The "current" version shown to users is determined by the reconciliation logic:
DagWood is defined in src/ent/dagwood.rs:20. new_position() at line 69, new_position_after() at line 79. BranchStore and TracePosition are in src/ent/branch.rs and src/ent/trace.rs. The original C++ comment at line 8 notes: "Each dagwood defines a partial ordering of TracePositions."
This section consolidates the performance characteristics of every major operation. The "naive" column shows what the C++ Udanax Gold code would have done if the disconnected components had been wired together with simple loops. The "Xudanu" column shows what the optimized Rust implementation achieves.
| Operation | Naive | Xudanu | How |
|---|---|---|---|
| Find works containing text | O(W × L) | O(U × A) | Fingerprint HashMap lookup. U = unique fingerprints in query, A = avg works per fingerprint. No document scanning. |
| Find shared regions (diff) | O(n × m) | O(La + Lb) | BTreeSet of fingerprints for O(1) membership test per element. Linear in document sizes, not their product. |
| Edition element listing | O(n log n) every call | O(1) after first | OnceLock cache on the Edition struct, shared via Arc. First call traverses the O-tree; subsequent calls return the cached Vec. |
| Transclusion index lookup | O(n) flat scan | O(log n) | Canopy-filtered H-tree traversal. Flag bits prune entire subtrees in a single comparison. |
| Position transform (shift) | O(n) rewrite | O(1) | Dsp node wraps the O-tree with an offset. No elements are touched; positions are adjusted on read. |
| Shared region highlighting | O(r × t) | O(r + t) | Sorted regions, single pass through text. Regions are merged by position, not compared pairwise. |
| Content deduplication | O(n) compare each pair | O(1) per element | BLAKE3 fingerprint + HashMap lookup. Same content = same hash = same BeId. Instant. |
| O-tree split | N/A (unsplit) | O(n) | One-time amortized cost when leaf exceeds MAX_LEAF_SIZE. Splits at median; depth grows as O(log n). |
| Snapshot checkpoint | N/A (no persistence) | O(state size) | Full serialize via serde_json. Atomic write to temp file + rename. Versioned with SHA-256 checksum. |
W = total works, L = text length, U = unique fingerprints, A = avg works per fingerprint, n/m = document lengths, r = shared regions, t = text
Xudanu doesn't just store flat text. It supports a rich addressing model that lets you name, link to, and navigate into specific positions within documents — even across documents via transclusion.
A Label is a named anchor attached to a position in an O-tree. Labels are stored as RangeElement::Label entries, so they participate in the same structural sharing and versioning as text. When you create a new revision, labels from the previous version are automatically inherited.
Edition for "Getting Started Guide":
pos 0: Text("# Getting Started")
pos 1: Label("intro") ← anchor named "intro"
pos 2: Text("Welcome to Xudanu...")
pos 3: Text("First, install the server.")
pos 4: Label("install") ← anchor named "install"
pos 5: Text("Run `xudanu-server`...")
The Udanax Gold addressing model uses tumblers — hierarchical numeric paths that identify a position within a document, within a work, within the global namespace. Xudanu implements this via:
Id — a pair of (IdSpaceId, i64) identifying any entityIdSpace — a namespace that allocates sequential IDsIdentityMap — resolves labels to IDs and backA full path to a label looks like:
work:42 / edition:7 / label:"intro"
This resolves through three lookups: work ID → current edition → label position. Each step is O(1) via HashMap.
Because labels are stored as O-tree elements, they support the same operations as text:
Xudanu implements bidirectional hyperlinks via the HyperLink and HyperRef types:
HyperLink — stores source position, destination path, and link typeHyperRef — a reference from a link to a target edition/positionIdentityMap are in src/edition/label.rs. Id and IdSpace are in src/edition/grandmap.rs:17. HyperLink and HyperRef are in src/edition/links.rs.
Xudanu persists the entire server state as a single JSON snapshot file (server.json). The persistence layer handles versioning, checksums, atomic writes, migration, and backup management.
pub struct VersionedSnapshot {
format_version: u32, // current: 1
created_at: String, // ISO 8601 timestamp
server_version: String, // e.g. "0.1.1"
checksum: String, // SHA-256 of the data field
data: serde_json::Value, // the actual server state
}
When the server checkpoints (triggered by each save operation), the following steps occur:
Server struct is serialized to a ServerSnapshot via to_snapshot()VersionedSnapshotserver.tmpserver.tmp is renamed to server.json
1. to_snapshot() → ServerSnapshot { works: [...], clubs: [...], links: [...], ... }
2. SHA-256(data) → "a4f7c2e8..."
3. VersionedSnapshot → { format_version: 1, checksum: "a4f7c2e8...", data: {...} }
4. write("server.tmp") → 847 KB written
5. rename("server.tmp", "server.json") → atomic on POSIX, nearly atomic on Windows
On startup, the server reads the snapshot with full validation:
format_version field (v0 = raw JSON, v1 = versioned wrapper)ChecksumMismatch error if notgrand_map_id_counter, works, etc.)Server struct via from_snapshot()| Component | What's Saved |
|---|---|
| Works | All documents: current edition, revision count, edit/read clubs, sponsors, full history |
| Clubs | Permission groups: public, admin, server clubs, and user-created clubs |
| Links | All hyperlinks with source/destination references |
| Standalone editions | Editions not yet attached to a work (templates, drafts) |
| Content address index | The full fingerprint ↔ BeId mapping |
| Federation state | Peers, membership, governance, royalty ledger |
| Admin state | Connection acceptance, shutdown flag, access grants |
| Key history | Server keypair history for TLS |
| Blob metadata | References to large binary objects stored on disk |
When a migration occurs (e.g., v0 → v1), the system automatically:
server.json.v0.bakcleanup_old_backups()The snapshot layer defines a comprehensive error enum:
pub enum SnapshotError {
Io(std::io::Error),
Json(serde_json::Error),
Migration { from_version: u32, to_version: u32, reason: String },
Validation(ValidationReport),
InsufficientDiskSpace { required_bytes: u64, available_bytes: u64 },
ChecksumMismatch { expected: String, actual: String },
}
The most critical errors:
server.tmp file left behind. The server.json file is always either the previous complete state or the new complete state — never a partial write. The rename operation is atomic on POSIX filesystems.
src/server/transport/snapshot.rs. VersionedSnapshot at line 7, write_versioned_snapshot() at line 278, full_restore() at line 326. The ServerSnapshot type and to_snapshot()/from_snapshot() are in src/server/server.rs (module persist_snapshot at line 3476).
Xudanu supports multi-server content replication. A federated deployment lets documents and their transclusion relationships span multiple Xudanu servers, with eventual consistency guaranteed by CRDTs (Conflict-free Replicated Data Types) and coordinated trust decisions handled by PBFT (Practical Byzantine Fault Tolerance).
Each Xudanu server is identified by a server_id string. Servers know about their peers via a FederationConfig:
pub struct FederationConfig {
pub enabled: bool,
pub peers: Vec<PeerAddress>, // host:port of known peers
pub mode: FederationMode, // Closed, Open, or Community
pub min_endorsements: u32, // trust threshold (default: 2)
}
pub enum FederationMode {
Closed, // only listed peers may connect
Open, // any server may join
Community, // peers vouch for new members (endorsement-based)
}
Federation uses three CRDT primitives to handle different aspects of distributed state:
Tracks sets where items can be added and removed concurrently across servers. Each add/remove is tagged with a unique event ID. Resolution: adds win over removes only if the remove has seen the add.
// Two servers add and remove concurrently
Server A: add("doc_42") → events: {("doc_42", a1)}
Server B: remove("doc_42") → events: {("doc_42", b1)}
// Merge: "doc_42" is present because
// Server B's remove hasn't seen
// Server A's add (concurrent)
Used for: membership sets, known-peers, content origin tracking
A single-value register where the value with the latest timestamp wins. Ties are broken by server_id comparison for determinism.
pub struct LwwRegister<T> {
value: T,
timestamp: u64,
server_id: String,
}
// Server A sets value at t=100
// Server B sets value at t=105
// Merge → Server B's value wins (higher t)
Used for: document content merges, configuration state
When content crosses server boundaries, it carries its origin:
pub struct FederatedId {
pub server_id: String, // which server created this
pub local_id: u64, // the BeId on that server
}
This allows the system to:
The FederationState manages three aspects of distributed coordination:
| Component | CRDT | Purpose |
|---|---|---|
MembershipState | OrSet | Track which servers are members. Adds and removes are concurrent-safe. |
known_peers | HashMap | Address book of all known servers, including their public keys. |
GovernanceState (PBFT) | Consensus | Coordinate trust decisions: endorsing new peers, revoking memberships, changing configuration. Requires min_endorsements votes. |
royalty_ledger | HashMap | Track cross-server content usage for micropayment settlement. |
Not all federation operations can use CRDTs. Some decisions — like accepting a new peer into a closed federation — require explicit agreement from existing members. Xudanu uses a simplified PBFT protocol for these:
min_endorsements votes are collected, the decision is committedsrc/server/federation.rs. FederationConfig at line 60, OrSet and LwwRegister implemented in the same file. FederatedId at line 8. The file is 3,656 lines including comprehensive property tests for CRDT merge semantics.
Xudanu's design draws on ideas from several systems. Here's how it compares:
| Feature | Xudanu | Git | IPFS | Web |
|---|---|---|---|---|
| Content addressing | BLAKE3 (per element) | SHA-1/SHA-256 (per blob) | SHA-256 (CID) | URL (location-based) |
| Transclusion | Native — element-level | No | File-level only | No (copies) |
| Bidirectional links | Yes — tracked server-side | No | No | No (one-way only) |
| Version ancestry | H-tree DAG | Commit DAG | No (immutable) | No |
| Query pruning | Canopy flags | No tree pruning | DHT routing | N/A |
| Deduplication | Automatic (GrandMap) | Pack files | Block-level | No |
| Micropayments | Per-element accounting | No | Filecoin (per deal) | No |
| Federation | CRDTs + PBFT | Fork + merge | DHT + bitswap | CORS (same-origin) |
| Concurrent edits | DagWood + reconcile | Merge conflicts | N/A (immutable) | Last write wins |
| Structural sharing | O-tree Arc sharing | Tree objects | No | No |
| Feature | Description | Status |
|---|---|---|
crates.io publication |
Publish the xudanu crate to crates.io for easy installation via cargo install xudanu |
Ready to ship |
| Atomic chunked checkpoints | Split server.json into per-work chunks. Checkpoint only changed works, not the entire state. |
Designed, not implemented |
| WASM build | Compile the core engine to WebAssembly for browser-based single-user mode | Engine is no_std compatible |
| Feature | Description |
|---|---|
| Access control | Per-document read/write permissions via Club-based ACLs. Already implemented server-side; needs client integration. |
| Scalable storage backend | Replace single-file JSON with a chunked backend (SQLite or custom). Target: millions of documents without multi-GB snapshots. |
| Structured editing | Expose Edition.combine/replace/copy to the UI for tables, outlines, nested documents. The engine already supports all composition operations. |
| Age heatmap view | Walk revision history, diff consecutive versions, color characters by when they were added. Uses H-tree ancestry + shared region detection. |
| Reactive recorders | Live transclusion notifications using SensorCanopy. When new content matches your registered queries, push a notification to the client. |
| Federation transport | Server-to-server gRPC or HTTP/2 protocol for content replication. CRDTs are implemented; the wire protocol is not yet connected. |
StorageCost, CostMethod) already exists.original-code/xanadugold/ in the workspace repository.docs/xanadu-17-rules.md.README.md — Quick Start guide, CLI reference, security notesdocs/diagram.html — Content + Performance Architecture diagram (companion to this document)docs/slides.html — Presentation slides for Show & Telldocs/tls-setup.md — TLS configuration guidedocs/manual.md — Full user manualdocs/protocol-and-audit.md — WebSocket protocol specificationTODO.md — Development roadmaptests/integration.rs — 192 integration tests covering all documented behaviorstests/tls.rs — 7 TLS integration tests