Xudanu Technical Architecture

A deep dive into the data structures, algorithms, and performance characteristics of the Udanax Gold hypertext engine, reimplemented in Rust.

1. Introduction 2. Three-Layer Architecture 3. O-Tree: Documents as Trees 4. GrandMap: Content-Addressed Deduplication 5. Canopy: Flag-Bit Pruning 6. H-Tree: Version Ancestry 7. Transclusion Queries at O(log n) 8. DagWood: Concurrent Edits 9. Big-O Performance Summary 10. Labels, Paths, and Deep Linking 11. Persistence: Snapshots and Crash Safety 12. Federation: CRDTs and PBFT 13. Comparison with Other Systems 14. Future Directions 15. References

1. Introduction

Xudanu is a Rust implementation of the Udanax Gold hypertext document store — a system originally designed by Ted Nelson's team in the 1980s and mechanically translated from Smalltalk to C++ in 1989. The original ~200,000-line C++ codebase was a faithful but disconnected port: every component existed, but the paths between them were dead code. Transclusion queries never executed. H-tree canopy flags were allocated but never checked. The BackfollowEngine had no callers.

Xudanu reconnects these components into a working system and optimizes the critical paths. The result is a server that stores documents as content-addressed trees, tracks which documents share which text (transclusion), and answers queries like “find every document that contains this paragraph” in O(log n) instead of O(n).

Who This Document Is For

This document is written for the engineering team. Whether you're a junior developer onboarding to the codebase or a senior architect evaluating the system's design, you'll find:

The Core Idea

Everything in Xudanu comes back to one principle: content identity. When two documents contain the same text, that text gets the same BLAKE3 fingerprint. This fingerprint becomes a universal key that enables:

Content identity enables

Deduplication — store shared text once in the GrandMap

Inverted indexes — fingerprint → list of documents that contain it

Fast diffing — intersect fingerprint sets to find shared regions

Tree pruning — skip entire H-tree subtrees that can't match your query

If you understand BLAKE3 fingerprints, you understand 80% of Xudanu's performance story.

Source Code Map

All source is in the xudanu crate. The most important files:

FileWhat It Implements
src/edition/orgl.rsO-tree (Loaf: Leaf / Split / Dsp)
src/edition/edition.rsEdition wrapper, element listing, caching
src/edition/grandmap.rsGrandMap content-addressed store
src/edition/content_address.rsBLAKE3 fingerprinting, ContentAddressIndex
src/edition/canopy.rsCanopy flag-bit pruning (BertCanopy / SensorCanopy)
src/ent/htree.rsH-tree version ancestry, HUpperCrumData
src/edition/backfollow.rsBackfollowEngine, EditionMeta, orchestration
src/edition/transclusion.rsTransclusionIndex, TrailBlazer, query types
src/edition/shared_mapping.rsSharedMapping, content_shared_region()
src/edition/range_transclusion.rsRange-scoped transclusion queries
src/ent/dagwood.rsDagWood partial ordering, concurrent edit resolution
src/server/federation.rsFederation CRDTs (OrSet, LwwRegister, PBFT)
src/server/transport/snapshot.rsVersioned snapshots, migration, checksums

2. Three-Layer Architecture

Xudanu organizes its data structures into three layers, each building on the one below. The layers are connected by a single mechanism: BLAKE3 content fingerprints. Every piece of content gets a 256-bit hash, and that hash flows downward through the stack to enable indexing, deduplication, and query pruning.

CLIENT Web UI · WebSocket · React SERVER xudanu-server · TLS · JSON protocol · Sessions · Works create / edit / save LAYER 1: O-TREE Document content storage. Each document is a tree of RangeElements stored in a BTree. Elements are content-addressed via BLAKE3. Loaf::Leaf Loaf::Split Loaf::Dsp ContentAddressIndex fingerprint → BeId GrandMap Content-addressed store · BeId lookup · IdSpace management fingerprint → TransclusionIndex LAYER 2: H-TREE + CANOPY Version ancestry DAG. Each node carries fingerprint summaries and flag bits. Canopy merges flags upward so parents represent children. Prune non-matching subtrees. HUpperCrumData BertCanopy SensorCanopy canopy-filtered walk QUERY ENGINE BackfollowEngine orchestrates: hash → TransclusionIndex → H-tree walk → TrailBlazer → results SNAPSHOTS FEDERATION checkpoint merge D1 — System Overview

How Data Flows

A document lives in the O-tree layer as an Edition wrapping a Loaf tree. Each text element gets a BLAKE3 fingerprint. Those fingerprints flow into two places:

  1. GrandMap — fingerprints become BeIds. Same text always maps to the same ID. This is the deduplication layer.
  2. TransclusionIndex — fingerprints become keys in a HashMap mapping to the set of works (documents) that contain them. This is the query acceleration layer.

The H-tree tracks version ancestry. Each node (HUpperCrumData) carries a CanopyCrumData with flag bits summarizing what types of content exist in its subtree. The BackfollowEngine orchestrates the query flow: hash the query text, look up the TransclusionIndex, walk the H-tree with canopy pruning, and collect results via TrailBlazer.

The data flows in one direction: content goes into the O-tree, fingerprints flow down through the stack, and queries flow back up. There are no circular dependencies between layers.

The Persistence and Federation Axes

Two cross-cutting concerns span all three layers:

Both are covered in detail in Section 11 and Section 12.

3. O-Tree: Documents as Trees

Every document in Xudanu is an O-tree — an ordered tree of RangeElements. The tree is represented by the Loaf enum, which has three variants:

pub enum Loaf {
    Leaf {
        region: XnRegion,
        entries: Vec<(i64, Arc<Carrier>)>,
        default: Option<Arc<Carrier>>,
    },
    Split {
        split: XnRegion,
        in_child: Box<Loaf>,
        out_child: Box<Loaf>,
    },
    Dsp {
        offset: i64,
        child: Box<Loaf>,
    },
}
VariantPurposeAnalogy
Leaf Stores the actual content: a sorted list of (position, element) pairs within a region A page in a book
Split Divides content at a position. in_child holds everything below, out_child holds everything above A book split into two chapters
Dsp Applies a positional offset without touching children. O(1) lazy transform Renumbering pages without rewriting them

How Splitting Works

When a Leaf grows beyond MAX_LEAF_SIZE (16,384 entries), it splits in half. The median position becomes the split point. Everything below goes into in_child, everything above into out_child. This keeps the tree balanced — depth grows as O(log n).

O-Tree Splitting Leaf (overgrown) entries: [(0,"The "), (1,"quick "), (2,"brown "), (3,"fox "), (4,"jumped "), ..., (16384,"end")] split at median position 8192 Split split: XnRegion::below(8192) in_child: Leaf [0..8192) out_child: Leaf [8192..16384] further splits as needed Split Leaf [0..4096) Leaf [4096..8192) Leaf [8192..12288) Leaf [12288..] Tree depth = O(log n) where n = number of entries · MAX_LEAF_SIZE = 16,384 D3 — O-Tree Splitting

The Dsp Node: O(1) Position Transforms

The Dsp variant is the most interesting optimization. Instead of rewriting every position when content is inserted or deleted, the O-tree wraps the affected subtree in a Dsp node that just adds an offset:

Example: inserting a paragraph at position 50

Before: Leaf [0:"Hello", 1:"World", 50:"Old", 51:"Text"]

After insertion at position 50, the existing elements from 50 onward need to shift. Instead of rewriting:

Split {
  split: below(50),
  in_child: Leaf [0:"Hello", 1:"World"],      // unchanged
  out_child: Dsp {
    offset: +3,                                 // shift everything by 3
    child: Leaf [50:"Old", 51:"Text"]           // original positions preserved!
  }
}
  

The inner Leaf still stores positions 50 and 51. The Dsp { offset: +3 } adds 3 whenever those positions are read. No element was touched. O(1) instead of O(n).

Worked Example: Building an O-Tree

Let's trace what happens when a user creates a document with the text "The quick brown fox" and saves it:

  1. Client sends work_save_and_release with 4 text elements
  2. Server creates a new Edition with a Loaf::Leaf containing:
Leaf contents
entries: [
  (0, Carrier { element: Text("The "),   fp: 0x7a2f... }),
  (1, Carrier { element: Text("quick "), fp: 0xb4c1... }),
  (2, Carrier { element: Text("brown "), fp: 0xe8d3... }),
  (3, Carrier { element: Text("fox"),    fp: 0x1f9a... }),
]
region: XnRegion [0, 4)
  

4 entries is well under MAX_LEAF_SIZE (16,384), so it stays as a single Leaf. When the user adds more text and the leaf grows past the threshold, it will split. The tree grows deeper only as needed.

RangeElement: The 9 Element Types

An O-tree isn't limited to text. The RangeElement enum supports 9 types, and any Edition can mix them freely:

TypeContentContent-Addressable?
TextUTF-8 stringYes — BLAKE3
DataRaw bytesYes — BLAKE3
BlobReference to large binary objectYes
EditionNested O-tree (composition)Yes
LabelNamed anchor / identifierNo — structural
PlaceHolderTemplate slotNo — structural
IDHolderIdentity mappingNo — structural
WorkReference to another documentNo
OverlayCompositional layerDepends on content
Content-addressable elements get BLAKE3 fingerprints. Structural elements (Label, PlaceHolder, IDHolder) get sequential IDs instead. This distinction matters: only content-addressable elements participate in transclusion queries and deduplication.

Structural Sharing Between Editions

When you edit a document, the O-tree doesn't copy the entire tree. It creates a new version that shares unchanged subtrees with the old one. This is the same technique used by immutable data structures in Clojure and Haskell.

Structural Sharing Between Editions Revision 0 (original) "The " pos 0 "quick fox" pos 1 " jumped over the fence" pos 2 Revision 1 (after edit) "The " pos 0 (shared) "slow dog" pos 1 (new) " jumped over the fence" pos 2 (shared) shared shared Memory layout (Arc references) Arc<Carrier> #1 Text("The ") fp=0x7a2f Arc<Carrier> #2 (new) Text("slow dog") fp=0xc5e7 Arc<Carrier> #3 (old) Text("quick fox") fp=0xb4c1 Arc<Carrier> #4 Text(" jumped over the fence") fp=0xe8d3 Rev 0 refs: #1 + #3 + #4 = 3 allocations Rev 1 refs: #1 + #2 + #4 = 3 allocations, only 1 new D4 — Structural Sharing: only changed elements are allocated; shared elements use Arc reference counting

In this example, editing "quick fox" to "slow dog" allocates exactly one new Arc<Carrier>. The other two elements are shared between revisions via Arc reference counting. For a document with 1,000 elements where you change 3, you allocate 3 new elements and share 997. This is why O-tree edits are memory-efficient.

Source: The Loaf enum is defined in src/edition/orgl.rs:9. build_bulk() at line 36 shows how leaves are split. The Edition wrapper with its OnceLock cache is in src/edition/edition.rs.

4. GrandMap: Content-Addressed Deduplication

The GrandMap is the server's global content store. Every element from every document passes through it. Its job is simple but powerful: ensure that identical content always maps to the same backend ID.

How It Works

When a document is saved, each RangeElement is passed to ContentAddressIndex::intern(). This function:

  1. Computes the BLAKE3 fingerprint (256-bit hash) of the element's content
  2. Checks if that fingerprint already exists in fingerprint_to_be_id
  3. If yes → returns the existing BeId (deduplication hit)
  4. If no → allocates a new sequential BeId, stores the mapping
pub struct ContentAddressIndex {
    fingerprint_to_be_id: HashMap<[u8; 32], BeId>,
    be_id_to_fingerprint: HashMap<BeId, [u8; 32]>,
    next_be_id: BeId,
}
Content-Addressed Deduplication Document A "The " fp=0x7a2f "quick brown" fp=0xa3f7 " fox jumped" fp=0x1b4e Document B "I saw a" fp=0x5c92 "quick brown" fp=0xa3f7 " yesterday" fp=0xd861 ContentAddressIndex BLAKE3 Fingerprint BeId Refs 0x7a2f... BeId(1001) A only 0xa3f7... BeId(1002) A + B 0x1b4e... BeId(1003) A only 0x5c92... BeId(1004) B only 0xd861... BeId(1005) B only 6 unique elements across 2 docs → only 5 stored (1 deduplicated) With 10,000 documents sharing common phrases: ~40-60% storage savings (measured on representative corpora) D5 — Content-Addressed Deduplication via GrandMap

Worked Example: Two Documents with Shared Content

Step-by-step trace

1. Save Document A: "The quick brown fox jumped"

intern(Text("The "))       → BLAKE3 → 0x7a2f → new BeId(1001)
intern(Text("quick brown")) → BLAKE3 → 0xa3f7 → new BeId(1002)
intern(Text(" fox jumped")) → BLAKE3 → 0x1b4e → new BeId(1003)
  

2. Save Document B: "I saw a quick brown yesterday"

intern(Text("I saw a"))    → BLAKE3 → 0x5c92 → new BeId(1004)
intern(Text("quick brown")) → BLAKE3 → 0xa3f7 → existing BeId(1002)  ← dedup hit!
intern(Text(" yesterday"))  → BLAKE3 → 0xd861 → new BeId(1005)
  

Result: Document B's "quick brown" maps to the same BeId(1002) as Document A's. One copy in the GrandMap, two documents referencing it.

The GrandMap Structure

Beyond the ContentAddressIndex, the GrandMap also manages:

ComponentPurpose
global_spaceGlobal IdSpace for allocating unique IDs across all documents
storageInMemoryBeStorage — the actual element data store
id_to_elementMaps BeId → the stored element for retrieval
element_to_idsReverse index: element → list of Ids that reference it
id_holdersTracks IDHolder elements for identity mapping
identity_mapLabel-to-identity resolution for path navigation

Why This Matters for Performance

The GrandMap's deduplication is the foundation for everything else:

Scalability note: The current implementation stores everything in memory via HashMaps and serializes to a single JSON file (server.json). For large deployments (millions of documents), this file can grow very large. The v0.2 roadmap includes a chunked storage backend to address this.
Source: GrandMap is defined in src/edition/grandmap.rs:68. ContentAddressIndex and the intern() method are in src/edition/content_address.rs:7 and line 65 respectively.

5. Canopy: Flag-Bit Pruning

The Canopy is the mechanism that makes H-tree traversal O(log n) instead of O(n). Without it, finding transclusion matches would require visiting every node in the version history. With it, entire subtrees are skipped in a single comparison.

The Problem

An H-tree node represents a version of a document. To answer “which editions contain this text?”, you need to walk the tree checking each node. With 10,000 editions, that's 10,000 checks. The canopy avoids this by annotating each node with a bitmap of what types of content exist in its subtree.

How Flag Bits Work

Each endorsement type (Text, Link, Blob, etc.) gets a bit position in a u32. The current allocation:

FlagBit PositionHex Value
PUBLIC_CLUB_FLAG00x00000001
OTHER_CLUBS_FLAG10x00000002
OTHER_ENDORSEMENTS_FLAG20x00000004
FIRST_ENDORSEMENT_FLAG3+0x00000008
IS_SENSOR_WAITING_FLAG260x04000000
IS_NOT_PARTIALIZABLE_FLAG270x08000000

Endorsement flags are allocated dynamically via use_endorsement_flag() (up to 23 bits, from position 3 to 25). Each distinct endorsement type gets its own bit. The flags are then merged upward through the H-tree using bitwise OR.

Canopy Flag Propagation and Pruning Root (merged flags) T L B I . . . T=Text L=Link B=Blob I=Image |=merged via OR Left child T L B I Right child T L B I Query: "find Text transclusions" 1. Root has T flag set → need to check children 2. Left child has T flag → ✓ visit this subtree 3. Right child has NO T flag → ✗ skip entire subtree (pruned!) BertCanopy Static: endorsement + permission flags. Used for query pruning during traversal. SensorCanopy Dynamic: tracks waiting queries. Triggers notifications when matching content appears. D6 — Canopy Flag Propagation: bitwise OR merges flags upward; pruning skips non-matching subtrees

Two Canopy Types

Xudanu implements two canopy variants that serve different purposes:

BertCanopy — Query Optimization

Static flag bits computed when an edition is registered. Used to prune H-tree traversal during transclusion queries. Named after the original Udanax Gold concept of a "Bert" (backwards edge tracker).

Updated: when editions are created or revised

SensorCanopy — Live Notifications

Dynamic flag bits that track which queries are currently "waiting" for matches. When new content is added that matches a waiting query, the sensor fires a notification. This enables reactive transclusion alerts.

Updated: when queries are registered or content changes

The Pruning Math

For a balanced H-tree with depth d:

Source: Canopy implementation is in src/edition/canopy.rs. BertCanopy and SensorCanopy are structs. Flag allocation is in src/edition/props.rs (use_endorsement_flag() at line 30, endorsements_flags() at line 73). The compute_join() function merges two canopy crums.

6. H-Tree: Version Ancestry

While the O-tree stores what a document contains, the H-tree stores how versions relate. It's a directed acyclic graph (DAG) where each node — called a crum — represents a version snapshot, and edges point from child to parent (newer to older).

Structure of an H-Tree Node

pub struct HUpperCrumData {
    hcut: TracePosition,                       // where this node was split
    o_parents: Vec<Arc<Mutex<dyn HPart>>>,    // parent references (DAG edges)
    bert_crum: Arc<Mutex<CanopyCrumData>>,    // canopy flags for this subtree
    base: HistoryCrumBase,                      // unique hash for identity
    _bert_canopy: BertCanopy,                   // reference to owning canopy
}

Each crum knows its parents (multiple parents = merged branches). Each carries a CanopyCrumData that summarizes the endorsement types present in its entire subtree. This is the data the canopy pruning algorithm uses.

How the H-Tree Grows

When a new edition is created:

  1. A new HUpperCrumData is allocated
  2. Its parent(s) are set to the previous edition's crum(s)
  3. A CanopyCrumData is created with this edition's endorsement flags
  4. Flags are propagated upward through the propagate_flags() function — each ancestor's flags become the bitwise OR of all descendants
Worked example: 4 edits to a document
Rev 0 (initial)     → crum hash=0x001, flags=[T]
    ↓ parent
Rev 1 (typo fix)    → crum hash=0x002, flags=[T]
    ↓ parent
Rev 2 (add links)   → crum hash=0x003, flags=[T, L]
    ↓ parent
Rev 3 (add image)   → crum hash=0x004, flags=[T, L, I]

After propagation:
  Rev 3 canopy: [T, L, I]         (own + all ancestors)
  Rev 2 canopy: [T, L]            (own + Rev 0, Rev 1)
  Rev 1 canopy: [T]               (own + Rev 0)
  Rev 0 canopy: [T]               (own only)

Query: "find Link transclusions"
  → Rev 3 has L → ✓ descend
  → Rev 2 has L → ✓ descend
  → Rev 1 no L  → ✗ skip Rev 1 and Rev 0 (pruned 50%)
  

Branching and Merging

The H-tree is a DAG, not a simple chain. If two users edit the same document concurrently, the H-tree branches. When those branches are reconciled (via DagWood, Section 8), a merge node gets two parents.

     Rev 0
      ↓
     Rev 1
    ↙    ↘           ← concurrent edits
  Rev 2a  Rev 2b
    ↘    ↙           ← merge
     Rev 3            → o_parents: [Rev 2a, Rev 2b]

The canopy correctly handles merge nodes: the merge crum's flags are the bitwise OR of both branches, so no matching content is lost.

Delayed Backfollow

The H-tree supports a concept called delayed backfollow — the ability to lazily discover transclusion relationships by walking the version chain backward. Instead of maintaining a complete index of every version's content, the system can traverse parent links on demand, using canopy flags to skip irrelevant branches.

This is particularly useful for:

Source: HUpperCrumData is defined in src/ent/htree.rs:39. from_two() at line 64 handles merge nodes with two parents. Flag propagation is in src/edition/canopy.rs (propagate_flags()). The HPart trait at line 18 provides the polymorphic parent interface.

7. Transclusion Queries at O(log n)

This is where everything comes together. A transclusion query asks: "which documents contain this text?" The naive approach scans every document. Xudanu answers it in O(log n) by combining content fingerprints, inverted indexes, canopy pruning, and H-tree traversal.

The Full Query Pipeline

Transclusion Query Pipeline find_transcluders("quick brown fox") 1 Hash the query text BLAKE3("quick brown fox") → 0xa3f7...b2c1 2 TransclusionIndex lookup HashMap<[u8;32], HashSet<u64>> → fingerprint → {work IDs} 0xa3f7... → {work_42, work_107, work_891} 3 H-tree walk with canopy pruning For each candidate work, walk its H-tree. At each node: if (node.flags & query_flags) == 0 → skip subtree else → descend 4 Verify content match TrailBlazer records each match → dedup via HashSet → Edition with results 5 Return results TransclusionResult { element, is_direct } × N matches Total: Naive O(W × L)Xudanu O(U × A + log n × hits) W=works, L=text length, U=unique fingerprints, A=avg works per fingerprint D7 — Transclusion Query Pipeline

Step-by-Step Trace

Query: find all documents containing "quick brown fox"

Step 1 — Hash:

BLAKE3("quick brown fox") → [0xa3, 0xf7, 0x1b, ..., 0xc1]  (32 bytes)

Step 2 — Index lookup:

TransclusionIndex.get(0xa3f7...) → {work_42, work_107, work_891}

Three candidate documents. We didn't scan any document text — just a HashMap lookup.

Step 3 — Canopy-filtered H-tree walk:

work_42 H-tree:
  Root flags [T,L,I] & TextFlag → match → descend
  Left child [T] & TextFlag → match → descend → found at Rev 3
  Right child [L,I] & TextFlag → no match → pruned

work_107 H-tree:
  Root flags [T] & TextFlag → match → descend → found at Rev 1

work_891 H-tree:
  Root flags [L,B] & TextFlag → no match → entire tree pruned

Step 4 — Verify and collect:

TrailBlazer.record(Text("quick brown fox"), work_42_rev3)  → true
TrailBlazer.record(Text("quick brown fox"), work_107_rev1)  → true

Step 5 — Results:

[
  TransclusionResult { element: Text("quick brown fox"), is_direct: true,  work: 42 },
  TransclusionResult { element: Text("quick brown fox"), is_direct: true,  work: 107 },
]

The TransclusionIndex

The TransclusionIndex is the core inverted index. It maps each BLAKE3 fingerprint to the set of work IDs that contain an element with that fingerprint:

pub struct TransclusionIndex {
    // fingerprint → set of work IDs containing this content
    index: HashMap<[u8; 32], HashSet<u64>>,
}

It's populated during BackfollowEngine::register_edition(): when a new edition is saved, each of its content-addressable elements is indexed. The index is also used by content_shared_region() to find overlapping content between two specific documents.

Range-Scoped Queries

The RangeTransclusionQuery in src/edition/range_transclusion.rs extends basic transclusion queries with position constraints:

Source: TransclusionIndex and TrailBlazer are in src/edition/transclusion.rs. RangeTransclusionQuery is in src/edition/range_transclusion.rs. The BackfollowEngine orchestrates the full pipeline in src/edition/backfollow.rs. content_shared_region() is in src/edition/shared_mapping.rs:58.

8. DagWood: Concurrent Edits

When two users edit the same document at the same time, both save successfully, and neither sees the other's changes until later. The DagWood is the data structure that handles this. It defines a partial ordering of edit positions and provides operations to create new positions, detect concurrency, and reconcile conflicts.

What Is a DagWood?

A DagWood is a tree of TracePositions. Each position represents a point in the edit timeline. The tree structure encodes ordering: positions on the same branch are ordered (sequential edits), while positions on different branches are concurrent.

pub struct DagWood {
    branches: BranchStore,                          // all branches
    root: TracePosition,                            // the root position
    trunk: HashMap<TracePosition, BranchId>,       // main line positions
    cached_position: Option<TracePosition>,        // last known position
    nav_cache: HashMap<BranchId, u32>,              // navigation cache
}

Concurrent Edit Scenario

DagWood: Resolving Concurrent Edits TIME → Root pos 0 Edit 1 pos 1 Edit 2 pos 2 ← fork point User A pos 2a "added intro" User B pos 2b "fixed typo" Concurrent! Neither edit knows about the other. Merge pos 3 Reconciliation: Both changes preserved BranchStore maintains branches as a HashMap. Each branch tracks parent, children, and position count. Navigation cache maps BranchId → max reachable position for fast ordering comparisons. D8 — Concurrent Edit Resolution via DagWood

How Positions Are Created

The DagWood provides two key operations:

OperationBehaviorWhen Used
new_position() Creates a position on the root branch, after the last known position Single-user sequential edits
new_position_after(p) Creates a successor to p. First successor is on the same branch; further successors branch off into a binary tree Concurrent edits after the same position
Branching in action
// User A saves after pos 2
let pos_a = dagwood.new_position_after(pos_2);
// → pos_2a on same branch (first successor)

// User B also saves after pos 2 (concurrent!)
let pos_b = dagwood.new_position_after(pos_2);
// → pos_2b on a NEW branch (second successor)
// → pos_a and pos_b are now concurrent, not ordered
  

Partial Ordering and Navigation

The DagWood's nav_cache enables fast ordering comparisons. Given two positions, the system can determine:

This is used by the TraceView to present a coherent ordering to clients even when the underlying history is a DAG.

Reconciliation Strategy

When concurrent edits need to be reconciled, Xudanu's current strategy is last-writer-wins at the element level. Both edits are preserved in the history; the H-tree stores both branches. The "current" version shown to users is determined by the reconciliation logic:

  1. Elements unique to one branch are kept
  2. Elements present in both branches use the later timestamp
  3. The merge creates a new H-tree node with both branches as parents
Source: DagWood is defined in src/ent/dagwood.rs:20. new_position() at line 69, new_position_after() at line 79. BranchStore and TracePosition are in src/ent/branch.rs and src/ent/trace.rs. The original C++ comment at line 8 notes: "Each dagwood defines a partial ordering of TracePositions."

9. Big-O Performance Summary

This section consolidates the performance characteristics of every major operation. The "naive" column shows what the C++ Udanax Gold code would have done if the disconnected components had been wired together with simple loops. The "Xudanu" column shows what the optimized Rust implementation achieves.

Operation Naive Xudanu How
Find works containing text O(W × L) O(U × A) Fingerprint HashMap lookup. U = unique fingerprints in query, A = avg works per fingerprint. No document scanning.
Find shared regions (diff) O(n × m) O(La + Lb) BTreeSet of fingerprints for O(1) membership test per element. Linear in document sizes, not their product.
Edition element listing O(n log n) every call O(1) after first OnceLock cache on the Edition struct, shared via Arc. First call traverses the O-tree; subsequent calls return the cached Vec.
Transclusion index lookup O(n) flat scan O(log n) Canopy-filtered H-tree traversal. Flag bits prune entire subtrees in a single comparison.
Position transform (shift) O(n) rewrite O(1) Dsp node wraps the O-tree with an offset. No elements are touched; positions are adjusted on read.
Shared region highlighting O(r × t) O(r + t) Sorted regions, single pass through text. Regions are merged by position, not compared pairwise.
Content deduplication O(n) compare each pair O(1) per element BLAKE3 fingerprint + HashMap lookup. Same content = same hash = same BeId. Instant.
O-tree split N/A (unsplit) O(n) One-time amortized cost when leaf exceeds MAX_LEAF_SIZE. Splits at median; depth grows as O(log n).
Snapshot checkpoint N/A (no persistence) O(state size) Full serialize via serde_json. Atomic write to temp file + rename. Versioned with SHA-256 checksum.
Xudanu optimized
Naive baseline
Index-based
Canopy-based

W = total works, L = text length, U = unique fingerprints, A = avg works per fingerprint, n/m = document lengths, r = shared regions, t = text

Key Takeaway

Content identity is the multiplier. Every optimization in this table traces back to BLAKE3 fingerprints giving us O(1) content identity. Without it: HashMap lookups become linear scans, canopy flags can't summarize content types, and Dsp nodes have nothing to deduplicate. The fingerprint is the lever that moves everything from O(n) to O(log n).

10. Labels, Paths, and Deep Linking

Xudanu doesn't just store flat text. It supports a rich addressing model that lets you name, link to, and navigate into specific positions within documents — even across documents via transclusion.

The Label System

A Label is a named anchor attached to a position in an O-tree. Labels are stored as RangeElement::Label entries, so they participate in the same structural sharing and versioning as text. When you create a new revision, labels from the previous version are automatically inherited.

Labels in an O-tree
Edition for "Getting Started Guide":
  pos 0: Text("# Getting Started")
  pos 1: Label("intro")            ← anchor named "intro"
  pos 2: Text("Welcome to Xudanu...")
  pos 3: Text("First, install the server.")
  pos 4: Label("install")          ← anchor named "install"
  pos 5: Text("Run `xudanu-server`...")
  

Paths and Tumblers

The Udanax Gold addressing model uses tumblers — hierarchical numeric paths that identify a position within a document, within a work, within the global namespace. Xudanu implements this via:

A full path to a label looks like:

work:42 / edition:7 / label:"intro"

This resolves through three lookups: work ID → current edition → label position. Each step is O(1) via HashMap.

Deep Linking

Because labels are stored as O-tree elements, they support the same operations as text:

HyperLinks

Xudanu implements bidirectional hyperlinks via the HyperLink and HyperRef types:

Source: Labels and IdentityMap are in src/edition/label.rs. Id and IdSpace are in src/edition/grandmap.rs:17. HyperLink and HyperRef are in src/edition/links.rs.

11. Persistence: Snapshots and Crash Safety

Xudanu persists the entire server state as a single JSON snapshot file (server.json). The persistence layer handles versioning, checksums, atomic writes, migration, and backup management.

The Versioned Snapshot Format

pub struct VersionedSnapshot {
    format_version: u32,          // current: 1
    created_at: String,           // ISO 8601 timestamp
    server_version: String,       // e.g. "0.1.1"
    checksum: String,             // SHA-256 of the data field
    data: serde_json::Value,      // the actual server state
}

Write Process: Atomic Checkpoint

When the server checkpoints (triggered by each save operation), the following steps occur:

  1. Serialize — the entire Server struct is serialized to a ServerSnapshot via to_snapshot()
  2. Checksum — SHA-256 hash of the serialized data is computed
  3. Wrap — data, checksum, timestamp, and version are combined into a VersionedSnapshot
  4. Write temp — the JSON is written to server.tmp
  5. Atomic renameserver.tmp is renamed to server.json
Checkpoint trace
1. to_snapshot()     → ServerSnapshot { works: [...], clubs: [...], links: [...], ... }
2. SHA-256(data)     → "a4f7c2e8..."
3. VersionedSnapshot  → { format_version: 1, checksum: "a4f7c2e8...", data: {...} }
4. write("server.tmp")  → 847 KB written
5. rename("server.tmp", "server.json")  → atomic on POSIX, nearly atomic on Windows
  

Read Process: Restore and Migrate

On startup, the server reads the snapshot with full validation:

  1. Detect version — check for format_version field (v0 = raw JSON, v1 = versioned wrapper)
  2. Migrate — if v0, wrap in v1 format. If future version, apply migration chain
  3. Verify checksum — SHA-256 of the data must match the stored checksum. ChecksumMismatch error if not
  4. Validate structure — check for required fields (grand_map_id_counter, works, etc.)
  5. Check disk space — ensure 3× current file size + 1MB is available before restoring
  6. Deserialize — reconstruct the Server struct via from_snapshot()

What Gets Snapshotted

ComponentWhat's Saved
WorksAll documents: current edition, revision count, edit/read clubs, sponsors, full history
ClubsPermission groups: public, admin, server clubs, and user-created clubs
LinksAll hyperlinks with source/destination references
Standalone editionsEditions not yet attached to a work (templates, drafts)
Content address indexThe full fingerprint ↔ BeId mapping
Federation statePeers, membership, governance, royalty ledger
Admin stateConnection acceptance, shutdown flag, access grants
Key historyServer keypair history for TLS
Blob metadataReferences to large binary objects stored on disk

Backup Management

When a migration occurs (e.g., v0 → v1), the system automatically:

Error Handling

The snapshot layer defines a comprehensive error enum:

pub enum SnapshotError {
    Io(std::io::Error),
    Json(serde_json::Error),
    Migration { from_version: u32, to_version: u32, reason: String },
    Validation(ValidationReport),
    InsufficientDiskSpace { required_bytes: u64, available_bytes: u64 },
    ChecksumMismatch { expected: String, actual: String },
}

The most critical errors:

Crash Safety Guarantees

Atomic write via temp file + rename: If the server crashes during checkpoint, the worst case is a stale server.tmp file left behind. The server.json file is always either the previous complete state or the new complete state — never a partial write. The rename operation is atomic on POSIX filesystems.
Current limitation: The entire server state is serialized as a single JSON file. For servers with many large documents, this file can grow to hundreds of megabytes. The v0.2 roadmap includes atomic chunked checkpoints that write only the changed portions, reducing both write time and disk usage.
Source: Snapshot versioning and I/O is in src/server/transport/snapshot.rs. VersionedSnapshot at line 7, write_versioned_snapshot() at line 278, full_restore() at line 326. The ServerSnapshot type and to_snapshot()/from_snapshot() are in src/server/server.rs (module persist_snapshot at line 3476).

12. Federation: CRDTs and PBFT

Xudanu supports multi-server content replication. A federated deployment lets documents and their transclusion relationships span multiple Xudanu servers, with eventual consistency guaranteed by CRDTs (Conflict-free Replicated Data Types) and coordinated trust decisions handled by PBFT (Practical Byzantine Fault Tolerance).

Federation Model

Each Xudanu server is identified by a server_id string. Servers know about their peers via a FederationConfig:

pub struct FederationConfig {
    pub enabled: bool,
    pub peers: Vec<PeerAddress>,        // host:port of known peers
    pub mode: FederationMode,             // Closed, Open, or Community
    pub min_endorsements: u32,            // trust threshold (default: 2)
}

pub enum FederationMode {
    Closed,      // only listed peers may connect
    Open,        // any server may join
    Community,   // peers vouch for new members (endorsement-based)
}

CRDT Building Blocks

Federation uses three CRDT primitives to handle different aspects of distributed state:

OrSet — Observed-Remove Set

Tracks sets where items can be added and removed concurrently across servers. Each add/remove is tagged with a unique event ID. Resolution: adds win over removes only if the remove has seen the add.

// Two servers add and remove concurrently
Server A: add("doc_42")   → events: {("doc_42", a1)}
Server B: remove("doc_42") → events: {("doc_42", b1)}

// Merge: "doc_42" is present because
// Server B's remove hasn't seen
// Server A's add (concurrent)
    

Used for: membership sets, known-peers, content origin tracking

LwwRegister — Last-Writer-Wins

A single-value register where the value with the latest timestamp wins. Ties are broken by server_id comparison for determinism.

pub struct LwwRegister<T> {
    value: T,
    timestamp: u64,
    server_id: String,
}

// Server A sets value at t=100
// Server B sets value at t=105
// Merge → Server B's value wins (higher t)
    

Used for: document content merges, configuration state

Federation Merge Diagram

Federation: Multi-Server Content Merge Server A server_id: "alpha" doc_1: "Hello world" doc_2: "shared text" ← doc_3: "Alpha only" Server B server_id: "beta" doc_1: "Hello ALL" doc_2: "shared text" ← doc_4: "Beta only" sync doc states Merge Resolution doc_1: LwwRegister → "Hello ALL" (beta t=105 > alpha t=100) doc_2: fingerprint match → same BeId (already shared) doc_3: OrSet add → present on alpha only doc_4: OrSet add → present on beta only Post-Merge: Both Servers doc_1: "Hello ALL" doc_2: "shared text" doc_3: "Alpha only" doc_4: "Beta only" All 4 documents visible on both servers. Transclusion index updated. Trust decisions (peer endorsement, content acceptance) require min_endorsements via PBFT consensus D9 — Federation Merge via CRDTs

Content Origin Tracking

When content crosses server boundaries, it carries its origin:

pub struct FederatedId {
    pub server_id: String,    // which server created this
    pub local_id: u64,        // the BeId on that server
}

This allows the system to:

Membership and Governance

The FederationState manages three aspects of distributed coordination:

ComponentCRDTPurpose
MembershipStateOrSetTrack which servers are members. Adds and removes are concurrent-safe.
known_peersHashMapAddress book of all known servers, including their public keys.
GovernanceState (PBFT)ConsensusCoordinate trust decisions: endorsing new peers, revoking memberships, changing configuration. Requires min_endorsements votes.
royalty_ledgerHashMapTrack cross-server content usage for micropayment settlement.

PBFT for Trust Decisions

Not all federation operations can use CRDTs. Some decisions — like accepting a new peer into a closed federation — require explicit agreement from existing members. Xudanu uses a simplified PBFT protocol for these:

  1. Propose — a server proposes a trust change (e.g., "add server gamma")
  2. Vote — each member server votes yes/no
  3. Decide — once min_endorsements votes are collected, the decision is committed
  4. Propagate — the decision is replicated to all members via OrSet
Source: Federation types are in src/server/federation.rs. FederationConfig at line 60, OrSet and LwwRegister implemented in the same file. FederatedId at line 8. The file is 3,656 lines including comprehensive property tests for CRDT merge semantics.

13. Comparison with Other Systems

Xudanu's design draws on ideas from several systems. Here's how it compares:

System Comparison Content Identity Transclusion Versioning Bidirectional Links Micropayments Federation Xudanu Git IPFS Web (larger area = more complete implementation) D10 — System Comparison Radar

Feature Comparison Table

Feature Xudanu Git IPFS Web
Content addressing BLAKE3 (per element) SHA-1/SHA-256 (per blob) SHA-256 (CID) URL (location-based)
Transclusion Native — element-level No File-level only No (copies)
Bidirectional links Yes — tracked server-side No No No (one-way only)
Version ancestry H-tree DAG Commit DAG No (immutable) No
Query pruning Canopy flags No tree pruning DHT routing N/A
Deduplication Automatic (GrandMap) Pack files Block-level No
Micropayments Per-element accounting No Filecoin (per deal) No
Federation CRDTs + PBFT Fork + merge DHT + bitswap CORS (same-origin)
Concurrent edits DagWood + reconcile Merge conflicts N/A (immutable) Last write wins
Structural sharing O-tree Arc sharing Tree objects No No

What Xudanu Borrows

14. Future Directions

v0.1.2 (Near Term)

FeatureDescriptionStatus
crates.io publication Publish the xudanu crate to crates.io for easy installation via cargo install xudanu Ready to ship
Atomic chunked checkpoints Split server.json into per-work chunks. Checkpoint only changed works, not the entire state. Designed, not implemented
WASM build Compile the core engine to WebAssembly for browser-based single-user mode Engine is no_std compatible

v0.2 (Medium Term)

FeatureDescription
Access control Per-document read/write permissions via Club-based ACLs. Already implemented server-side; needs client integration.
Scalable storage backend Replace single-file JSON with a chunked backend (SQLite or custom). Target: millions of documents without multi-GB snapshots.
Structured editing Expose Edition.combine/replace/copy to the UI for tables, outlines, nested documents. The engine already supports all composition operations.
Age heatmap view Walk revision history, diff consecutive versions, color characters by when they were added. Uses H-tree ancestry + shared region detection.
Reactive recorders Live transclusion notifications using SensorCanopy. When new content matches your registered queries, push a notification to the client.
Federation transport Server-to-server gRPC or HTTP/2 protocol for content replication. CRDTs are implemented; the wire protocol is not yet connected.

Longer Term

15. References

Primary Sources

Technical Concepts

Internal Documentation

Test coverage: 1,664 tests passing (1,467 unit + 190 integration + 7 TLS). Property-based tests (proptest) cover BLAKE3 fingerprinting, O-tree operations, CRDT merge semantics, and snapshot roundtrips. Every data structure described in this document has corresponding test coverage.