Xudanu Content + Performance Architecture

How O-tree, H-tree, and Canopy work together for O(log n) transclusion queries

O-Tree (OrglRoot) — Document Content Storage

Each document is an ordered sequence of RangeElements stored in a BTree. Elements are content-addressed via BLAKE3 fingerprints.

Doc A:
"The "
"quick brown fox"
" sat down"
Doc B:
"I saw a "
"quick brown fox"
" yesterday"
RangeElement Text("quick brown fox") fp: blake3 → 0xa3f7... BeId: 1002 (shared)
Same content = same fingerprint = same BeId. Deduplication is automatic.
fingerprint → TransclusionIndex
H-Tree — Version Ancestry + Transclusion Navigation

Editions are linked into a version DAG. Each node (crum) carries metadata about its subtree. Traversal walks parent↔child edges to find transclusion relationships.

Rev 0
Rev 1
Rev 2
Rev 3 (current)
Each crum: { prop flags, fingerprint summary, parent ref }
H-Tree Query
"Find all editions containing fingerprint 0xa3f7..."
→ Walk from root, skip subtrees whose fingerprint summary doesn't match
canopy flags prune branches
Canopy — Flag-Bit Pruning for O(log n) Traversal

Each H-tree node carries a bitmap of endorsement types (Text=1, Link=2, Blob=4...). The canopy merges these bitmaps upward so a parent's flags are the union of all children's flags. During traversal, if a parent's flags don't include the type you're looking for, skip the entire subtree.

Endorsement flags at each level:
Root (merged):
T
L
B
I
.
.
.
.
Left child:
T
L
B
I
.
.
.
.
Right child:
T
L
B
I
.
.
.
.
Query: "find Text transclusions" → skip right child (no T flag) → half the tree pruned
Pruning Result
Without canopy: visit every node
With canopy: visit only matching subtrees
→ O(log n) not O(n)
BackfollowEngine orchestrates
Query Flow — find_transcluders("quick brown fox")
1. BLAKE3 hash query text
2. Fingerprint → TransclusionIndex
3. H-tree walk (canopy-filtered)
4. Collect matching works
5. Results!

Big-O Performance

OperationNaiveXudanu (optimized)How
Find works containing text O(W × L) O(U × A) Fingerprint intersection: U = unique chars, A = avg works per char. HashMap lookup, no scan.
Find shared regions (diff) O(n × m) O(La + Lb) Fingerprint HashMap for seed discovery, then greedy longest-first claiming.
Edition element listing O(n log n) every call O(1) after first OnceLock cache on Edition struct. Shared via Arc on clone.
Transclusion index lookup O(n) flat scan O(log n) H-tree canopy-filtered traversal. Flag bits prune entire subtrees.
Shared region highlighting O(r × t) O(r + t) Sorted regions, single pass through text. No nested loops.
W = total works, L = text length, U = unique fingerprints, A = avg works per fingerprint, n/m = document lengths, r = regions, t = text

The Key Insight for Developers

Content identity enables all the optimizations. Because BLAKE3 fingerprints give us content-addressed identity for free, we can build inverted indexes (fingerprint → works), intersect them (shared content), cache them (OnceLock), and prune tree traversals (canopy flags). Without content-addressing, none of these O(log n) shortcuts work — you're back to O(n) string comparison.

This is the same insight behind Git (content-addressed objects), IPFS (content-addressed storage), and modern databases (hash indexes). Udanax Gold had it in the 1990s — just without BLAKE3.