You leave the Vault holding one fact: most artifacts are not stored
whole. A blob row usually holds a delta — a
tiny program for rebuilding the artifact out of another one. Useful
storage, but useless on its own: sooner or later something needs the
real bytes back. The function that hands them over is
content_get, and every part of Fossil that reads an
artifact — a diff, a checkout, a web page — goes through it.
This page takes that one function apart, slowly. By the last act you
will be able to read its C unaided.
Act 1 · The shape of the thing
Before any code, the picture. We will follow one real artifact the
whole way through this page: rid 177, a stored
version of www/pop.html from Fossil's own repository.
Ask the Vault for its bytes and it cannot simply unzip a row —
rid 177 is a delta, and a delta describes a change against
another artifact, not the file itself.
That other artifact is rid 176. But 176 is a delta too, against
175; and 175 against 214; and 214 against 432. Only rid 432
breaks the pattern — it carries no delta row, so its
content is the whole file, compressed. In the tour's
terms it is the anchor: the firm ground the chain finally
stands on. Five artifacts, four delta hops, all of them versions of
the same www/pop.html.
So reconstruction has a shape, and it goes both ways. First Fossil walks up the chain — 177 to 176 to 175 to 214 to 432 — collecting rids until it reaches an artifact that is whole. It rebuilds the anchor's 3,268 bytes from that compressed row. Then it turns around and works back down: apply 214's delta to the anchor, apply 175's delta to that result, then 176's, then 177's. Each hop runs one little program and produces the next artifact, until the 3,676 bytes of rid 177 — the artifact actually asked for — finally exist.
That up-then-down walk is the whole idea. The diagram below runs it on the real chain: press Reconstruct and watch 177 resolve, hop by hop — up the chain to the anchor, then rebuilt back down.
Press Reconstruct to watch the chain resolve.
Act 2 · Rebuilding it, draft by draft
Act 1 showed the shape: walk up to an anchor, rebuild back down.
Now we earn the function. Rather than read Fossil's
content_get cold, we build our own from the simplest
thing that compiles, then hit a wall, fix it, hit the next wall, fix
that — five drafts, each one forced by a concrete failure of
the last. By the end the draft has the same skeleton as the real
function, and Act 3 will be a short step.
One promise up front, so nothing surprises you: these stage
drafts skip error handling so the shape stays clear; the real
function in Act 3 checks every return. Each draft below is
the happy path only — it assumes every helper
succeeds and never inspects what they hand back. That is not how you
would ship C, and the panels say so: each is badged
teaching scaffold, meaning real, valid C, simplified from
content_get but never copied from it.
Three helpers do the heavy lifting and recur across every draft.
They are the real Fossil helpers, used here as black boxes
— we will not open them, only state each one's contract the
first time it appears: content_of_blob,
delta_source_rid, and blob_delta_apply.
Stage 1The simplest thing that compiles
Forget deltas for a moment. The simplest possible
content_get just fetches the artifact's row and
unzips it — one helper does exactly that, so the whole
function is a single line that hands its work straight off.
{ } Stage 1 — fetch and unzip, nothing else teaching scaffold teaching scaffold — not Fossil source
rid, the artifact's number, and pBlob, an out-parameter — the caller's empty Blob that this function fills with the answer.content_of_blob(rid, pBlob): fetches blob row rid, unzips its content into pBlob, returns 1 if found. Its return value is our return value — a return code: 1 for found, 0 for not.▸ Click any underlined term for the C explained.
Hand this version a delta artifact — rid 177 from
Act 1 — and content_of_blob dutifully unzips
its row. But that row does not hold the file. It holds the
delta program: the tiny set of copy-and-insert
instructions for rebuilding 177 out of 176.
So the caller asked for www/pop.html and got the
raw bytes of a delta program instead. Gibberish.
Stage 1 only works for whole, non-delta artifacts — and in
a real repository those are the rare minority.
- signature
- int
- return code
- out-parameter
- function call
- return
▸ Click a term above for the C explained.
Stage 2Recurse on the source
To handle a delta we need its source. delta_source_rid
tells us whether an artifact is a delta and, if so, which
artifact it builds against. If it is a delta, fetch its source,
fetch its delta program, apply one to the other. And since the
source might itself be a delta, content_get calls
itself on the source — that single self-call,
recursion, quietly handles a chain of any depth.
This draft is correct in shape, but read it as
happy-path-only: it never checks whether the helpers succeeded and
returns 1 unconditionally, purely to keep the
recursion legible. The real function checks every call — you
will see that in Act 3.
{ } Stage 2 — a delta is source plus delta, applied teaching scaffold teaching scaffold — not Fossil source
delta_source_rid(rid): returns the source rid if rid is a delta, or 0 if rid is a whole base.else branch — a block scope. They exist only where they are needed and vanish at the closing brace.& is address-of: it passes where delta lives, so the helper can fill it. The same trick the caller used to give us pBlob.blob_delta_apply(pSrc, pDelta, pTarget): runs the delta program pDelta against source pSrc, writing the result into pTarget; returns the output size, or a negative number if the delta is malformed.▸ Click any underlined term for the C explained.
Nothing here is wrong. Hand Stage 2 rid 177 and it returns the right bytes: it recurses to 176, to 175, to 214, to the anchor 432, then applies the deltas back down. Correct.
And yet Fossil does not ship this version. The reason is cost, not correctness — and it is exactly what Stage 3 is about to explain.
- declaration
- if / else
- address-of
- recursion
- comment
- block scope
▸ Click a term above for the C explained.
Stage 3Why Fossil doesn't recurse like that
Stage 2 is correct, but Fossil does not use it. Recursion's depth
grows with the chain: a 900-deep delta chain becomes 900 nested
calls, each holding its own src and delta
on the stack at once. Worse, the recursive shape cannot easily
cache the artifacts it rebuilds along the way —
something later stages will badly want.
So Fossil flattens the recursion into a loop. Walk up the chain collecting rids into a list; then iterate back down the list applying deltas. Same up-then-down shape as Act 1's diagram — now as plain iteration.
{ } Stage 3 — collect the chain, then walk it down teaching scaffold teaching scaffold — not Fossil source
a[0], a[1], and so on. Room for a chain up to 100 artifacts long.r is not 0. The walk up the chain — it ends once delta_source_rid returns 0 for the current rid, meaning that rid has no source. That rid, now stored in a[n-1], is the anchor.n++ is increment: shorthand for “add 1 to n”. After storing a rid, move to the next slot.a[n-1] is the anchor — the last rid collected. Unzip it whole; that is the firm ground the deltas apply onto.n-2 (the slot just below the anchor), keep going while i>=0, step i-- each pass. It walks the list downward.*pBlob is a dereference assignment: write next into the Blob pBlob points at. Each pass replaces the running result with the artifact one step further down.▸ Click any underlined term for the C explained.
int a[100] is a fixed promise: room for exactly 100
rids, decided when the function is written. Hand Stage 3 a delta
chain longer than 100 and the loop writes to a[100],
a[101], and beyond — slots that were never
part of the array.
C does not check array bounds. There is no error, no crash at the moment of the write — the program simply scribbles over whatever memory sat after the array. The corruption is silent, and surfaces later as something inexplicable. The list must be able to grow.
- array
- while loop
- for loop
- operators
- increment / decrement
- dereference assignment
▸ Click a term above for the C explained.
Step through the code below to watch each line move the chain in the diagram — up to the anchor, then rebuilt back down.
Press Next to step through the walk.
- int a[100];
- int n = 0;
- int r = rid;
- while( r!=0 ){
- a[n] = r;
- n++;
- r = delta_source_rid(r);
- }
- content_of_blob(a[n-1], pBlob);
- for(int i=n-2; i>=0; i--){
- Blob delta, next;
- content_of_blob(a[i], &delta);
- blob_delta_apply(pBlob, &delta, &next);
- *pBlob = next;
- }
- return 1;
Stage 4A chain of any length, safely
The fixed a[100] has to become a list that grows. C
gives you raw memory for that: fossil_malloc asks the
system for a block, fossil_realloc grows the block
when it fills, and free hands it back at the end.
Fossil also adds a guard — if the chain ever grows longer
than the number of artifacts that exist, the delta
table must be looping on itself, so it calls
fossil_panic rather than spin forever.
Below is the growth machinery, shown against Stage 3's walk: how the list is born, how it doubles when full, and how it is released.
{ } Stage 4 — a list that grows on demand teaching scaffold teaching scaffold — not Fossil source
a[0] and a[1]) before this loop, so n starts at 1 — this fragment picks up there.int *a is a pointer — it does not hold the list, it holds the address of one. Set to 0, the null pointer: a deliberate “points at nothing yet”.fossil_malloc asks the system for a block of memory. sizeof(a[0]) is the byte-size of one slot — times nAlloc gives room for 10. Now a points at a real, usable list.db_int(0, "SELECT ..."): runs the SQL query and returns the single integer result. Here: the largest rid that exists. If the chain is already longer than that, it must be looping.fossil_panic aborts the program with a message. A self-referential delta chain can never be resolved — far better to fail loudly than loop forever.fossil_realloc grows the block, keeping what is already in it. Doubling each time means a chain of any length costs only a handful of grow steps.fossil_malloc is borrowed, not given. free returns it. Skip this and the program leaks a little memory on every read.▸ Click any underlined term for the C explained.
This stage does not have a failing example — it is the
robustness fix. The “problem” it answers is the one
Stage 3 left open: the unbounded chain. With a
growing list there is no fixed ceiling, and the
fossil_panic guard turns a corrupt,
self-referential delta table into a loud, immediate
failure instead of an endless loop.
Stage 4 is correct and safe for a chain of any length. What it still does, though, is rebuild the entire chain from the anchor up on every single read — and that is what Stage 5 attacks.
- pointer
- null pointer
- malloc
- realloc
- sizeof
- panic
- string literal
▸ Click a term above for the C explained.
Stage 5Don't redo work — the cache, and recursion's return
Rebuilding the same long chain on every read is wasteful: read
rid 177 ten times and Stage 4 walks all five hops ten times
over. So Fossil keeps a contentCache — a
process-wide store of recently rebuilt artifacts. It plays three
parts. A missing set is an early-out: if an artifact is
known to be absent, return 0 at once. An in-cache lookup:
if the artifact is already cached, copy it out and return. And
during the down-walk, an insert of every 8th rebuilt
artifact, so a future read of this chain starts partway down
instead of at the anchor.
Here is the climax. With a cache in play, the up-walk no longer
has to reach a base — it can stop at the first artifact that
is either a base or already cached. And the
anchor it stopped at is resolved by a single recursive
content_get(a[n], pBlob): one self-call that
transparently handles both “anchor is a base” and
“anchor is a cache hit” with the same line. Recursion
returns — but now in a precise, minimal role, not as the
whole engine.
{ } Stage 5 — stop early at a cache hit, recurse once teaching scaffold teaching scaffold — not Fossil source
bag_find(&set, rid): returns true if rid is in the set. contentCache: a process-wide cache struct holding recently rebuilt artifacts. The .inCache picks one field out of that struct — a member access.(nextRid = delta_source_rid(nextRid))>0 is an assignment used as an expression. In one move it stores the next source and tests it — the loop both advances and checks its stop condition. With ! and &&: keep walking while the artifact is not cached and still has a source.a[n] is whatever the walk stopped at: a true base, or a cached artifact. One self-call handles both, because content_get itself checks the cache first thing.▸ Click any underlined term for the C explained.
Stage 4 left one cost on the table: every read re-walks the whole chain, anchor and all, even for an artifact read seconds ago. The cache is the answer — the missing set skips known-absent artifacts instantly, the in-cache lookup ends the walk early, and the every-8th insert seeds the cache so the next walk is shorter still.
That is the real content_get in outline: a chain
walk, a growing list, a panic guard, a cache, and one disciplined
recursive call. Act 3 opens the actual function
— the same shape, every return value now checked.
- member access
- assignment as expression
- recursion, revisited
- operators (&&, !)
▸ Click a term above for the C explained.
Act 3 · The whole building
Five drafts brought us to the doorstep. Here is the real
content_get — verbatim from Fossil 2.28, not a
scaffold, not simplified. Every shape you built in Act 2 is in here:
the cache early-outs, the chain walk up to an anchor, the one
recursive call, the down-walk applying deltas, the every-8th cache
insert. The difference is that the real function checks every return
value, so a handful of extra lines carry error handling we skipped.
Read it with the annotations first. Then press the button: cold mode hides every note, leaving nothing but the source. If you can follow it unaided, the page has done its job.
{ } The whole function src/content.c · lines 237–317
g: Fossil's process-wide global-state struct; the field g.repositoryOpen is true when a repository database is open. The assert states a precondition: content_get assumes a repository is already open, and fails loudly in a debug build if it is not.blob_zero(pBlob): initialises a Blob to empty. The caller's pBlob is cleared first thing, so every early return below hands back an empty blob rather than stale bytes.contentCache.missing is the set of rids already proven absent. If this rid is in it, there is nothing to fetch — return 0 (failure) at once, no database touched.a[] for its slot. Black box — blob_copy(pTo, pFrom): copies one Blob's contents into another — the cached bytes are handed straight to the caller. age = nextAge++ marks this slot as just-used (post-increment, so the slot gets the old value and the counter steps on). Return 1 (success): no chain walk needed.delta_source_rid answers the one question: 0 means this artifact is a whole base — fetch and unzip it directly. Anything else means it is a delta, and the chain walk begins.n was declared as int n = 1 on line 266, but that initialiser is immediately overwritten here. This is the operative assignment: after a[0] = rid and a[1] = nextRid have been stored, setting n = 1 establishes the loop invariant — n is the index of the last filled slot in a[].a[] of rids: this artifact, its source, its source's source… The loop both advances (nextRid = delta_source_rid(nextRid), an assignment used as an expression) and stops — when an artifact is already in cache, or has no source (>0 fails). Either way, a[n] ends up at a resolvable anchor.delta table must contain a cycle — fossil_panic aborts the program rather than spin forever or grow the list without bound.a[n] is the anchor the walk stopped at — a true base, or a cached artifact. content_get calls itself to resolve it, and that one self-call handles both cases, because content_get checks the cache first thing. This is the only recursion left; everything below is a plain loop.pBlob, step back down the list applying each delta in turn. content_of_blob fetches the raw delta bytes into the local delta blob; then blob_delta_apply (line 294) applies that delta program to pBlob, writing the rebuilt artifact into next. rc guards each step, so a failure anywhere stops the loop.blob_delta_apply returned a negative number, meaning the delta was malformed and could not be applied. Three things happen here that are worth pausing on. First, rc = 1 is a no-op: the code only enters this branch from inside if( rc ){, so rc is already 1. The assignment changes nothing. Second, delta is not freed here — only the else branch calls blob_reset(&delta), so on a failed delta apply the delta blob leaks. Third, *pBlob = next is skipped, so pBlob retains the previous step’s artifact; yet the loop continues (rc stays truthy, n--), carrying that stale state into the next step. This is the genuine Fossil 2.28 source (src/content.c line 295); the tour reproduces it as written. No commit message or comment explains the intent, so it is presented honestly rather than silently “corrected”.blob_reset(pBlob): frees a Blob's contents, leaving it empty. The raw delta has been consumed by blob_delta_apply, so its memory is released here.(mx-n) is how far down the chain we have walked. Every eighth step, black box — content_cache_insert(rid, pBlob): stores the freshly rebuilt artifact in the shared content cache (it mutates the global cache). Caching every artifact would bloat memory; caching one in eight seeds enough waypoints that future walks stop early. The other seven steps just blob_reset(pBlob) — free the intermediate result before overwriting it.*pBlob = next writes the just-rebuilt artifact back through the pointer, making it the source for the next step down. This line runs only on a successful delta apply — the surprise branch above skips it.bag_insert(&set, rid): adds rid to a set. A failed reconstruction joins contentCache.missing so the first early-out skips it next time; a success joins contentCache.available. The function teaches the cache as a side effect of every call.▸ Click any underlined term for the C explained.
Leaving content_get, you now know
- The function has one shape: walk up the delta chain to an anchor, then apply deltas back down — cache early-outs and error checks wrap that core, nothing more.
- It iterates rather than recurses for the walk — an explicit list
a[]instead of nested calls. The single remaining recursive call (content_get(a[n], pBlob)) resolves the anchor, and works for both a true base and a cache hit. - It caches every 8th rebuilt artifact: caching all of them would bloat memory, caching none would re-walk the full chain each time — one in eight seeds enough waypoints to keep future walks short.
- A delta chain that loops back on itself triggers a
fossil_panic— detected by the chain growing longer than the repository has blobs, it fails loudly instead of spinning forever.