The Vault
{ }

An interior · The Vault · src/content.c

Inside content_get

The function that turns a delta back into a file.

You leave the Vault holding one fact: most artifacts are not stored whole. A blob row usually holds a delta — a tiny program for rebuilding the artifact out of another one. Useful storage, but useless on its own: sooner or later something needs the real bytes back. The function that hands them over is content_get, and every part of Fossil that reads an artifact — a diff, a checkout, a web page — goes through it. This page takes that one function apart, slowly. By the last act you will be able to read its C unaided.

Act 1 · The shape of the thing

Before any code, the picture. We will follow one real artifact the whole way through this page: rid 177, a stored version of www/pop.html from Fossil's own repository. Ask the Vault for its bytes and it cannot simply unzip a row — rid 177 is a delta, and a delta describes a change against another artifact, not the file itself.

That other artifact is rid 176. But 176 is a delta too, against 175; and 175 against 214; and 214 against 432. Only rid 432 breaks the pattern — it carries no delta row, so its content is the whole file, compressed. In the tour's terms it is the anchor: the firm ground the chain finally stands on. Five artifacts, four delta hops, all of them versions of the same www/pop.html.

So reconstruction has a shape, and it goes both ways. First Fossil walks up the chain — 177 to 176 to 175 to 214 to 432 — collecting rids until it reaches an artifact that is whole. It rebuilds the anchor's 3,268 bytes from that compressed row. Then it turns around and works back down: apply 214's delta to the anchor, apply 175's delta to that result, then 176's, then 177's. Each hop runs one little program and produces the next artifact, until the 3,676 bytes of rid 177 — the artifact actually asked for — finally exist.

That up-then-down walk is the whole idea. The diagram below runs it on the real chain: press Reconstruct and watch 177 resolve, hop by hop — up the chain to the anchor, then rebuilt back down.

rid 177 delta
rid 176 delta
rid 175 delta
rid 214 delta
rid 432 base

Press Reconstruct to watch the chain resolve.

Act 2 · Rebuilding it, draft by draft

Act 1 showed the shape: walk up to an anchor, rebuild back down. Now we earn the function. Rather than read Fossil's content_get cold, we build our own from the simplest thing that compiles, then hit a wall, fix it, hit the next wall, fix that — five drafts, each one forced by a concrete failure of the last. By the end the draft has the same skeleton as the real function, and Act 3 will be a short step.

One promise up front, so nothing surprises you: these stage drafts skip error handling so the shape stays clear; the real function in Act 3 checks every return. Each draft below is the happy path only — it assumes every helper succeeds and never inspects what they hand back. That is not how you would ship C, and the panels say so: each is badged teaching scaffold, meaning real, valid C, simplified from content_get but never copied from it.

Three helpers do the heavy lifting and recur across every draft. They are the real Fossil helpers, used here as black boxes — we will not open them, only state each one's contract the first time it appears: content_of_blob, delta_source_rid, and blob_delta_apply.

Stage 1The simplest thing that compiles

Forget deltas for a moment. The simplest possible content_get just fetches the artifact's row and unzips it — one helper does exactly that, so the whole function is a single line that hands its work straight off.

{ } Stage 1 — fetch and unzip, nothing else teaching scaffold teaching scaffold — not Fossil source
1int content_get(int rid, Blob *pBlob){
Two things in: rid, the artifact's number, and pBlob, an out-parameter — the caller's empty Blob that this function fills with the answer.
2 return content_of_blob(rid, pBlob);
Black box — content_of_blob(rid, pBlob): fetches blob row rid, unzips its content into pBlob, returns 1 if found. Its return value is our return value — a return code: 1 for found, 0 for not.
3}

▸ Click any underlined term for the C explained.

Where it fails

Hand this version a delta artifact — rid 177 from Act 1 — and content_of_blob dutifully unzips its row. But that row does not hold the file. It holds the delta program: the tiny set of copy-and-insert instructions for rebuilding 177 out of 176.

So the caller asked for www/pop.html and got the raw bytes of a delta program instead. Gibberish. Stage 1 only works for whole, non-delta artifacts — and in a real repository those are the rare minority.

C you just met
  • signature
  • int
  • return code
  • out-parameter
  • function call
  • return

▸ Click a term above for the C explained.

Stage 2Recurse on the source

To handle a delta we need its source. delta_source_rid tells us whether an artifact is a delta and, if so, which artifact it builds against. If it is a delta, fetch its source, fetch its delta program, apply one to the other. And since the source might itself be a delta, content_get calls itself on the source — that single self-call, recursion, quietly handles a chain of any depth.

This draft is correct in shape, but read it as happy-path-only: it never checks whether the helpers succeeded and returns 1 unconditionally, purely to keep the recursion legible. The real function checks every call — you will see that in Act 3.

{ } Stage 2 — a delta is source plus delta, applied teaching scaffold teaching scaffold — not Fossil source
1int content_get(int rid, Blob *pBlob){
2 int srcRid = delta_source_rid(rid);
Black box — delta_source_rid(rid): returns the source rid if rid is a delta, or 0 if rid is a whole base.
3 if( srcRid==0 ){
4 return content_of_blob(rid, pBlob); /* a whole base — done */
5 }else{
6 Blob src, delta;
These two locals are declared inside the else branch — a block scope. They exist only where they are needed and vanish at the closing brace.
7 content_get(srcRid, &src); /* recurse: rebuild the source */
A function calling itself is recursion. The source might be a delta too — this one call resolves the whole rest of the chain before returning.
8 content_of_blob(rid, &delta); /* fetch this delta program */
The & is address-of: it passes where delta lives, so the helper can fill it. The same trick the caller used to give us pBlob.
9 blob_delta_apply(&src, &delta, pBlob); /* src + delta -> pBlob */
Black box — blob_delta_apply(pSrc, pDelta, pTarget): runs the delta program pDelta against source pSrc, writing the result into pTarget; returns the output size, or a negative number if the delta is malformed.
10 return 1; /* happy path only */
11 }
12}

▸ Click any underlined term for the C explained.

This works, but —

Nothing here is wrong. Hand Stage 2 rid 177 and it returns the right bytes: it recurses to 176, to 175, to 214, to the anchor 432, then applies the deltas back down. Correct.

And yet Fossil does not ship this version. The reason is cost, not correctness — and it is exactly what Stage 3 is about to explain.

C you just met
  • declaration
  • if / else
  • address-of
  • recursion
  • comment
  • block scope

▸ Click a term above for the C explained.

Stage 3Why Fossil doesn't recurse like that

Stage 2 is correct, but Fossil does not use it. Recursion's depth grows with the chain: a 900-deep delta chain becomes 900 nested calls, each holding its own src and delta on the stack at once. Worse, the recursive shape cannot easily cache the artifacts it rebuilds along the way — something later stages will badly want.

So Fossil flattens the recursion into a loop. Walk up the chain collecting rids into a list; then iterate back down the list applying deltas. Same up-then-down shape as Act 1's diagram — now as plain iteration.

{ } Stage 3 — collect the chain, then walk it down teaching scaffold teaching scaffold — not Fossil source
1int content_get(int rid, Blob *pBlob){
2 int a[100]; /* the chain: this rid, its source, ... */
An array: 100 integer slots in a row, reached as a[0], a[1], and so on. Room for a chain up to 100 artifacts long.
3 int n = 0;
4 int r = rid;
5 while( r!=0 ){
A while loop: repeat the body as long as r is not 0. The walk up the chain — it ends once delta_source_rid returns 0 for the current rid, meaning that rid has no source. That rid, now stored in a[n-1], is the anchor.
6 a[n] = r;
7 n++;
n++ is increment: shorthand for “add 1 to n”. After storing a rid, move to the next slot.
8 r = delta_source_rid(r);
9 }
10 /* a[0] is the target, a[n-1] is the anchor. Resolve bottom-up. */
11 content_of_blob(a[n-1], pBlob);
Slot a[n-1] is the anchor — the last rid collected. Unzip it whole; that is the firm ground the deltas apply onto.
12 for(int i=n-2; i>=0; i--){
A for loop bundles the count into one line: start at n-2 (the slot just below the anchor), keep going while i>=0, step i-- each pass. It walks the list downward.
13 Blob delta, next;
14 content_of_blob(a[i], &delta);
15 blob_delta_apply(pBlob, &delta, &next);
16 *pBlob = next;
*pBlob is a dereference assignment: write next into the Blob pBlob points at. Each pass replaces the running result with the artifact one step further down.
17 }
18 return 1; /* happy path only */
19}

▸ Click any underlined term for the C explained.

Where it fails

int a[100] is a fixed promise: room for exactly 100 rids, decided when the function is written. Hand Stage 3 a delta chain longer than 100 and the loop writes to a[100], a[101], and beyond — slots that were never part of the array.

C does not check array bounds. There is no error, no crash at the moment of the write — the program simply scribbles over whatever memory sat after the array. The corruption is silent, and surfaces later as something inexplicable. The list must be able to grow.

C you just met
  • array
  • while loop
  • for loop
  • operators
  • increment / decrement
  • dereference assignment

▸ Click a term above for the C explained.

Step through the code below to watch each line move the chain in the diagram — up to the anchor, then rebuilt back down.

rid 177 delta
rid 176 delta
rid 175 delta
rid 214 delta
rid 432 base

Press Next to step through the walk.

  1. int a[100];
  2. int n = 0;
  3. int r = rid;
  4. while( r!=0 ){
  5. a[n] = r;
  6. n++;
  7. r = delta_source_rid(r);
  8. }
  9. content_of_blob(a[n-1], pBlob);
  10. for(int i=n-2; i>=0; i--){
  11. Blob delta, next;
  12. content_of_blob(a[i], &delta);
  13. blob_delta_apply(pBlob, &delta, &next);
  14. *pBlob = next;
  15. }
  16. return 1;

Stage 4A chain of any length, safely

The fixed a[100] has to become a list that grows. C gives you raw memory for that: fossil_malloc asks the system for a block, fossil_realloc grows the block when it fills, and free hands it back at the end. Fossil also adds a guard — if the chain ever grows longer than the number of artifacts that exist, the delta table must be looping on itself, so it calls fossil_panic rather than spin forever.

Below is the growth machinery, shown against Stage 3's walk: how the list is born, how it doubles when full, and how it is released.

{ } Stage 4 — a list that grows on demand teaching scaffold teaching scaffold — not Fossil source
1 int n = 1;
The real function has already stored the first two rids (a[0] and a[1]) before this loop, so n starts at 1 — this fragment picks up there.
2 int nAlloc = 10;
3 int *a = 0; /* not pointing anywhere yet */
int *a is a pointer — it does not hold the list, it holds the address of one. Set to 0, the null pointer: a deliberate “points at nothing yet”.
4 a = fossil_malloc( sizeof(a[0])*nAlloc ); /* now a owns real memory */
fossil_malloc asks the system for a block of memory. sizeof(a[0]) is the byte-size of one slot — times nAlloc gives room for 10. Now a points at a real, usable list.
5 /* ... walk the chain, and when the list is full: ... */
6 if( n>=nAlloc ){
7 if( n>db_int(0, "SELECT max(rid) FROM blob") ){
Black box — db_int(0, "SELECT ..."): runs the SQL query and returns the single integer result. Here: the largest rid that exists. If the chain is already longer than that, it must be looping.
8 fossil_panic("infinite loop in DELTA table");
fossil_panic aborts the program with a message. A self-referential delta chain can never be resolved — far better to fail loudly than loop forever.
9 }
10 nAlloc = nAlloc*2 + 10;
11 a = fossil_realloc(a, nAlloc*sizeof(a[0]));
fossil_realloc grows the block, keeping what is already in it. Doubling each time means a chain of any length costs only a handful of grow steps.
12 }
13 /* ... after the down-walk: ... */
14 free(a);
Memory from fossil_malloc is borrowed, not given. free returns it. Skip this and the program leaks a little memory on every read.

▸ Click any underlined term for the C explained.

The wall this clears

This stage does not have a failing example — it is the robustness fix. The “problem” it answers is the one Stage 3 left open: the unbounded chain. With a growing list there is no fixed ceiling, and the fossil_panic guard turns a corrupt, self-referential delta table into a loud, immediate failure instead of an endless loop.

Stage 4 is correct and safe for a chain of any length. What it still does, though, is rebuild the entire chain from the anchor up on every single read — and that is what Stage 5 attacks.

C you just met
  • pointer
  • null pointer
  • malloc
  • realloc
  • sizeof
  • panic
  • string literal

▸ Click a term above for the C explained.

Stage 5Don't redo work — the cache, and recursion's return

Rebuilding the same long chain on every read is wasteful: read rid 177 ten times and Stage 4 walks all five hops ten times over. So Fossil keeps a contentCache — a process-wide store of recently rebuilt artifacts. It plays three parts. A missing set is an early-out: if an artifact is known to be absent, return 0 at once. An in-cache lookup: if the artifact is already cached, copy it out and return. And during the down-walk, an insert of every 8th rebuilt artifact, so a future read of this chain starts partway down instead of at the anchor.

Here is the climax. With a cache in play, the up-walk no longer has to reach a base — it can stop at the first artifact that is either a base or already cached. And the anchor it stopped at is resolved by a single recursive content_get(a[n], pBlob): one self-call that transparently handles both “anchor is a base” and “anchor is a cache hit” with the same line. Recursion returns — but now in a precise, minimal role, not as the whole engine.

{ } Stage 5 — stop early at a cache hit, recurse once teaching scaffold teaching scaffold — not Fossil source
1 while( !bag_find(&contentCache.inCache, nextRid)
Black box — bag_find(&set, rid): returns true if rid is in the set. contentCache: a process-wide cache struct holding recently rebuilt artifacts. The .inCache picks one field out of that struct — a member access.
2 && (nextRid = delta_source_rid(nextRid))>0 ){
The teaching example: (nextRid = delta_source_rid(nextRid))>0 is an assignment used as an expression. In one move it stores the next source and tests it — the loop both advances and checks its stop condition. With ! and &&: keep walking while the artifact is not cached and still has a source.
3 /* ... grow the list ... */
4 }
5 rc = content_get(a[n], pBlob); /* resolve the anchor: base OR cache hit */
Recursion, returning — but minimal now. a[n] is whatever the walk stopped at: a true base, or a cached artifact. One self-call handles both, because content_get itself checks the cache first thing.

▸ Click any underlined term for the C explained.

The cost this clears

Stage 4 left one cost on the table: every read re-walks the whole chain, anchor and all, even for an artifact read seconds ago. The cache is the answer — the missing set skips known-absent artifacts instantly, the in-cache lookup ends the walk early, and the every-8th insert seeds the cache so the next walk is shorter still.

That is the real content_get in outline: a chain walk, a growing list, a panic guard, a cache, and one disciplined recursive call. Act 3 opens the actual function — the same shape, every return value now checked.

C you just met
  • member access
  • assignment as expression
  • recursion, revisited
  • operators (&&, !)

▸ Click a term above for the C explained.

Act 3 · The whole building

Five drafts brought us to the doorstep. Here is the real content_get — verbatim from Fossil 2.28, not a scaffold, not simplified. Every shape you built in Act 2 is in here: the cache early-outs, the chain walk up to an anchor, the one recursive call, the down-walk applying deltas, the every-8th cache insert. The difference is that the real function checks every return value, so a handful of extra lines carry error handling we skipped.

Read it with the annotations first. Then press the button: cold mode hides every note, leaving nothing but the source. If you can follow it unaided, the page has done its job.

{ } The whole function src/content.c · lines 237–317
237int content_get(int rid, Blob *pBlob){
238 int rc;
239 int i;
240 int nextRid;
241
242 assert( g.repositoryOpen );
g: Fossil's process-wide global-state struct; the field g.repositoryOpen is true when a repository database is open. The assert states a precondition: content_get assumes a repository is already open, and fails loudly in a debug build if it is not.
243 blob_zero(pBlob);
Black box — blob_zero(pBlob): initialises a Blob to empty. The caller's pBlob is cleared first thing, so every early return below hands back an empty blob rather than stale bytes.
244 if( rid==0 ) return 0;
245
246 /* Early out if we know the content is not available */
247 if( bag_find(&contentCache.missing, rid) ){
248 return 0;
249 }
First early-out. contentCache.missing is the set of rids already proven absent. If this rid is in it, there is nothing to fetch — return 0 (failure) at once, no database touched.
250
251 /* Look for the artifact in the cache first */
252 if( bag_find(&contentCache.inCache, rid) ){
253 for(i=0; i<contentCache.n; i++){
254 if( contentCache.a[i].rid==rid ){
255 blob_copy(pBlob, &contentCache.a[i].content);
256 contentCache.a[i].age = contentCache.nextAge++;
257 return 1;
258 }
259 }
260 }
Second early-out. If the rid is in the cache, scan the cache array a[] for its slot. Black box — blob_copy(pTo, pFrom): copies one Blob's contents into another — the cached bytes are handed straight to the caller. age = nextAge++ marks this slot as just-used (post-increment, so the slot gets the old value and the counter steps on). Return 1 (success): no chain walk needed.
261
262 nextRid = delta_source_rid(rid);
263 if( nextRid==0 ){
264 rc = content_of_blob(rid, pBlob);
265 }else{
No cache shortcut taken. delta_source_rid answers the one question: 0 means this artifact is a whole base — fetch and unzip it directly. Anything else means it is a delta, and the chain walk begins.
266 int n = 1;
267 int nAlloc = 10;
268 int *a = 0;
269 int mx;
270 Blob delta, next;
271
272 a = fossil_malloc( sizeof(a[0])*nAlloc );
273 a[0] = rid;
274 a[1] = nextRid;
275 n = 1;
n was declared as int n = 1 on line 266, but that initialiser is immediately overwritten here. This is the operative assignment: after a[0] = rid and a[1] = nextRid have been stored, setting n = 1 establishes the loop invariant — n is the index of the last filled slot in a[].
276 while( !bag_find(&contentCache.inCache, nextRid)
277 && (nextRid = delta_source_rid(nextRid))>0 ){
The chain walk. Build a list a[] of rids: this artifact, its source, its source's source… The loop both advances (nextRid = delta_source_rid(nextRid), an assignment used as an expression) and stops — when an artifact is already in cache, or has no source (>0 fails). Either way, a[n] ends up at a resolvable anchor.
278 n++;
279 if( n>=nAlloc ){
280 if( n>db_int(0, "SELECT max(rid) FROM blob") ){
281 fossil_panic("infinite loop in DELTA table");
282 }
The chain-loop guard. A healthy chain can never be longer than the number of blobs in the repository. If the walk has visited more rids than that, the delta table must contain a cycle — fossil_panic aborts the program rather than spin forever or grow the list without bound.
283 nAlloc = nAlloc*2 + 10;
284 a = fossil_realloc(a, nAlloc*sizeof(a[0]));
285 }
286 a[n] = nextRid;
287 }
288 mx = n;
289 rc = content_get(a[n], pBlob);
The one recursive call. a[n] is the anchor the walk stopped at — a true base, or a cached artifact. content_get calls itself to resolve it, and that one self-call handles both cases, because content_get checks the cache first thing. This is the only recursion left; everything below is a plain loop.
290 n--;
291 while( rc && n>=0 ){
292 rc = content_of_blob(a[n], &delta);
293 if( rc ){
The down-walk. Starting from the anchor in pBlob, step back down the list applying each delta in turn. content_of_blob fetches the raw delta bytes into the local delta blob; then blob_delta_apply (line 294) applies that delta program to pBlob, writing the rebuilt artifact into next. rc guards each step, so a failure anywhere stops the loop.
294 if( blob_delta_apply(pBlob, &delta, &next)<0 ){
295 rc = 1;
296 }else{
An unexplained oddity — flagged, not smoothed over. This branch is reached because blob_delta_apply returned a negative number, meaning the delta was malformed and could not be applied. Three things happen here that are worth pausing on. First, rc = 1 is a no-op: the code only enters this branch from inside if( rc ){, so rc is already 1. The assignment changes nothing. Second, delta is not freed here — only the else branch calls blob_reset(&delta), so on a failed delta apply the delta blob leaks. Third, *pBlob = next is skipped, so pBlob retains the previous step’s artifact; yet the loop continues (rc stays truthy, n--), carrying that stale state into the next step. This is the genuine Fossil 2.28 source (src/content.c line 295); the tour reproduces it as written. No commit message or comment explains the intent, so it is presented honestly rather than silently “corrected”.
297 blob_reset(&delta);
Black box — blob_reset(pBlob): frees a Blob's contents, leaving it empty. The raw delta has been consumed by blob_delta_apply, so its memory is released here.
298 if( (mx-n)%8==0 ){
299 content_cache_insert(a[n+1], pBlob);
300 }else{
301 blob_reset(pBlob);
302 }
The every-8th cache insert. (mx-n) is how far down the chain we have walked. Every eighth step, black box — content_cache_insert(rid, pBlob): stores the freshly rebuilt artifact in the shared content cache (it mutates the global cache). Caching every artifact would bloat memory; caching one in eight seeds enough waypoints that future walks stop early. The other seven steps just blob_reset(pBlob) — free the intermediate result before overwriting it.
303 *pBlob = next;
*pBlob = next writes the just-rebuilt artifact back through the pointer, making it the source for the next step down. This line runs only on a successful delta apply — the surprise branch above skips it.
304 }
305 }
306 n--;
307 }
308 free(a);
309 if( !rc ) blob_reset(pBlob);
310 }
311 if( rc==0 ){
312 bag_insert(&contentCache.missing, rid);
313 }else{
314 bag_insert(&contentCache.available, rid);
315 }
Record the outcome. Black box — bag_insert(&set, rid): adds rid to a set. A failed reconstruction joins contentCache.missing so the first early-out skips it next time; a success joins contentCache.available. The function teaches the cache as a side effect of every call.
316 return rc;
317}

▸ Click any underlined term for the C explained.

Leaving content_get, you now know

  • The function has one shape: walk up the delta chain to an anchor, then apply deltas back down — cache early-outs and error checks wrap that core, nothing more.
  • It iterates rather than recurses for the walk — an explicit list a[] instead of nested calls. The single remaining recursive call (content_get(a[n], pBlob)) resolves the anchor, and works for both a true base and a cache hit.
  • It caches every 8th rebuilt artifact: caching all of them would bloat memory, caching none would re-walk the full chain each time — one in eight seeds enough waypoints to keep future walks short.
  • A delta chain that loops back on itself triggers a fossil_panic — detected by the chain growing longer than the repository has blobs, it fails loudly instead of spinning forever.
Back to the district
The Vault