The city map
2

District 2 of 7 · The content store

The Vault

66,578 artifacts — without storing 66,578 whole files.

You arrive from Bedrock knowing the shape of things: every artifact is one row in the blob table, and each row has a content column. It is natural to picture the file's bytes sitting in that column, waiting. They are not. Open a row and you find something stranger, and far cleverer — the reason a 24-year project fits in a 68-megabyte file.

1What the content column really holds.

In Bedrock, the blob schema annotated one column as “Compressed content of this record.” Two separate truths hide inside that short phrase, and the Vault is the district that unpacks both.

The first: the bytes are zlib-compressed. A plain SELECT content FROM blob hands you a compressed lump, not readable text — something must inflate it first.

The second is the surprising one: the column often does not hold the artifact at all. It holds a delta — a description of how to build the artifact out of a different artifact. Fossil's own schema comment, the one we read in Bedrock, said exactly this and we walked straight past it: “it might hold the full text of the record or it might hold a delta.” Time to take it seriously.

2Most artifacts are deltas.

Bedrock's schema blueprint showed a second table sitting right next to blob: the delta table, just two columns — rid and srcid. Its job is to record a single fact: “blob rid is stored as a delta against blob srcid.” If a blob has a row in delta, its content is a delta; if it does not, its content is the whole (compressed) artifact.

Back in the sqlite3 shell from Bedrock, the scale of it is one query away. Step through:

sqlite3 fossil-scm.fossil

Read those last two numbers together — they are the whole point of the Vault:

8.5 GB
artifact content
(every version of everything)
38 MB
actually on disk
≈222×
smaller

Eight and a half gigabytes of logical content — the full text of every file at every revision across 24 years — held in 38 megabytes. Two mechanisms stack to do this: zlib squeezes each stored lump, and delta-encoding means most lumps are tiny descriptions of change rather than whole files. 88% of all artifacts are stored as deltas.

One more wrinkle worth holding onto: a delta's source can itself be a delta. Sources chain. Artifact 66 is a delta against 294; 294 might be a delta against something else again. That chain is the puzzle Street 4 has to solve.

3A delta is a little program.

“Delta” can sound vague — some fuzzy notion of a difference. It is not vague at all. A Fossil delta is a precise, tiny program: a sequence of instructions for building the target artifact out of a source artifact. Per Fossil's own delta_format.wiki, it has three parts — a header (the size of the target), a segment-list (the instructions), and a trailer (a checksum to verify the result).

The segment-list uses just two kinds of instruction:

  • Copy — “copy N bytes starting at offset in the source.” This is where the saving comes from: a long shared run becomes a few-byte instruction.
  • Insert — “insert these N brand-new literal bytes.” Only the genuinely new material is spelled out in full.

Here is a delta in miniature. The source is one short string; the delta is four instructions; the target is what running them produces. Click an instruction to see what it contributes.

Source — an artifact Fossil already has
the quick brown fox
The delta — a four-instruction program

Blue = copied from the source · amber = newly inserted.

Target — the artifact the delta reconstructs
the quick red fox jumps

Notice what the delta never contains: the words “the quick” and “fox.” They already exist in the source, so the delta just points at them. For a one-line edit to a large file, the delta is two copy instructions (the unchanged bulk, before and after) and one small insert (the new line) — a few dozen bytes to encode a change in a hundred-kilobyte file. Multiply across 58,394 deltas and you get the 222× from Street 2.

On disk the instructions are written compactly — a copy as length@offset,, an insert as length:bytes, the whole thing topped with the target size and tailed with the checksum. The shape, though, is exactly the four rows above.

4Reconstruction.

A delta is only useful if something can run it. When any part of Fossil asks for the real content of an artifact, the function that answers is content_get in content.c. Conceptually it faces one question and handles two cases.

Is this artifact a delta? If not, the job is easy — fetch the row, unzip it, done. The small helper for that step is worth meeting first:

{ } Fetching and uncompressing one blob src/content.c · lines 218–230
218static int content_of_blob(int rid, Blob *pBlob){
219 static Stmt q;
220 int rc = 0;
221 db_static_prepare(&q, "SELECT content FROM blob WHERE rid=:rid AND size>=0");
222 db_bind_int(&q, ":rid", rid);
223 if( db_step(&q)==SQLITE_ROW ){
224 db_ephemeral_blob(&q, 0, pBlob);
225 blob_uncompress(pBlob, pBlob);
The heart of it: the row's content is zlib-compressed, so it is inflated before anyone sees it. Note the SQL filter size>=0 — that quietly skips phantoms (Bedrock's size = -1 rows).
226 rc = 1;
227 }
228 db_reset(&q);
229 return rc;
230}

▸ Click static or * for the C explained.

But if the artifact is a delta? Then its source might be a delta too, and its source as well. So content_get first walks up the chain — delta → source → source — until it reaches a real, whole base artifact. It reconstructs that base, then applies the deltas back down the chain, each one rebuilding the next, until the artifact you actually asked for emerges.

{ } Walking the delta chain src/content.c · lines 262–310, inside content_get()
262 nextRid = delta_source_rid(rid);
263 if( nextRid==0 ){
264 rc = content_of_blob(rid, pBlob);
265 }else{
delta_source_rid answers the one question: returns the source rid, or 0 if this artifact is a whole base. Zero → the easy path above. Otherwise, the chain walk begins.
266 int n = 1;
267 int nAlloc = 10;
268 int *a = 0;
269 int mx;
270 Blob delta, next;
271
272 a = fossil_malloc( sizeof(a[0])*nAlloc );
273 a[0] = rid;
274 a[1] = nextRid;
275 n = 1;
Build a list a[] of rids: this artifact, then its source, then its source's source… The list is grown by hand as it fills.
276 while( !bag_find(&contentCache.inCache, nextRid)
277 && (nextRid = delta_source_rid(nextRid))>0 ){
278 n++;
279 if( n>=nAlloc ){
280 if( n>db_int(0, "SELECT max(rid) FROM blob") ){
281 fossil_panic("infinite loop in DELTA table");
282 }
283 nAlloc = nAlloc*2 + 10;
284 a = fossil_realloc(a, nAlloc*sizeof(a[0]));
285 }
286 a[n] = nextRid;
287 }
Follow each source upward until one is a whole base (or is already cached). The fossil_panic is a guard: if the chain somehow loops on itself, fail loudly rather than spin forever.
288 mx = n;
289 rc = content_get(a[n], pBlob);
a[n] is the base. content_get calls itself to reconstruct it — a function calling itself is called recursion.
290 n--;
291 while( rc && n>=0 ){
292 rc = content_of_blob(a[n], &delta);
293 if( rc ){
294 if( blob_delta_apply(pBlob, &delta, &next)<0 ){
295 rc = 1;
296 }else{
297 blob_reset(&delta);
298 if( (mx-n)%8==0 ){
299 content_cache_insert(a[n+1], pBlob);
300 }else{
301 blob_reset(pBlob);
302 }
303 *pBlob = next;
304 }
305 }
306 n--;
307 }
Now apply deltas back down the chain: blob_delta_apply runs each delta-program against what we have so far, producing the next artifact. Every 8th step is cached so future lookups start closer.
308 free(a);
309 if( !rc ) blob_reset(pBlob);
310 }

▸ Click * or fossil_malloc for the C explained.

That is the Vault's machinery end to end: store almost everything as a delta against something else; to read an artifact back, walk its chain to a base and replay the deltas. The cost is a little work on every read; the prize is the 222×.

Leaving the Vault, you now know

  • A blob row's content is always zlib-compressed, and usually a delta rather than the whole artifact — 88% of them.
  • The delta table records which blob is a delta and against which source; sources chain.
  • A delta is a small program — copy ranges from a source, insert literal bytes — and that is where the 222× compression comes from.
  • content_get reconstructs any artifact: walk the delta chain up to a base, then apply deltas back down.
Next district
The Chronicle

The Vault holds artifacts as a flat pool of immutable, hash-named content. District 3 asks the next question: how do those artifacts point at each other to form a history — check-ins, parents, branches?