Bedrock — Fossil, A Guided Tour

Welcome to bedrock — the lowest, oldest layer of the city, the one every other district is built on. There is only one idea down here, but it carries the entire weight of Fossil above it: a Fossil repository is not kept in a database. It is a database. Get this single fact solid and the other six districts stop being mysterious.

1It is just a database. Watch.

Claims are cheap, so let us not make one — let us look. Below is a real terminal session against a real repository: fossil-scm.fossil, the 68-megabyte clone of Fossil's own source code made for this tour. Step through it one command at a time.

~/dev/my/fossil-scm

Look at what was actually typed. file and sqlite3 are generic Unix tools — they know nothing whatsoever about version control. They worked because the file genuinely is an SQLite database. Fossil even stamps every database it creates with an SQLite application id (the integer 252006673), a small marker in the file header — which is why file can name it so confidently as a “Fossil repository” rather than just “some SQLite database.”

That .tables command listed 28 names: 27 ordinary tables, plus one view named artifact (a view is a saved query that behaves like a table). No magic file format, no proprietary container. If you can read SQLite — and after this tour you will — you can read a Fossil repository with nothing but the standard sqlite3 shell.

C aside · source files

Throughout this tour we will name C source files — db.c, schema.c, and the rest. A .c file is simply a unit of C source text, the way a .swift file is a unit of Swift. Fossil's engine is about 150 of them in one flat directory. The one that handles all database access is src/db.c — 5,732 lines, and a place we will return to often.

2Actually, three databases.

One precise correction before we go deeper, because it saves endless confusion later. When someone says “the Fossil database” they could mean one of three different SQLite files. db.c opens by saying exactly this, in its own words:

{ } The three databases, in Fossil's own words src/db.c · lines 18–31

The tail of db.c's opening comment block — plain English by Fossil's author, and the authoritative source for the “three databases” fact.

18** Code for interfacing to the various databases.

19**

20** There are three separate database files that fossil interacts

21** with:

22**

23** (1) The "configdb" database in ~/.fossil or ~/.config/fossil.db

24** or in %LOCALAPPDATA%/_fossil

25**

26** (2) The "repository" database

27**

28** (3) A local check-out database named "_FOSSIL_" or ".fslckout"

29** and located at the root of the local copy of the source tree.

30**

31*/

▸ Click the highlighted ** to see how C writes comments.

configdb

~/.fossil

Scope: You. One per user account.
Holds: Global preferences and the list of repositories you have used — your settings, not any project's data.
One table: global_config

repository

anything.fossil

Scope: One project. The star of this district.
Holds: Every artifact the project has ever contained — all history, all branches, the wiki, the tickets. The whole project, in one file.
Tables: 27 of them. We start mapping them below.

checkout

_FOSSIL_ or .fslckout

Scope: One working tree — one folder of files you actually edit.
Holds: The state of this checkout: which check-in it is on, which files changed. It even stores the path back to its repository.
Tables: vvar, vfile, vmerge

The distinction that matters most: the repository is the project; the checkout is one working copy of it. One repository can have many checkouts (or none). The checkout database is small — it is just a notebook saying “I am a working copy of that repository, currently at this check-in.” The repository database is the real thing. We will live in the repository for most of this tour and visit the checkout properly in the Workshop, District 6.

3The foundation stone: the `blob` table.

Of the repository's 27 tables, one is the foundation that the rest merely annotate. It is called blob, and the rule is absolute: every artifact in a Fossil project is one row in the blob table. Every version of every file, every check-in, every wiki page, every forum post — a row here. Nothing else.

Here is its real schema, straight from the repository. Click any column to learn what it holds.

CREATE TABLE blob(
  rid      INTEGER PRIMARY KEY,
  rcvid    INTEGER,
  size     INTEGER,
  uuid     TEXT UNIQUE NOT NULL,
  content  BLOB,
  CHECK( length(uuid)>=40 AND rid>0 )
);

▸ Click a column in the schema.

The explorer above shows that table as the running database reports it. Here is where the table is declared — and the first surprise is that it does not live in a .sql file at all. Unfold the blueprint.

{ } Where the blob table is declared src/schema.c · lines 67–87

67const char zRepositorySchema1[] =

One C variable, zRepositorySchema1, holds the repository's base schema as text. Every line below — each tagged @ — is part of its contents.

68@ -- The BLOB and DELTA tables contain all records held in the repository.

69@ --

70@ -- The BLOB.CONTENT column is always compressed using zlib. This

71@ -- column might hold the full text of the record or it might hold

72@ -- a delta that is able to reconstruct the record from some other

73@ -- record. If BLOB.CONTENT holds a delta, then a DELTA table entry

74@ -- will exist for the record and that entry will point to another

75@ -- entry that holds the source of the delta. Deltas can be chained.

76@ --

77@ -- The blob and delta tables collectively hold the "global state" of

78@ -- a Fossil repository.

79@ --

The comment ends; real SQL begins. Lines 80–87 are the exact statement that creates the blob table you explored just above.

80@ CREATE TABLE blob(

81@ rid INTEGER PRIMARY KEY, -- Record ID

82@ rcvid INTEGER, -- Origin of this record

83@ size INTEGER, -- Size of content. -1 for a phantom.

84@ uuid TEXT UNIQUE NOT NULL, -- hash of the content

85@ content BLOB, -- Compressed content of this record

86@ CHECK( length(uuid)>=40 AND rid>0 )

87@ );

▸ Click a highlighted term — const, char, [], @, -- — to see the C (and SQL!) explained.

Two names for one thing: `rid` and `uuid`

Notice that a blob row carries two quite different identifiers. This pairing is not an accident, and it recurs in nearly every table upstream — so it is worth meeting head-on now.

local · fast

rid

A plain integer — the table's PRIMARY KEY.
Assigned by this repository, in the order content arrived.
Fast to index and join on, so Fossil's own SQL uses it everywhere.
Meaningless anywhere else: another clone numbers the same artifact differently.

global · stable

uuid

A cryptographic hash of the content itself.
40 hex characters for SHA-1, 64 for SHA-3-256 — the CHECK allows either.
Identical in every clone of the project, forever.
The name a human pastes into a URL, a command, an email.

Two names for the same artifact, each good at what the other is bad at. The integer rid is for the machine; the hash uuid is for the world. Translating between them is one of the most common things Fossil's code does — watch for it.

4Source of truth, and derived cache.

The 27 tables are not a flat pile of equals. They fall into layers, and one distinction outranks all the others. schema.c — the file that defines the schema — states it plainly: blob and delta together hold “the global state of a Fossil repository.” Everything else is, in principle, derived.

The source of truth global state

Two tables hold every artifact the project has ever contained. This is the part that travels when you sync. Lose everything else and you have lost nothing — it can all be recomputed from here.

blob delta

Local & administrative state this clone

Users, settings, the log of what content was received, shunned content — the administrative state of this one copy of the repository, distinct from the project's shared history.

rcvfromuser configshun privatereportfmt concealedunversioned

The derived index rebuildable

The history graph's parent/child links, the timeline, file names, tags, tickets, forum posts. None of it is original information: every row can be recomputed from the artifacts in blob. schema.c even keeps these in a separate string — Schema2 — precisely because the fossil rebuild command throws them away and regenerates them.

filenamemlink plinkleaf eventphantom orphanunclustered unsenttag tagxrefbacklink attachmentticket ticketchngcherrypick forumpost

This is one of the most important ideas in the whole city, so hold onto it: only blob and delta are irreplaceable. Every other table is a cache — an index built for speed, reconstructible at any time. It is why fossil rebuild exists and why it is safe to run. We will see the derived tables being built when we reach the Chronicle, District 3.

C aside · the schema is baked in

Where does the schema actually live? Not in a separate .sql file shipped alongside Fossil — it lives inside the program. src/schema.c declares the schema as C string constants: ordinary text, compiled straight into the fossil executable as data. In Swift you might write let schema = """ … """; C does the same with const char zRepositorySchema1[] = "…". At runtime Fossil simply hands that text to SQLite to create the tables. The blueprint for the database travels inside the tool that builds it.

Standing on bedrock, you now know

A repository is literally a SQLite database — openable with the plain sqlite3 shell, stamped with an application id so even file recognises it.
There are three databases, not one: configdb (your settings), repository (the project), checkout (one working copy).
Every artifact is one row in the blob table, carrying two identities: the local integer rid and the global hash uuid.
Only blob + delta are the source of truth. Every other table is a derived index that fossil rebuild can regenerate.

Next district
The Vault ▲

In the Vault we open a blob row and find… not the file. We find a delta. District 2 is about how Fossil stores 66,578 artifacts without storing 66,578 whole files.