Welcome to bedrock — the lowest, oldest layer of the city, the one every other district is built on. There is only one idea down here, but it carries the entire weight of Fossil above it: a Fossil repository is not kept in a database. It is a database. Get this single fact solid and the other six districts stop being mysterious.
1It is just a database. Watch.
Claims are cheap, so let us not make one — let us look. Below
is a real terminal session against a real repository:
fossil-scm.fossil, the 68-megabyte clone of Fossil's
own source code made for this tour. Step through it one command at
a time.
Look at what was actually typed. file and
sqlite3 are generic Unix tools — they know
nothing whatsoever about version control. They worked because the
file genuinely is an SQLite database. Fossil even stamps
every database it creates with an SQLite application id
(the integer 252006673), a small marker in the file
header — which is why file can name it so
confidently as a “Fossil repository” rather than just
“some SQLite database.”
That .tables command listed 28 names: 27 ordinary
tables, plus one view named artifact (a view is
a saved query that behaves like a table). No magic file format, no
proprietary container. If you can read SQLite — and after this
tour you will — you can read a Fossil repository with nothing
but the standard sqlite3 shell.
Throughout this tour we will name C source files —
db.c, schema.c, and the rest. A
.c file is simply a unit of C source text, the way a
.swift file is a unit of Swift. Fossil's engine is
about 150 of them in one flat directory. The one that handles all
database access is src/db.c — 5,732 lines, and
a place we will return to often.
2Actually, three databases.
One precise correction before we go deeper, because it saves
endless confusion later. When someone says “the Fossil
database” they could mean one of three different
SQLite files. db.c opens by saying exactly this, in its
own words:
{ } The three databases, in Fossil's own words src/db.c · lines 18–31
db.c's opening comment block — plain English by Fossil's author, and the authoritative source for the “three databases” fact.▸ Click the highlighted ** to see how C writes comments.
configdb
~/.fossil
- Scope
- You. One per user account.
- Holds
- Global preferences and the list of repositories you have used — your settings, not any project's data.
- One table
global_config
repository
anything.fossil
- Scope
- One project. The star of this district.
- Holds
- Every artifact the project has ever contained — all history, all branches, the wiki, the tickets. The whole project, in one file.
- Tables
- 27 of them. We start mapping them below.
checkout
_FOSSIL_ or .fslckout
- Scope
- One working tree — one folder of files you actually edit.
- Holds
- The state of this checkout: which check-in it is on, which files changed. It even stores the path back to its repository.
- Tables
vvar,vfile,vmerge
The distinction that matters most: the repository is the project; the checkout is one working copy of it. One repository can have many checkouts (or none). The checkout database is small — it is just a notebook saying “I am a working copy of that repository, currently at this check-in.” The repository database is the real thing. We will live in the repository for most of this tour and visit the checkout properly in the Workshop, District 6.
3The foundation stone: the blob table.
Of the repository's 27 tables, one is the foundation that the rest
merely annotate. It is called blob, and the rule is
absolute: every artifact in a Fossil project is one row in
the blob table. Every version of every file,
every check-in, every wiki page, every forum post — a row
here. Nothing else.
Here is its real schema, straight from the repository. Click any column to learn what it holds.
CREATE TABLE blob( rid INTEGER PRIMARY KEY, rcvid INTEGER, size INTEGER, uuid TEXT UNIQUE NOT NULL, content BLOB, CHECK( length(uuid)>=40 AND rid>0 ) );
▸ Click a column in the schema.
The explorer above shows that table as the running database
reports it. Here is where the table is declared — and
the first surprise is that it does not live in a .sql
file at all. Unfold the blueprint.
{ } Where the blob table is declared src/schema.c · lines 67–87
zRepositorySchema1, holds the repository's base schema as text. Every line below — each tagged @ — is part of its contents.blob table you explored just above.▸ Click a highlighted term — const, char, [], @, -- — to see the C (and SQL!) explained.
Two names for one thing: rid and uuid
Notice that a blob row carries two quite different identifiers. This pairing is not an accident, and it recurs in nearly every table upstream — so it is worth meeting head-on now.
rid
- A plain integer — the table's
PRIMARY KEY. - Assigned by this repository, in the order content arrived.
- Fast to index and join on, so Fossil's own SQL uses it everywhere.
- Meaningless anywhere else: another clone numbers the same artifact differently.
uuid
- A cryptographic hash of the content itself.
- 40 hex characters for SHA-1, 64 for SHA-3-256 — the
CHECKallows either. - Identical in every clone of the project, forever.
- The name a human pastes into a URL, a command, an email.
Two names for the same artifact, each good at what the other is bad
at. The integer rid is for the machine; the hash
uuid is for the world. Translating between them is one
of the most common things Fossil's code does — watch for it.
4Source of truth, and derived cache.
The 27 tables are not a flat pile of equals. They fall into layers,
and one distinction outranks all the others. schema.c
— the file that defines the schema — states it plainly:
blob and delta together hold
“the global state of a Fossil repository.” Everything
else is, in principle, derived.
Two tables hold every artifact the project has ever contained. This is the part that travels when you sync. Lose everything else and you have lost nothing — it can all be recomputed from here.
Users, settings, the log of what content was received, shunned content — the administrative state of this one copy of the repository, distinct from the project's shared history.
The history graph's parent/child links, the timeline, file
names, tags, tickets, forum posts. None of it is original
information: every row can be recomputed from the artifacts in
blob. schema.c even keeps these in a
separate string — Schema2 — precisely
because the fossil rebuild command throws them
away and regenerates them.
This is one of the most important ideas in the whole city, so hold
onto it: only blob and delta are
irreplaceable. Every other table is a cache — an index built
for speed, reconstructible at any time. It is why
fossil rebuild exists and why it is safe to run.
We will see the derived tables being built when we reach
the Chronicle, District 3.
Where does the schema actually live? Not in a separate
.sql file shipped alongside Fossil — it lives
inside the program. src/schema.c declares the
schema as C string constants: ordinary text, compiled
straight into the fossil executable as data. In Swift
you might write let schema = """ … """; C does
the same with const char zRepositorySchema1[] = "…".
At runtime Fossil simply hands that text to SQLite to create the
tables. The blueprint for the database travels inside the tool
that builds it.
Standing on bedrock, you now know
- A repository is literally a SQLite database — openable with the plain
sqlite3shell, stamped with an application id so evenfilerecognises it. - There are three databases, not one: configdb (your settings), repository (the project), checkout (one working copy).
- Every artifact is one row in the
blobtable, carrying two identities: the local integerridand the global hashuuid. - Only
blob+deltaare the source of truth. Every other table is a derived index thatfossil rebuildcan regenerate.
In the Vault we open a blob row and find… not the
file. We find a delta. District 2 is about how Fossil
stores 66,578 artifacts without storing 66,578 whole files.