This directory contains helper scripts for downloading, post-processing, and comparing CBDB SQLite releases.
setup_cbdb.ipynb— Google Colab notebook that runs the full setup pipeline in one click: downloads the latest database, adds foreign keys, creates views, and builds theADDRESSEStable.— click to open directly, then click Runtime → Run all. Each step can be toggled on or off via boolean flags in the Configuration cell.
| Script | Description |
|---|---|
add_foreign_keys.py |
Fetches foreign_keys_regen.csv from GitHub and recreates SQLite tables with proper FOREIGN KEY constraints. Skips tables that already have FK constraints (idempotent). |
create_views.sh |
Creates 18 convenience SQL views (e.g. View_PeopleData, View_EntryData, View_PostingOfficeData). |
create_addresses_table.py |
Builds the ADDRESSES table by resolving the full administrative hierarchy for each address across time, preserving gaps in the data. |
compare_db_tables.py |
Compares two SQLite databases table-by-table, emitting row-count and schema discrepancies. |
process_cbdb_dbs.sh |
End-to-end workflow: downloads the latest and a historical SQLite dump, unpacks them, vacuums both, and runs compare_db_tables.py. |
No local installation needed — just upload to Google Colab.
| Tool | Required by |
|---|---|
python3 |
add_foreign_keys.py, create_addresses_table.py, compare_db_tables.py |
sqlite3 CLI |
create_views.sh |
bash |
create_views.sh, process_cbdb_dbs.sh |
wget, 7z |
process_cbdb_dbs.sh |
process_cbdb_dbs.sh checks for missing tools at startup and exits early if any are absent.
python scripts/add_foreign_keys.py --db latest.dbPass --csv-url URL to use a different branch of foreign_keys_regen.csv.
bash scripts/create_views.sh latest.dbpython scripts/create_addresses_table.py --db latest.dbpython scripts/compare_db_tables.py old.db new.dbbash scripts/process_cbdb_dbs.shIntermediate downloads are written to a temporary directory and cleaned up automatically.