title: Database v0→v1 Migration Record tags: [database, migration, history] created: 2025-05-07 updated: 2026-05-27 status: archived related:


Database v0→v1 Migration Record

The initial database migration (v0→v1) ran on the production database, transforming 97,387 documents into 91,600.

Source Analysis

job_typecountstatus
boltz55,318ID mismatch (pre-hash)
carbonara-binders22,451unregistered type (same schema as carbonara)
carbonara14,378field rename needed
None3,349colabfold remnants, no current schema
colabfold1,306removed endpoint
rfdiffusion281removed endpoint
cpmp173ID mismatch (pre-hash)
boltzgen1134 docs old RFdiffusion format
openmm9OK
pesto-screen-embed3OK
pesto3OK
test2test data
pesto-screen-interact1OK
running24,168stale, dead workers
queued5,629stale

Transformations Applied

  1. Dropped removed/unregistered types (~4,936 docs)
  2. Marked stale jobs as failed (~29,797 docs)
  3. Merged legacy fields (project_id + job_id → project_name, job_name)
  4. Normalized pid field (None/missing → [])
  5. Renamed carbonara field (ignore_heteroatoms → ignore_unknown_atoms)
  6. Registered carbonara-binders as JobCarbonara schema
  7. Rehashed all document IDs via hash_payload(input_data)
  8. Stripped BinaryFile content, validated storage_key

Results

MetricBeforeAfter
Total docs97,38791,600
Dropped (removed types)-4,938
Dropped (old boltzgen)-4
Dropped (validation)-10 (boltz input)
Dropped (duplicate ID)-731 (identical inputs)
Dropped (connection error)-4
Status: running24,1680
Status: queued5,6290
Status: failed1,29030,238
Status: completed66,30061,362
ID hash match~7%100%
Carbonara outputs recovered-29,075 (pssm→CSV→BinaryFile)
BinaryFile: content presentsome0 (stripped)
Legacy fields present85,8040
PID normalized-84,212

Post-migration Steps

  • Pssm recovery: ~28k carbonara/carbonara-binders pssm BinaryFiles uploaded to R2 (~6-10 hours)
  • 4 connection error drops: Accepted as negligible data loss
  • 2 input BinaryFiles missing both content and key: Incomplete boltzgen jobs, accepted as data loss
  • 731 duplicate IDs: Correct deduplication — identical inputs produce the same hash