title: Database v0→v1 Migration Record tags: [database, migration, history] created: 2025-05-07 updated: 2026-05-27 status: archived related:
Database v0→v1 Migration Record
The initial database migration (v0→v1) ran on the production database, transforming 97,387 documents into 91,600.
Source Analysis
| job_type | count | status |
|---|---|---|
| boltz | 55,318 | ID mismatch (pre-hash) |
| carbonara-binders | 22,451 | unregistered type (same schema as carbonara) |
| carbonara | 14,378 | field rename needed |
| None | 3,349 | colabfold remnants, no current schema |
| colabfold | 1,306 | removed endpoint |
| rfdiffusion | 281 | removed endpoint |
| cpmp | 173 | ID mismatch (pre-hash) |
| boltzgen | 113 | 4 docs old RFdiffusion format |
| openmm | 9 | OK |
| pesto-screen-embed | 3 | OK |
| pesto | 3 | OK |
| test | 2 | test data |
| pesto-screen-interact | 1 | OK |
| running | 24,168 | stale, dead workers |
| queued | 5,629 | stale |
Transformations Applied
- Dropped removed/unregistered types (~4,936 docs)
- Marked stale jobs as failed (~29,797 docs)
- Merged legacy fields (project_id + job_id → project_name, job_name)
- Normalized pid field (None/missing → [])
- Renamed carbonara field (ignore_heteroatoms → ignore_unknown_atoms)
- Registered carbonara-binders as JobCarbonara schema
- Rehashed all document IDs via hash_payload(input_data)
- Stripped BinaryFile content, validated storage_key
Results
| Metric | Before | After |
|---|---|---|
| Total docs | 97,387 | 91,600 |
| Dropped (removed types) | - | 4,938 |
| Dropped (old boltzgen) | - | 4 |
| Dropped (validation) | - | 10 (boltz input) |
| Dropped (duplicate ID) | - | 731 (identical inputs) |
| Dropped (connection error) | - | 4 |
| Status: running | 24,168 | 0 |
| Status: queued | 5,629 | 0 |
| Status: failed | 1,290 | 30,238 |
| Status: completed | 66,300 | 61,362 |
| ID hash match | ~7% | 100% |
| Carbonara outputs recovered | - | 29,075 (pssm→CSV→BinaryFile) |
| BinaryFile: content present | some | 0 (stripped) |
| Legacy fields present | 85,804 | 0 |
| PID normalized | - | 84,212 |
Post-migration Steps
- Pssm recovery: ~28k carbonara/carbonara-binders pssm BinaryFiles uploaded to R2 (~6-10 hours)
- 4 connection error drops: Accepted as negligible data loss
- 2 input BinaryFiles missing both content and key: Incomplete boltzgen jobs, accepted as data loss
- 731 duplicate IDs: Correct deduplication — identical inputs produce the same hash