title: Pomodoro v0.7.x Ablation Results tags: [journal, pomodoro, training, results, ablation, v0.7] created: 2026-05-12 updated: 2026-05-12 status: active related:
Pomodoro v0.7.x Ablation Results
Version Comparison Table
| Version | S | L | max_size | Loss | soft_normalize | fragment_length | Key Change |
|---|---|---|---|---|---|---|---|
| v0.6.0 | 32 | 8 | 128 | full | yes | — | Baseline (small S) |
| v0.6.1 | 32 | 8 | 512 | full | yes | — | max_size=512 |
| v0.6.2 | 32 | 32 | 128 | full | yes | — | L=32 (4x deeper) |
| v0.6.3 | 64 | 8 | 128 | full | yes | — | S=64 (wider) |
| v0.7.0 | 64 | 8 | 64 | full | yes | — | max_size=64 |
| v0.7.1 | 64 | 8 | 128 | RMSD only | yes | — | Simplified loss (la+lr+lc) |
| v0.7.2 | 64 | 8 | 64 | RMSD only | yes | — | max_size=64 + simplified loss |
| v0.7.3 | 64 | 8 | 128 | full | disabled | — | soft_normalize removed from GT |
| v0.8.0 | 64 | 8 | 8192 | full | yes | 8 | Fragment cropping + full-size data |
Full loss =
(la + lb + lnb + lr + lc + lda + ldr + ldc) * lw; RMSD only =(la + lr + lc) * lw
Observations
- RMSD-only loss is sufficient — v0.7.2 performs as good as or slightly better than equivalent models with the full loss.
- Model width (S) doesn't change much — increasing S from 32 to 64 didn’t yield dramatic gains.
- Model depth (L) might improve performance — v0.6.2 (L=32) is the deepest model; depth deserves further exploration.
Next Steps
- Train v0.7.2 on more GPUs for longer duration to confirm the simplified loss holds at scale.