title: Pomodoro v0.6.x Training Regime Observations tags: [journal, pomodoro, training, observations, v0.6] created: 2026-05-04 updated: 2026-05-04 status: active related:


Pomodoro v0.6.x Training Regime Observations

Version Config Recap

VersionChanged ParamValueBaseline (v0.6.0)Purpose
v0.6.0— (baseline)Reference
v0.6.1max_size512128Larger structures
v0.6.2L328Deeper model (4x layers)
v0.6.3S6432Wider model (2x state size)

All versions: r: float = 0.0 (changed from 1e-3 after initial setup).

Training Observations

v0.6.0 (baseline)

  • Stagnates after ~3 days or very slow training beyond that point.
  • Serves as the reference; other versions are compared against this.

v0.6.1 (large structures, max_size=512)

  • Will take a lot more to train — larger structures mean more compute per sample.
  • Decision: Stay in the low structure size regime for now to pretrain the model effectively.
  • Consider reducing max_size from 128 down to 64 residues to speed up pretraining.

v0.6.2 (deep model, L=32)

  • Very slow at training; not fully trained yet.
  • May show better performance than v0.6.0 and v0.6.3 once fully trained — too early to tell.
  • 4x layers is a significant capacity increase; needs more training time.

v0.6.3 (wide model, S=64)

  • Stagnates after ~3 days, similar to v0.6.0.
  • Better than v0.6.0 on lDDT, likely indicating more model power helps with correct local structure.
  • Global structure quality still limited by the same stagnation behavior.

Conclusions & Next Steps

  1. Reduce structure size: Move from 128 → 64 residues to speed up pretraining iterations.
  2. Try simplified loss: Run a v0.6.3-like model (S=64) without the distance matrix constraint — fewer loss terms may ease optimization and break the stagnation pattern.
  3. v0.6.1 deprioritized: Large structures are premature; pretrain on smaller structures first.
  4. v0.6.2 needs more time: Deeper model is slower but may be worth the wait; monitor for improvement.
  5. Local vs global: v0.6.3’s lDDT advantage over v0.6.0 suggests capacity helps local accuracy, but global convergence remains a bottleneck across versions.