title: Pomodoro v0.6.x Training Regime Observations tags: [journal, pomodoro, training, observations, v0.6] created: 2026-05-04 updated: 2026-05-04 status: active related:

Pomodoro v0.6.x Training Regime Observations

Version Config Recap

Version	Changed Param	Value	Baseline (v0.6.0)	Purpose
v0.6.0	— (baseline)	—	—	Reference
v0.6.1	`max_size`	512	128	Larger structures
v0.6.2	`L`	32	8	Deeper model (4x layers)
v0.6.3	`S`	64	32	Wider model (2x state size)

All versions: r: float = 0.0 (changed from 1e-3 after initial setup).

Training Observations

v0.6.0 (baseline)

Stagnates after ~3 days or very slow training beyond that point.
Serves as the reference; other versions are compared against this.

v0.6.1 (large structures, max_size=512)

Will take a lot more to train — larger structures mean more compute per sample.
Decision: Stay in the low structure size regime for now to pretrain the model effectively.
Consider reducing max_size from 128 down to 64 residues to speed up pretraining.

v0.6.2 (deep model, L=32)

Very slow at training; not fully trained yet.
May show better performance than v0.6.0 and v0.6.3 once fully trained — too early to tell.
4x layers is a significant capacity increase; needs more training time.

v0.6.3 (wide model, S=64)

Stagnates after ~3 days, similar to v0.6.0.
Better than v0.6.0 on lDDT, likely indicating more model power helps with correct local structure.
Global structure quality still limited by the same stagnation behavior.

Conclusions & Next Steps

Reduce structure size: Move from 128 → 64 residues to speed up pretraining iterations.
Try simplified loss: Run a v0.6.3-like model (S=64) without the distance matrix constraint — fewer loss terms may ease optimization and break the stagnation pattern.
v0.6.1 deprioritized: Large structures are premature; pretrain on smaller structures first.
v0.6.2 needs more time: Deeper model is slower but may be worth the wait; monitor for improvement.
Local vs global: v0.6.3’s lDDT advantage over v0.6.0 suggests capacity helps local accuracy, but global convergence remains a bottleneck across versions.

Lemna Knowledge Base

Explorer

Pomodoro v0.6.x Training Regime Observations

Version Config Recap

Training Observations

v0.6.0 (baseline)

v0.6.1 (large structures, max_size=512)

v0.6.2 (deep model, L=32)

v0.6.3 (wide model, S=64)

Conclusions & Next Steps

Graph View

Recent Notes

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Table of Contents

Backlinks