title: Pomodoro v0.7.x Training Plan — Ablation from v0.6.3 Baseline tags: [journal, pomodoro, training, plan, ablation, v0.7] created: 2026-05-04 updated: 2026-05-04 status: active related:

Pomodoro v0.7.x Training Plan — Ablation from v0.6.3 Baseline

Baseline: v0.6.3 Config

S=64, L=8, r=0.0, max_size=128
Loss: (la + lb + lnb + lr + lc + lda + ldr + ldc) * lw

v0.6.3 is the best so far on lDDT (local structure) but stagnates after ~3 days. Goal: isolate what helps by changing one thing at a time.

New Versions

Version	Changed Param(s)	Value	Rationale
v0.7.0	`max_size`	64 (was 128)	Smaller structures = faster iterations, more pretraining signal
v0.7.1	loss	RMSD only (`la + lr + lc`)	Remove all distance losses (`lda, ldr, ldc`) and push-pull (`lb, lnb`) — simpler gradient landscape, test if distance terms cause stagnation
v0.7.2	`max_size` + loss	64 + RMSD only	Combine both changes — smallest/fastest regime, best for rapid iteration during pretraining
v0.7.3	`soft_normalize`	disabled	Test if `soft_normalize` at end of GT operations constrains gradient flow and contributes to stagnation

Per-Version Config Changes

v0.7.0 — Smaller Structures

ConfigData.max_size: 128 → 64
ConfigRuntime.version: "0.7.0"
Everything else same as v0.6.3

v0.7.1 — Simplified Loss (RMSD Only)

ConfigRuntime.version: "0.7.1"
In main.py:209, change: loss = (la + lb + lnb + lr + lc + lda + ldr + ldc) * lw → loss = (la + lr + lc) * lw
Still compute and log all loss components for monitoring, just don’t backprop through distance/push-pull terms
Everything else same as v0.6.3

v0.7.2 — Smaller Structures + Simplified Loss

ConfigData.max_size: 128 → 64
ConfigRuntime.version: "0.7.2"
Same loss simplification as v0.7.1

v0.7.3 — No soft_normalize

ConfigRuntime.version: "0.7.3"
Remove soft_normalize calls at the end of GT operations in model.py:
- VectorTrack (L60): v = soft_normalize(v, dim=1) → v (identity)
- ScalarTrack (L140-141): Q = soft_normalize(Q, dim=2), K = soft_normalize(K, dim=2) → remove
- VectorTrack (L214-215, L220): same pattern → remove
- BootstrapVectorState (L340): pz = soft_normalize(pz, dim=1) → pz
- GeometryDecoderModel (L428-429): Q, K normalize → remove
- GeometryDecoderModel (L488): u = soft_normalize(u, dim=1) → u
Alternative: make soft_normalize a no-op via a config flag rather than deleted, so it’s easy to re-enable.
Everything else same as v0.6.3

Implementation Steps

Create v0.7.0 worktree from v0.6.3 branch:
```
cd models/pomodoro/pomodoro
git branch v0.7.0 v0.6.3
git worktree add ../v0.7.0 v0.7.0
```
Edit config.py: max_size=64, version="0.7.0". Commit + push.
Create v0.7.1 worktree from v0.6.3 branch:
```
git branch v0.7.1 v0.6.3
git worktree add ../v0.7.1 v0.7.1
```
Edit config.py: version="0.7.1". Edit main.py:209: simplify loss to (la + lr + lc) * lw. Still log all components. Commit + push.
Create v0.7.2 worktree from v0.7.1 branch (inherits loss simplification):
```
git branch v0.7.2 v0.7.1
git worktree add ../v0.7.2 v0.7.2
```
Edit config.py: max_size=64, version="0.7.2". Commit + push.
Create v0.7.3 worktree from v0.6.3 branch:
```
git branch v0.7.3 v0.6.3
git worktree add ../v0.7.3 v0.7.3
```
Edit config.py: version="0.7.3". Edit model.py: remove or disable soft_normalize calls at end of GT operations. Commit + push.

Loss Simplification Detail

Current loss in main.py:209:

loss = (la + lb + lnb + lr + lc + lda + ldr + ldc) * lw

Simplified (v0.7.1, v0.7.2):

loss = (la + lr + lc) * lw

Removed:

lda — atom distance matrix loss (O(N²) memory)
ldr — residue distance matrix loss
ldc — chain distance matrix loss
lb — bonded push-pull loss
lnb — non-bonded push-pull loss

All components still computed and logged for observability — only the backward gradient path changes.

What We Learn

Comparison	Tests
v0.7.0 vs v0.6.3	Does smaller structure size break stagnation?
v0.7.1 vs v0.6.3	Does removing distance/push-pull losses break stagnation?
v0.7.2 vs v0.7.0 & v0.7.1	Are the two effects independent or synergistic?
v0.7.3 vs v0.6.3	Does removing soft_normalize break stagnation? (tests if GT output normalization constrains gradient flow)

Lemna Knowledge Base

Explorer

Pomodoro v0.7.x Training Plan — Ablation from v0.6.3 Baseline

Baseline: v0.6.3 Config

New Versions

Per-Version Config Changes

v0.7.0 — Smaller Structures

v0.7.1 — Simplified Loss (RMSD Only)

v0.7.2 — Smaller Structures + Simplified Loss

v0.7.3 — No soft_normalize

Implementation Steps

Loss Simplification Detail

What We Learn

Graph View

Recent Notes

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Table of Contents

Backlinks