title: EDM Noise Exploration — Session Handoff tags: [journal, edm, pomodoro, noise, experiment, multiscale] created: 2024-04-23 updated: 2024-04-23 status: active related:
- “edm-pomodoro-algorithm-review”
- “edm-algorithm-review”
- “diffusionv2-algorithm”
- “edm-pomodoro-session”
EDM Noise Exploration — Session Handoff
What This Session Did
Built a noise exploration notebook to empirically measure how much noise lands at each structural level (atom, residue, chain) under different noise weighting strategies. The goal: understand why residues barely move during sampling, and find better noise weights.
Notebook: models/pomodoro/workspace/pomodoro/edm/explore_noise.py
Output PDBs: models/pomodoro/workspace/pomodoro/edm/data/noise_*.pdb (open in PyMOL/ChimeraX for visual inspection)
Key Empirical Results (32-residue single-chain fragment, 253 atoms)
z sigma flat_r ms_a ms_r ms_r/a rb_a ms_r rb_r/a
-3.0 0.05 0.02 0.05 0.04 0.93 0.05 0.04 0.76
-2.0 0.24 0.09 0.20 0.19 0.93 0.21 0.16 0.76
-1.0 1.08 0.41 0.91 0.84 0.93 0.96 0.73 0.76
0.0 4.82 1.83 4.07 3.77 0.93 4.30 3.28 0.76
1.0 21.60 8.21 18.22 16.88 0.93 19.28 14.70 0.76
2.0 96.79 36.78 81.67 75.67 0.93 86.41 65.89 0.76
3.0 433.80 164.82 366.03 339.13 0.93 387.24 295.32 0.76
- flat = isotropic noise (baseline),
flat_r= residue std under flat noise - ms = multiscale noise (current weights: w_c=8.2, w_r=1.6, w_a=1.3)
- rb = rebalanced (w_c=1.0, w_r=4.0, w_a=4.0)
- ms_r/a, rb_r/a = ratio of residue displacement to atom displacement
Interpretation
- Ratios are constant across all z — purely determined by weight decomposition, not sigma
- ms_r/a = 0.93: 93% of atom-level displacement is shared at the residue level — residues move almost entirely as rigid bodies with the chain. The model barely trains on residue-relative rearrangement.
- rb_r/a = 0.76: better, but still means 76% of displacement is correlated. Chain noise still dominates.
- Flat noise gives
flat_r/a ≈ 0.38(sqrt(2/3) — just geometry of averaging over atoms) - Typical inter-residue distance: ~3.8A. At z=0, ms_r=3.77A — just enough to scramble residue positions, BUT it’s mostly chain-correlated, so residues move together, not independently.
Physical reference
- Atom-atom distance: ~1.4A
- Residue-residue (Cα-Cα): ~3.8A
- For residue positions to be fully noised (independent rearrangement), residue-level relative displacement should be comparable to ~3.8A
What Needs To Happen Next
1. Try noise weights that give lower residue-to-atom ratio
Goal: residual_r/a ≈ 0.5 or less, so residue-relative displacement is meaningfully independent of atom displacement.
Try e.g.:
w_c=3.0, w_r=8.0, w_a=1.0— big chain shift + big residue-relative, small atom jitter- Or
w_c=2.0, w_r=6.0, w_a=2.0
Action: Add these to add_noise calls in the notebook, run, check the rb_r/a ratio drops.
2. Consider whether preconditioning should use per-level sigma
Currently c_skip(σ), c_out(σ) etc. use the global σ for all levels. But effective noise at residue level is σ_r = sqrt(w_r/w_sum) * σ, not σ. This means:
c_skip(σ) * Xr_noisedis too small (thinks there’s more noise than there is)c_out(σ) * model_outputis too large (over-weights network)
Action: In edm_denoise, try c_skip(σ_r) for residue level, c_skip(σ_c) for chain level.
3. Use multi-scale denoised output in sampling
edm_sample currently discards Xr_hat and Xc_hat. If the model learns to denoise residue positions, those outputs should inform the sampling trajectory.
Action: Broadcast residue/chain corrections onto atoms in the Euler step.
4. Add random augmentation + centering in sampling
Boltz does center + random rotation at every sampling step. Pomodoro doesn’t. Small rotational biases accumulate over 50+ steps.
Action: Add centering + random SO(3) rotation between Euler steps.
Existing Algorithm Reviews (for full detail)
- edm-pomodoro-algorithm-review — 8 pitfalls with severity ranking, includes schedule/training σ mismatch
- edm-algorithm-review — 9 pitfalls, older version
- edm-pomodoro-session — implementation notes from initial EDM integration session
Notebook Design Notes
add_noise(X0, Mr, Mc, C, sigma, w_c=None)— passw_c=Nonefor flat/isotropic noise, otherwise pass weightslevel_sigmas— measures actual displacement std at atom/residue/chain levels from a noised structuresuperpose— SVD alignment with <3-point guard (returns mean-shift for degenerate cases)save_pdb/save_pdb_residue— write multi-MODEL PDBs (MODEL 0 = GT, rest = noised frames)- Single-chain fragments: Mrc is (N_res, 1), so chain-level std is nan (only 1 point)
- Marimo constraint: unique variable names across all cells