title: EDM Noise Distribution Fix — Gradient Bias Analysis tags: [journal, edm, pomodoro, noise, training, critical] created: 2026-05-15 updated: 2026-05-15 status: active related:


EDM Noise Distribution Fix — Gradient Bias Analysis

Problem: Exponential Decay of Effective Training Signal

The compute_noise.mo.py notebook plotted effective_gradient = pdf(z) * lw(sigma), showing a massive exponential bias toward low-σ training:

  • Gradient at z = -5 (σ ≈ 0.1) is ~100× stronger than at z = -1.2 (median σ)
  • Gradient at z = 2 (σ ≈ 120) is ~10,000× weaker

This means the model gets virtually all its training gradient from fine local geometry (bond lengths, angles) and almost none from global structure (fold topology, residue rearrangements).

Key Insight: λ·c_out² = 1 Cancellation

The EDM preconditioning ensures perfect gradient scaling cancellation:

λ(σ) · c_out² = (σ² + σ_data²) / (σ·σ_data)² · σ²·σ_data² / (σ² + σ_data²) = 1

This means:

  • λ(σ) is NOT optional — it belongs to the denoising score matching objective and cancels c_out² to give uniform gradient norm per sample
  • The pdf * lw chart was measuring loss-value density, not gradient scale — the real gradient on F_θ is pdf(z) only
  • All bias comes from the log-normal p(z) distribution, NOT from lw(σ)

Fix Applied: Wider Log-Normal (v0.9.1)

Parameterv0.9.0v0.9.1
P_mean-1.2-1.84
P_std1.52.8
Effective median σ4.8 Å2.5 Å
95% range σ0.25–95 Å0.06–110 Å

P_mean formula: P_mean = ln(sigma_min / sigma_data) + P_std² ensures the low-σ CDF tail still reaches sigma_min.

Why not log-uniform?

Log-uniform with λ(σ) intact would create a cubic singularity (1/σ³) at low σ, making the bias worse. Log-uniform only works if λ(σ) is dropped or redesigned. Wider log-normal is simpler and preserves the EDM objective.

What’s NOT changed

  • lambda(σ) stays exactly as-is — it cancels c_out² perfectly
  • sigma_min = 1e-3, sigma_max = 80.0, rho = 7.0 unchanged
  • No code changes in context.py, objectives.py, or any other module
  • Only config.py values changed

Implementation

  • Branch: v0.9.1 (from v0.9.0 at commit b627bce)
  • Worktree: models/pomodoro/workspace/v0.9.1/
  • Commit: v0.9.1: wider noise sampling (P_mean=-1.84, P_std=2.8)
  • Remote: pushed to origin/v0.9.1
  • Main branch also updated with same config values