title: Platform Spec tags: [inventory, spec, platform] created: 2026-05-22 updated: 2026-05-22 status: active related:


Platform Spec

End-to-end contracts and cross-component workflows. Per-component specs live in inventory/specs/{component}.md.

Mission

Protein-based molecule design for therapeutic applications. Data flow: datasets -> models -> endpoints -> workflows -> website.

Three Pillars

PillarDomainModels
PestoInterface & Interaction Analysispesto, pesto-screen
CarbonaraMolecule Designcarbonara, carbonara-binders, carbonara-clip
PomodoroStructure Generationpomodoro, oracle, backmap

Must Do

  • Provide deployable Docker endpoints for each production model (handler + runtime + schema)
  • Support job lifecycle: QUEUED -> RUNNING -> COMPLETED/FAILED via SDK client
  • Separate code (git) from data (R2/S3) — never commit weights, datasets, or large binaries
  • Track data artifacts via pantry (.datapattern + artifacts.json + pty add/push/pull/resolve)
  • Follow handler pattern: validate -> hydrate -> RUNNING -> process -> dehydrate -> COMPLETED
  • Register every endpoint’s job type in job_type_registry
  • Include pid field in every job (required by API)
  • Support backend strategy pattern: RunpodBackend (prod), LocalBackend (Docker test), NullBackend (unit test)
  • Provide SDK client (LemnaJobClient) as the single interface for all job operations
  • Validate endpoints via test_endpoint.py with MODE toggle (local/runpod)
  • Build Docker images with SSH agent forwarding for private git deps
  • Use cache-then-copy pattern for large model downloads in Dockerfiles
  • Keep workspace/ directories gitignored for experimentation; migrate mature code out
  • Maintain master justfile at repo root; per-repo justfiles for local commands
  • Use ConfigRuntime instance (not class) for endpoint model config
  • Use pantry.resolve() for reading data paths in endpoints (not kw.find_root())
  • Use Path(“data”)/… for writing data in endpoints

Must Not Do

  • Run git submodule update --init --recursive
  • Modify .gitmodules
  • Store model weights, datasets, or large binaries in git
  • Import from legacy kitchenware paths (core, utils.dev)
  • Use .ipynb notebooks (use .mo.py compute-visualize split instead)
  • Hardcode data paths (use pantry.resolve or ConfigRuntime)
  • Double-wrap RunPod input ({“input”: …} wrapping is intentional in RunpodBackend.submit)
  • Use Optional[BaseModel] on server-side FastAPI models (use Optional[Any] for JobBaseAnonymous)
  • Build Docker images without ssh-keyscan github.com in known_hosts
  • Use —no-cache-dir with uv when cache mounts are active

End-to-End Workflows

Binder Design Pipeline

pesto -> rfdiffusion -> carbonara-binders -> boltz -> openmm

  1. pesto: Analyze protein interface, identify binding site
  2. rfdiffusion: Generate backbone scaffold for target site
  3. carbonara-binders: Design binder sequences on scaffold
  4. boltz: Validate structure with structure prediction
  5. openmm: Molecular dynamics refinement

Workflow Contracts

  • Each step receives output from previous step as BinaryFile input
  • Each step uses its own SDK bucket namespace
  • Steps are chained via job output_data -> next job input_data
  • Pipeline status tracked via job status (QUEUED/RUNNING/COMPLETED/FAILED)

Data Contracts

LayerStorageTracking
Codegit reposversion control
Model weightsR2/S3 (outputs/vX.Y.Z/)pantry (.datapattern + artifacts.json)
Training datasetsR2/S3 (datasets/)pantry
Model data (per-repo)R2/S3 (data/)pantry
ResultsR2/S3pantry

Dependency Chain

kitchenware -> models -> endpoints -> workflows -> website

Changes to kitchenware cascade to everything. Test the full chain when modifying it.

Quality Gates (Universal)

  • just format && just check passes on every repo
  • No unused imports
  • Type hints on function signatures
  • pathlib.Path for all file paths
  • Tests exist for components with side effects