title: Cyclic Peptide–Protein Binding Affinity Dataset Curation Plan tags: [plan, data-scraping, dataset, cyclic-peptide, binding-affinity] created: 2026-05-10 updated: 2026-05-27 status: active related:

Cyclic Peptide–Protein Binding Affinity Dataset Curation Plan

Goal

Curate a dataset of cyclic peptides with measured binding affinity (or binary bind/non-bind) against protein targets, suitable for training/evaluating ML models at Lemna.

1. Background & Keyword Landscape

Core terms

cyclic peptide, macrocyclic peptide, bicyclic peptide
stapled peptide, stapled helix, hydrocarbon-stapled peptide
thioether-macrocyclic peptide, disulfide-cyclized peptide, head-to-tail cyclic peptide
peptide macrocycle, constrained peptide, conformationally constrained peptide

Binding affinity measurement terms

Kd, KD (dissociation constant)
Ki (inhibition constant)
IC50 (half-maximal inhibitory concentration)
EC50 (half-maximal effective concentration)
Ka, kon, koff (kinetic association/dissociation rates)
ΔG, ΔH, ΔS (thermodynamic parameters, often from ITC)
binding / no binding (binary classification)

Assay methods

SPR (surface plasmon resonance)
ITC (isothermal titration calorimetry)
FP (fluorescence polarization)
MST (microscale thermothermophoresis)
BLI (bio-layer interferometry)
ELISA, AlphaScreen, TR-FRET
affinity selection-mass spectrometry (AS-MS)
phage display / mRNA display / RaPID system

Target context terms

protein-protein interaction (PPI) inhibitor/modulator
protein-targeting, receptor binding
drug target, therapeutic target

2. Existing Databases & Resources (to check and potentially integrate)

Database	Content	Relevance	URL
CPSea	2.7M cyclic peptide-receptor complexes (AFDB-derived) + CPBind subset with affinity scores (Rosetta ddG < -25, Vina < -6) + CPBind_affinity.csv	HIGH — largest resource, has computed affinity	https://github.com/YZY010418/CPSea, Zenodo
CycPeptMPDB	7,991 cyclic peptides with membrane permeability data	MEDIUM — permeability not binding, but has structures	http://cycpeptmpdb.com
PPIKB	Peptide-protein interaction structures + quantitative affinity measurements	HIGH — affinity + structure + targets	https://ppikb.duanlab.ac
BindingDB	Protein-ligand binding affinities (mostly small molecules, some peptides)	MEDIUM — filter for cyclic peptide entries	https://www.bindingdb.org
PEPBI	329 peptide-protein complexes with ΔG, ΔH, ΔS (ITC-derived)	MEDIUM — small but high-quality thermodynamic data	Nature Sci Data 2025
PepBDB / PepSet	Peptide-protein complexes from PDB	LOW-MEDIUM — structural, affinity not always present
PepBenchmark	29 canonical + 6 non-canonical peptide ML datasets	MEDIUM — includes some affinity datasets	https://github.com/ZGCI
CyclicPepedia	8,751 cyclic peptides, 59 targets	MEDIUM — broad coverage, check for affinity data	https://www.biosino.org/iMAC/cyclicpepedia
CREMP	36 macrocyclic peptide conformer ensembles (structural, not affinity)	LOW — structural only	Zenodo

3. Search Strategy with Paperclip

Phase 3A: Broad discovery searches

Use multiple keyword combinations to capture the full literature:

cyclic peptide protein binding affinity Kd
macrocyclic peptide protein target IC50 Ki
stapled peptide protein binding affinity measured
bicyclic peptide protein target SPR ITC
thioether macrocyclic peptide protein binding
disulfide cyclized peptide protein target affinity
head-to-tail cyclic peptide receptor binding
peptide-protein interaction inhibitor cyclic macrocyclic
RaPID system cyclic peptide binding affinity
mRNA display cyclic peptide protein target Kd

Phase 3B: Targeted affinity-measurement searches

Focus on specific assay types and quantitative measurements:

"cyclic peptide" protein "isothermal titration calorimetry" binding
"cyclic peptide" protein SPR "dissociation constant"
"cyclic peptide" "fluorescence polarization" protein affinity
cyclic peptide protein binding "no binding" OR "non-binder"
macrocyclic peptide selectivity "protein-protein interaction" affinity

Phase 3C: Methods & screening searches

phage display cyclic peptide binder protein target
mRNA display macrocyclic peptide protein binder affinity
OBOC cyclic peptide protein target screening
affinity selection mass spectrometry cyclic peptide protein
in vitro selection cyclic peptide de novo protein target

Phase 3D: Dataset / database papers

cyclic peptide protein interaction database dataset
peptide protein binding affinity benchmark machine learning
cyclic peptide binder design computational dataset

4. Data Extraction Pipeline

For each relevant paper found:

Fields to extract

Peptide info: sequence (one-letter if canonical, HELM notation if modified), cyclization type (disulfide, head-to-tail, thioether, stapled, lactam, other), ring size, molecular weight
Protein target info: name, UniProt ID, PDB ID (if available), protein family/class
Binding data: Kd/Ki/IC50/EC50 value, unit, assay method, assay conditions (pH, temp, buffer if available), binary bind/no-bind label if available
Structure: PDB ID of complex if available
Source: DOI, year, authors

Classification criteria

Binder: reported Kd < 10 µM OR Ki < 10 µM OR IC50 < 10 µM (or explicitly stated as binder)
Non-binder: explicitly reported as non-binding OR Kd > 100 µM (or similar cutoff — to be refined)
Strong binder: Kd < 100 nM
Moderate binder: 100 nM < Kd < 1 µM
Weak binder: 1 µM < Kd < 10 µM

5. Priority Actions & Deliverables

Step	Action	Status
1	Create `data-scraping/` folder	DONE
2	Broad keyword research (web)	DONE
3	Draft plan for review	DONE
4	Execute Paperclip searches (Phase 3A-D)	TODO
5	Check existing databases (CPSea, PPIKB, BindingDB, CyclicPepedia) for downloadable datasets	TODO
6	Extract data from top papers	TODO
7	Normalize & deduplicate entries	TODO
8	Save curated dataset as CSV/JSON in `data-scraping/`	TODO
9	Summary statistics & quality report	TODO

6. Open Questions for User

Binding affinity threshold: Use 10 µM for binder/non-binder split? Or different cutoff?
Cyclization types: Include all (disulfide, head-to-tail, thioether, stapled, lactam, etc.) or focus on specific classes?
Non-canonical amino acids: Include peptides with N-methylated, D-amino acids, other modifications?
Binary vs. regression: Do you want both continuous (Kd) and binary (bind/no-bind) labels, or prioritize one?
Synthetic vs. natural: Include natural cyclic peptide products (cyclotides, etc.) or only de novo designed ones?
Dataset size target: Are you aiming for hundreds, thousands, or tens of thousands of entries?
Primary use case: Model training (need large dataset)? Benchmarking (need high-quality, diverse)? Both?

Lemna Knowledge Base

Explorer

Cyclic Peptide–Protein Binding Affinity Dataset Curation Plan

Goal

1. Background & Keyword Landscape

Core terms

Binding affinity measurement terms

Assay methods

Target context terms

2. Existing Databases & Resources (to check and potentially integrate)

3. Search Strategy with Paperclip

Phase 3A: Broad discovery searches

Phase 3B: Targeted affinity-measurement searches

Phase 3C: Methods & screening searches

Phase 3D: Dataset / database papers

4. Data Extraction Pipeline

Fields to extract

Classification criteria

5. Priority Actions & Deliverables

6. Open Questions for User

Graph View

Recent Notes

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Table of Contents

Backlinks