Boltz-2 and the New Frontier of Structure-Grounded Binding Affinity Prediction
Artificial intelligence has rapidly transformed biomolecular modeling, yet one capability has remained persistently out of reach: accurate, scalable prediction of protein–ligand binding affinities. Boltz-2, introduced in a recent preprint, represents the most technically complete attempt to bridge this gap, combining high-fidelity structure prediction with affinity estimation at near–free-energy-perturbation (FEP) accuracy while maintaining orders-of-magnitude lower computational cost.
Boltz-2 model article
Why Binding Affinity Prediction Has Remained a Hard ProblemDeep learning–based structure prediction models (e.g., AlphaFold2/3, Boltz-1, Chai-1) have rewritten expectations of structural accuracy, yet they still do not reliably predict binding free energies—the key determinant of small-molecule potency.
Docking and ML scoring functions are fast but imprecise.
FEP, the most accurate computational method available, is far too expensive for routine use.
Boltz-2 is the first AI model to meaningfully narrow this accuracy gap with FEP while retaining screening-scale efficiency. The model’s improvements stem from both architectural extensions and unusually extensive biochemical data curation.
A Structure Model Built on Ensembles, Not Static Snapshots
A defining innovation in Boltz-2 is the shift from static PDB structures to structural ensembles encompassing:
-
NMR conformational frames
-
Molecular dynamics (MISATO, ATLAS, mdCATH)
-
Distillation from AlphaFold2 and Boltz-1 across multiple molecular modalities
By exposing the model to local fluctuations, side-chain dynamics, and subtle conformational heterogeneity, the training process allows Boltz-2 to learn more realistic interaction geometries.
Benchmarks in the paper show MD-conditioned Boltz-2 matching or outperforming recent dynamics-focused models such as AlphaFlow and BioEmu in RMSF correlation metrics.
Boltz-2 model article
Controllability: Bringing AI Structure Prediction Closer to Experimental PracticeBoltz-2 introduces an advanced suite of controllability features rarely seen in previous models:
-
Method conditioning (X-ray/NMR/MD) to bias toward specific experimental characteristics
-
Pocket and distance constraints, enabling hypothesis-driven exploration
-
Soft and hard template conditioning, including support for multimer templates
-
Boltz-Steering, a physics-inspired potential for reducing steric clashes and improving physical plausibility
These features allow experimentalists and computational chemists to integrate prior knowledge—an essential step toward mechanistic modeling rather than blind prediction.
A Dedicated Affinity Module Grounded in Structural Representations
Boltz-2 separates structure generation from affinity estimation via a specialised PairFormer module operating on protein–ligand and intraligand representations.
The module outputs two key quantities:
-
A binding likelihood classifier
-
A continuous affinity value, integrating Ki, Kd, IC₅₀, EC₅₀, XC₅₀, etc., normalized to log-µM
The latter is not a literal Ki or IC₅₀ but an “assay-agnostic binding strength” proxy designed for ranking compounds within and across chemotypes.
This separation of structure and affinity heads prevents destructive interference during training while still enabling tight coupling through shared structural embeddings.
Data Curation at Unprecedented Scale and Cleanliness
Affinity prediction is fundamentally limited by data quality, and naïvely combining PubChem, ChEMBL, or BindingDB introduces assay noise, protocol heterogeneity, and structural inconsistencies.
Boltz-2 addresses this with a multi-stage curation workflow that includes:
-
filtering for reliable, single-protein, biochemically robust assays
-
standardizing all continuous values to log-µM
-
generating synthetic decoys to reduce HTS bias
-
PAINS removal, size filtering, and ipTM-based structural quality gating
The final dataset contains millions of high-quality binders, non-binders, and quantitative measurements across thousands of targets.
This is arguably one of the largest and most rigorously curated biochemical datasets ever used for training a unified affinity model.
Performance: Approaching FEP Accuracy at Three Orders of Magnitude Less Compute
Across public benchmarks, Boltz-2 consistently outperforms all non-FEP baselines, including physics-based scoring functions (MM/PBSA, FMO), classical ML methods, and docking.
Highlights from the FEP+ benchmark suite (OpenFE + 4-target subset):
-
Pearson R = 0.66, approaching relative FEP methods
-
>1000× speed improvements over FEP
-
Outperforms all ML and fast-physics baselines across assays
On CASP16—a blind international competition—Boltz-2 surpassed every top-scoring entry without any fine-tuning, an exceptional demonstration of generalisation.
Large-Scale and Prospective Virtual ScreeningRetrospective Screens (MF-PCBA)Boltz-2 nearly doubles the average precision of prior ML baselines and achieves enrichment factors above 18× in the top 0.5% of ranked compounds.
Prospective Screens (TYK2)
Using Enamine’s Hit Locator and Kinase libraries:
-
8/10 (HLL) and 10/10 (kinase) top-ranked molecules displayed binding in Boltz-ABFE validation.
-
Random controls had 0/10 predicted binders.
-
Screen score correlates strongly with ABFE (|R| = 0.74).
Coupled to a GFlowNet generator sampling Enamine REAL Space (~76B molecules), Boltz-2 enabled:
-
fully de novo design
-
diversity-aware selection
-
ABFE validation for top candidates
All selected generative candidates were predicted to bind TYK2, despite minimal similarity to known ligands.
This creates a truly closed-loop AI design pipeline:
structural prediction → affinity → generative design → physics validation.
Limitations and Open Challenges
The authors note several remaining weaknesses:
-
Difficulty modeling large conformational changes or allosteric transitions
-
Reduced accuracy when the predicted pocket is incorrect or cofactors are missing
-
Variable performance across protein classes and assay families
-
Limited sensitivity to long-range interactions outside the affinity crop
These limitations highlight the continued need for integrated structural biology, biochemical context, and physics-driven refinement.
Conclusion: A Pivotal Step Toward Unified AI-Driven Drug Discovery
Boltz-2 represents a substantial advance in AI-native biomolecular modeling. By tightly coupling structure prediction, local dynamics, controllability, and affinity estimation, it provides a unified framework that spans hit discovery, hit-to-lead, and lead optimization.
Crucially, Boltz-2 approaches FEP-level accuracy at a fraction of the cost and is released under a permissive license—positioning it as a foundation for future open innovation in structural biology and drug design.
Boltz-2 is not the final word on AI-guided affinity prediction, but it establishes a new baseline for what is computationally achievable and sets a clear trajectory for the next generation of foundation models in molecular science.


