Boltz-2 and the New Frontier of Structure-Grounded Binding Affinity Prediction

Artificial intelligence has rapidly transformed biomolecular modeling, yet one capability has remained persistently out of reach: accurate, scalable prediction of protein–ligand binding affinities. Boltz-2, introduced in a recent preprint, represents the most technically complete attempt to bridge this gap, combining high-fidelity structure prediction with affinity estimation at near–free-energy-perturbation (FEP) accuracy while maintaining orders-of-magnitude lower computational cost.

Boltz-2 model article

Why Binding Affinity Prediction Has Remained a Hard Problem

Deep learning–based structure prediction models (e.g., AlphaFold2/3, Boltz-1, Chai-1) have rewritten expectations of structural accuracy, yet they still do not reliably predict binding free energies—the key determinant of small-molecule potency.

Docking and ML scoring functions are fast but imprecise.
FEP, the most accurate computational method available, is far too expensive for routine use.

Boltz-2 is the first AI model to meaningfully narrow this accuracy gap with FEP while retaining screening-scale efficiency. The model’s improvements stem from both architectural extensions and unusually extensive biochemical data curation.

A Structure Model Built on Ensembles, Not Static Snapshots

A defining innovation in Boltz-2 is the shift from static PDB structures to structural ensembles encompassing:

NMR conformational frames
Molecular dynamics (MISATO, ATLAS, mdCATH)
Distillation from AlphaFold2 and Boltz-1 across multiple molecular modalities

By exposing the model to local fluctuations, side-chain dynamics, and subtle conformational heterogeneity, the training process allows Boltz-2 to learn more realistic interaction geometries.

Benchmarks in the paper show MD-conditioned Boltz-2 matching or outperforming recent dynamics-focused models such as AlphaFlow and BioEmu in RMSF correlation metrics.

Boltz-2 model article

Controllability: Bringing AI Structure Prediction Closer to Experimental Practice

Boltz-2 introduces an advanced suite of controllability features rarely seen in previous models:

Method conditioning (X-ray/NMR/MD) to bias toward specific experimental characteristics
Pocket and distance constraints, enabling hypothesis-driven exploration
Soft and hard template conditioning, including support for multimer templates
Boltz-Steering, a physics-inspired potential for reducing steric clashes and improving physical plausibility

These features allow experimentalists and computational chemists to integrate prior knowledge—an essential step toward mechanistic modeling rather than blind prediction.

A Dedicated Affinity Module Grounded in Structural Representations

Boltz-2 separates structure generation from affinity estimation via a specialised PairFormer module operating on protein–ligand and intraligand representations.

The module outputs two key quantities:

A binding likelihood classifier
A continuous affinity value, integrating Ki, Kd, IC₅₀, EC₅₀, XC₅₀, etc., normalized to log-µM

The latter is not a literal Ki or IC₅₀ but an “assay-agnostic binding strength” proxy designed for ranking compounds within and across chemotypes.

This separation of structure and affinity heads prevents destructive interference during training while still enabling tight coupling through shared structural embeddings.

Data Curation at Unprecedented Scale and Cleanliness

Affinity prediction is fundamentally limited by data quality, and naïvely combining PubChem, ChEMBL, or BindingDB introduces assay noise, protocol heterogeneity, and structural inconsistencies.

Boltz-2 addresses this with a multi-stage curation workflow that includes:

filtering for reliable, single-protein, biochemically robust assays
standardizing all continuous values to log-µM
generating synthetic decoys to reduce HTS bias
PAINS removal, size filtering, and ipTM-based structural quality gating

The final dataset contains millions of high-quality binders, non-binders, and quantitative measurements across thousands of targets.

This is arguably one of the largest and most rigorously curated biochemical datasets ever used for training a unified affinity model.

Performance: Approaching FEP Accuracy at Three Orders of Magnitude Less Compute

Across public benchmarks, Boltz-2 consistently outperforms all non-FEP baselines, including physics-based scoring functions (MM/PBSA, FMO), classical ML methods, and docking.

Highlights from the FEP+ benchmark suite (OpenFE + 4-target subset):

Pearson R = 0.66, approaching relative FEP methods
>1000× speed improvements over FEP
Outperforms all ML and fast-physics baselines across assays

On CASP16—a blind international competition—Boltz-2 surpassed every top-scoring entry without any fine-tuning, an exceptional demonstration of generalisation.

Large-Scale and Prospective Virtual ScreeningRetrospective Screens (MF-PCBA)

Boltz-2 nearly doubles the average precision of prior ML baselines and achieves enrichment factors above 18× in the top 0.5% of ranked compounds.

Prospective Screens (TYK2)

Using Enamine’s Hit Locator and Kinase libraries:

8/10 (HLL) and 10/10 (kinase) top-ranked molecules displayed binding in Boltz-ABFE validation.
Random controls had 0/10 predicted binders.
Screen score correlates strongly with ABFE (|R| = 0.74).

Generative Design (GFlowNet + SynFlowNet)

Coupled to a GFlowNet generator sampling Enamine REAL Space (~76B molecules), Boltz-2 enabled:

fully de novo design
diversity-aware selection
ABFE validation for top candidates

All selected generative candidates were predicted to bind TYK2, despite minimal similarity to known ligands.

This creates a truly closed-loop AI design pipeline:
structural prediction → affinity → generative design → physics validation.

Limitations and Open Challenges

The authors note several remaining weaknesses:

Difficulty modeling large conformational changes or allosteric transitions
Reduced accuracy when the predicted pocket is incorrect or cofactors are missing
Variable performance across protein classes and assay families
Limited sensitivity to long-range interactions outside the affinity crop

These limitations highlight the continued need for integrated structural biology, biochemical context, and physics-driven refinement.

Conclusion: A Pivotal Step Toward Unified AI-Driven Drug Discovery

Boltz-2 represents a substantial advance in AI-native biomolecular modeling. By tightly coupling structure prediction, local dynamics, controllability, and affinity estimation, it provides a unified framework that spans hit discovery, hit-to-lead, and lead optimization.

Crucially, Boltz-2 approaches FEP-level accuracy at a fraction of the cost and is released under a permissive license—positioning it as a foundation for future open innovation in structural biology and drug design.

Boltz-2 is not the final word on AI-guided affinity prediction, but it establishes a new baseline for what is computationally achievable and sets a clear trajectory for the next generation of foundation models in molecular science.

View all News Xchange