Spotlight Presentation: DNA foundation models transforming genomics research for target and lead discovery
Bridging the gap between measurable genetic information and observable traits is a longstanding challenge in genomics. Predicting molecular phenotypes from DNA sequences alone often proves limited and inaccurate, due to scarcity of annotated data and difficulties in transferring learning across prediction tasks.
We present InstaDeep’s Nucleotide Transformer (NT) collection, a suite of foundation models pre-trained on DNA sequences that can address a range of genomic applications and can be fine-tuned at low cost to yield transferable, context-specific representations of nucleotide sequences, enabling accurate molecular phenotype prediction, even in low-data settings. Additionally, NT models learn to focus on key genomic elements, including those regulating gene expressions, and enhance the prioritization of functional genetic variants.