Spotlight Presentation: Foundational model building with single-cell RNA-Seq data

26 Sept 2024

15:00 - 15:30

Data Quality Target Identification Lead Generation & Optimization Drug Response Prediction

Strand will present progress on the following subproblems in the use of single-cell RNA-Seq data for drug discovery:

An AWS data lake capable of ingesting and processing single-cell RNA-Seq data with associated metadata at scale

N/A

Semi-automated LLM-based ingestion to a schema with ≈35 fields of single-cell RNA-Seq + metadata of 3 disease datasets -- UC, AD and FTD -- from GEO. We show an improvement in turnaround time of ≈3x-5x
A standardized single-cell pipeline that generates normalized counts from fastqs for ingested data
Embeddings of the single-cell data for pretraining on an LLM [see for e.g scBERT]. We show how such embeddings might be used to remove batch effect and hence integrate data.

Industry Expert

AI in Drug Discovery Xchange - San Francisco - 2024