Uncategorized
Genomics initiative could boost AI-designed therapeutics

A new initiative is set to generate and model biological data at the trillion-gene scale, paving the way to scale AI-designed therapeutics.
The Trillion Gene Atlas, developed by Basecamp Research in collaboration with Anthropic, Ultima Genomics, PacBio, and NVIDIA AI infrastructure aims to expand known evolutionary genetic diversity 100-fold by collecting genomic data from more than 100 million species across thousands of sites worldwide.
The initiative, unveiled during the Health Track at SXSW and the NVIDIA GTC conference in San Jose, could help drive AI drug development and therapeutic design.
“Today’s biological AI models are trained on a narrow slice of life on Earth,” said Glen Gowers, Co-Founder and CEO of Basecamp Research, speaking at SXSW in Austin.
“The Trillion Gene Atlas expands the known genetic universe by orders of magnitude beyond what is in public databases. Training models at this scale establishes a new paradigm for programmable therapeutic design.”
Trillion Gene Atlas to expand genomics data
With huge increases in model size and computing power, diverse data is a critical enabler for progress in AI drug development and real-world benchmarks.
All current sequence-based foundation models rely on variants of the same public repositories, with 80% of these trained on a public database containing fewer than 250 million sequences.
Basecamp Research’s EDEN foundation models, released in January, bypass the industry’s evolutionary “data wall” by training entirely on BaseData, a proprietary genomic database that is currently more than 10 times larger than all public resources combined. By learning from an unprecedented 10 billion new-to-science genes across one1 million newly discovered species, EDEN unlocked critical new scaling laws for AI in biology.
The Trillion Gene Atlas builds on this approach by greatly expanding the breadth and contextual depth of genomic data in the known “internet of biology” suitable for AI training.
The tool is enabled by advances in ultra-high-throughput short- and long-read sequencing and accelerated computing. Basecamp has partnered with Ultima Genomics and PacBio to deliver industrial-scale sequencing including data-rich, high-accuracy long reads.
“PacBio HiFi sequencing delivers highly accurate long reads that preserve full genomic context and enables subspecies and even strain-level resolution in complex samples,” said Christian Henry, President and CEO of PacBio.
“HiFi data provides the reliable, information-rich foundation biological AI models need to learn from nature at scale and power initiatives like the Trillion Gene Atlas.”
“Biology has been fundamentally data-starved when compared to other fields like language or computer vision as researchers have lacked the tools required to generate data at scale,” added Gilad Almogy, Founder and CEO of Ultima Genomics.
“We strongly believe that AI will have an immense impact on our understanding of biology and human health, and the UG200 Series was designed from the ground up to enable the massive datasets required for BioAI to deliver on this promise. We are excited our technology can enable Basecamp in their vision and advance innovative initiatives like the Trillion Gene Atlas.”
The post Genomics initiative could boost AI-designed therapeutics appeared first on Drug Discovery World (DDW).