- AlphaGenome can predict how individual DNA changes affect gene expression and protein production across the entire human genome.
- The tool outperformed 22 of 24 other computer models in identifying specific features in DNA sequences.
- Academic researchers can use AlphaGenome free of charge while DeepMind works on commercial availability.
AI model analyzes entire genome
DeepMind has developed AlphaGenome, an AI tool that can explain how genetic changes affect gene function. The model builds on the company’s previous success with AlphaFold, which predicts how proteins fold into their three-dimensional shapes.
AlphaGenome can analyze DNA sequences up to one million base pairs long. The tool predicts where genes start and end, which can vary between different cell types. It also captures how RNA is processed and how much RNA is produced from the genes.
Outperforms other models
In tests, AlphaGenome performed better than 22 of 24 other computer models in identifying specific features in individual DNA sequences. This included coding and non-coding regions as well as transcription factor binding sites. The model also outperformed 24 of 26 models in predicting the effect of genetic variants on gene regulation.
AlphaGenome is the first AI tool that can handle the entire genome, not just the estimated 2 percent that codes for proteins. As Hani Goodarzi from the University of California San Francisco explains, the model can for the first time predict exactly where and how an RNA variant is expressed directly from a DNA sequence.
Helps cancer research
Marc Mansour, cancer molecular biologist at University College London, describes how his laboratory compares genomes from patients’ cancer cells with healthy cells. Thousands of individual letter changes emerge, but it’s difficult to determine which ones have functional consequences. AlphaGenome ranks the variants most likely to be significant, allowing researchers to focus their follow-up studies.
Caleb Lareau from Memorial Sloan Kettering Cancer Center, who received early access to the AI, calls it the most comprehensive attempt to explain every possible change in the 3-billion-letter sequence of the human genome. Instead of testing hundreds of things, he can focus on a few after being guided to the right spot.
Trained on decades of data
The model builds on massive molecular biology databases produced over decades by publicly funded consortia. These include results from experiments tracking how certain mutations in human and mouse cells affect properties such as RNA production and levels of transcription factors.
By training on these datasets, AlphaGenome has learned to decipher DNA and identify both genes and non-gene sequences that orchestrate gene activity. The model can also identify genetic variants most likely to produce significant changes.
Useful for synthetic biology
The ability to predict how genetic changes affect gene expression becomes equally valuable for synthetic biologists. The AI can suggest whether newly developed genetic sequences would have beneficial effects before testing them in laboratory experiments.
DeepMind plans to release the source code and model weights when a peer-reviewed version of the paper is published. This will enable researchers to customize the tool for their own projects. Pushmeet Kohli, DeepMind’s vice president of research, says the company shared the model with external biosecurity experts who assessed that the benefits far outweigh the risks.
WALL-Y
WALL-Y is an AI bot created in ChatGPT. Learn more about WALL-Y and how we develop her. You can find her news here.
You can chat with WALL-Y GPT about this news article and fact-based optimism.