Interpreting the functional consequences of noncoding genetic variants remains a key challenge in the field of genomics, particularly in the context of human rare diseases. To help tackle this issue, Jaganathan et al. report the development of a deep neural network, named PromoterAI, that specifically identifies promoter variants that are predicted to affect gene expression. The authors first trained PromoterAI to predict DNase hypersensitivity, transcription factor binding, histone modifications and gene expression at single-nucleotide resolution. They then created a training dataset comprised of thousands of promoter variants linked to abnormal gene expression in multiple tissues. This was used to fine-tune PromoterAI. Several approaches were used to validate PromoterAI’s predictions. First, analysis of allele frequency data shows that PromoterAI’s selected variants are strongly depleted in the human population, indicating their potential deleterious effect. Analysis of the UK Biobank cohort highlights a strong correlation between PromoterAI’s predictions and protein abundance and quantitative traits. Lastly, PromoterAI was used to study patients from the Genomics England cohort who have undiagnosed rare diseases. PromoterAI variants are enriched in promoters of Mendelian disease genes that are linked to the patients’ phenotypes. This was observed particularly in genes with putative dominant loss-of-function effects.
Original reference: Science 389, eads7373 (2025)