A powerful new AI predicts how over 1,000 diseases may unfold across a person’s life, opening doors for precision prevention, policy planning, and bias-aware healthcare innovation.
Study: Learning the natural history of human disease with generative transformers. Image Credit: Song_about_summer / Shutterstock
In a recent study published in the journal Nature, researchers developed a machine learning model that utilized large-scale health data to predict the progression of 1,256 distinct ICD-10 level 3 diseases based on patients’ past medical histories.
The model demonstrated predictive accuracy comparable to existing tools that analyze individual diseases. It showed potential for simulating future health trajectories over a period of up to two decades and provided insights into personalized health risks and comorbidities.
A need for complex disease models
Human disease progression involves periods of health, acute illness, and chronic conditions, often appearing as clusters of comorbidities influenced by genetics, lifestyle, and socioeconomic factors.
Understanding these patterns is crucial for delivering personalized healthcare, providing lifestyle guidance, and implementing effective early screening programs. However, traditional algorithms are primarily designed for single diseases and cannot capture the complexity of over 1,000 recognized health conditions.
This limitation becomes especially important in the context of aging populations, where the burden of illnesses such as cancer, diabetes, cardiovascular disease, and dementia is projected to rise significantly over the coming decades. Accurately modeling disease trajectories is therefore vital for both healthcare planning and economic policy.
Artificial intelligence, particularly large language models (LLMs), provides a promising solution. These models excel at learning dependencies across sequences of data, much like predicting disease based on prior health events.
Inspired by this analogy, researchers have developed transformer-based models for predicting specific conditions, with encouraging early results. Yet, despite these advances, a truly comprehensive and generative model capable of simulating the full spectrum of multimorbidity across time has not been systematically evaluated.
Developing a large-scale data model
The researchers created Delphi-2M, a transformer-based model, to predict lifetime disease trajectories. Unlike language models that process words, Delphi-2M worked with diagnostic codes from the tenth revision of the International Classification of Diseases (ICD-10), as well as death, sex, BMI, and lifestyle factors such as smoking and alcohol use.
To address gaps in medical records, the team inserted artificial “no-event” tokens. It included sex and lifestyle tokens, with a vocabulary spanning disease codes, lifestyle levels, sex, no-event, and padding tokens (around 1,270 total).
Training was based on large-scale health records from the UK Biobank, comprising 402,799 participants for training, 100,639 for validation, and 471,057 for longitudinal testing. To test generalizability, the model was also validated on data from 1.93 million Danish individuals.
Several modifications tailored the base model to health data: replacing positional encoding with continuous age encoding, adding an output head to predict time-to-next event, and altering attention masks to prevent tokens at the same time point from influencing one another.
Delphi-2M could estimate risks for more than 1,000 diseases, forecast the timing of diagnoses, and simulate complete health trajectories. Performance was optimized through hyperparameter tuning, resulting in a 2.2M parameter model that combined predictive accuracy with generative capacity, providing a novel approach to modeling multimorbidity and long-term health progression.
a, Schematic of health trajectories based on ICD-10 diagnoses, lifestyle, and healthy padding tokens, each recorded at a distinct age. b, Training, validation, and testing data derived from the UK Biobank (left) and Danish disease registries (right). c, The Delphi model architecture. The red elements indicate changes compared with the underlying GPT-2 model. ‘N ×’ denotes applying the transformer block sequentially N times. d, Example model input (prompt) and output (samples) comprising (age: token) pairs. e, Scaling laws of Delphi, showing the optimal validation loss as a function of model parameters for different training data sizes. f, Ablation results measured by the cross-entropy differences relative to an age- and sex-based baseline (y axis) for different ages (x axis). g, The accuracy of predicted time to event. The observed (y-axis) and expected (x-axis) times to events are shown for each next-token prediction (grey dots). The blue line shows the average across consecutive bins of the x-axis.
Evaluating the model’s performance
Delphi-2M’s performance was evaluated using health data up to age 60 from 63,622 participants in the UK Biobank. The model generated simulated health trajectories and compared them with tangible outcomes.
Predictions of disease rates at ages 70 and 75 closely matched observed patterns, confirming its ability to capture population-level incidence trends. While predictive accuracy declined over longer time horizons, from an average AUC of approximately 0.76 to about 0.70 at 10 years, Delphi-2M still outperformed models based only on age and sex.
The model effectively distinguished risks across subgroups defined by lifestyle or previous illnesses, supporting its value for personalized risk profiling.
Importantly, Delphi-2M could also generate synthetic health trajectories that mirrored real-world disease patterns without duplicating individual records. A model trained solely on this synthetic data retained much of the original’s performance, showing only a three-point drop in AUC. This highlights potential applications for privacy-preserving research.
To interpret predictions, researchers examined the embedding space, which revealed disease clusters consistent with ICD-10 chapters and showed how specific diagnoses shaped outcomes, such as the strong impact of pancreatic cancer on mortality.
External validation on Danish data confirmed generalizability, with an average AUC of about 0.67, though with a modest performance drop. Finally, the study acknowledged its limitations, including biases in the UK Biobank recruitment process and patterns of missing data.
Conclusions
The study introduced Delphi-2M, a GPT-based model capable of predicting and simulating the progression of multiple diseases over time. Compared with single-disease or biomarker-based models, Delphi-2M showed strong accuracy in forecasting health risks across more than 1,000 conditions.
For diabetes risk, however, it performed lower than the single-marker HbA1c approach, although with only a modest decline in performance when tested on Danish data.
Its ability to sample synthetic future trajectories allows estimation of long-term disease burdens and the creation of privacy-preserving datasets. The model also highlighted patterns of comorbidities and temporal influences of illnesses, such as persistent mortality risks from cancer, and achieved an AUC of about 0.97 for predicting death.
However, several limitations were noted. Predictions reflected biases in UK Biobank data, including healthy volunteer effects, recruitment bias, and missingness patterns. Differences were also seen across ancestry and socioeconomic groups. Importantly, the model captures statistical associations but not causal relationships, which limits its direct clinical use.
Overall, Delphi-2M demonstrates the promise of transformer-based models for personalized risk prediction, healthcare planning, and biomedical research. Future improvements may integrate multimodal data, support clinical decision-making, and aid policy development in ageing populations.
Journal reference:
- Learning the natural history of human disease with generative transformers. Shmatko, A., Jung, A.W., Gaurav, K., Brunak, S., Mortensen, L.H., Birney, E., Fitzgerald, T., Gerstung, M. Nature (2025). DOI: 10.1038/s41586-025-09529-3, https://www.nature.com/articles/s41586-025-09529-3