Remember the last time you visited the doctor? They likely asked you about your medical history.
For many conditions, this information isn’t just relevant for diagnosis and treatment, it’s also valuable for prevention. Thanks to AI, a range of algorithms can now predict the risk of single medical conditions, such as cardiovascular disease and cancer, based on medical records.
But diseases don’t exist in a vacuum. Some conditions may increase the risk of others. A full picture of a person’s health trajectory would predict risk across a range of diseases. This could not only inform early treatment, but also surface vulnerable groups of people for screening and other preventative measures. And it could identify people at risk for a condition—say, high blood pressure or breast cancer—that don’t necessarily fit the usual criteria.
Recently, a team from the German Cancer Research Center and collaborators released an AI “oracle” that predicts a person’s risk of getting over 1,000 common diseases decades in the future. Dubbed Delphi-2M, the AI is a type of large language model, like the algorithms powering popular chatbots.
Rather than training the AI on text, however, the team fed it over 400,000 medical records from the UK Biobank, a massive study tracking participants’ health as they age. After adding lifestyle information, such as body mass, smoking, and drinking habits, Delphi could predict any participant’s chance of multiple diseases for at least two decades.
Though it only trained on the Biobank cohort, the AI mapped the health trajectories of nearly two million people in Denmark without any changes to its setup, suggesting it had captured the crux of disease risk and interaction. Delphi is also explainable, in that it lays out the rationale for its assessment.
The tool is “an achievement” that sets “a new standard for both predictive accuracy and interpretability” for healthcare, said Justin Stebbing at Anglia Ruskin University, who was not involved in the study.
Looking Glass
Health care is shifting from treatment to prevention. But individual guidance can be confusing. Take mammograms. Recommendations on what age to start testing have shifted from 40 to 50 and back to 40. More broadly, as the world ages, modeling the burden of cancer, dementia, and other diseases could better prepare healthcare systems for the so-called “silver tsunami.”
Here’s where medical AI comes in. Early tools were crafted to diagnose conditions based on medical images. But large language models have opened a whole new avenue for prediction.
These algorithms and classic disease modeling share a common logic. The AI samples language as a sequence of word fragments known as tokens. It then generates responses token by token based on text it’s learned from scraped online resources. With enough training data, the AI learns how tokens relate to one another statistically and can generate human-like responses.
Predicting the progression of diseases is somewhat similar. If every step in the progression of a disease is a token, then predicting what’s next means statistically establishing how the tokens connect. Scientists have already used large language model-like algorithms trained on electronic health records to predict single diseases including cancer, stroke, and self-harm.
But tackling multiple diseases at once is another beast altogether.
Earlier this year, an AI called Foresight took medical prediction a step further. Trained on 57 million anonymized health records from England’s National Health Service, Foresight learned to predict hospitalizations, heart attacks, and hundreds of other conditions, but the algorithm was limited to Covid-19 research due to privacy concerns.
Seeing Eye
The German team designed Delphi to recognize the diagnostic code for each illness as a token. These codes are standardized globally. The team then modified the large language model to incorporate new information—for example, blood test results—to re-evaluate its predictions.
Delphi trained on over 400,000 comprehensive health records for 1,258 diseases, alongside factors like sex, body mass index, and other self-reported lifestyle indicators, including smoking and alcohol habits. The AI immediately found trends on the population level based on age and other demographic patterns. For example, the incidence of chickenpox peaked in infancy, whereas asthma tended to stick around. A person’s biological sex also had pronounced effects for risk of diabetes, depression, and heart attack.
For most diseases, Delphi matched or outperformed clinical risk score exams and medical AI predictors for individual diseases. It also beat other algorithms that analyze biomarkers—often specific proteins or other molecules in the blood—at predicting the risk of some diseases up to two decades in advance.
Delphi offers “the great advantage of enabling the simultaneous assessment of more than 1,000 diseases and their timing at any given time,” wrote the team.
The AI was especially helpful for analyzing cardiovascular disease and dementia, with both conditions following a relatively stable pattern of progression. However, it struggled with Type 2 diabetes, which has a more versatile trajectory depending on lifestyle changes.
Next, they challenged Delphi with nearly two million Danish health records without tweaking the algorithm. The database, the Danish National Patient Registry, contains medical records spanning nearly half a century. Delphi’s prediction accuracy barely dropped, suggesting the AI is generalizable to health record datasets beyond those it trained on.
Delphi has other perks. For one, it can generate and learn from synthetic medical records data to reduce the chance it violates participants’ privacy. The AI can also “explain” itself. Some diseases, such as diabetes, are tied to additional health challenges, like issues with a patient’s eyesight or peripheral nerve problems. Delphi clusters these symptoms, making it useful for scientists exploring the genes or cellular drivers behind these connections.
The team stresses Delphi only reveals association, not causation. But they built the AI so it can easily incorporate other data—such as genomes, diagnostic images, biomarkers, or even data from wearables—to further improve its predictions. They’re now testing the tool in other countries and populations.
Like other AI algorithms, Delphi learns to make predictions from its training data—and that includes the biases therein. UK Biobank health records generally skew white, middle-aged, and educated. For cancer patients, only those who survive are included in the database, which could also influence the AI’s predictions. Very little data is available for people aged 80 and older, so Delphi can’t reliably model their heath trajectory into the twilight years.
Even so, the AI could help find people that would benefit from diagnostic tests or screening programs—such as for breast cancer—even if they don’t meet the conventional criteria.
“This research looks to be a significant step towards scalable, interpretable, and—most importantly—ethically responsible form of predictive modeling in medicine,” said Gustavo Sudre at King’s College London, who was not involved in the study.