Predictive models for lung cancer often fall short when applied beyond the clinical settings in which they were developed, especially when evaluating biopsied lung nodules.
A recent study in Radiology: Artificial Intelligence highlights this challenge and offers guidance for improving models’ generalizability across institutions and clinical settings with strategies such as image harmonization and fine-tuning models on local patient populations.
More than 1.5 million Americans have at least one pulmonary nodule detected either incidentally at routine chest CT or during lung cancer screening every year. Nodule biopsy carries risks, costs and anxiety for patients. With 95% of indeterminate pulmonary nodules found to be benign, clinical guidelines recommend risk-stratifying nodules before resorting to invasive percutaneous or surgical interventions.
“We want to diagnose these pulmonary nodules earlier and noninvasively, and we want to avoid performing a biopsy on benign nodules,” said study lead author Thomas Z. Li, PhD, from the Medical-image Analysis and Statistical Interpretation (MASI) Lab at Vanderbilt University in Nashville, TN. “Better noninvasive diagnostic tools can help us do that.”
Statistical models for predicting lung cancer have the potential to improve risk stratification, aiding in earlier diagnosis of malignancy as well as reducing the risk of morbidity, costs and unnecessary anxiety associated with the workup of benign disease. Several models have been validated, but a systematic analysis of their performance is lacking.
To learn more, Dr. Li and colleagues evaluated eight validated predictive models developed to stratify pulmonary nodules. The models consisted of clinical prediction models, cross-sectional or longitudinal AI models, and multimodal approaches. The researchers evaluated the models on nine patient cohorts in three clinical settings: nodules detected during screening, incidentally detected nodules and pulmonary nodules deemed suspicious enough to warrant a biopsy.
“We wanted to know, in these three clinical settings, how do the models that have been developed so far perform?” Dr. Li said.
Analysis revealed that the eight lung cancer prediction models failed to generalize well across clinical settings and sites outside of their training distributions.
The findings show that a single external validation set is not enough to guarantee generalization performance, Dr. Li noted.
“You’re training the model on one group, which is a healthy screening population, and then you’re trying to apply it into a different group, and what we see is that it doesn’t work,” he said. “We need the model to be evaluated across multiple different institutions, and we need it to be evaluated in different clinical settings.”