Artificial Intelligence (AI)–assisted mammogram readings reduced radiologists’ workload by approximately 40% while maintaining performance accuracy, a new study published in Radiology found.1
Prior evidence suggests using AI for decision support in breast cancer detection screenings can improve radiologist reading performance. Numerous studies have assessed the proficiency of AI-assisted screenings; however, the new study aimed to assess the reduction in workload AI might provide radiologists by allowing standalone AI interpretation in cases where the model performs as well as or better than the radiologist.
AI-assisted breast cancer screenings aid in reducing radiologists’ workload when certain of its analysis. | Image Credit: okrasiuk – stock.adobe.com
The study introduced an AI model to mammogram screenings that outputs the probability of malignancy (POM) and a measure of its uncertainty. Researchers proposed a hybrid reading approach where recall decisions for additional screenings made by the model were only assessed for a double reading by a radiologist if the predictions were deemed confident by the model itself.
Researchers compiled digital mammographic screening examinations performed between July 2003 and August 2018 at the prevention screening unit in Utrecht, the Netherlands. The AI interpretation model was developed to assess screenings in 3 steps—in contrast to previous models based on a single neural network—which are standard for examination-level predictions. The 3 steps were:
- A sensitivity region detection algorithm that proposes regions of interest
- A region classification network
- The generation of an examination-level conclusion
The certainty of the models’ predictions was based on the region classification stage and its effect on the uncertainty of the entire model. While the region detection network was considered, it was ultimately omitted from certainty projections due to its high-sensitivity operation, which would likely cause false-negative errors at an examination level. The output was measured at the examination level, resulting in an area under the receiver operating characteristic curve (AUC).
AI-Assisted Mammogram Screening Proficiency
The data set, comprised of 41,471 examinations from 15,524 women with a median age of 59 years, included a total of 332 screen-detected cancers and 34 interval cancers. The AUC for the AI mammography interpretation model for detecting malignancies was 0.92 (95% CI, 0.89-0.94), meaning it performed exceedingly well in distinguishing malignant from nonmalignant cases. The single (1 radiologist) and double (2 radiologists) readings by radiologists had sensitivities of 69.2% and 72.3%, respectively, meaning they were able to detect about 69 out of 100 and about 72 out of 100 cancers, respectively. Their specificities of 98.2% (95% CI, 98.0-98.3) and 98.3% (95% CI, 98.2-98.4), respectively, show that radiologists were very accurate in avoiding false positives, although they did miss some cases, resulting in a lower sensitivity. At those specificities, the AI model had a lower sensitivity for both single reading and double reading—62.1% (95% CI, 55.0-68.9; P = .01) and 61.6% (95% CI, 54.3-68.6; P < .001), respectively. These data show that while the AI was able to detect a significant number of cases, when forced to match the high specificity of radiologists, it was much less sensitive.
The uncertainty metric that produced the best results was the entropy of the mean of the POM score of the most specific region. Under this metric, the AI assigns each region of the breast a POM score and then measures the confidence of its interpretation for each score. High entropy means the AI is uncertain, yet using this metric led to a split where the AI was uncertain in 61.9% of cases.
The detection rate and recall rate did not differ drastically, if at all, from that of the standard double readings (recall, 23.7 vs. 23.9 per 1000 examinations), yet 19% of the recalls were triggered by AI alone. The study authors still consider this to be unfavorable, as most women prefer their mammograms to be read by at least 1 radiologist, thus encouraging radiologists to also review recalled examinations by AI despite its confidence in interpretation. 2
Overall, the hybrid reading strategy reduced the radiologist’s workload to 61.9%, with a cancer detection rate of 6.6 per 1000 examinations (95% CI, 5.5-7.7) and a recall rate of 23.6 per 1000 (95% CI, 21.6-25.5). 1
“Even with this lower performance, leveraging the information gained by estimating the examinations where the model is certain, it is still possible to reduce the workload while maintaining the performance of standard double reading,” the study author explained. “Applying the proposed strategy to a higher-performing model would likely improve the reduction in workload or improve the performance further.”
The limitations of this study address the POM as a fair predictor of certainty because deep neural networks tend to be overconfident in their predictions. Additionally, radiologists’ behavior was not considered despite potential changes due to the prevalence of cancers and subtypes varying within a set, which may have influenced their reading strategy.
“Therefore, further research, ideally a prospective trial, is needed to determine how workload reduction in the number of examinations obtained using this method would translate to a reduction in reading time,” the study authors concluded. “The best uncertainty metric could guide a reading strategy to reduce workload by approximately 40% without decreasing performance even with a model that has lower performance than that of a single radiologist.”
References
1. Verboom SD, Kroes J, Pires S, Broeders MJM, Sechopoulos I. AI should read mammograms only when confident: a hybrid breast cancer screening reading strategy. J Am Coll Radiol. Published online August 19, 2025. doi:10.1148/radiol.242594
2. Ongena YP, Yakar D, Haan M, Kwee TC. Artificial intelligence in screening mammography: a population survey of women’s preferences. J Am Coll Radiol. 2021;18(1 Pt A):79-86. doi:10.1016/j.jacr.2020.09.042