Algorithms submitted for the AI Challenge hosted by RSNA have shown excellent performance for detecting breast cancers on mammography images, increasing screening sensitivity while maintaining low recall rates, according to a study published in Radiology.
The RSNA Screening Mammography Breast Cancer Detection AI Challenge was a crowdsourced competition that took place in 2023, with more than 1,500 teams participating. The Radiology article details an analysis of the algorithms’ performance, led by Yan Chen, PhD, a professor in cancer screening at the University of Nottingham in the United Kingdom.
“We were overwhelmed by the volume of contestants and the number of AI algorithms that were submitted as part of the Challenge,” Prof. Chen said. “It’s one of the most participated-in RSNA AI Challenges. We were also impressed by the performance of the algorithms given the relatively short window allowed for algorithm development and the requirement to source training data from open-sourced locations.”
The goal of the Challenge was to source AI models that improve the automation of cancer detection in screening mammograms, helping radiologists work more efficiently, improving the quality and safety of patient care, and potentially reducing costs and unnecessary medical procedures.
RSNA invited participation from teams across the globe. Emory University in Atlanta, Georgia, and BreastScreen Victoria in Australia provided a training dataset of around 11,000 breast screening images, and Challenge participants could also source publicly available training data for their algorithms.
Prof. Chen’s research team evaluated 1,537 working algorithms submitted to the Challenge, testing them on a set of 10,830 single-breast exams—completely separate from the training dataset—that were confirmed by pathology results as positive or negative for cancer.
Altogether, the algorithms yielded median rates of 98.7% specificity for confirming no cancer was present on mammography images, 27.6% sensitivity for positively identifying cancer, and a recall rate—the percentage of the cases that AI judged positive—of 1.7%. When the researchers combined the top 3 and top 10 performing algorithms, it boosted sensitivity to 60.7% and 67.8%, respectively.
“When ensembling the top performing entries, we were surprised that different AI algorithms were so complementary, identifying different cancers,” Prof. Chen said. “The algorithms had thresholds that were optimized for positive predictive value and high specificity, so different cancer features on different images were triggering high scores differently for different algorithms.”
According to the researchers, creating an ensemble of the 10 best-performing algorithms produced performance that is close to that of an average screening radiologist in Europe or Australia.