AI-Powered Ovarian Cancer Detection Through DNA Methylation

Ovarian cancer: ©Dr_Microbe – stock.adobe.com

Early diagnosis of ovarian cancer is critical because it significantly improves patient prognosis and survival rates. Currently, about 75% of patients are diagnosed at an advanced stage (stage III or IV), where survival rates drop to 40-50%. In contrast, Stage I diagnosis boasts an 80 to 90% survival rate. The main challenge lies in the difficulty of early detection, similar to pancreatic cancer, as conventional methods like imaging and CA125 lack sufficient sensitivity and specificity for early-stage disease.

Researchers, including Jesús Gonzalez Bosquet, MD, PhD, are exploring the use of DNA methylation patterns as markers for early diagnosis of ovarian cancer, detectable through a liquid biopsy from blood. This method aims to identify specific methylation changes in cell-free DNA (cfDNA) circulating in the blood, which could indicate the presence of cancer at an earlier stage.

“We did preliminary analysis in methylation in ovarian cancer. It turns out that there’s DNA floating in the blood that can be analyzed, and some of them is methylated. So potentially, methylation patterns in DNA could be a good marker for diagnosis of ovarian cancer,” explained Gonzalez Bosquet, who is an associate professor of obstetrics and gynecology and gynecologic oncology at the University of Iowa.

The researchers utilized a methylation chip analysis, specifically an Illumina Infinium MethylationEPIC BeadChip Array, which probes over 850,000 different methylation sites in the genome. They analyzed DNA collected from ovarian tumor samples and normal controls. To manage the vast number of variables, they employed machine learning and deep learning methodologies, including MethylNet for initial reduction, followed by univariate ANOVA analyses and multivariate lasso regression. This rigorous process allowed them to narrow down the most informative probes from over 850,000 to a highly predictive set of 9 probes.

“The difficult part is trying to reduce the numbers of variables,” said Gonzalez Bosquet. “So, we used some machine learning and deep learning methodology.”

This was a pilot case-control study that included 99 high-grade serous cancer (HGSC) tissue samples and 12 normal fallopian tube samples as controls from the University of Iowa’s Gynecologic Oncology Bank. The initial prediction models using MethylNet were highly accurate, achieving an area under the curve (AUC) of 100%. After optimization through variable reduction using ANOVA and lasso regression, a model with only 9 methylated probes also achieved an AUC of 100% in predicting HGSC.

Using a small number of probes, such as the 9 identified in this study, is advantageous for developing a practical diagnostic test. A model with fewer variables is less complex, more easily validated, and more feasible for clinical implementation. The initial methylation chip provided over 850,000 data points, which is impractical for a diagnostic test. By reducing this to 9 highly informative probes while maintaining high accuracy (AUC of 100% in the initial dataset), the researchers made significant progress towards a clinically usable test.

The predictive model was rigorously validated using several methods. First, it was validated in an independent dataset (GSE65820) from a different geographical location (Australia) with similar patient ancestry. The 11,167-probe model showed an excellent AUC of 98%, and the simplified 9-probe model achieved a very good AUC of 84% in this external validation. Additionally, the models were re-trained, validated, and tested using an independent machine learning analytic platform, TensorFlow, which also showed excellent performance.

Strengths of the study include the use of a well-annotated single-institution biobank, focusing on a homogeneous phenotype of high-grade serous cancer, and validating the model in an independent dataset from a different geographical location. The use of advanced AI and machine learning techniques for probe selection is also a notable strength.

Limitations include the relatively small and imbalanced sample size (99 cases vs 12 controls in the initial set, and even more imbalanced in the validation set), and the fact that the initial study used tissue samples rather than blood.

“One of the problems with ovarian cancer always have seen the false positives,” said Gonzalez Bosquet. “We need to now go to population study, try to identify this in blood first, then if we can then put it in a in a prospective trial to see if it’s if it’s valid in a prospective way.”

The “black box” effect of some deep-learning tools like MethylNet in variable selection was also noted by researchers, though mitigated by downstream analyses. The generalizability of the model might also be limited as it was tested in populations with similar backgrounds, suggesting a need for testing in more diverse populations.

The ultimate goal of this research is to develop a method that can be used for early detection of ovarian cancer using cfDNA from blood samples, ideally for population-level screening.

The next crucial steps involve transitioning from tissue analysis to population studies to identify these methylation markers in blood samples, conducting a prospective trial to validate the findings in a real-world setting, and optimizing the model’s performance in blood, with a diverse population of patients across all stages.

“It needs validation in real life to have really impact. As we know, with this environment, it is difficult to get funding. [But] we still insist on trying to get this further, because I think it’s a potential good mechanism for detection,” Gonzalez Bosquet said.

REFERENCE:
Bosquet JG, Wagner VM, Russo D, et al. Identifying ovarian cancer with machine learning DNA methylation pattern analysis. Sci Rep. 2025 Jul 1;15(1):20910. doi: 10.1038/s41598-025-05460-9.

Continue Reading