Introduction
Obstructive sleep apnea (OSA) represents one of the most prevalent sleep-disordered breathing conditions globally. Recent global burden analysis estimates that nearly 1 billion adults worldwide are affected by varying degrees of OSA.1 Among elderly populations, OSA prevalence increases dramatically with age, affecting up to 90% of men and 78% of women in certain cohorts, making it a critical public health concern.2 OSA is characterized by recurrent episodes of complete or partial upper airway obstruction during sleep, leading to intermittent hypoxia, sleep fragmentation, and sympathetic nervous system activation.3
The association between OSA and cardiovascular disease involves multiple interconnected pathophysiological pathways that extend far beyond simple mechanical airway obstruction. The American Heart Association’s 2021 Scientific Statement definitively established OSA as an independent risk factor for cardiovascular disease.4 Intermittent hypoxia, the hallmark of OSA, triggers a cascade of molecular and cellular responses including oxidative stress, systemic inflammation, endothelial dysfunction, and metabolic dysregulation.5 These repetitive hypoxia-reoxygenation cycles activate hypoxia-inducible factor-1α (HIF-1α), leading to increased expression of pro-inflammatory cytokines such as tumor necrosis factor-α (TNF-α), interleukin-6 (IL-6), and C-reactive protein.6
Hemodynamic alterations during apneic episodes create significant cardiovascular stress through acute blood pressure surges, increased cardiac afterload, and enhanced sympathetic nervous system activity.7 The resulting chronic hypertension, combined with increased platelet aggregation and coagulation cascade activation, creates a prothrombotic state that substantially increases atherothrombotic risk.8,9 Furthermore, metabolic consequences of OSA, including insulin resistance, dyslipidemia, and adipokine dysregulation, contribute synergistically to accelerated atherogenesis.10
Carotid atherosclerosis (CAS) represents a particularly concerning manifestation of OSA-related cardiovascular disease, serving as both a marker of systemic atherosclerotic burden and a direct predictor of cerebrovascular events.11 The carotid arteries are particularly susceptible to OSA-related damage due to their anatomical location, exposure to fluctuating pressures during sleep, and high shear stress patterns.12 Studies have consistently demonstrated that OSA patients exhibit significantly increased carotid intima-media thickness, plaque prevalence, and plaque instability compared to matched controls.13
The elderly OSA population faces unique and compounded risks for CAS development. Age-related vascular changes, including arterial stiffening, endothelial senescence, and reduced nitric oxide bioavailability, create a substrate particularly vulnerable to OSA-induced damage.14 The prevalence of comorbidities such as hypertension, diabetes mellitus, and dyslipidemia in elderly OSA patients creates complex interaction networks that amplify cardiovascular risk beyond simple additive effects.15,16
Despite the established association between OSA and CAS, current risk stratification approaches remain inadequate for this population. Traditional cardiovascular risk calculators, such as the Framingham Risk Score and ASCVD Risk Calculator, were developed primarily for general populations and fail to capture OSA-specific pathophysiological mechanisms.17 These tools typically rely on conventional risk factors (age, sex, smoking, blood pressure, lipids) while overlooking critical OSA-related parameters such as hypoxemic burden, sleep fragmentation indices, and inflammatory markers.18
Diagnostic approaches for CAS detection also present significant limitations. While carotid ultrasonography remains the gold standard for CAS assessment, it is operator-dependent, costly for population screening, and may miss early-stage atherosclerotic changes.19 More advanced imaging modalities, including computed tomography angiography and magnetic resonance imaging, offer superior accuracy but are impractical for routine screening due to cost, radiation exposure, and accessibility constraints.20
Recent advances in machine learning and artificial intelligence have opened new paradigms for cardiovascular risk assessment, offering the potential to capture complex, non-linear relationships that traditional statistical approaches cannot adequately model. Several pioneering studies have demonstrated the potential of machine learning approaches for CAS prediction in general populations. Chen et al developed random forest models achieving area under the curve (AUC) values of 0.78–0.85 for carotid plaque prediction.21 Wu et al reported an XGBoost ensemble model with AUC of 0.864 for asymptomatic CAS detection.22 However, these studies predominantly focused on general populations and did not specifically address the unique pathophysiological profile of OSA patients.
The development of CAS prediction models specifically for OSA populations requires consideration of sleep-specific parameters that traditional cardiovascular models ignore. Polysomnographic metrics such as the apnea-hypopnea index (AHI) and percentage of time spent with oxygen saturation below 90% (T90) provide crucial insights into the severity and pattern of nocturnal hypoxemia.23
Despite the growing recognition of OSA as a major cardiovascular risk factor, several critical knowledge gaps persist in CAS risk assessment for elderly OSA patients: 1) lack of OSA-specific prediction models; 2) underutilization of OSA-specific parameters; 3) limited focus on elderly populations; 4) insufficient exploration of complex interactions. These limitations hinder clinicians’ ability to identify high-risk patients for targeted interventions.
This study addresses these limitations by developing the first machine learning-based CAS prediction model specifically designed for elderly OSA patients. Our approach integrates traditional cardiovascular risk factors with OSA-specific polysomnographic parameters, utilizing advanced ensemble learning algorithms capable of capturing complex feature interactions and non-linear relationships. Early identification of high-risk elderly OSA patients could enable targeted preventive interventions, optimize resource allocation for diagnostic imaging, and guide personalized treatment strategies.
Methods
The overall machine learning methodology pipeline is illustrated in Figure 1, which outlines the systematic approach from data collection through model development and validation.
Figure 1 Methodology Flowchart.
|
Study Design and Population
This was a retrospective, multicenter observational study conducted from January 2015 to October 2017. Patients diagnosed with OSA were continuously enrolled from six tertiary hospitals in China, including Chinese PLA General Hospital, Peking University International Hospital, and four other affiliated medical centers.
Inclusion criteria were: (1) age ≥60 years and (2) confirmed diagnosis of OSA by polysomnography (PSG). Exclusion criteria included: (1) chronic obstructive pulmonary disease and (2) malignant tumors.
OSA diagnosis and severity classification followed the American College of Physicians Clinical Practice Guidelines for the Diagnosis of Adult Obstructive Sleep Apnea. OSA was defined as AHI ≥5 events/hour with typical nocturnal symptoms (apnea episodes with snoring, daytime sleepiness) or AHI ≥15 events/hour with or without symptoms. OSA severity was classified based on AHI values: mild (5≤ AHI <15 events/hour), moderate (15≤ AHI <30 events/hour), and severe (AHI ≥30 events/hour).
PSG monitoring employed standardized protocols across all centers. All PSG equipment met American Academy of Sleep Medicine technical standards and simultaneously recorded AHI, oxygen saturation, and other key parameters. Data analysis utilized the Australian Compumedics sleep analysis system with standardized interpretation following the latest AASM scoring manual guidelines.
CAS diagnosis was determined by experienced ultrasonographers at each tertiary hospital based on standardized carotid ultrasound examinations. Diagnostic criteria for atherosclerotic plaque included: (1) carotid intima-media thickness (IMT) ≥1.5mm, or (2) focal IMT increase ≥50% compared to surrounding normal vessel wall thickness. CAS was diagnosed when atherosclerotic plaques were present in any of the common carotid artery, internal carotid artery, or external carotid artery. This study was approved by the Ethics Committee of the Chinese People’s Liberation Army General Hospital (S2019-352-01), and all participants provided written informed consent prior to their study enrolment. The study was conducted in accordance with the Declaration of Helsinki.
Data Collection and Variables
Clinical data were collected retrospectively from electronic medical records at each participating center using standardized data extraction protocols. A total of 19 potential predictive variables were systematically collected, including demographic characteristics, anthropometric measurements, lifestyle factors, OSA-related parameters from PSG studies, cardiovascular parameters, and laboratory findings.
Demographic and lifestyle variables included age, sex, body mass index (BMI), smoking history, and alcohol consumption status. OSA-related parameters derived from PSG studies included apnea-hypopnea index (AHI), mean oxygen saturation (MSpO2), and percentage of total sleep time with oxygen saturation below 90% (T90). Cardiovascular parameters comprised systolic blood pressure (SBP), diastolic blood pressure (DBP), and heart rate (HR) measured during routine clinical assessment. Laboratory parameters included fasting glucose (Glu), total cholesterol (TC), triglycerides (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), platelet count (Plt), white blood cell count (WBC), and red blood cell count (RBC).
Data quality assurance was maintained through standardized collection protocols across all participating centers and systematic verification of data completeness and consistency. All collected data were complete with no missing values for the final analytical dataset.
Data Preprocessing
Data preprocessing followed a systematic pipeline to ensure optimal model performance and reproducibility (Figure 1). All analyses were conducted using R statistical software with reproducibility ensured through fixed random seed settings (seed = 123).
Data standardization was performed using Z-score normalization to eliminate scale differences between variables. All continuous variables were standardized by subtracting the mean and dividing by the standard deviation (Z = (x – μ)/σ), where μ and σ were calculated from the training dataset. Categorical variables were maintained as binary encodings (0/1) without transformation.
Class imbalance was addressed using the ROSE (Random Over-Sampling Examples) technique to generate a balanced dataset for model training. This synthetic sampling approach creates new synthetic examples by smoothly interpolating between existing minority class samples, thereby improving model performance on imbalanced datasets while maintaining the underlying data distribution characteristics.
The dataset was randomly partitioned into training (80%) and testing (20%) sets using stratified sampling to maintain proportional representation of CAS cases in both subsets. The training set was used for model development, hyperparameter tuning, and cross-validation, while the testing set remained completely independent and was reserved solely for final model evaluation to assess generalizability.
All preprocessing steps were applied consistently across training and testing datasets, with normalization parameters derived exclusively from the training set and subsequently applied to the testing set to prevent data leakage and ensure unbiased performance evaluation.
Feature Selection
Feature selection was performed using LASSO (Least Absolute Shrinkage and Selection Operator) regression to identify the most predictive variables from the 19 potential features. LASSO regression applies L1 regularization to the absolute values of regression coefficients, automatically setting less important coefficients to zero and thereby achieving both feature selection and model sparsity.
The optimal regularization parameter (λ) was determined through cross-validation to minimize cross-validation error while achieving the desired level of feature sparsity. A range of lambda values was systematically evaluated, with the final lambda selected to balance model performance and complexity. The cross-validation procedure identified the optimal parameter that maintained predictive accuracy while reducing model complexity.
Selected features were required to have non-zero coefficients in the final LASSO model, ensuring that only variables with meaningful predictive contributions were retained. This systematic approach reduced the feature space to a focused subset of clinically relevant variables for subsequent machine learning model development.
Machine Learning Model Development
Four machine learning algorithms were implemented to develop CAS prediction models using the selected feature subset: logistic regression, random forest, XGBoost (eXtreme Gradient Boosting), and LightGBM (Light Gradient Boosting Machine). These algorithms were selected to represent different modeling approaches, including traditional statistical methods and advanced ensemble techniques capable of capturing complex nonlinear relationships.
Logistic regression served as the baseline model using generalized linear modeling with class weights to address potential imbalance issues. Random forest employed ensemble learning with multiple decision trees to improve prediction stability and reduce overfitting. XGBoost and LightGBM utilized gradient boosting frameworks with optimized implementations for enhanced performance and computational efficiency.
Hyperparameter optimization was performed for each algorithm using grid search and early stopping strategies to prevent overfitting. Key parameters were systematically tuned, including learning rates, tree depths, regularization parameters, and ensemble sizes. All models were trained on the preprocessed training dataset with consistent evaluation metrics to ensure fair comparison across algorithms.
Model training incorporated cross-validation procedures to assess performance stability and select optimal configurations. The training process utilized standardized preprocessing pipelines to ensure consistency in data handling and feature scaling across all modeling approaches.
Model Validation and Performance Assessment
Model validation employed 5-fold cross-validation to assess performance stability and generalizability across different data subsets. Each fold maintained proportional representation of outcome classes, with models trained on four folds and validated on the remaining fold. Cross-validation results were aggregated to provide robust estimates of model performance and variance.
Performance evaluation utilized multiple metrics including area under the ROC curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. Optimal decision thresholds were determined through F1-score maximization to balance sensitivity and specificity for clinical application. ROC curve analysis provided comprehensive assessment of discriminative ability across all possible threshold values.
Model calibration was evaluated using Hosmer-Lemeshow goodness-of-fit tests, calibration plots, and Brier scores to assess agreement between predicted probabilities and observed outcomes. Calibration improvement techniques including temperature scaling, Platt scaling, and isotonic regression were applied when necessary to enhance probability estimation accuracy.
Clinical utility was assessed through decision curve analysis (DCA) to evaluate net benefit across different threshold probabilities compared to treat-all and treat-none strategies. Final model evaluation was performed on the independent test set to provide unbiased estimates of real-world performance, with results reported using the optimal threshold determined from cross-validation.
Model Interpretability Analysis
Model interpretability was achieved through SHAP (SHapley Additive exPlanations) analysis to provide comprehensive understanding of feature contributions and model decision-making processes. SHAP values quantify the marginal contribution of each feature to individual predictions using game-theoretic principles, enabling both global and local model explanations.
Global interpretability analysis included feature importance ranking based on mean absolute SHAP values across all predictions, and summary plots showing the distribution of feature effects on model outputs. Dependence plots were generated to visualize the relationship between individual features and their SHAP values, with color coding by secondary features to reveal interaction effects.
Local interpretability focused on individual patient explanations through waterfall plots and force plots, demonstrating how specific feature values contributed to individual risk predictions. These visualizations showed the cumulative effect of positive and negative contributions from baseline probability to final prediction.
SHAP analysis encompassed multiple visualization types including beeswarm plots for feature effect distributions, bar plots for feature importance rankings, and interaction analysis to identify synergistic relationships between predictors. All SHAP calculations were performed on the optimal model to ensure clinical relevance and interpretability of the final prediction system.
Statistical Analysis
Baseline characteristics were compared between CAS and non-CAS groups using appropriate statistical tests based on variable types. Continuous variables were compared using independent sample t-tests and reported as mean ± standard deviation. Categorical variables were analyzed using chi-square tests and presented as frequencies and percentages.
Statistical significance was set at P < 0.05 for all analyses without adjustment for multiple comparisons, given the exploratory nature of the baseline characteristic analysis. Normality assumptions were assessed prior to parametric testing, with appropriate non-parametric alternatives considered when necessary.
All statistical analyses were performed using R statistical software (version 4.3) with relevant packages including dplyr for data manipulation, caret for machine learning implementation, glmnet for LASSO regression, and pROC for ROC analysis. Machine learning analyses utilized specialized packages including xgboost, randomForest, lightgbm, and shapviz for model interpretability.
Reproducibility was ensured through systematic documentation of all analytical procedures and consistent use of random seed settings across all computational steps. All analyses followed established guidelines for machine learning in medical research to ensure methodological rigor and clinical applicability.
Results
Study Population and Patient Characteristics
The study flow and patient selection process are illustrated in Figure 2. A total of 1196 elderly patients with OSA were included in this multicenter study after applying inclusion and exclusion criteria. Among these patients, 273 (22.8%) were diagnosed with carotid atherosclerosis (CAS) and 923 (77.2%) were classified as non-CAS. The study population was randomly divided into a training set (n=958, 80%) and a test set (n=238, 20%), maintaining proportional representation of CAS cases in both datasets (training set: 219 CAS cases, 739 non-CAS; test set: 54 CAS cases, 184 non-CAS).
![]() |
Figure 2 Study Population Selection and Flowchart.
|
Baseline characteristics revealed significant differences between CAS and non-CAS groups across multiple clinical domains (Table 1). Cardiovascular parameters demonstrated the most pronounced disparities: patients with CAS had significantly elevated systolic blood pressure (143.40±23.20 vs 133.77±16.50 mmHg, P<0.001) and diastolic blood pressure (77.92±11.29 vs 75.98±9.13 mmHg, P=0.010). OSA-related parameters showed that CAS patients experienced more severe nocturnal hypoxemia, characterized by significantly prolonged time with oxygen saturation below 90% (T90: 19.35±23.41% vs 10.34±16.69%, P<0.001) and reduced mean oxygen saturation (MSpO2: 91.53±4.04% vs 93.26±2.96%, P<0.001). Notably, the apnea-hypopnea index (AHI) was slightly but significantly lower in the CAS group (30.04±20.25 vs 32.90±20.67 events/hour, P=0.041).
![]() |
Table 1 Patient Baseline Characteristics by CAS Status
|
Demographic and lifestyle factors also differed significantly between groups. CAS patients were older (69.52±6.97 vs 66.59±6.26 years, P<0.001), had higher smoking prevalence (30.8% vs 18.5%, P<0.001), and increased alcohol consumption (18.7% vs 8.7%, P<0.001). Laboratory parameters revealed that CAS patients had significantly lower platelet counts (193.67±60.39 vs 211.98±51.79 ×109/L, P<0.001), reduced white blood cell counts (6.26±1.87 vs 6.52±1.91 ×109/L, P=0.042), and decreased HDL cholesterol levels (1.10±0.33 vs 1.18±0.40 mmol/L, P=0.004).
Feature Selection Results
LASSO (Least Absolute Shrinkage and Selection Operator) regression was employed to identify the most predictive features from 18 potential variables through a systematic feature selection process (Figure 3). The cross-validation procedure determined the optimal lambda parameter that balanced model complexity and predictive performance while achieving sparsity in feature coefficients.
![]() |
Figure 3 Feature Selection Result. LASSO regression feature selection results for CAS prediction model. (A) Horizontal bar chart showing selected features and their regression coefficients, with blue bars indicating positive associations and Orange bars indicating negative associations with CAS risk. Features are ordered by absolute coefficient magnitude. (B) Cross-validation error curve plotting binomial deviance (y-axis) against log-transformed lambda values (x-axis), with gray error bars representing standard deviations across folds. Numbers above the plot indicate the number of features retained at each lambda value. Vertical dashed lines mark key lambda thresholds: lambda.min (blue), lambda.1se (green), and the selected lambda for 8 features (red).
|
The feature selection process successfully identified 8 optimal predictive features that demonstrated the strongest associations with CAS risk (Figure 3A). These features, ranked by their absolute regression coefficients, included: systolic blood pressure (SBP) as the most influential predictor, followed by time spent with oxygen saturation below 90% (T90), alcohol consumption, mean oxygen saturation (MSpO2), body mass index (BMI), age, platelet count (Plt), and smoking status.
The cross-validation error curve (Figure 3B) demonstrated the relationship between lambda values and model performance, with the optimal lambda selected to achieve the target number of features while minimizing binomial deviance. The selected features represented diverse clinical domains: cardiovascular parameters (SBP), OSA-specific metrics (T90, MSpO2), demographic factors (age, BMI), lifestyle factors (alcohol, smoking), and hematological markers (platelet count).
These 8 features showed both positive and negative associations with CAS risk. SBP, T90, alcohol consumption, age, and smoking demonstrated positive coefficients, indicating increased CAS risk with higher values, while MSpO2, BMI, and platelet count showed negative coefficients, suggesting protective or inverse associations. This feature subset effectively captured the complex multifactorial nature of CAS development in elderly OSA patients, incorporating traditional cardiovascular risk factors alongside OSA-specific pathophysiological markers.
Model Performance Comparison
Four machine learning algorithms were systematically evaluated and compared for their ability to predict CAS in elderly OSA patients using the selected 8-feature set (Table 2 and Figure 4). The models demonstrated varying levels of performance across multiple evaluation metrics, with clear distinctions in their predictive capabilities and generalization performance.
![]() |
Table 2 Machine Learning Model Performance Comparison
|
![]() |
Figure 4 Model Performance Comprehensive Comparison. Comprehensive performance comparison of four machine learning models for CAS prediction. (A) ROC curves for test set performance, with each colored line representing one model and AUC values displayed in the legend. (B) ROC curves for training set performance of the same four models. (C) Calibration curves comparing predicted probabilities (x-axis) with observed probabilities (y-axis) for all models, with Brier scores shown in the legend and the diagonal dashed line representing perfect calibration. (D) Confusion matrix for XGBoost model on test set, displaying true/false positives and negatives with both absolute counts and percentages. Color intensity indicates the magnitude of each cell value.
|
XGBoost emerged as the best-performing model with superior performance across most evaluation metrics. On the test set, XGBoost achieved an AUC of 0.854, accuracy of 0.798, sensitivity of 0.812, specificity of 0.785, positive predictive value (PPV) of 0.785, negative predictive value (NPV) of 0.812, and F1-score of 0.798. Random Forest demonstrated the second-best performance with a test set AUC of 0.851, though with slightly lower balanced performance metrics. LightGBM showed comparable performance to XGBoost with a test set AUC of 0.846, while Logistic Regression exhibited the most modest performance with a test set AUC of 0.722.
ROC curve analysis (Figure 4A and B) visually confirmed the superior discriminative ability of tree-based ensemble methods compared to traditional logistic regression. The test set ROC curves (Figure 4A) demonstrated that XGBoost, Random Forest, and LightGBM achieved similar and substantially higher AUCs than logistic regression, with XGBoost showing the most favorable sensitivity-specificity trade-off across different threshold values.
Overfitting assessment revealed acceptable generalization performance across all models. XGBoost demonstrated an AUC gap of 0.044 between training (0.897) and test (0.854) performance, indicating well-controlled overfitting despite its complexity. Random Forest showed a larger AUC gap of 0.054, while LightGBM and Logistic Regression demonstrated gaps of 0.050 and 0.020, respectively. The relatively modest overfitting in XGBoost, combined with its superior test set performance, supported its selection as the optimal model.
Model calibration analysis (Figure 4C) revealed that XGBoost achieved excellent calibration with a Brier score of 0.162, indicating excellent agreement between predicted probabilities and observed outcomes. The confusion matrix for XGBoost (Figure 4D) on the test set using the optimal threshold of 0.55 demonstrated well-balanced performance with 44 true positives, 40 false positives, 144 true negatives, and 10 false negatives, corresponding to sensitivity of 81.5% and specificity of 78.3%.
Optimal Model Validation and Clinical Utility
Comprehensive validation of the XGBoost model was conducted through rigorous cross-validation and multiple analytical approaches to ensure robustness and clinical applicability (Figure 5). The validation process encompassed performance stability assessment, learning curve analysis, calibration improvement, and clinical decision utility evaluation.
![]() |
Figure 5 XGBoost Model Validation and Clinical Utility. Comprehensive validation of XGBoost model performance across multiple evaluation metrics. (A) ROC curves for training sets across 5-fold cross-validation, with each colored line representing one fold and AUC values displayed in the legend. (B) ROC curves for validation sets across 5-fold cross-validation, showing model performance on held-out data for each fold. (C) ROC curve for the final model evaluated on the independent test set. (D) Learning curves plotting AUC performance against training sample size, with red line showing training performance and blue line showing validation performance. (E) Calibration curve comparing predicted probabilities (x-axis) with observed probabilities (y-axis), with the diagonal dashed line representing perfect calibration. Brier Score and Hosmer-Lemeshow p-value are displayed. (F) Decision curve analysis showing net benefit (y-axis) across different threshold probabilities (x-axis) for three strategies: treat all patients (blue), treat none (green), and model-guided decisions (red).
|
Five-fold cross-validation demonstrated consistent and stable performance across different data subsets (Figure 5A and B). The training set ROC curves (Figure 5A) showed high performance with AUCs ranging from 0.885 to 0.921 across the five folds, while validation set performance (Figure 5B) maintained good consistency with AUCs between 0.743 and 0.824. The final model achieved a test set AUC of 0.854 (Figure 5C), confirming excellent discriminative ability on completely independent data. The average validation AUC of 0.784 ± 0.033 across folds demonstrated acceptable variance, indicating robust model stability.
Learning curve analysis (Figure 5D) revealed that model performance improved steadily with increasing sample size and reached a plateau, suggesting that the current dataset size was adequate for model training. The convergence of training and validation curves indicated well-controlled overfitting, with the final gap between training and validation performance remaining within acceptable limits. This analysis confirmed that the model would benefit from additional data but performed optimally given the current sample size.
Model calibration was significantly improved through temperature scaling, addressing the initial calibration concerns (Figure 5E). The post-calibration analysis showed excellent agreement between predicted probabilities and observed outcomes, with a Hosmer-Lemeshow p-value of 0.089 and improved Brier score. The calibration curve demonstrated near-perfect alignment with the ideal diagonal line, indicating that predicted probabilities accurately reflected true CAS risk.
Decision curve analysis revealed substantial clinical utility across a wide range of threshold probabilities (Figure 5F). The XGBoost model demonstrated superior net benefit compared to “treat all” and “treat none” strategies for threshold probabilities below 65%. At the optimal threshold of 0.55, the model provided a net benefit of 0.162, indicating significant clinical value for risk stratification. The analysis suggested that using this model for screening would result in better clinical outcomes than universal screening or no screening approaches, particularly for patients with moderate risk profiles.
The comprehensive validation results confirmed that the XGBoost model achieved optimal balance between sensitivity and specificity while maintaining excellent calibration and demonstrating clear clinical utility for CAS risk assessment in elderly OSA patients.
Model Interpretability and Clinical Insights
SHAP (SHapley Additive exPlanations) analysis was employed to provide comprehensive model interpretability and elucidate the clinical mechanisms underlying CAS prediction in elderly OSA patients (Table 3 and Figure 6). This analysis revealed not only the relative importance of individual features but also their complex interactions and biological pathways contributing to CAS risk.
![]() |
Table 3 XGBoost Feature Importance and Clinical Interpretation
|
![]() |
Figure 6 SHAP Model Interpretability Analysis. SHAP interpretability analysis of the XGBoost model. (A) Bar plot showing feature importance ranked by mean absolute SHAP values. (B) Beeswarm plot where each point represents one patient, with horizontal position indicating SHAP value and color representing feature value magnitude (purple=low, Orange=high). (C) Dependence plot showing the relationship between SBP values (x-axis) and corresponding SHAP values (y-axis), with points colored by T90 values. (D) Horizontal bar chart displaying individual SHAP contributions for one representative patient, with blue bars indicating positive contributions and red bars indicating negative contributions. (E) Force plot showing how individual feature values push the prediction above or below the baseline. (F) Waterfall plot illustrating the cumulative effect of SHAP values from baseline expectation to final prediction probability.
|
Feature importance analysis identified systolic blood pressure (SBP) as the most influential predictor with a SHAP importance of 0.3148, confirming its central role in CAS pathogenesis (Figure 6A and Table 3). T90 emerged as the second most important feature (SHAP importance: 0.2660), highlighting the critical role of nocturnal hypoxemia severity over traditional OSA metrics like AHI. Alcohol consumption ranked third (SHAP importance: 0.2187), suggesting multifactorial mechanisms involving vascular inflammation and metabolic dysfunction. The complete ranking included mean oxygen saturation (MSpO2: 0.1725), BMI (0.1616), age (0.1553), platelet count (0.0872), and smoking status (0.0286).
SHAP dependence plots revealed complex nonlinear relationships between key predictors and CAS risk (Figure 6C). The SBP dependence analysis demonstrated that CAS risk increased exponentially above 140 mmHg, with significant interactions with T90 severity. Beeswarm plots (Figure 6B) illustrated the distribution of feature effects across the patient population, showing that higher SBP values (orange points) consistently drove predictions toward higher CAS risk, while lower values (purple points) were protective.
Individual patient explanations (Figure 6D–F) demonstrated the model’s ability to provide personalized risk assessments. Waterfall plots (Figure 6F) showed how each feature contributed to moving individual predictions away from the baseline risk, while force plots (Figure 6E) visualized the balance between risk-increasing and risk-reducing factors for specific patients. These visualizations confirmed that the model integrates multiple clinical domains—cardiovascular (SBP), OSA-specific (T90, MSpO2), demographic (age, BMI), lifestyle (alcohol, smoking), and hematological (platelet count)—to generate individualized risk estimates.
The SHAP analysis revealed distinct pathophysiological pathways underlying CAS development in elderly OSA patients (Table 3). Cardiovascular mechanisms were primarily driven by hypertension-induced endothelial dysfunction, while OSA-related pathways involved hypoxia-mediated oxidative stress and vascular inflammation. Risk factor interactions suggested that intermittent hypoxia (T90) amplified the harmful effects of elevated blood pressure, creating synergistic risk profiles in the most vulnerable patients.
Clinical insights from the interpretability analysis indicated that CAS risk prediction in elderly OSA patients requires consideration of both traditional cardiovascular risk factors and OSA-specific pathophysiological markers. The prominence of T90 over AHI suggests that hypoxemia severity may be more clinically relevant than apnea frequency for cardiovascular risk stratification. Furthermore, the model’s ability to capture complex feature interactions provides a more nuanced understanding of CAS risk than traditional linear approaches, supporting its potential utility for personalized cardiovascular risk assessment in sleep medicine practice.
Discussion
This study represents the first machine learning-based prediction model specifically developed for carotid atherosclerosis (CAS) risk assessment in elderly patients with obstructive sleep apnea (OSA). Our XGBoost model achieved discriminative performance with an AUC of 0.854, which is comparable to recent machine learning applications in CAS prediction. The prevalence of CAS in our elderly OSA cohort (22.8%) underscores the clinical importance of systematic risk stratification in this vulnerable population, particularly given that many patients remain asymptomatic until severe cardiovascular events occur.
Sample representativeness in our study presents both strengths and limitations. Our multicenter design encompassing six tertiary hospitals across different Chinese regions, including Beijing and Gansu, provides reasonable geographic coverage with patients referred from nationwide. The demographic profile of our elderly OSA cohort (mean age 66.59–69.52 years, ~60% male) aligns with reported characteristics of Chinese elderly OSA populations. However, the tertiary care setting may introduce referral bias toward more severe cases with higher comorbidity burden, potentially explaining the elevated CAS prevalence (22.8%) compared to community-based elderly OSA patients. Additionally, our sample predominantly consists of Han Chinese individuals, which may limit generalizability to other ethnic populations with different genetic and lifestyle risk profiles.
Recent machine learning applications for CAS prediction have demonstrated varying but generally promising results across different populations. Machine learning models for CAS early screening using health check-up data achieved AUCs of 0.860 in internal validation and 0.851 in external validation with GBDT algorithms.24 Excellent discrimination has been reported when applying machine learning approaches to identify carotid subclinical atherosclerosis endotypes, with C-statistics ranging from 0.75 to 0.85 depending on the endpoint.21 An XGBoost ensemble model for carotid plaque prediction in asymptomatic populations achieved AUC of 0.864 with sensitivity of 86.8% and specificity of 85.9%.22 AUCs of 0.755–0.759 across development and validation datasets were obtained in a risk prediction model for carotid atherosclerosis progression in early middle-age adults.25 Our model’s performance falls within this range of published studies, though the direct comparison is limited by differences in study populations, particularly our focus on elderly OSA patients rather than general populations.
The prominence of systolic blood pressure (SBP) as our strongest predictor (SHAP importance: 0.3148) confirms its central role in CAS pathogenesis and aligns with established cardiovascular epidemiology. A systematic assessment of polysomnographic indices with blood pressure in the Multi-Ethnic Study of Atherosclerosis revealed that long-term hypertension leads to vascular wall damage and endothelial dysfunction, promoting atherosclerotic plaque formation.26 The importance of discriminating between systolic/diastolic hypertension and isolated systolic hypertension in elderly populations has been emphasized, with particular significance noted for elevated systolic pressure in older adults.27 The interaction between blood pressure and nocturnal hypoxemia revealed in our SHAP dependence plots suggests that patients with both elevated blood pressure and severe oxygen desaturation face exponentially increased CAS risk, supporting a synergistic rather than additive relationship.
The emergence of T90 as the second most important predictor (SHAP importance: 0.2660) represents a significant finding that challenges the traditional reliance on AHI for cardiovascular risk assessment in OSA patients. Direct evidence has shown that the severity of oxygen desaturation is predictive of carotid wall thickening and plaque occurrence, with T90 serving as a more comprehensive indicator of tissue hypoxia severity and duration compared to traditional AHI.13 Recent studies have further supported this finding, with nocturnal hypoxemic burden, particularly T90, being associated with incident major adverse cardiovascular events in patients with type 2 diabetes, while AHI showed no significant association.28 Independent associations have been found between nocturnal hypoxemia parameters, including minimum oxygen saturation ≤90% and T90 ≥5%, and coronary microvascular dysfunction, whereas no significant association was observed with AHI.29 These findings suggest that the pathophysiological impact of OSA on cardiovascular disease may be better captured by hypoxemia metrics than by respiratory event frequency.
The inclusion of alcohol consumption as the third most important predictor (SHAP importance: 0.2187) reflects the complex multifactorial mechanisms underlying CAS development in elderly OSA patients. While specific literature on alcohol-CAS relationships in OSA populations is limited, alcohol’s effects encompass vascular endothelial function modification, inflammatory response activation, and alterations in sleep architecture and upper airway muscle tone. These mechanisms may synergistically interact with OSA-related pathophysiology to accelerate atherosclerotic progression in elderly patients.
Age emerged as a significant predictor despite our study population being exclusively elderly (≥60 years), indicating that even within this age-restricted cohort, advancing age continues to confer incremental CAS risk. According to a national study of Chinese adults, age represents an independent risk factor for carotid atherosclerosis across all age groups.30 The mechanisms of dysfunction in the aging vasculature have been well-characterized, showing that aging leads to reduced vascular elasticity and endothelial dysfunction, which may synergistically interact with OSA-related vascular damage to accelerate atherosclerotic progression.14
The significance of hematological markers, particularly platelet count (SHAP importance: 0.0872), reflects OSA’s systemic impact on hemostasis and inflammation. Research has established that OSA-related intermittent hypoxia stimulates sympathetic nervous system activation, promoting platelet activity and aggregation, while chronic hypoxia activates circulating leukocytes that contribute to vascular inflammation through cytokine release and endothelial adhesion.31 These findings provide biological plausibility for the observed associations and suggest that OSA’s cardiovascular risk extends beyond traditional risk factors to encompass prothrombotic and proinflammatory pathways.
Several important limitations must be acknowledged. Our study’s cross-sectional design precludes assessment of temporal relationships between OSA parameters and CAS development, limiting causal inference. The sample derives primarily from Chinese tertiary medical centers, which may limit generalizability to other ethnic groups, healthcare systems, and community-based populations. Elderly OSA patients in tertiary care settings may represent more severe cases with higher comorbidity burden compared to community populations, potentially affecting the observed associations. The age restriction to patients ≥60 years limits applicability to younger OSA populations who may exhibit different risk factor profiles and atherosclerotic progression patterns.
Our findings have several clinical implications. The three-tier risk stratification approach suggested by our decision curve analysis could facilitate clinical decision-making: high-risk patients (predicted probability >55%) warrant immediate carotid ultrasound and aggressive cardiovascular risk factor modification; intermediate-risk patients (25–55%) should receive carotid ultrasound within six months and standard cardiovascular management; and low-risk patients (<25%) can be managed with routine follow-up focused on OSA therapy adherence. The prominence of T90 over AHI supports optimizing OSA therapy based on oxygenation improvement rather than solely reducing apnea frequency. Until external validation is completed, we recommend implementing this model as a decision-support tool alongside clinical judgment, with periodic monitoring of prediction accuracy and local calibration adjustments as needed.
Future research priorities include external validation in international cohorts representing different ethnic backgrounds and healthcare systems. To address this limitation, we are planning to collect data from additional elderly OSA patients across multiple centers for external validation of our model. Longitudinal studies to establish temporal relationships and validate prognostic value are also essential next steps. The integration of additional biomarkers, genetic risk scores, or novel cardiovascular markers may further improve prediction accuracy. Given the aging global population and increasing OSA prevalence, developing specialized approaches for elderly OSA patients represents an important area for continued investigation, as highlighted in recent comprehensive reviews.32
In conclusion, this study establishes a foundation for precision medicine approaches in sleep-related cardiovascular risk assessment for elderly OSA patients. The integration of OSA-specific parameters with traditional cardiovascular risk factors offers a more comprehensive understanding of atherosclerotic risk and supports the development of personalized treatment strategies. While external validation and real-world implementation studies are essential next steps, our work provides evidence for the feasibility and potential impact of machine learning-enhanced cardiovascular risk assessment in this high-risk population.
Data Sharing Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Author Contributions
Conceptualization, W.C., L.L., and D.R.; Methodology, W.C., D.R., and L.L.; Software, W.C. and X.X.; Validation, W.C., L.L., and L.Z.; Formal analysis, W.C., D.R., and L.Z.; Investigation, W.C., T.L., Z.Z., T.N., and Y.M.; Data curation, W.C., Y.M., T.L., and T.N.; Writing – original draft preparation, W.C.; Writing – review and editing, W.C., D.R., L.Z., X.X., T.L., Y.M., Z.Z., T.N., Y.G., and L.L.; Visualization, W.C. and X.X.; Resources, Y.G.and L.L.
All authors have read and agreed to the published version of the paper.
All authors agreed on the journal to which the article will be submitted and agree to take responsibility and be accountable for the contents of the article.
Funding
No funding was provided for this study.
Disclosure
The authors have no competing or conflicting interests to declare.
References
1. Benjafield AV, Ayas NT, Eastwood PR, et al. Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med. 2019;7(8):687–698. doi:10.1016/s2213-2600(19)30198-5
2. Senaratna CV, Perret JL, Lodge CJ, et al. Prevalence of obstructive sleep apnea in the general population: a systematic review. Sleep Med Rev. 2017;34:70–81. doi:10.1016/j.smrv.2016.07.002
3. Lévy P, Kohler M, McNicholas WT, et al. Obstructive sleep apnoea syndrome. Nat Rev Dis Primers. 2015;1:15015. doi:10.1038/nrdp.2015.15
4. Yeghiazarians Y, Jneid H, Tietjens JR, et al. Obstructive sleep apnea and cardiovascular disease: a scientific statement from the American heart association. Circulation. 2021;144(3):e56–e67. doi:10.1161/CIR.0000000000000988
5. Lavie L. Oxidative stress in obstructive sleep apnea and intermittent hypoxia–revisited–the bad ugly and good: implications to the heart and brain. Sleep Med Rev. 2015;20:27–45. doi:10.1016/j.smrv.2014.07.003
6. Ryan S, Taylor CT, McNicholas WT. Selective activation of inflammatory pathways by intermittent hypoxia in obstructive sleep apnea syndrome. Circulation. 2005;112(17):2660–2667. doi:10.1161/circulationaha.105.556746
7. Somers VK, Dyken ME, Clary MP, Abboud FM. Sympathetic neural mechanisms in obstructive sleep apnea. J Clin Invest. 1995;96(4):1897–1904. doi:10.1172/jci118235
8. Eisensehr I, Ehrenberg BL, Noachtar S, et al. Platelet activation, epinephrine, and blood pressure in obstructive sleep apnea syndrome. Neurology. 1998;51(1):188–195. doi:10.1212/wnl.51.1.188
9. Kovbasyuk Z, Ramos-Cejudo J, Parekh A, et al. Obstructive sleep apnea, platelet aggregation, and cardiovascular risk. J Am Heart Assoc. 2024;13(15):e034079. doi:10.1161/jaha.123.034079
10. Drager LF, Togeiro SM, Polotsky VY, Lorenzi-Filho G. Obstructive sleep apnea: a cardiometabolic risk in obesity and the metabolic syndrome. J Am Coll Cardiol. 2013;62(7):569–576. doi:10.1016/j.jacc.2013.05.045
11. Ji P, Kou Q, Zhang J. Study on relationship between carotid intima-media thickness and inflammatory factors in obstructive sleep apnea. Nat Sci Sleep. 2022;14:2179–2187. doi:10.2147/nss.S389253
12. Somuncu MU, Karakurt ST, Karakurt H, Serbest NG, Cetin MS, Bulut U. The additive effects of OSA and nondipping status on early markers of subclinical atherosclerosis in normotensive patients: a cross-sectional study. Hypertens Res. 2019;42(2):195–203. doi:10.1038/s41440-018-0143-0
13. Baguet JP, Hammer L, Lévy P, et al. The severity of oxygen desaturation is predictive of carotid wall thickening and plaque occurrence. Chest. 2005;128(5):3407–3412. doi:10.1378/chest.128.5.3407
14. Donato AJ, Machin DR, Lesniewski LA. Mechanisms of dysfunction in the aging vasculature and role in age-related disease. Circ Res. 2018;123(7):825–848. doi:10.1161/circresaha.118.312563
15. Kwon Y, Tzeng WS, Seo J, et al. Obstructive sleep apnea and hypertension; critical overview. Clin Hypertension. 2024;30(1):19. doi:10.1186/s40885-024-00276-7
16. Adderley NJ, Subramanian A, Toulis K, et al. Obstructive sleep apnea, a risk factor for cardiovascular and microvascular disease in patients with type 2 diabetes: findings from a population-based cohort study. Diabetes Care. 2020;43(8):1868–1877. doi:10.2337/dc19-2116
17. D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008;117(6):743–753. doi:10.1161/circulationaha.107.699579
18. Goff DC, Lloyd-Jones DM, Bennett G, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American college of cardiology/American heart association task force on practice guidelines. J Am Coll Cardiol. 2014;63(25 Pt B):2935–2959. doi:10.1016/j.jacc.2013.11.005
19. Nicolaides AN, Panayiotou AG, Griffin M, et al. Arterial ultrasound testing to predict atherosclerotic cardiovascular events. J Am Coll Cardiol. 2022;79(20):1969–1982. doi:10.1016/j.jacc.2022.03.352
20. Lee JH, Kang EJ, Bae WY, et al. Carotid arterial calcium scoring using upper airway computed tomography in patients with obstructive sleep apnea: efficacy as a clinical predictor of cerebrocardiovascular disease. Korean J Radiol. 2019;20(4):631–640. doi:10.3348/kjr.2018.0550
21. Chen QS, Bergman O, Ziegler L, et al. A machine learning based approach to identify carotid subclinical atherosclerosis endotypes. Cardiovasc Res. 2023;119(16):2594–2606. doi:10.1093/cvr/cvad106
22. Wu D, Cui G, Huang X, et al. An accurate and explainable ensemble learning method for carotid plaque prediction in an asymptomatic population. Comput Methods Programs Biomed. 2022;221:106842. doi:10.1016/j.cmpb.2022.106842
23. Berry RB, Budhiraja R, Gottlieb DJ, et al. Rules for scoring respiratory events in sleep: update of the 2007 AASM manual for the scoring of sleep and associated events. deliberations of the sleep apnea definitions task force of the American academy of sleep medicine. J Clin Sleep Med. 2012;8(5):597–619. doi:10.5664/jcsm.2172
24. Yun K, He T, Zhen S, et al. Development and validation of explainable machine-learning models for carotid atherosclerosis early screening. J Transl Med. 2023;21(1):353. doi:10.1186/s12967-023-04093-8
25. Sun X, Wu C, Kang J, Lv H, Liu X. Development and validation of a risk prediction model for short-term progression of carotid atherosclerosis among early middle age adults. Atherosclerosis. 2024;397:118557. doi:10.1016/j.atherosclerosis.2024.118557
26. Dean DA, Wang R, Jacobs DR, et al. A systematic assessment of the association of polysomnographic indices with blood pressure: the multi-ethnic study of atherosclerosis (Mesa). Sleep. 2015;38(4):587–596. doi:10.5665/sleep.4576
27. Haas DC, Foster GL, Nieto FJ, et al. Age-dependent associations between sleep-disordered breathing and hypertension: importance of discriminating between systolic/diastolic hypertension and isolated systolic hypertension in the sleep heart health study. Circulation. 2005;111(5):614–621. doi:10.1161/01.Cir.0000154540.62381.Cf
28. Driendl S, Baumert M, Arzt M, et al. Nocturnal hypoxemic burden is associated with incident major adverse cardiovascular events in patients with type 2 diabetes. Eur J Prev Cardiol. 2025. doi:10.1093/eurjpc/zwaf259
29. Feng L, Zhao X, Song J, et al. Association between nocturnal hypoxemia parameters and coronary microvascular dysfunction: a cross-sectional study. Nat Sci Sleep. 2024;16:2279–2288. doi:10.2147/nss.S494018
30. Fu J, Deng Y, Ma Y, et al. National and provincial-level prevalence and risk factors of carotid atherosclerosis in Chinese adults. JAMA Network Open. 2024;7(1):e2351225. doi:10.1001/jamanetworkopen.2023.51225
31. Kim K, Li J, Tseng A, Andrews RK, Cho J. NOX2 is critical for heterotypic neutrophil-platelet interactions during vascular inflammation. Blood. 2015;126(16):1952–1964. doi:10.1182/blood-2014-10-605261
32. Osorio RS, Martínez-García M, Rapoport DM. Sleep apnoea in the elderly: a great challenge for the future. Eur Respir J. 2022;59(4):2101649. doi:10.1183/13993003.01649-2021