Patient characteristics
The baseline data of all 204 patients and their tumors are summarized in Table 1. The median follow-up times for DFS were 899.5 days in the training cohort (interquartile range [IQR]: 385.0–1257.0 days) and 950.4 days in the validation cohort (IQR: 499.0–1304.5 days).
Sixty-five of the 204 patients experienced disease recurrence, 33 (50.8%) of whom experienced systemic disease recurrence (8 in the lung, 20 in the liver, 2 in the bone and 3 in both the liver and lung), 21 (32.3%) of whom experienced locoregional disease recurrence, and 11 (16.9%) of whom experienced mixed disease. Among them, 20 patients were confirmed by surgery, while the other 45 patients were diagnosed based on radiological characteristics. 137 (67.2%) patients were treated with postoperative adjuvant fluorouracil-based chemotherapy.
Overall, the average CD34-based MVD of all the lesions was 40.19 ± 6.89; for CD105-based MVD, it was 28.25 ± 5.50.
2D- vs. 3D-ROI interobserver agreement
Among the two ROI methods, the 3D-ROI method had the best interobserver agreement (ICC of 0.826–0.960) (Table 2). The Bland-Altman analysis showed that all the imaging features measured by the 3D-ROI method were more concentrated than those measured by the 2D-ROI method, indicating that the 3D-ROI analysis had a smaller consistency interval and better accuracy in repeated measurements by different readers (Fig. 2). Therefore, the average values of 3D-quantitative imaging features calculated by the two radiologists were used for further analysis.
Predictive factors for DFS
In the univariate analyses of DFS, clinicopathological parameters (histologic grade, pT stage, pN stage, CEA, HIF-1α, LVI, and PNI), SDCT features (NICVP3D and NICDP3D values) and angiogenesis parameters (CD34, CD105, and VEGF) were associated with DFS. According to the multivariate analysis, clinicopathological parameters (PNI, histologic grade), SDCT features (NICVP3D values) and angiogenesis parameters (CD105) were found to be independent predictors in the training cohort (P < 0.05, Table 3).
Bland-Altman analysis between the two observers using two different ROI methods. (a) ICVP3D. (b) ICDP3D. (c) NICVP3D. (d) NICDP3D. (e) ICVP2D. (f) ICVP2D. (g) NICVP2D. (h) NICDP2D.
Model construction and comparison
A multidimensional radiological-angiogenesis-clinicopathological integrated model (RACIM) was established based on the above prediction variables (PNI, CD105, histologic grade and NICVP3D values), which predicted the probability of disease recurrence for each individual patient. Multivariate analysis was used to construct a clinical model that included histologic grade, HIF-1α, LVI and PNI; an angiogenesis model that included CD105; and a radiological model that included NICVP3D values. The receiver-operating characteristic (ROC) curves of the different models for the entire cohort are shown in Fig. 3. The ROC curves revealed that the radiological model (NICVP3D) had an AUC of 0.85 (95% CI, 0.78–0.91), a sensitivity of 78.4%, and a specificity of 79.3%. According to the X-tile, the optimal cut-off value of the NICVP3D was identified as 0.345. The combined model achieved excellent predictive performance, with AUCs of 0.95 (95% CI, 0.92–0.98) and 0.93 (95% CI, 0.85-1.00) in the training and validation cohorts, respectively (Table 4). The AUC of the combined model was obviously greater than that of the radiological (P = 0.0004, P = 0.0393), angiogenesis (P < 0.0001, P = 0.0091) and clinical models (P = 0.0471, P = 0.0088) in all cohorts.

Receiver operating characteristic curve (ROC) analysis for the prediction models in the training (a) and validation cohorts (b).
Additionally, a VN model with pathological stage, surgical procedure, and adjuvant chemotherapy status was also built for comparison. Compared with the VN model (AUC: 0.77, 95% CI: 0.70–0.85; AUC: 0.76, 95% CI: 0.59–0.92; AUC: 0.72, 95% CI: 0.59–0.85), our radiological and RACIM models exhibited superior performance in the training (AUC: 0.85, 95% CI: 0.78–0.91, P = 0.0160; AUC: 0.95, 95% CI: 0.92–0.98, P < 0.0001) and validation (AUC: 0.83, 95% CI: 0.73–0.93, P = 0.4217; AUC: 0.93, 95% CI: 0.85-1.00, P = 0.0428) (Table 4). Moreover, the calibration plots of RACIM model showed that the estimations had good agreement with the actual observations (Fig. 4a,b). The decision curve analysis curves revealed that the RACIM model achieved moderately better net benefit than other models over the relevant threshold range in all cohorts (Fig. 4c, d).

Calibration curves and decision curves of different models. (a) Calibration curves in training cohort. (b) Calibration curves in validation cohort. (c) Decision curves in training cohort. (d) Decision curves in validation cohort.
Patient risk stratification
We divided patients into high- and low-risk groups according to the X-tile-generated optimum cutoff value (0.389) of the RACIM, which significantly differed in terms of DFS in the training cohort (log-rank test, P < 0.001) (Fig. 5a). Then, we performed the same analyses to stratify patients in the validation cohort to determine the prognostic value of the RACIM. Consistent with the training cohort, significant differences in DFS were observed between the two groups in validation cohort (log-rank test, P = 0.001) (Fig. 5b). Table 5 showed the selected prediction parameters in RACIM-classified high and low-risk groups.
To test the ability of the RACIM to identify patients who may benefit from postoperative adjuvant chemotherapy, subgroup analyses of patients receiving adjuvant chemotherapy were further performed. Notably, in the RACIM-classified high-risk group, postoperative adjuvant chemotherapy was significantly associated with a treatment benefit (P = 0.036), while adjuvant chemotherapy did not improve survival in any of the 204 patients (P = 0.400) or in patients with any high-risk clinicopathological features (P = 0.400, Fig. 6).

The Kaplan Meier survival analysis curve stratified the prognosis of patients according to the RACIM-based classifier. (a) Training cohort. (b) Validation cohort.

Effect of postoperative adjuvant chemotherapy in different subgroups, which were stratified by the receipt of chemotherapy. (a) All cases group. (b) RACIM-classified high risk group. (c) Any high-risk clinicopathological features group.
Model interpretability with SHAP
In this study, we employed the SHAP algorithm to endow our RACIM with global and local interpretability. As observed in the plot, the SDCT imaging indicator NICVP3D was the most important risk factor, followed by CD105, PNI, and histologic grade (Fig. 7a,b).
Figure 7c,d shows the SHAP model force plot of two male participants who had TNM stage IIIB disease, depicting how NICVP3D, CD105 and clinicopathological characteristics affect the ability of the model to predict recurrence risk at the individual level.

Model interpretability of the RACIM for the prediction of disease-free survival (DFS) with SHAP in the training cohort. (a) Feature importance plot listing the most significant variables in descending order. (b) Summary plot of the impact of features on model decision-making and the interactions between features in the model. SHAP force plots of two participants with high (c) and low (d) risk of DFS. Yellow dots represent higher eigenvalues and purple dots represent lower eigenvalues.