Development and validation of a predictive model for tuberculous pleural effusion with high adenosine deaminase

Participants

During the study period, 397 patients with PE and ADA levels > 25 IU/L were identified. Among these patients, 81 were excluded due to multiple PE samplings, and 79 were excluded due to meeting the exclusion criteria, including 26 patients with PE of unknown etiology and 53 patients whose clinical or laboratory data were incomplete. Ultimately, 237 patients were included in the final analysis. The included patients were categorized into TPE and non-TPE groups (n = 133 and 104, respectively) based on clinical features. Among 133 patients in the TPE group, 86 patients had fever, 26 patients had night sweats, and 6 patients had weight loss, the median duration of symptoms was 7 days. According to the Technical Specifications for Tuberculosis Prevention and Control Work in China (2020 Edition), the time point for observing the effectiveness of anti-tuberculosis treatment is 2 months after the initiation of treatment, which is evaluated based on the results of sputum smear for acid-fast bacilli, sputum culture for Mycobacterium tuberculosis, chest CT scans, and blood tests. Among the 133 patients with tuberculous pleural effusion, 119 were followed up. The median follow-up time was 12 months, with the 25th and 75th percentiles being 8 months and 24.5 months, respectively. Importantly, no patients experienced relapse during the follow-up period. The flowchart of the study can be found in Fig. 1.

Fig. 1

Flowchart of participant selection.

Comparison of clinical features

Patients in the TPE group were considerably younger than those in the non-TPE group (P < 0.001). In the TPE group, effusion TP, GLU, lymphocyte percentage, serum TP, ALB, TC, Ca125, peripheral blood platelet count, PNR, plasma D-dimer, and APTT were all elevated in contrast to those in the non-TPE group (P < 0.05). By contrast, the TPE group displayed significantly lower effusion LDH levels, LDH/ADA ratio, neutrophil percentage, peripheral blood WBC count, neutrophil count, NLR and serum TB, LDH, and CEA levels compared with the non-TPE group (P-values < 0.001). There were no significant differences in sex, effusion ADA levels, nucleated cells, macrophage percentage, peripheral blood lymphocyte count, monocyte count, PLR, LMR, FBG, and serum GLU between the two groups. The demographic and laboratory characteristics of the included patients are shown in Table 1.

Table 1 Comparison of pleural effusions with ADA levels ≥ 25 IU/L.

Correlation analysis with PE

The correlation between PE and various indicators was analyzed using Spearman’s method. As a result, we identified significant positive correlations between effusion LDH and effusion LDH/ADA, as well as between peripheral blood NLR and peripheral blood neutrophil and lymphocyte count, with correlation coefficients of r = 0.942, r = 0.954, and r = 0.996, respectively (Fig. 2).

Fig. 2
figure 2

Correlation heatmap of indicators.

Prediction model and performance

First, we identified 21 variables that showed significant differences between the TPE and non-TPE groups. Subsequently, feature correlation analysis was conducted on these indicators, and those with a correlation coefficient of more than 0.9 were removed; the selected features were then used for further model training. We attempted six common machine learning models, namely Logistic Regression, XGBoost, LightGBM, Random Forest, Adaboost, and Decision Tree, and evaluated their performance using the AUC. As a result, with an AUC of 0.926, the LightGBM model showed superior performance. Logistic Regression, XGBoost, Random Forest, Adaboost, and Decision Tree had AUCs of 0.915, 0.916, 0.904, 0.905, and 0.816, respectively (Fig. 3).

Fig. 3
figure 3

Receiver operating characteristic (ROC) curves showing the predictions of the six models.

The feature weights were ordered from high to low. On this basis, the top five feature weights were selected to build the LightGBM model, including effusion Lym.%, effusion LDH/ADA, age, effusion TP, and peripheral blood PLT, with effusion Lym.% yielding the greatest weight (Fig. 4).

Fig. 4
figure 4

Top five feature importance weights.

Fifteen percent of cases were randomly selected from the total sample (test set) for data validation, and the remainder of the samples (i.e., the training set) were used for five-fold cross-validation. The model achieved AUCs of 1.000, 0.911, and 0.970 on the training, validation, and test sets. Table 2 and Fig. 5a–c present the performance of this model.

Table 2 Results of the confusion matrix for the training, validation, and test sets.
Fig. 5
figure 5

AUC for the training sets (a). AUC for the validation set (b). AUC for the test sets (c). AUC for the external validation sets (d).

External validation (61 TPE patients and 39 non-TPE patients) showed that the predictive model was highly effective, with the ROC curve analysis revealing an AUC of 0.869 (Fig. 5d).

We assessed the potential benefits and clinical applicability of the prediction model through DCA, which revealed notable benefits (Fig. 6a, b). Furthermore, as illustrated by the calibration curve in Fig. 7a, b, our model demonstrates a strong calibration effect. An online tool available through the Extreme Smart Beckman Coulter DxAI platform (http://www.xsmartanalysis.com/model/list/predict/model/html?mid=19630&symbol=2173pKXm268155JgfL33) can be used to generate predictive models using the current algorithm, and it was applied to estimate a patient’s risk of TPE by entering the parameters, as illustrated in Fig. 8.

Fig. 6
figure 6

Decision curve analysis of training set (a) and Decision curve analysis of validation set (b).

Fig. 7
figure 7

Calibration curve of training set (a) and Calibration curve of validation set (b).

Fig. 8
figure 8

The visualization of the prediction model through the Extreme Smart Beckman Coulter DxAI platform.

Continue Reading