Systematic review of prognostic models in Parkinson’s disease

We identified 560 papers in MEDLINE and 1087 papers in EMBASE and one paper was identified outside the formal search strategy. We removed 569 duplicates and excluded 994 papers by abstract and title screening. 84 papers were selected for full text screening. 25 papers^{7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}, comprising 41 prognostic models, were eligible for inclusion (see Fig. 1).

Fig. 1

PRISMA flowchart of included studies.

Study populations and designs

15 studies (60%) were published since 2015^{7,9,11,14,15,17,18,19,20,23,25,27,28,29,30,31}, and one before 2010⁸ (Table 1, Fig. 2). Most studies included European (40%)^{8,12,13,15,17,19,20,24,25,28}, North American (12%)^10,23,29, Australian populations (12%)^11,16,22, or a combination of these (16%)^14,18,26,27. 20 studies (80%) were prospective observational cohort studies^{7,8,9,10,11,13,14,15,16,17,19,20,21,22,23,24,27,28,29,31} and 7 studies (28%) were inception cohort studies^{14,15,19,20,24,27,28}. Models from 7 studies (28%)^{14,15,19,20,27,28,29} had a defined time-point at which they could be used (i.e. at diagnosis or in early PD) (Table 1). 18 studies (72%)^{7,8,9,10,11,12,13,16,17,18,21,22,23,24,25,26,30,31} recruited PwP at various disease stages or did not define which PwP were recruited, so we were unable to identify which time-points in the disease course the models were designed to be used. However, one model²³ recruited PwP with disease durations ranging from 0 to 30 years and included disease duration as a predictor variable in the model, so could potentially be used throughout the disease course if adequately validated.

Table 1 Summary of design of studies included in the systematic review of prognostic models in Parkinson’s Disease

Outcomes of study

The most common prognostic outcome was falls/recurrent falls, which was predicted in 11 studies (44%)^{7,8,9,10,12,13,16,17,19,21,22}. 7 studies (28%)^{12,18,19,23,27,28,31} predicted cognitive impairment/dementia, 4 studies (16%)^12,15,25,26 predicted motor complications, 3 studies (12%)^11,12,19 predicted freezing of gait, 3 studies (12%) predicted imbalance^12,19,30, 2 studies (8%)^18,20 predicted functional disability, 2 studies (8%)^20,28 predicted a composite poor outcome, and single studies predicted depression¹⁴, mortality²⁰, fracture risk²⁴, difficulty doing hobbies¹⁹, and several other symptoms and signs^12,29. The follow-up duration over which predictions were made varied from 3 months⁸ to 12 years²⁰, most of which were <2 years (60% of models) and 4 studies^18,20,25,28 had 5 or more years’ follow-up (Table 1).

Predictors in study

The number of predictors per model ranged from 3 to 998 (Table 1). 17 studies comprising 24 prognostic models (59%) used variables which were simple to collect in clinical practice, but 7 studies comprising 11 prognostic models (27%) included predictors that are not always routinely available in clinical practice, such as DAT imaging measurements, CSF biomarkers, or genetic polymorphism data (supplementary Table 1)^{13,14,18,23,25,27,31}. In one study, 6 models (15%) were based on smartphone features and the corresponding app/analysis pipelines are not available for routine use in clinical practice¹⁹. 8 studies dichotomised or categorised continuous/discrete predictors^{7,10,12,13,17,22,24,31}. Across 24 studies with 35 final models which specified the predictors, the most common predictors were age/age at onset (n = 25), sex (n = 15), and original or Movement Disorder Society Revision of the UPDRS (n = 12) (supplementary Table 2). In Fig. 3 we showed the percentages of predictors included in the models for the two most common outcomes (falls/recurrent falls [13 models] and cognitive impairment/dementia [7 models]). We question the usefulness of previous falls as a predictor of future falls, as was the case in 11 models^{7,8,9,10,13,17,21,22} because once PwP have started to fall, the fracture risk is already present and physiotherapy interventions for falls and balance are already indicated.

Study sample sizes

5 studies (20%) had fewer than 100 participants^8,9,13,15,30 (Table 1). Only 4 studies (16%) had an events per variable (EPV) of at least 10^10,17,18,20 (Table 1), the usual rule of thumb for minimum EPV required for Cox or logistic regression modelling³², and many of the other studies had EPVs much less than 10^{7,8,9,11,13,14,16,19,25,27,28,30,31}. 4 studies (16%) didn’t give information about the number of events^18,24,26,29 (Table 1).

Model development

12 studies (48%) did not provide information on the number of participants lost to follow-up^{9,10,11,12,15,18,20,22,24,25,26,29,31} and 11 studies (44%) didn’t report the number of participants with missing data^{9,11,12,15,16,17,21,22,24,26,31} (supplementary Tables 3 and 4). 10 studies (40%) gave full information of missing data (number and imputation method)^{7,10,13,14,18,23,25,27,28,29}. The most common method of handling missing data was complete case analysis (28%)^{7,10,13,15,18,25,29}. 2 studies (8%) handled the missing data with multiple imputation^14,28 (Table 2). 8 studies (32%) transformed continuous predictors into dichotomous or category variables^{7,10,12,13,17,22,24,31} and 10 studies (40%) selected predictors by univariable analysis^{7,9,13,14,16,20,22,25,27,31} (supplementary table 1 and 5).

Table 2 Summary of modelling methods and validation

12 studies (48%) used logistic regression^{8,9,10,11,13,14,16,17,21,22,27,28} and 3 studies (12%) used machine learning (decision trees, XGBoost, and random forests) to build the prognostic model^12,14,19. None of the machine learning models reported key predictor importance (e.g., SHAP values) or provided sufficient details for independent validation.8 studies (32%) didn’t account for censoring and simply excluded censored participants in the analysis^{8,13,14,16,17,21,27,28}. 10 studies (40%) used time-to-event survival analysis to build the prognostic models: 6 studies used Cox regression^{7,15,24,25,26,31}. Other studies used a frailty Cox model^18,23, Weibull parametric survival model²⁰ and a dynamic prediction model²⁹ (Table 2). Three studies reported checking the proportional hazards assumption in survival analysis^7,18,20 (Table 2 and supplementary table 5).

Model evaluation and performance

Two studies^10,17 that aimed to externally validate previously published models did not use the original model equation to make predictions for PwP in their validation dataset³. Therefore, these 2 studies^10,17 were not truly external validation studies. We classed these studies as model development in the PROBAST assessment (Tables 1 and 3).

Table 3 Summary of risk of bias and applicability in PROBAST

Internal validation and model equation assessment only applies to model development studies (n = 24) (Table 1). 7 studies (28%) didn’t perform internal validation^{8,9,10,11,17,21,26}, 7 studies (28%) didn’t provide clear information about whether internal validation had been applied in all model development procedures or not^{13,14,16,23,24,29,31}, and 3 studies (12%) used split data methods^14,27,30 (supplementary Table 6). 15 studies (60%) used cross-validation or bootstrap resampling to assess optimism in model performance^{7,12,13,16,18,19,20,22,23,24,25,27,28,29,31} (supplementary Table 6). Only 3 studies (12%) performed both internal and external validation after model development^18,20,28 (supplementary Table 6). One study¹⁸ didn’t give the number of events in the development and validation datasets (Table 1).

3 studies (12%) didn’t evaluate model performance^8,12,21 (supplementary Table 7). 12 studies (48%) reported internal discrimination performance but did not report calibration performance^{7,13,16,18,19,23,24,25,26,29,30,31} and one external validation study¹⁵ reported the discrimination performance without reporting calibration (Table 2). 6 studies (24%) used the Hosmer-Lemeshow goodness-fit-test to assess the internal calibration performance^{9,10,11,17,22,27} (supplementary Table 7). One study (4%) used both calibration plot and slope to present models’ internal and external calibration performance²⁸, one study (4%) used calibration plot to present models’ internal and external calibration performance²⁰ and one study (4%) used calibration plot to present models’ internal calibration performance¹⁴ (supplementary Table 7).

Model reporting

9 studies (36%) including 13 models (32%) gave sufficient information for the models to be used in clinical practice^{11,14,18,20,26,27,28,29,30} (Table 2). 10 studies (40%) did not report the intercept or baseline hazard^{7,8,9,10,13,17,18,22,25,31}. 5 studies (20%) did not provide the model equation or sufficient details to replicate the model^{12,19,21,23,24} and one study provided a plot of estimated coefficients instead of giving specific values¹⁶.

Risk of bias/applicability

We found 8 studies (32%) which had inclusion and exclusion criteria that would be broadly generalisable to unselected populations with PD^{14,15,18,19,20,24,27,28} (supplementary Table 8), which had low concern of applicability (supplementary Table 9). 16 studies (64%) lacked details of important aspects of study design (e.g. recruitment methods/dates, diagnostic criteria)^{7,8,10,11,12,13,16,17,21,22,23,25,26,29,30,31} and 7 studies (28%) had selection concerns that could bias the studies towards healthier participants (e.g., excluding on the basis of comorbidities, older age) raising concerns about generalisability or risk of bias^{7,8,9,16,17,30,31} (supplementary Table 8, 9 and 10).

Supplementary Table 11 contains the risk of bias results relating to the predictors studied. One study (4%) had risk of bias in the predictors as they used a retrospective cohort without stating how subjective predictors (e.g., depression, olfactory dysfunction) were measured²⁵. 7 studies (28%) included predictors that may not be routinely available in clinical practice, such as CSF biomarkers or imaging data^{13,14,18,23,25,27,31} so these models may not be feasible in clinical practice, especially in resource-poor settings.

For the risk of bias relating to the outcomes in studies, one study (4%) had unclear risk of bias as it didn’t state the outcome definition¹² (supplementary Tables 12 and 13). Outcome definitions in 2 studies (8%) may have been biased by determination with knowledge of predictor information as the outcome definitions were subjective^19,25 (supplementary Tables 12 and 13).

Systematic review of prognostic models in Parkinson’s disease

Study populations and designs

Outcomes of study

Predictors in study

Study sample sizes

Model development

Model evaluation and performance

Model reporting

Risk of bias/applicability

Continue Reading

More posts

GT3 Revival Series reveals further details ahead of 2026 debut

Ball Over Everything in NBA® 2K26; Superstar and Leave No Doubt Editions Now Available – Business Wire

Fat Microscopy: Visualizing Cell Lipids

Albany start-up’s nature-inspired catheter technology will benefit stroke victims