Leveraging automated machine learning to benchmark, deconstruct, and compare frailty indices for predicting adverse spinal surgery outcomes

  • Harrop, J. S. et al. Congress of neurological surgeons systematic review and evidence-based guidelines for perioperative spine: preoperative surgical risk assessment. Neurosurgery 89, S9–S18 (2021).

    Google Scholar 

  • Ton, A. et al. The evolution of risk assessment in spine surgery: a narrative review. World Neurosurg. 188, 1–14 (2024).

    Google Scholar 

  • Bakhsheshian, J. et al. The performance of frailty in predictive modeling of short-term outcomes in the surgical management of metastatic tumors to the spine. Spine J. 22, 605–615 (2022).

    Google Scholar 

  • Ton, A. et al. The impact of frailty on postoperative complications in geriatric patients undergoing multi-level lumbar fusion surgery. Eur. Spine J. 31, 1745–1753 (2022).

    Google Scholar 

  • Shahrestani, S. et al. Inclusion of frailty improves predictive modeling for postoperative outcomes in surgical management of primary and secondary lumbar spine tumors. World Neurosurg. 153, e454–e463 (2021).

    Google Scholar 

  • Thommen, R., Bowers, C. A., Segura, A. C., Roy, J. M. & Schmidt, M. H. baseline frailty measured by the risk analysis index and 30-day mortality after surgery for spinal malignancy: analysis of a prospective registry (2011–2020). Neurospine 21, 404–413 (2024).

    Google Scholar 

  • Chan, V. et al. Frailty adversely affects outcomes of patients undergoing spine surgery: a systematic review. Spine J. 21, 988–1000 (2021).

    Google Scholar 

  • Shahrestani, S. et al. Integration of chronological age does not improve the performance of a mixed-effect model using comorbidity burden and frailty to predict 90-day readmission after surgery for degenerative scoliosis. World Neurosurg. 187, e560–e567 (2024).

    Google Scholar 

  • Miller, E. K. et al. External validation of the adult spinal deformity (ASD) frailty index (ASD-FI). Eur. Spine J. 27, 2331–2338 (2018).

    Google Scholar 

  • Miller, E. K. et al. External validation of the Adult Spinal Deformity (ASD) Frailty Index (ASD-FI) in the scoli-RISK-1 patient database. Spine (Phila Pa 1976) 43, 1426–1431 (2018).

    Google Scholar 

  • Adida, S. et al. Machine learning in spine surgery: a narrative review. Neurosurgery 94, 53–64 (2024).

    Google Scholar 

  • Karandikar, P. et al. Machine learning applications of surgical imaging for the diagnosis and treatment of spine disorders: current state of the art. Neurosurgery 90, 372–382 (2022).

    Google Scholar 

  • Ghosh, A. et al. Preoperative anemia is an unsuspecting driver of machine learning prediction of adverse outcomes after lumbar spinal fusion. Spine J. https://doi.org/10.1016/j.spinee.2025.01.031 (2025).

    Google Scholar 

  • Yahanda, A. T. et al. Current applications and future implications of artificial intelligence in spine surgery and research: a narrative review and commentary. Global Spine J. 15, 1445–1454 (2025).

    Google Scholar 

  • Shahrestani, S. et al. Optimizing predictive model performance in adult spinal deformity surgery: a comparative head-to-head analysis of learning models for perioperative complications. Neurosurg. Focus 58, E12 (2025).

    Google Scholar 

  • van Buuren, S. & Groothuis-Oudshoorn, K. Mice: multivariate imputation by chained equations in R. J. Stat. Soft. 45, 1–67 (2011).

    Google Scholar 

  • Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  • Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. Multiple imputation by chained equations: what is it and how does it work?. Int. J. Methods Psychiatr. Res. 20, 40–49 (2011).

    Google Scholar 

  • Hancock, J. T. & Khoshgoftaar, T. M. Survey on categorical data for neural networks. J. Big Data 7, 28 (2020).

    Google Scholar 

  • Poslavskaya, E. & Korolev, A. Encoding categorical data: is there yet anything ‘hotter’ than one-hot encoding? Preprint at https://doi.org/10.48550/arXiv.2312.16930 (2023).

  • Gilbert, T. et al. Development and validation of a hospital frailty risk score focusing on older people in acute care settings using electronic hospital records: an observational study. The Lancet 391, 1775–1782 (2018).

    Google Scholar 

  • Clegg, A. et al. Development and validation of an electronic frailty index using routine primary care electronic health record data. Age Ageing 45, 353–360 (2016).

    Google Scholar 

  • Chin, M. et al. Comparing the hospital frailty risk score and the clinical frailty scale among older adults with chronic obstructive pulmonary disease exacerbation. JAMA Netw. Open 6, e2253692 (2023).

    Google Scholar 

  • Pajewski, N. M., Lenoir, K., Wells, B. J., Williamson, J. D. & Callahan, K. E. Frailty screening using the electronic health record within a medicare accountable care organization. J. Gerontol. A Biol. Sci. Med. Sci. 74, 1771–1777 (2019).

    Google Scholar 

  • Krell, R. W., Girotti, M. E. & Dimick, J. B. Extended length of stay after surgery: complications, inefficient practice, or sick patients?. JAMA Surg. 149, 815–820 (2014).

    Google Scholar 

  • Olson, R. S. & Moore, J. H. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Proc. of the Workshop on Automatic Machine Learning 66–74 (PMLR 2016).

  • Olson, R. S. et al. automating biomedical data science through tree-based pipeline optimization. In applications of evolutionary computation (eds. Squillero, G. & Burelli, P.) 123–137 (Springer International Publishing, Cham). https://doi.org/10.1007/978-3-319-31204-0_9 (2016)

  • Koza, J. R. Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4, 87–112 (1994).

    Google Scholar 

  • Shanthi, D. L. & Chethan, N. Genetic algorithm based hyper-parameter tuning to improve the performance of machine learning models. SN Comput. Sci. 4, 119 (2022).

    Google Scholar 

  • Alibrahim, H. & Ludwig, S. A. Hyperparameter optimization: comparing genetic algorithm against grid search and bayesian optimization. In 2021 IEEE Congress on Evolutionary Computation (CEC) 1551–1559 https://doi.org/10.1109/CEC45853.2021.9504761 (2021)

  • Vincent, A. M. & Jidesh, P. An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Sci. Rep. 13, 4737 (2023).

    Google Scholar 

  • Salehi, F., Abbasi, E. & Hassibi, B. The impact of regularization on high-dimensional logistic regression. In advances in neural information processing systems 32 (Curran Associates, Inc., 2019).

  • Winter, E. Chapter 53 The shapley value. In handbook of game theory with economic applications 3 2025–2054 (Elsevier 2002).

  • Lundberg, S. M. & Lee, S.-I. A Unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (Curran Associates, Inc., 2017).

  • Band, S. S. et al. Application of explainable artificial intelligence in medical health: A systematic review of interpretability methods. Inform. Med. Unlocked 40, 101286 (2023).

    Google Scholar 

  • Yang, C. C. Explainable artificial intelligence for predictive modeling in healthcare. J. Healthc. Inform Res. 6, 228–239 (2022).

    Google Scholar 

  • Youden, W. J. Index for rating diagnostic tests. Cancer 3, 32–35 (1950).

    Google Scholar 

  • Urbanowicz, R. J., Olson, R. S., Schmitt, P., Meeker, M. & Moore, J. H. Benchmarking relief-based feature selection methods for bioinformatics data mining. J. Biomed. Inform. 85, 168–188 (2018).

    Google Scholar 

  • Ritchie, M. D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genetics 69, 138–147 (2001).

    Google Scholar 

  • Granizo-Mackenzie, D. & Moore, J. H. Multiple threshold spatially uniform relieff for the genetic analysis of complex human diseases. In Evolutionary Computation Machine Learning and Data Mining in Bioinformatics (eds. Vanneschi, L., Bush, W. S. & Giacobini, M.) 1–10 (Springer, Berlin, Heidelberg). https://doi.org/10.1007/978-3-642-37189-9_1 (2013)

  • Freda, P. J., Ye, S., Zhang, R., Moore, J. H. & Urbanowicz, R. J. Assessing the limitations of relief-based algorithms in detecting higher-order interactions. BioData Mining 17, 37 (2024).

    Google Scholar 

  • World Health Organization. Haemoglobin concentrations for the diagnosis of anaemia and assessment of severity. https://www.who.int/publications/i/item/WHO-NMH-NHD-MNM-11.1.

  • Shahrestani, S., Brown, N. J., Yue, J. K. & Tan, L. A. Developing mixed-effects models to optimize prediction of postoperative outcomes in a modern sample of over 450,000 patients undergoing elective cervical spine fusion surgery. Clin. Spine Surg. 36, E536–E544 (2023).

    Google Scholar 

  • Kim, D. U. et al. Central sarcopenia, frailty and comorbidity as predictor of surgical outcome in elderly patients with degenerative spine disease. J. Korean Neurosurg. Soc. 64, 995–1003 (2021).

    Google Scholar 

  • Han, D. et al. Comparison of three frailty evaluation tools in predicting postoperative adverse events in older patients undergoing lumbar fusion surgery: a prospective cohort study of 240 patients. Eur. Spine J. 34, 1741–1749 (2025).

    Google Scholar 

  • Hersh, A. M. et al. Comparison of frailty metrics and the charlson comorbidity Index for predicting adverse outcomes in patients undergoing surgery for spine metastases. J. Neurosurg. Spine 36, 849–857 (2022).

    Google Scholar 

  • Song, X., Mitnitski, A., Cox, J. & Rockwood, K. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Stud. Health Technol. Inform 107, 736–740 (2004).

    Google Scholar 

  • Huang, Y., Li, W., Macheret, F., Gabriel, R. A. & Ohno-Machado, L. A tutorial on calibration measurements and calibration models for clinical prediction models. J. Am. Med. Inform. Assoc. 27, 621–633 (2020).

    Google Scholar 

  • Stevens, R. J. & Poppe, K. K. Validation of clinical prediction models: what does the “calibration slope” really measure?. J. Clin. Epidemiol. 118, 93–99 (2020).

    Google Scholar 

  • Van Calster, B. et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 17, 230 (2019).

    Google Scholar 

  • Fluss, R., Faraggi, D. & Reiser, B. Estimation of the youden index and its associated cutoff point. Biom. J. 47, 458–472 (2005).

    Google Scholar 

  • Raj, R., Kannath, S. K., Mathew, J. & Sylaja, P. N. AutoML accurately predicts endovascular mechanical thrombectomy in acute large vessel ischemic stroke. Front. Neurol. 14, 1259958 (2023).

    Google Scholar 

  • Wang, J., Xue, Q., Zhang, C. W. J., Wong, K. K. L. & Liu, Z. Explainable coronary artery disease prediction model based on AutoGluon from AutoML framework. Front. Cardiovasc. Med. 11, 1360548 (2024).

    Google Scholar 

  • Zhu, Z. et al. Integrating machine learning and the SHapley Additive exPlanations (SHAP) framework to predict lymph node metastasis in gastric cancer patients based on inflammation indices and peripheral lymphocyte subpopulations. J. Inflamm. Res. 17, 9551–9566 (2024).

    Google Scholar 

  • Wang, L. et al. Identification of testicular cancer with T2-weighted MRI-based radiomics and automatic machine learning. BMC Cancer 25, 563 (2025).

    Google Scholar 

  • Continue Reading