Machine learning and statistical inference in microbial population genomics | Genome Biology

  • Blackwell GA, Hunt M, Malone KM, Lima L, Horesh G, Alako BTF, et al. Exploring bacterial diversity via a curated and searchable snapshot of archived DNA sequences. PLoS Biol. 2021;19:e3001421 (Hanage WP, editor.).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wong ZSY, Zhou J, Zhang Q. Artificial intelligence for infectious disease big data analytics. Infect Dis Health. 2019;24:44–8.

    PubMed 

    Google Scholar 

  • Ow GS, Tang Z, Kuznetsov VA. Big data and computational biology strategy for personalized prognosis. Oncotarget. 2016;7:40200–20.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, et al. On the Opportunities and Risks of Foundation Models. arXiv; 2021 Available from: https://arxiv.org/abs/2108.07258. [cited 2025 Sept 2].

  • Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pagès-Gallego M, De Ridder J. Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling. Genome Biol. 2023;24:71.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Torres MDT, Brooks EF, Cesaro A, Sberro H, Gill MO, Nicolaou C, et al. Mining human microbiomes reveals an untapped source of peptide antibiotics. Cell. 2024;187:5453-5467.e15.

    CAS 
    PubMed 

    Google Scholar 

  • Wan F, Torres MDT, Peng J, De La Fuente-Nunez C. Deep-learning-enabled antibiotic discovery through molecular de-extinction. Nat Biomed Eng. 2024;8:854–71.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Iwashyna TJ, Liu V. What’s So Different about Big Data?. A Primer for Clinicians Trained to Think Epidemiologically. Annals ATS. 2014;11:1130–5.

  • Murphy KP. Probabilistic machine learning: an introduction. Cambridge, Massachusetts: The MIT Press; 2022.

    Google Scholar 

  • Murphy KP. Probabilistic machine learning: advanced topics. Cambridge, Massachusetts: The MIT Press; 2023.

    Google Scholar 

  • Breiman L. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist Sci. 2001;16. Available from: https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling–The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full. [cited 2025 Sept 2].

  • Bzdok D, Altman N, Krzywinski M. Statistics versus machine learning. Nat Methods. 2018;15:233–4.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117.

    PubMed 

    Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32. Curran Associates, Inc; 2019;8024–35.

  • TensorFlow Developers. TensorFlow. Zenodo; 2024. Available from: https://zenodo.org/doi/10.5281/zenodo.12726004. [cited 2025 Sept 2].

  • Greene AC, Giffin KA, Greene CS, Moore JH. Adapting bioinformatics curricula for big data. Brief Bioinform. 2016;17:43–50.

    PubMed 

    Google Scholar 

  • Wiemken TL, Kelley RR. Machine learning in epidemiology and health outcomes research. Annu Rev Public Health. 2020;41:21–36.

    PubMed 

    Google Scholar 

  • Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901.

    Google Scholar 

  • Falush D, Wirth T, Linz B, Pritchard JK, Stephens M, Kidd M, et al. Traces of human migrations in Helicobacter pylori populations. Science. 2003;299:1582–5.

    CAS 
    PubMed 

    Google Scholar 

  • Corander J, Marttinen P. Bayesian identification of admixture events using multilocus molecular markers. Mol Ecol. 2006;15:2833–43.

    PubMed 

    Google Scholar 

  • Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res. 2019;47:5539–49.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lees JA, Tonkin-Hill G, Yang Z, Corander J. Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation. Phil Trans R Soc B. 2022;377:20210237.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Jaillard M, Lima L, Tournoud M, Mahé P, Van Belkum A, Lacroix V, et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. Didelot X, editor. PLoS Genet. 2018;14:e1007758.

  • Hoffman S, Podgurski A. Big bad data: law, public health, and biomedical databases. J Law Med Ethics. 2013;41:56–60.

    PubMed 

    Google Scholar 

  • Wang Q, Ma Y, Zhao K, Tian Y. A comprehensive survey of loss functions in machine learning. Ann Data Sci. 2022;9:187–212.

    Google Scholar 

  • Stone M. Cross-Validatory Choice and Assessment of Statistical Predictions. J Royal Statistic Soc Series B (Methodological. 1974;36:111–47.

  • Bzdok D, Krzywinski M, Altman N. Machine learning: a primer. Nat Methods. 2017;14:1119–20.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bashir D, Montañez GD, Sehra S, Segura PS, Lauw J. An Information-T. Cham: Springer International Publishing; 2020; 347–58. Available from: https://link.springer.com/10.1007/978-3-030-64984-5_27. [cited 2025 Sept 2].

  • Fix E, Hodges JL. Discriminatory analysis: Nonparametric discrimination: Consistency properties: (471672008–001). 1951 Available from: https://doi.apa.org/doi/10.1037/e471672008-001. [cited 2025 Sept 2].

  • Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inform Theory. 1967;13:21–7.

    Google Scholar 

  • Yao Z, Ruzzo WL. A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinformatics. 2006;7:S11.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Mihelčić M, Šmuc T, Supek F. Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype. Sci Rep. 2019;9:19537.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Xu S. Bayesian naïve Bayes classifiers to text classification. J Inf Sci. 2018;44:48–59.

    Google Scholar 

  • John GH, Langley P. Estimating Continuous Distributions in Bayesian Classifiers. arXiv; 2013 Available from: https://arxiv.org/abs/1302.4964. [cited 2025 Sept 2].

  • Webb GI. Naïve Bayes. In: Sammut C, Webb GI, editors. Encyclopedia of Machine Learning. Boston, MA: Springer US; 2011713–4. Available from: https://link.springer.com/10.1007/978-0-387-30164-8_576. [cited 2025 Sept 2].

  • Li F, Shen Y, Lv D, Lin J, Liu B, He F, et al. A bayesian classification model for discriminating common infectious diseases in Zhejiang province, China. Medicine. 2020;99:e19218.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhao Z, Cristian A, Rosen G. Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life. BMC Bioinformatics. 2020;21:412.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Sandberg R, Winberg G, Bränden C-I, Kaske A, Ernberg I, Cöster J. Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. Genome Res. 2001;11:1404–9.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G. Support vector machines and kernels for computational biology. PLoS Comput Biol. 2008;4:e1000173 (Lewitter F, editor.).

    PubMed 
    PubMed Central 

    Google Scholar 

  • McIntyre ABR, Ounit R, Afshinnekoo E, Prill RJ, Hénaff E, Alexander N, et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 2017;18:182.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.

    Google Scholar 

  • Tsirigos A. A sensitive, support-vector-machine method for the detection of horizontal gene transfers in viral, archaeal and bacterial genomes. Nucleic Acids Res. 2005;33:3699–707.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Weimann A, Mooren K, Frank J, Pope PB, Bremges A, McHardy AC. From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer. Segata N, editor. mSystems. 2016;1:e00101–16.

  • Belman S, Pesonen H, Croucher NJ, Bentley SD, Corander J. Estimating Between Country Migration in Pneumococcal Populations. Epidemiology; 2023. Available from: http://medrxiv.org/lookup/doi/10.1101/2023.11.15.23298520. [cited 2025 Sept 2].

  • Lupolova N, Dallman TJ, Holden NJ, Gally DL. Patchy promiscuity: machine learning applied to predict the host specificity of Salmonella enterica and Escherichia coli. Microbial Genomics. 2017;3. Available from: https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000135. [cited 2025 Sept 2].

  • Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.

    Google Scholar 

  • Li M, Xu H, Deng Y. Evidential decision tree based on belief entropy. Entropy. 2019;21:897.

    PubMed Central 

    Google Scholar 

  • Schrider DR, Kern AD. Supervised machine learning for population genetics: a new paradigm. Trends Genet. 2018;34:301–12.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Google Scholar 

  • Statnikov A, Henaff M, Narendra V, Konganti K, Li Z, Yang L, et al. A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome. 2013;1:11.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Deneke C, Rentzsch R, Renard BY. Paprbag: a machine learning approach for the detection of novel pathogens from NGS data. Sci Rep. 2017;7:39194.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Méric G, Mageiros L, Pensar J, Laabei M, Yahara K, Pascoe B, et al. Disease-associated genotypes of the commensal skin bacterium Staphylococcus epidermidis. Nat Commun. 2018;9:5034.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Mageiros L, Méric G, Bayliss SC, Pensar J, Pascoe B, Mourkas E, et al. Genome evolution and the emergence of pathogenicity in avian Escherichia coli. Nat Commun. 2021;12:765.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen ML, Doddi A, Royer J, Freschi L, Schito M, Ezewudo M, et al. Beyond multidrug resistance: leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction. EBioMedicine. 2019;43:356–69.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Li Y, Metcalf BJ, Chochua S, Li Z, Gertz RE, Walker H, et al. Validation of β-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences. BMC Genomics. 2017;18:621.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Arning N, Sheppard SK, Bayliss S, Clifton DA, Wilson DJ. Machine learning to predict the source of campylobacteriosis using whole genome data. PLoS Genet. 2021;17:e1009436 (Hughes D, editor.).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pascoe B, Futcher G, Pensar J, Bayliss SC, Mourkas E, Calland JK, et al. Machine learning to attribute the source of Campylobacter infections in the United States: a retrospective analysis of national surveillance data. J Infect. 2024;89:106265.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wheeler NE, Gardner PP, Barquist L. Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica. PLoS Genet. 2018;14:e1007333 (Didelot X, editor.).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhang S, Li S, Gu W, Den Bakker H, Boxrud D, Taylor A, et al. Zoonotic Source Attribution of Salmonella enterica Serotype Typhimurium Using Genomic Surveillance Data, United States. Emerg Infect Dis. 2019;25. Available from: http://wwwnc.cdc.gov/eid/article/25/1/18-0835_article.htm. [cited 2025 Sept 2].

  • Beavan AJS, Domingo-Sananes MR, McInerney JO. Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome. Proc Natl Acad Sci USA. 2024;121:e2304934120.

    CAS 
    PubMed 

    Google Scholar 

  • Mason L, Baxter J, Bartlett P, Frean M. Boosting Algorithms as Gradient Descent. Advances in Neural Information Processing Systems. MIT Press; 1999. Available from: https://proceedings.neurips.cc/paper/1999/hash/96a93ba89a5b5c6c226e49b88973f46e-Abstract.html.

  • Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29. Available from: https://projecteuclid.org/journals/annals-of-statistics/volume-29/issue-5/Greedy-function-approximation-A-gradient-boosting-machine/10.1214/aos/1013203451.full. [cited 2025 Sept 2].

  • Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc; 2017;3149–57 17.

  • Anahtar MN, Yang JH, Kanjilal S. Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research. McAdam AJ, editor. J Clin Microbiol. 2021;59:e01260–20.

  • Ramoneda J, Stallard-Olivera E, Hoffert M, Winfrey CC, Stadler M, Niño-García JP, et al. Building a genome-based understanding of bacterial pH preferences. Sci Adv. 2023;9:eadf8998.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A. 1982;79:2554–8.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sheehan S, Song YS. Deep Learning for Population Genetic Inference. Chen K, editor. PLoS Comput Biol. 2016;12:e1004845.

  • Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X. Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods. 2019;166:4–21.

    CAS 
    PubMed 

    Google Scholar 

  • Sejnowski TJ. The Deep Learning Revolution. The MIT Press; 2018 Available from: https://direct.mit.edu/books/book/4111/The-Deep-Learning-Revolution. [cited 2025 Sept 2].

  • Lugo L, Hernández EB. A recurrent neural network approach for whole genome bacteria identification. Appl Artif Intell. 2021;35:642–56.

    Google Scholar 

  • Hasan MA, Lonardi S. Deeplyessential: a deep neural network for predicting essential genes in microbes. BMC Bioinformatics. 2020;21:367.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Assaf R, Xia F, Stevens R. Detecting operons in bacterial genomes via visual representation learning. Sci Rep. 2021;11:2124.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wiatrak M, Weimann A, Dinan A, Brbić M, Floto RA. Sequence-based modelling of bacterial genomes enables accurate antibiotic resistance prediction. Microbiology; 2024 Available from: http://biorxiv.org/lookup/doi/10.1101/2024.01.03.574022. [cited 2025 Sept 2].

  • Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2:359–66.

    Google Scholar 

  • Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. arXiv; 2016. Available from: https://arxiv.org/abs/1611.03530. [cited 2025 Sept 2].

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017;30.

  • Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–44.

    Google Scholar 

  • Holz HJ, Loew MH. Relative feature importance: A classifier-independent approach to feature selection. Machine Intelligence and Pattern Recognition. Elsevier; 1994;473–87. Available from: https://linkinghub.elsevier.com/retrieve/pii/B9780444818928500468. [cited 2025 Sept 2].

  • Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B. Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci USA. 2019;116:22071–80.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • House of Commons Science, Innovation and Technology Committee. 2023. The governance of artificial intelligence: interim report. Ninth Report of Session 2022–23. HC1769. https://committees.parliament.uk/publications/41130/documents/205611/default/

  • Nielsen EM, Fussing V, Engberg J, Nielsen NL, Neimann J. Most Campylobacter subtypes from sporadic infections can be found in retail poultry products and food animals. Epidemiol Infect. 2006;134:758–67.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Garrett N, Devane ML, Hudson JA, Nicol C, Ball A, Klena JD, et al. Statistical comparison of Campylobacter jejuni subtypes from human cases and environmental sources: comparison of Campylobacter subtypes. J Appl Microbiol. 2007;103:2113–21.

    CAS 
    PubMed 

    Google Scholar 

  • Wilson DJ, Gabriel E, Leatherbarrow AJH, Cheesbrough J, Gee S, Bolton E, et al. Tracing the Source of Campylobacteriosis. Guttman DS, editor. PLoS Genet. 2008;4:e1000203.

  • Sheppard SK, Dallas JF, Strachan NJC, MacRae M, McCarthy ND, Wilson DJ, et al. Campylobacter genotyping to determine the source of human infection. Clin Infect Dis. 2009;48:1072–8.

    PubMed 

    Google Scholar 

  • Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA: ACM; 2016;785–94. Available from: https://dl.acm.org/doi/10.1145/2939672.2939785. [cited 2025 Sept 2].

  • Mackay TFC. The genetic architecture of quantitative traits. Annu Rev Genet. 2001;35:303–39.

    CAS 
    PubMed 

    Google Scholar 

  • Peacock SJ, Moore CE, Justice A, Kantzanou M, Story L, Mackie K, et al. Virulent combinations of adhesin and toxin genes in natural populations of Staphylococcus aureus. Infect Immun. 2002;70:4987–96.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Astle W, Balding DJ. Population Structure and Cryptic Relatedness in Genetic Association Studies. Statist Sci. 2009;24. Available from: https://projecteuclid.org/journals/statistical-science/volume-24/issue-4/Population-Structure-and-Cryptic-Relatedness-in-Genetic-Association-Studies/10.1214/09-STS307.full. [cited 2025 Sept 2].

  • Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–63.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sheppard SK. Strain wars and the evolution of opportunistic pathogens. Curr Opin Microbiol. 2022;67:102138.

    CAS 
    PubMed 

    Google Scholar 

  • Pearl J. Causal inference in statistics: An overview. Statist Surv. 2009;3. Available from: https://projecteuclid.org/journals/statistics-surveys/volume-3/issue-none/Causal-inference-in-statistics-An-overview/10.1214/09-SS057.full. [cited 2025 Sept 2].

  • Zhu Z, Zheng Z, Zhang F, Wu Y, Trzaskowski M, Maier R, et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun. 2018;9:224.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Sheppard SK, Didelot X, Meric G, Torralbo A, Jolley KA, Kelly DJ, et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci USA. 2013;110:11923–7.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Earle SG, Wu C-H, Charlesworth J, Stoesser N, Gordon NC, Walker TM, et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol. 2016;1:16041.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lees JA, Galardini M, Bentley SD, Weiser JN, Corander J. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Stegle O, editor. Bioinformatics. 2018;34:4310–2.

  • Young BC, Earle SG, Soeng S, Sar P, Kumar V, Hor S, et al. Panton-valentine leucocidin is the key determinant of Staphylococcus aureus pyomyositis in a bacterial GWAS. Elife. 2019;8:e42486.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Earle SG, Lobanovska M, Lavender H, Tang C, Exley RM, Ramos-Sevillano E, et al. Genome-wide association studies reveal the role of polymorphisms affecting factor H binding protein expression in host invasion by Neisseria meningitidis. Nassif X, editor. PLoS Pathog. 2021;17:e1009992.

  • Green AG, Yoon CH, Chen ML, Ektefaie Y, Fina M, Freschi L, et al. A convolutional neural network highlights mutations relevant to antimicrobial resistance in Mycobacterium tuberculosis. Nat Commun. 2022;13:3817.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • The CRyPTIC Consortium. Genome-wide association studies of global Mycobacterium tuberculosis resistance to 13 antimicrobials in 10,228 genomes identify new resistance mechanisms. Ladner J, editor. PLoS Biol. 2022;20:e3001755.

  • Mosquera-Rendón J, Moreno-Herrera CX, Robledo J, Hurtado-Páez U. Genome-wide association studies (GWAS) approaches for the detection of genetic variants associated with antibiotic resistance: a systematic review. Microorganisms. 2023;11:2866.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW. Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet. 2012;13:601–12.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Walker TM, Cruz ALG, Peto TE, Smith EG, Esmail H, Crook DW. Tuberculosis is changing. Lancet Infect Dis. 2017;17:359–61.

    PubMed 

    Google Scholar 

  • Satta G, Lipman M, Smith GP, Arnold C, Kon OM, McHugh TD. Mycobacterium tuberculosis and whole-genome sequencing: how close are we to unleashing its full potential? Clin Microbiol Infect. 2018;24:604–9.

    CAS 
    PubMed 

    Google Scholar 

  • Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. Abecasis GR, editor. PLoS Genet. 2009;5:e1000337.

  • Yang Y, Niehaus KE, Walker TM, Iqbal Z, Walker AS, Wilson DJ, et al. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data. Birol I, editor. Bioinformatics. 2018;34:1666–71.

  • Kouchaki S, Yang Y, Walker TM, Sarah Walker A, Wilson DJ, Peto TEA, et al. Application of machine learning techniques to tuberculosis drug resistance analysis. Wren J, editor. Bioinformatics. 2019;35:2276–82.

  • Yang Y, Walker TM, Walker AS, Wilson DJ, Peto TEA, Crook DW, et al. DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis. Hancock J, editor. Bioinformatics. 2019;35:3240–9.

  • Gröschel MI, Owens M, Freschi L, Vargas R, Marin MG, Phelan J, et al. Gentb: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning. Genome Med. 2021;13:138.

    PubMed 
    PubMed Central 

    Google Scholar 

  • The CRyPTIC Consortium and the 100,000 Genomes Project. Prediction of Susceptibility to First-Line Tuberculosis Drugs by DNA Sequencing. N Engl J Med. 2018;379:1403–15.

  • He G, Zheng Q, Shi J, Wu L, Huang B, Yang Y. Evaluation of WHO catalog of mutations and five WGS analysis tools for drug resistance prediction of Mycobacterium tuberculosis isolates from China. Georghiou SB, editor. Microbiol Spectr. 2024;12:e03341–23.

  • Ferrari E, Retico A, Bacciu D. Measuring the effects of confounders in medical supervised classification problems: the confounding index (CI). Artif Intell Med. 2020;103:101804.

    PubMed 

    Google Scholar 

  • Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA: ACM; 2016;1135–44. Available from: https://dl.acm.org/doi/10.1145/2939672.2939778. [cited 2025 Sept 2].

  • Lundberg S, Lee S-I. A Unified Approach to Interpreting Model Predictions. arXiv; 2017 Available from: https://arxiv.org/abs/1705.07874. [cited 2025 Sept 2].

  • Meyes R, Lu M, Waubert de Puiseau C, Meisen T. Ablation studies to uncover structure of learned representations in artificial neural networks. Proceedings of the International Conference on Artificial Intelligence (ICAI). Athens, Greece: CSREA Press; 2019 Available from: https://www.researchgate.net/publication/334871296_Ablation_Studies_to_Uncover_Structure_of_Learned_Representations_in_Artificial_Neural_Networks. [cited 2025 Sept 2].

  • Callaway E. How generative AI is building better antibodies. Nature. 2023;d41586–023–01516-w.

  • 118.Callaway E. ‘ChatGPT for CRISPR’ creates new gene-editing tools. Nature. 2024;629:272–272.

    CAS 
    PubMed 

    Google Scholar 

  • Tang X, Dai H, Knight E, Wu F, Li Y, Li T, et al. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Briefings in Bioinformatics. 2024;25:bbae338

  • Winnifrith A, Outeiral C, Hie BL. Generative artificial intelligence for de novo protein design. Current Opinion in Structural Biology. 2024;86:102794

  • Continue Reading