Skip to main content

Biased sampling driven by bacterial population structure confounds machine learning prediction of antimicrobial resistance

News


Abstract

To comprehensively evaluate the impact of population structure in predicting AMR, we collected between 3,204 and 7,188 genomes for each of three Gram-negative species and two Gram-positive species representative of current WHO priority pathogens [15], including the gastrointestinal and urinary tract pathogen Escherichia coli, the opportunistic pathogen Klebsiella pneumoniae, the gastrointestinal pathogen Salmonella enterica, the skin commensal and opportunistic pathogen Staphylococcus aureus, and the major agent of community-acquired pneumonia Streptococcus pneumoniae. Search space included: 'max_bin' from range 50 to 500 with a step of 50, 'bagging_fraction' from range 0.01 to 1, 'bagging_freq' from range 0 to 10, 'feature_fraction' from range 0.01 to 1, 'subsample_for_bin' from range 30 to 0.8*sample size, 'max_depth' from range 1 to 16 with a step of 1, 'learning_rate' from range 0.01 to 1, 'lambda_l2' from range 0 to 100, 'min_data_in_leaf' from range 1 to 300, 'min_gain_to_split' from range 0 to 15, 'num_leaves' from range 2 to 100 with a step of 1. In scheme B, it denotes the number of susceptible genomes in the other clades where resistant or susceptible genomes were not excluded, Rn: in scheme A, this refers to the number of resistant genomes in the paired clades. In scheme B, it refers to the number of resistant genomes in the clade from which resistant or susceptible genomes were excluded, Sn: in scheme A, this refers to the number of susceptible genomes in the paired clades.
Key Data

  • Publication Date
    16 December 2025
  • Primary Author
    Yanying Yu
  • Source
    PLoS Medicine
  • Language
    English
Click below to visit original source: