Biased sampling driven by bacterial population structure confounds machine learning prediction of antimicrobial resistance

News

Abstract

To comprehensively evaluate the impact of population structure in predicting AMR, we collected between 3,204 and 7,188 genomes for each of three Gram-negative species and two Gram-positive species representative of current WHO priority pathogens [15], including the gastrointestinal and urinary tract pathogen Escherichia coli, the opportunistic pathogen Klebsiella pneumoniae, the gastrointestinal pathogen Salmonella enterica, the skin commensal and opportunistic pathogen Staphylococcus aureus, and the major agent of community-acquired pneumonia Streptococcus pneumoniae. Search space included: 'max_bin' from range 50 to 500 with a step of 50, 'bagging_fraction' from range 0.01 to 1, 'bagging_freq' from range 0 to 10, 'feature_fraction' from range 0.01 to 1, 'subsample_for_bin' from range 30 to 0.8*sample size, 'max_depth' from range 1 to 16 with a step of 1, 'learning_rate' from range 0.01 to 1, 'lambda_l2' from range 0 to 100, 'min_data_in_leaf' from range 1 to 300, 'min_gain_to_split' from range 0 to 15, 'num_leaves' from range 2 to 100 with a step of 1. In scheme B, it denotes the number of susceptible genomes in the other clades where resistant or susceptible genomes were not excluded, Rn: in scheme A, this refers to the number of resistant genomes in the paired clades. In scheme B, it refers to the number of resistant genomes in the clade from which resistant or susceptible genomes were excluded, Sn: in scheme A, this refers to the number of susceptible genomes in the paired clades.

Visit original source to read more

Key Data

Publication Date

16 December 2025
Primary Author

Yanying Yu
Source

PLoS Medicine
Language

English

Click below to visit original source:

Member Login

Biased sampling driven by bacterial population structure confounds machine learning prediction of antimicrobial resistance

News