Enabling Phenotypic AMR Prediction from NGS Data with Expert-Augmented Machine Learning and Curated Databases¶

Detecting and understanding antimicrobial resistance (AMR) is a critical challenge in modern microbiology, with far-reaching implications for public health and clinical decision-making. In a previous blog post, BugSeq’s Approach to AMR, we outlined the importance of combining genomic insights with machine learning (ML) to enhance AMR detection. However, not all computational approaches to AMR prediction enable the same accuracy, insight, and power. In this post, we discuss the advantages and limitations of different approaches and demonstrate how the approach BugSeq takes—expert-augmented machine learning combined with a curated AMR database—offers a more robust, interpretable, and powerful path for phenotypic AMR prediction (genomic AST).

The Challenge¶

It is unsurprising that popular, publicly available tools and databases perform poorly for predicting phenotypic AMR across a broad range of species and drugs: often they were not designed for this task!

For example, the wiki of AMRFinderPlus states:

“AMRFinderPlus does not predict phenotypic resistance.”

Yet, many researchers (including the AMRFinder paper!) use this tool for this purpose. Whether designed for predicting phenotype (e.g. ResFinder) or not, two common approaches have emerged for predicting phenotype from genomes: gene presence-absence and k-mer-based methods. We highlight some of the limitations of these approaches and demonstrate how leveraging their strengths, while mitigating their weaknesses, results in the best AMR performance and power.

K-mer Methods are Largely Uninterpretable with High Risk of Confounding¶

K-mer-based approaches, which rely on short nucleotide sequences to infer resistance, have gained traction due to their computational efficiency and scalability; however, these methods face significant obstacles that limit their utility in AMR prediction:

Failure to utilize more than 100 years of AMR research: The scientific literature contains detailed mechanisms of AMR for many drugs - yet k-mer based methods completely ignore this knowledge.
Confounding variables: K-mer models may capture spurious correlations rather than true causal mechanisms of resistance. This issue arises when k-mers associated with resistance are merely linked to the genetic background of a particular strain (phylogenetic signal) rather than being direct AMR determinants.
Non-balanced datasets: Many AMR datasets suffer from imbalanced representations of resistant and susceptible strains. K-mer approaches often amplify these biases, leading to overfitting on well-represented resistance markers and poor generalization to novel or rare strains. This can result in a model that performs well in testing but fails in real-world clinical scenarios.
Lack of interpretability: Unlike rule-based or curated ML approaches, most k-mer models generate predictions that are difficult to interpret biologically. They can tell you that a sample is likely resistant, but not why. This “black box” nature hinders validation and regulatory acceptance, particularly in clinical settings where transparency and understanding the mechanism of resistance are paramount.

Gene Presence-Absence Does Not Capture Phenotype¶

The most straightforward computational approach to AMR prediction is based on a simple premise: if a known antimicrobial resistance gene (ARG) is detected in a bacterium’s genome, the bacterium is predicted to be resistant. While intuitively appealing, this gene presence-absence model is fundamentally flawed because it ignores a crucial biological reality: the mere presence of a gene does not guarantee its expression, function or elevation of MIC above the resistance threshold. It also fails to account for interactions and additive effects between genes. These oversights lead to significant predictive errors and a dangerously incomplete understanding of an organism’s resistance potential.¹

Why Public Databases for Gene Presence-Absence Aren’t Enough — And How BugSeq Raises the Bar¶

Publicly available AMR gene databases have been instrumental in advancing the field—but they come with serious limitations when applied to phenotypic resistance prediction. These gaps can undermine the reliability of downstream analysis, especially in clinical or diagnostic contexts where accuracy is paramount.

Incomplete Gene Coverage: Public databases are often incomplete. Consider ResFinder: at the time of writing, it includes only 566 blaOXA genes, while the NCBI database lists over 1,300—a striking disparity that risks missing clinically relevant variants. This kind of incomplete coverage can lead to false negatives or inexact hits and an underestimation of resistance potential.
Non-Specific Annotations: Annotation quality is another weak point. In the AMRFinderPlus database, 1,218 genes are broadly labeled as causing “beta-lactam” resistance, without distinguishing between sub-classes like penicillins, cephalosporins, or carbapenems. Such generalization is insufficient for making actionable predictions or understanding specific mechanisms of resistance required for clinical decision-making.
Poor Phenotype-Genotype Correlation: CTX-Ms are one of the most common and important extended spectrum beta-lactamases, conferring resistance to third generation cephalosporins. Yet, of the 273 CTX-M genes in CARD, only three are annotated linking them to ceftriaxone resistance. This disconnect severely limits predictive accuracy and clinical relevance.

Database limitations extend beyond AMR gene databases. Publicly available datasets that link whole genomes to phenotypes—often used for benchmarking and machine learning—are also deeply flawed:

Low-Quality Genomes: Many of the genomes in these datasets fail to meet modern sequencing and assembly quality standards.
Incorrect Taxonomic Labels: Species identification is frequently inaccurate, leading to misclassification and erroneous conclusions. We have previously written about this in a peer reviewed publication on database curation, where approximately 3.6% of submitter-annotated genomes in public databases like GenBank have incorrect taxonomic identification.
Inconsistent Processing Pipelines: Genomes have been assembled and annotated using different tools and parameters, introducing batch and quality effects that confound downstream analyses.
Non-Standard Phenotype Interpretation: Phenotypic resistance labels often do not follow clinical standards such as CLSI or EUCAST breakpoints.
Biologically Implausible Phenotypes: Some datasets contain obvious errors, such as penicillin-resistant Group A Streptococcus, a phenotype never seen before.

These issues make such datasets unsuitable for rigorous model training or clinical-grade validation.

At BugSeq, we took matters into our own hands¶

To build a foundation we could trust, we curated the BugPheno database, a database of over 80,000 genomes from public and proprietary sources paired with antimicrobial susceptibility data. Here’s how we did it differently:

High-Quality Genomes: All genomes passed strict quality control, underwent standardized assembly, and were subject to accurate taxonomic identification leveraging our published BugSplit algorithm and BugRef database. Uniform processing avoided batch effects affecting public databases like BV-BRC and NDARO.
Clinically-Guided Phenotype Curation: Resistance phenotypes were interpreted according to recognized clinical guidelines (e.g. CLSI).
Expert Review: Implausible or mislabeled data was removed, drawing on domain expertise to curate a dataset optimized for machine learning and clinical prediction.

We then turned our attention to the genomic determinant database and curated the BugAMR database. BugAMR incorporates high quality genes, mutations, functional knockouts, insertion sequences and other biological changes involved in AMR: it includes more genotypic determinants and species than any publicly available database. To highlight the power of our team’s curation efforts, we applied similar methods to the AMRFinderPlus database. Below, we show how expert curation without any machine learning can dramatically improve phenotypic prediction.

Categorical agreement improvement with curation — Figure 1. Categorical agreement improvement across BugPheno with curation of AMRFinderPlus database alone.

Machine Learning: Capturing the Complexity¶

While expert curation leads to significant improvement of public gene databases, this approach will always be limited by the limitations of gene presence-absence, as described above. This is where machine learning, when properly guided, offers a significant leap forward. While basic models can falter, a sophisticated ML approach can learn the complex, non-linear relationships between genomic features and the final phenotypic outcome. By training on large, high-quality datasets of paired genomes and antibiotic susceptibility testing (AST) results like BugPheno, ML models can:

Model the cumulative effect of multiple genes and mutations (e.g. stepwise quinolone resistance).
Identify novel AMR markers that have not yet been characterized.
Understand the impact of regulatory elements on gene expression and phenotype.
Recognize how the genetic background can influence the function of a known AMR gene. For example, many blaOXA behave differently in Pseudomonas aeruginosa compared to Escherichia coli due to differences in intracellular antibiotic concentrations.²

However, machine learning is not a magic bullet. Without expert guidance and a strong biological foundation, an ML model is still susceptible to the “garbage in, garbage out” principle and the confounding issues described earlier. One of our favorite examples is from Van Camp, Haslam and Porollo (Figure 2), where the XGBoost model (a popular algorithm in the field!) identified aminoglycoside modifying enzymes as the most important predictor of resistance to cefepime, cefotaxime and ceftriaxone. Expert curation would quickly identify these predictors as confounders, likely transmitted on the same mobile genetic element.

Confounding with Machine Learning — Figure 2. Table 4 from Van Camp, Haslam, Porollo (2021): “Top 5 most important features for each antibiotic model.”

Putting it all Together: The Power of Curated AMR Databases and Expert-Augmented Machine Learning¶

To address these limitations, BugSeq integrates expert curation with machine learning, leveraging the BugAMR and BugPheno databases. Our expert-augmented approach ensures that the machine learning models are trained on biologically relevant, validated features, and that machine learning mechanisms can be inspected and curated. This synergy offers several key advantages:

Improved Interpretability: By anchoring our models in an understanding of how AMR develops (gene acquisition, mutations, functional knockouts, overexpression, etc.), the predictions remain biologically meaningful and explainable. This transparency is crucial for clinicians who need to understand the why behind a prediction.
Reduction in Confounding: Our curated database acts as a filter, ensuring that only mechanistically relevant features—such as specific gene variants, mutations, or functional knockouts—contribute to model training. This dramatically reduces the impact of irrelevant, strain-specific variations that can confound purely data-driven models.
Incorporation of Genomic Interactions: Our approach moves beyond simple presence/absence checks to model the true complexity of resistance. By integrating genomic context, known regulatory interactions, and epistatic effects into our feature engineering, the model can learn how different genetic elements work together to produce a resistant phenotype.

Performance Improvement by Drug — Figure 3. Improvement in categorical agreement across drugs with expert curation, simple ML examining each feature independently, and BugSeq’s ML integrating interactions across features.

Performance Improvement by Organism — Figure 4. Improvement in categorical agreement across organisms with expert curation, simple ML examining each feature independently, and BugSeq’s ML integrating interactions across features.

As demonstrated in Figures 3 and 4, BugSeq’s accuracy has now surpassed commonly used regulatory thresholds for many organisms and drugs.

BugSeq Reports for Clinical, Public Health, and Research¶

Building AMR prediction with high accuracy is only the first step. For clinical, public health, and research use, reports require additional, crucial information to be truly actionable. We’ve built BugSeq to deliver this depth of insight, tailored to the needs of different users.

Example BugSeq AMR Report
AMR Report Highlights
- gAST stratified for each organism in the sample
- Genotype linked with predicted phenotype
- Diverse mechanisms of resistance, including gene acquisition, mutations and insertion sequences
- Confidence scores to inform the accuracy of prediction
Example BugSeq Plasmid Report
Plasmid Report Highlights
- Localization of AMR genes to specific plasmids
- Plasmid host range prediction, enabling attribution of AMR genes and corresponding phenotype to bacteria within the sample
- Detailed information on plasmids, including mobility and typing information to understand the spread of AMR via plasmids

Case Studies¶

Case Study 1: BugSeq Accurately Predicts Multidrug-Resistant Shigella Resistance¶

Leung et al. found perfect correlation between BugSeq’s gAST prediction and phenotype for 64 Shigella flexneri isolates. These isolates were studied in the context of an outbreak of multidrug-resistant Shigella flexneri among people experiencing homelessness. Using BugSeq, they were able to go a step further and correlate genomic features with resistance to each drug:

Figure 6. Table 2 from Leung et al. (2025) demonstrating the power of correlating genotypic determinants of resistance with phenotype.

Case Study 2: Protein Knockouts in cysB and Their Impact on Mecillinam Activity¶

Stefanovic et al:

“We demonstrated relatively low mecillinam MICs among MDR Shigella isolates and the absence of known genetic resistance markers, suggesting that pivmecillinam is worth considering for the treatment of infections due to MDR Shigella.”

Another compelling example of the advantages of expert curation is the detection of protein knockouts influencing antibiotic resistance. To our knowledge, none of the major AMR databases incorporate features beyond gene acquisition and mutations, whereas BugAMR also incorporates protein knockouts, insertion sequences, and other features. Stefanovic et al. could therefore use BugSeq to assess for cysB knockouts conferring mecillinam resistance, which is the predominant mechanism of resistance to this drug in E. coli. Similar to Leung et al, they saw perfect correlation across 95 isolates. Furthermore, use of BugSeq’s curated database enabled confidence that genes detected in this study; such as blaOXA-1, blaTEM-1, blaCT-M-15; were not involved in mecillinam resistance.

Conclusion: The Future of Phenotypic AMR Prediction is Integrated and Intelligent¶

As the global crisis of antimicrobial resistance continues to escalate, the need for rapid, accurate, and reliable diagnostic tools has never been more urgent. While whole genome sequencing holds the key, we must move beyond simplistic computational approaches and embrace models that reflect the true complexity of resistance biology. The path forward lies in integrated and intelligent systems that synergize deep, expert-driven biological knowledge with the power of transparent and sophisticated machine learning. At BugSeq, we are committed to pioneering this approach. By building our platform on a foundation of meticulous expert curation, we provide our machine learning models with the high-quality, mechanistically relevant features they need to avoid confounding and learn the complex, quantitative, and interactive nature of resistance. The result is a system that delivers predictions that are not only accurate but also interpretable and actionable.