Skip to content

BugID: A machine learning enabled tool to implement evidence-based reporting thresholds specific to your NGS assay

Pathogen agnostic diagnostic assays, driven by metagenomic next-generation sequencing (mNGS), enable diagnosis and surveillance of pathogens unlike any other technology. However, a lack of bioinformatic frameworks and evidence to guide reporting thresholds has made the technology too complex for most labs. BugSeq’s latest innovation, BugID, brings intuitive, evidence-based thresholds within reach for your metagenomic assay.

Existing molecular diagnostic devices are designed to target specific pathogens depending on their primer design and are therefore reactive by design. Clinical microbiology laboratories and industry have begun developing and implementing mNGS-based diagnostic assays for certain use cases, including diagnosis of infection from lower respiratory specimens in ICU patients (Charalampous et al. 2024) and sterile fluid specimens (Benoit et al. 2024; Simner et al. 2018). However, there is a lack of standardization in the criteria used for reporting clinically significant organisms between sites that are performing these high-complexity tests, and a lack of bioinformatic knowledge at smaller institutions has prevented them from building and validating the analysis pipelines and reporting thresholds required to implement mNGS testing as laboratory developed tests (LDTs). This has in part contributed to the slow adoption of this technology.

In a previous blog post, we explored how BugSeq helps to overcome the unique challenges in translating NGS to clinical settings, including version control, cybersecurity, quality assurance, and curating reference databases (which we have also recently published; Chorlton. 2024). However, one critical unsolved piece for implementing bioinformatic workflows for routine clinical use is the need for interpretable reporting to tease out the signal from the noise in mNGS data. Multiple guidelines have suggested that robust reporting thresholds are required for effective mNGS interpretation (de Vries et al. 2021; Miller et al. 2019)

bfx_challenges
Challenges that need to be overcome when developing clinically-oriented bioinformatics workflows

Harnessing machine learning to establish robust reporting criteria

While several studies have validated clinical metagenomics workflows and proposed assay-specific reporting criteria (e.g. unique read counts, relative abundance) for clinical metagenomic workflows, there is limited potential to generalize these rules across the breadth of sequencing platforms and specimen types, as individual assays can vary significantly in terms of their data characteristics (e.g. host abundance, read length, sequence quality, sequencing depth, etc), which influences reporting criteria. Furthermore, there has been limited work to integrate reporting thresholds within interpretable reporting schemes to enable clinical laboratory staff to more easily determine clinically significant organisms in mNGS datasets and avoid human error in mNGS test interpretation.

BugSeq was recently funded under a BARDA DRIVe EZ-BAA1 to solve this problem and develop a tool to establish robust reporting criteria using machine learning techniques across a range of commonly used mNGS specimen types and sequencing platforms in clinical microbiology. Through this program, successfully completed in June 2025, BugSeq developed BugID, a novel tool to implement reporting thresholds for common mNGS applications. BugID can also be customized to enable user-specific organism detection thresholds and reporting that are trained to be specific to any assay and mNGS application.

BugID models are trained using a myriad of sequencing metrics (e.g. read counts and abundance, normalized read counts, host abundance, genome coverage, parent taxa abundance, etc), paired along with sample metadata with known ground truth (i.e. patient diagnosis by conventional test result). For the initial BugID models, BugSeq validated the reporting criteria against both an independently withheld portion of each dataset, as well as in silico simulated metagenomes that were designed to reflect the range of host, pathogen, and commensal flora abundance seen in each specific specimen type that a BugID model was trained for (cerebrospinal fluid, lower respiratory tract specimens, and upper respiratory tract specimens).

BugID: +23.7% average accuracy increase over baseline accuracy, +13.4% over literature guided decision rules

The clinical specimen validation results showed that BugID strongly outperformed a naive model (≥1 pathogen read detected), as well as a set of literature guided decision rules that were specific to each specimen type. In all cases, the BugID model accuracy matched or improved both the naive model accuracy (average percent accuracy increase = +23.7% [0% - 56.6%]) and the literature guided decision rules (average percent accuracy increase = +13.4% [1.1% - 27.0%]), highlighting

a) the need for establishing reporting thresholds to ensure adequate mNGS assay specificity,

b) the advantages of training reporting thresholds that are specific to the intended use of the NGS assay, and

c) the risks of relying on literature guided reporting thresholds.

In silico validation using contrived specimens with known ground truth revealed >98.4% accuracy across all BugID models (n = 56,000 unique measurements).

Translating reporting thresholds to interpretable reports

Through stakeholder and regulatory body engagement, the BugSeq team designed a regulatory-oriented qualitative reporting framework to summarize detection of clinically significant organisms (akin to a syndromic molecular panel) from BugID models, along with qualitative QC criteria that can be customized on a lab-by-lab basis to suit a range of NGS assays and applications, including the following:

  • Detection of internal process controls
  • Presence of negative control (NTC) submitted for analysis
  • Sample read count threshold (i.e. whether a minimum read count was achieved)
  • Report status → did the above metrics pass QC

Recently, BugSeq has also integrated the BugID framework into our Per-Sample Reports (RUO), where we publish a column to the taxonomic classification table to report clinically significant organisms near the top of the report – enabling BugSeq users to more accurately determine which organisms in a given sample are likely to be clinically significant. In this framework, BugID thresholds2 are applied to each organism detected in the sample. Organisms above threshold would have a higher significance score, whereas organisms below threshold would have their significance score downgraded. This framework was defined through stakeholder engagement to be applicable with both sterile (e.g. cerebrospinal fluid, bronchoalveolar lavage) and non-sterile (e.g. nasopharyngeal swabs) specimen types.

Organism classification Threshold Significance Score
Clinically significant Above Probable Pathogen (Above threshold)
Below Probable Pathogen (Below threshold)
Opportunistic or undefined Above Possible Pathogen (Above threshold)
Below Possible Pathogen (Below threshold)
Contaminant or normal flora Above Contaminant or normal flora (above threshold)
Below Contaminant or normal flora (below threshold)

Integration of BugID is such a major advancement for the BugSeq platform that a new version has been released today, version 6.0.

Custom thresholds and reporting?

After implementing BugID within the BugSeq platform, a common question that we have been asked has been: “can this be customized to my specific assay?” And the answer is, yes, BugSeq can train and enable assay-specific thresholds to your specific workflow so that your lab can have confidence in the accuracy of your NGS assay.

How does it work?

  1. Your lab assembles a training dataset containing NGS data with known ground truth
  2. You sequence your specimens and submit the data to BugSeq for analysis
  3. The BugSeq team works to aggregate the necessary sequencing metrics and harnesses machine learning and expert curation to train reporting thresholds specific to your input data
  4. Custom thresholds and reporting are then implemented specifically for your BugSeq billing account

Already validated your thresholds?

You can enable custom thresholds for your NGS assay by configuring them in the “Labs” settings of your BugSeq account. Multiple thresholds can be configured for individual NGS assays by creating a new “Lab” and customizing the reporting thresholds.

Please reach out to us at contact@bugseq.com to find out more details and to discuss pricing for training customized BugID thresholds and reports.

Conclusion

Clinical interpretation of metagenomic sequence data is a critical translational barrier to the widespread adoption of mNGS for pathogen detection in clinical practice. BugID, a tool developed by BugSeq, overcomes this barrier and establishes evidence-based reporting criteria and report structures to enable laboratories and industry partners to more easily implement NGS-based diagnostics within their testing menu. BugID can be customized to generate assay-specific reporting criteria to fit your individual laboratory needs and overcomes the limitations of using reporting thresholds cited in the literature that may not be optimized for your specific laboratory test.

Ready to try BugSeq in your lab?

Get In Touch


  1. This project has been supported in whole or in part with federal funds from the Department of Health and Human Services; Administration for Strategic Preparedness and Response; Biomedical Advanced Research and Development Authority (BARDA), under contract number 75A50124C00024. 

  2. For less common sample types without a trained BugID model, or if custom thresholds are not configured by the user, a 1% abundance threshold is applied by default