Skip to content

Outbreak Analysis


Results for multilocus sequence typing (MLST) of each organism in each sample are available in the per-sample reports. MLST schemes are mirrored from PubMLST.

Plasmid Typing

Outbreaks of antimicrobial resistance (eg. carbapenemases) are frequently mediated by plasmids and can span across multiple bacterial species. BugSeq performs plasmid detection and typing on each sample. Details on the detected plasmids are available in the per-sample and summary reports.

BugSeq uses MOB-Cluster IDs for plasmid identification and naming. MOB-Clusters are similar to unique taxonomic identifiers (eg. species) for plasmids, and are stable over time. Plasmids which are not in our plasmid database but which appear as novel plasmids will be given the identifier novel_{MD5 hash}. Further detail on MOB-Clusters is available in the respective publication.

Fine-Grained Outbreak Analysis

Outbreak analysis is an incubating feature

Validation of results is suggested.

Outbreak analysis is not currently available for nanopore R9.4.1 or R10.3 sequencing platforms.

BugSeq identifies all bacterial genomes in the submitted sequencing data and performs refMLST, a publication-pending method to calculate allele distances between samples without a core genome multilocus sequence typing (cgMLST) scheme. Preliminary validation demonstrates that refMLST generates reliable allele count distances that are highly concordant with traditional cgMLST/SNP approaches, yet can be applied to any bacterial species. Further validation and preparation for publication are ongoing.

BugSeq first collects all genomes in a submission (i.e. across samples) and divides them into bins for each species. If your lab is on a BugSeq subscription, each species from each sample will be saved across analyses and included in the most recent and any future analyses.

Can BugSeq include our background genomic sequences for epidemiologic comparison?

Yes. For labs on a subscription, please get in touch with us to have your custom genomes added to the background database.

The BugSeq Outbreak Investigation modules outputs:

Distance Matrix of Inter-Sample Allele Distances

For example, if there were Salmonella enterica in the input samples, a file named Salmonella_enterica-Distance_Matrix.xlsx would be produced.

GCF_000006945.2 Sample 1 Sample 2 Sample 3
GCF_000006945.2 0 3405 3652 3503
Sample 1 3405 0 4 12
Sample 2 3652 4 0 25
Sample 3 3503 12 25 0

In this example, Sample 1 is 3405 alleles different from the reference profile, 4 alleles different from Sample 2, and 12 alleles different from Sample 3. If a subsequent BugSeq analysis included Sample 4, this sample would be added to the table in the output of that analysis.

Cluster Addresses

Similar to SnapperDB and chewieSnake, cluster addresses are generated for each genome within each species. Cluster addresses follow a nomenclature reflecting allele differences between isolates, and contain seven digits; each digit reflects a difference allelic distance between clusters. If two isolates are identical or within 5 allele differences of eachother, they will share the same cluster address and seven digits. If two isolates are between 6 and 10 alleles of eachother, they will share the first six digits, and the seventh digit will be unique.

For example, cluster addresses for the above three samples and reference genome, located in the Salmonella_enterica-Cluster_Addresses.xlsx file, should look like:

Date Cluster Address Cluster - 1000 alleles Cluster - 200 alleles Cluster - 100 alleles Cluster - 50 alleles Cluster - 20 alleles Cluster - 10 alleles Cluster - 5 alleles
GCF_000006945.2 Feb 3, 2022 1 1 1 1 1 1 1
Sample 1 Feb 3, 2022 2 1 1 1 1 1 1
Sample 2 Feb 3, 2022 2 1 1 1 1 1 1
Sample 3 Feb 3, 2022 2 1 1 1 1 2 1

In this scenario, Sample 3 was 12 alleles different to the nearest isolate, and so was placed in cluster Examining cluster, we can say that this cluster is within 11-20 alleles of cluster based on their names.

Last update: March 1, 2022