Per-sample reports¶

Per-sample reports are generated for each sample submitted as part of a BugSeq analysis. These reports provide a more granular look at the composition of each sample. At the top of each per-sample report, you will find key information including the analysis name, analysis ID, pipeline version, sample type, and what reference database was used to generate the results. For samples with complete bacterial or haploid fungal genomes, you will also see a plain-text summary at the top of the report outlining the cluster address, and the QC status for the assembled genome (See the Labs for more details on how to configure QC thresholds for assembled genomes).

General stats table¶

As in the Interactive Summary, you will also find a General Statistics table in each Per-Sample report. Depending on whether the submitted sample was a WGS isolate, 16S sample or a metagenomic sample, the General Statistics table will contain different information.

For metagenomic samples, you will see the complete taxonomic composition of the sample. The columns of these reports can be configured to add or remove additional metrics (see below for additional details).

Details on each column may be found by hovering the mouse over the column name and are summarized below for important columns:

Pathogenicity Prediction: For each sample type, BugSeq maintains a comprehensive database containing pathogens associated with infection based on the sample type selected when the analysis was submitted. By default, organisms that are “Very Likely” or “Likely” pathogens for a given body site are flagged to the top of the table.

Interpreting pathogen detection results

BugSeq’s pathogenicity prediction doesn’t contain all possible pathogens for a given specimen type. Clinical adjudication is necessary to review the complete list of detected organisms in a given sample. Individual laboratories should validate their own thresholds for sequence data characteristics necessary to call a pathogen “Detected”.

Read Count: The number of reads assigned to a particular taxon
Abundance: The number of reads assigned to a particular taxon divided by the total number of reads in a given sample
Unique Read Alignments: The number of de-duplicated reads that align to the reference genome for a particular taxon. The reference genome is selected based on RefSeq’s designation of a reference for the taxon. Unique read alignments is only calculated for species-ranked taxa as there are no reference genomes designated for genus- or above-ranked taxa.

Accounting for reference bias

Although read alignment is performed against the reference genome, BugSeq uses amino acid alignment to overcome reference bias.

Negative Control Multiplicity (Summed): Reads per million of the row’s taxon and children in sample divided by reads per million of the row’s taxon and children in negative control. This calculation accounts for the varying number of reads in the sample and negative control, as well as classification uncertainty and lower read counts at deeper taxonomic ranks. An additional hidden column contains the negative control multiplicity without summing all taxa; see the “Reformatting the General Statistics table” tip below. Negative controls need to be specified as a metadata field during data submission for this column to be populated, see the Metadata Docs for more detail on how to specify negative controls as a metadata field during data submission.

Interpreting negative control multiplicity

A negative control multiplicity greater than one indicates the taxon was found more abundantly in the sample compared with the negative control. Conversely, a negative control multiplicity less than one indicates the taxon was found more abundantly in the negative control compared with the sample.

Examples

Negative control multiplicity (summed) of 20Negative control multiplicity (summed) of 0.5

Enterovirus (genus) is detected at 20% relative abundance in the sample. No classifications are made to the species rank in the sample. Enterovirus A is detected at 0.7% and Enterovirus B at 0.3% relative abundance in the negative control.

\[ \text{Negative Control Multiplicity} \, (Enterovirus) = \frac{20}{(0.7 + 0.3)} \]

Escherichia coli was detected in the sample at 1% relative abundance and in the negative control at 2% relative abundance.

\[ \text{Negative Control Multiplicity} \, (E. coli) = \frac{1}{2} \]

Others in the literature (Simner et al, Miller et al) have suggested a negative control multiplicity of 10 or greater to report a pathogen.

Internal Control Multiplicity: The number of reads assigned to a particular taxon divided by the number of reads assigned to the internal control associated with that sample (normalized reads relative to the internal control). BugSeq maintains a database of common internal process controls that are automatically detected. Please contact support if an internal control value was expected not detected in your samples.
Assembly Length/N50: These columns are populated when BugSeq was able to generate an assembly for a given taxon. Assembly length and quality may be used when interpreting the likelihood of detection for a given organism and should be interpreted in the context of the sample preparation workflow on a laboratory-by-laboratory basis.
Pathogenicity Prediction: Depending on the submitted sample type, you will see pathogenicity prediction, which takes into account a curated list of known pathogens for each body site, and assigns a pathogenicity prediction based on this list, as well as default or custom thresholds that may elevate or lower the pathogenicity score. See our recent blog post for more information on how BugSeq has developed this framework.

Reformatting the General Statistics table

Certain columns are displayed in each per-sample General Statistics table by default; however, selecting “Configure Columns” at the top of the table enables additional columns to be displayed or hidden depending on the intended use. Users can hold Shift while clicking column titles to sort the General Statistics table based on two separate columns. A CSV file of the General Statistics table can be generated by clicking the button on the top right of the table.

How can I investigate individual classifications?

We understand users want the flexibility to investigate metagenomic classification results, especially for low-abundance detections, which can in rare cases be due to false positive classification. You can directly BLAST taxa with an assembly by searching for the FASTA file under the “Assembly” dropdown. In cases where there was no assembled reads, you can use our “Filter FASTQ Files by Taxonomy ID” tool by uploading the metagenomic classification CSV file and the raw FASTQ file and downloading a new FASTA or FASTQ file containing only the reads of interest.

Antimicrobial resistance & plasmid analysis¶

In the Per-Sample Reports, you will find a more detailed AMR and Plasmid Analysis. Like the Interactive Summary Report, these include phenotype prediction. However, the Plasmid Table has additional information on whether the plasmids were able to be circularized, and the AMR table has additional information on the confidence of the result. See the AMR section for more details on how BugSeq assigns a confidence level to a given result.

Pathogen analyses & quality control¶

As with the Interactive Summary report, you will see detailed pathogen-specific analyses, MLST, and quality control statistics for each individual sample in the Per-Sample Reports. See the Interactive Summary section for more details on Pathogen Analyses and Quality Control.