Metagenomic classification¶
Interactive (Krona) report¶
Clicking the “Interactive” button under “Metagenomic Classification” produces an interactive Krona plot, which enables visualization of the taxonomic composition for all samples in your analysis. Single-clicking a taxon displays the number of reads assigned to that taxon in the top right corner of the plot. Double-clicking different rings of the Krona plot enables you to stratify the classification results based on different taxonomic ranks. You can navigate between different samples in your analysis in the navigation pane on the left side of the report.
Text format (.TSV): One report per-analysis¶
Under the folder Metagenomic Classification Summary
, there’s a file named metagenomic_classification-RUN_ID.tsv
. This file contains the raw data visualized in the Krona report for all samples submitted within a given analysis. For example, opening this file in Excel will reveal:
Taxon Rank | Taxon NCBI ID | Taxon Name | Sample 1 - Read count at this taxon and below | Sample 1 - Read count directly assigned to this taxon |
---|---|---|---|---|
U | 0 | unclassified | 28601 | 28601 |
R | 1 | root | 67461 | 0 |
R1 | 131567 | cellular organisms | 67458 | 2 |
D | 2 | Bacteria | 67449 | 25 |
P | 1224 | Proteobacteria | 67248 | 14 |
C | 1236 | Gammaproteobacteria | 67229 | 4 |
O | 91347 | Enterobacterales | 67214 | 131 |
F | 543 | Enterobacteriaceae | 59408 | 642 |
F1 | 2890311 | Klebsiella/Raoultella group | 57411 | 34 |
G | 570 | Klebsiella | 57351 | 864 |
S | 573 | Klebsiella pneumoniae | 52521 | 49766 |
The first three columns are row labels and reflect taxonomic nodes. BugSeq follows the NCBI taxonomic scheme. Taxon rank codes reflect (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Intermediate ranks, eg. F1, reflects one level below family.
Each sample is then included as two columns:
- Read count at this taxon and below: This field contains the summed read count at this taxon. In this example, 67449 reads were assigned to the superkingdom Bacteria or a rank below Bacteria.
- Read count directly assigned to this taxon: These reads couldn’t be assigned to a lower taxonomic node given mapping ambiguity and/or the nature of the taxonomic tree (eg. if the reads are assigned to the lowest rank in the tree). In this example, 25 reads were identified as bacterial in origin, but couldn’t be assigned to lower nodes such as Klebsiella pneumoniae.
Text format (.CSV): One report per-sample¶
BugSeq outputs a CSV text-formatted file for each sample in an analysis, which contains a read-by-read summary of how each read was classified (organism name and NCBI Taxonomy ID), whether the classification was from the read-based classifier, or the de novo assembly, the read length, the read coverage, and the percentage identity (for non-assembled reads). For example, opening this file in Excel will reveal:
read_id | read_length | tax_id | percent_identity | source | tax_name | tax_rank | read_coverage |
---|---|---|---|---|---|---|---|
A00111:702:H3THHHDO9:3:1107:2419:8343 | 150 | 573 | assembly | Klebsiella pneumoniae | species | 100 | |
A00111:671:H3THHHDO9:4:1631:19262:5228 | 150 | 573 | assembly | Klebsiella pneumoniae | species | 100 | |
A00111:671:H3THHHDO9:4:2620:23818:26334 | 150 | 29466 | assembly | Veillonella parvula | species | 100 | |
A00111:702:H3THHHDO9:3:2307:29586:18161 | 150 | 29466 | assembly | Veillonella parvula | species | 100 | |
A00111:671:H3THHHDO9:4:1513:8865:14246 | 150 | 29466 | assembly | Veillonella parvula | species | 100 | |
A00111:702:H3THHHDO9:3:2313:14073:32847 | 150 | 573 | assembly | Klebsiella pneumoniae | species | 100 | |
A00111:671:H3THHHDO9:4:2534:30201:26584 | 150 | 543 | assembly | Enterobacteriaceae | family | 100 | |
A00111:702:H3THHHDO9:3:1561:24822:17738 | 150 | 543 | assembly | Enterobacteriaceae | family | 100 |
Kraken-formatted reports¶
BugSeq outputs Kraken-formatted reports for each sample, which provide information about which taxonomic rank each read was classified to. They follow a similar structure to the text-formatted (.TSV) files described above. These reports can also be useful as they can be easily integrated with downstream analytical tools.