Skip to content

Metagenomic classification

Interactive (Krona) report

Clicking the “Interactive” button under “Metagenomic Classification” produces an interactive Krona plot, which enables visualization of the taxonomic composition for all samples in your analysis. Single-clicking a taxon displays the number of reads assigned to that taxon in the top right corner of the plot. Double-clicking different rings of the Krona plot enables you to stratify the classification results based on different taxonomic ranks. You can navigate between different samples in your analysis in the navigation pane on the left side of the report.

Interactive visualization with Krona

Text format (.TSV): One report per-analysis

Under the folder Metagenomic Classification Summary, there’s a file named metagenomic_classification-RUN_ID.tsv. This file contains the raw data visualized in the Krona report for all samples submitted within a given analysis. For example, opening this file in Excel will reveal:

Taxon Rank Taxon NCBI ID Taxon Name Sample 1 - Read count at this taxon and below Sample 1 - Read count directly assigned to this taxon
U 0 unclassified 28601 28601
R 1 root 67461 0
R1 131567 cellular organisms 67458 2
D 2 Bacteria 67449 25
P 1224 Proteobacteria 67248 14
C 1236 Gammaproteobacteria 67229 4
O 91347 Enterobacterales 67214 131
F 543 Enterobacteriaceae 59408 642
F1 2890311 Klebsiella/Raoultella group 57411 34
G 570 Klebsiella 57351 864
S 573 Klebsiella pneumoniae 52521 49766

The first three columns are row labels and reflect taxonomic nodes. BugSeq follows the NCBI taxonomic scheme. Taxon rank codes reflect (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Intermediate ranks, eg. F1, reflects one level below family.

Each sample is then included as two columns:

  • Read count at this taxon and below: This field contains the summed read count at this taxon. In this example, 67449 reads were assigned to the superkingdom Bacteria or a rank below Bacteria.
  • Read count directly assigned to this taxon: These reads couldn’t be assigned to a lower taxonomic node given mapping ambiguity and/or the nature of the taxonomic tree (eg. if the reads are assigned to the lowest rank in the tree). In this example, 25 reads were identified as bacterial in origin, but couldn’t be assigned to lower nodes such as Klebsiella pneumoniae.

Text format (.CSV): One report per-sample

BugSeq outputs a CSV text-formatted file for each sample in an analysis, which contains a read-by-read summary of how each read was classified (organism name and NCBI Taxonomy ID), whether the classification was from the read-based classifier, or the de novo assembly, the read length, the read coverage, and the percentage identity (for non-assembled reads). For example, opening this file in Excel will reveal:

read_id read_length tax_id percent_identity source tax_name tax_rank read_coverage
A00111:702:H3THHHDO9:3:1107:2419:8343 150 573 assembly Klebsiella pneumoniae species 100
A00111:671:H3THHHDO9:4:1631:19262:5228 150 573 assembly Klebsiella pneumoniae species 100
A00111:671:H3THHHDO9:4:2620:23818:26334 150 29466 assembly Veillonella parvula species 100
A00111:702:H3THHHDO9:3:2307:29586:18161 150 29466 assembly Veillonella parvula species 100
A00111:671:H3THHHDO9:4:1513:8865:14246 150 29466 assembly Veillonella parvula species 100
A00111:702:H3THHHDO9:3:2313:14073:32847 150 573 assembly Klebsiella pneumoniae species 100
A00111:671:H3THHHDO9:4:2534:30201:26584 150 543 assembly Enterobacteriaceae family 100
A00111:702:H3THHHDO9:3:1561:24822:17738 150 543 assembly Enterobacteriaceae family 100

Kraken-formatted reports

BugSeq outputs Kraken-formatted reports for each sample, which provide information about which taxonomic rank each read was classified to. They follow a similar structure to the text-formatted (.TSV) files described above. These reports can also be useful as they can be easily integrated with downstream analytical tools.