Raw Data¶
BugSeq outputs the following raw data files, depending on analysis:
Isolate/Metagenomic Analyses¶
Text Format¶
Under the folder Metagenomic Classification Summary
, there’s a file named metagenomic_classification-RUN_ID.tsv
. This file contains the raw data visualized in the above file. For example, opening this file in Excel will reveal:
Taxon Rank | Taxon NCBI ID | Taxon Name | Sample 1 - Read count at this taxon and below | Sample 1 - Read count directly assigned to this taxon |
---|---|---|---|---|
U | 0 | unclassified | 28601 | 28601 |
R | 1 | root | 67461 | 0 |
R1 | 131567 | cellular organisms | 67458 | 2 |
D | 2 | Bacteria | 67449 | 25 |
P | 1224 | Proteobacteria | 67248 | 14 |
C | 1236 | Gammaproteobacteria | 67229 | 4 |
O | 91347 | Enterobacterales | 67214 | 131 |
F | 543 | Enterobacteriaceae | 59408 | 642 |
F1 | 2890311 | Klebsiella/Raoultella group | 57411 | 34 |
G | 570 | Klebsiella | 57351 | 864 |
S | 573 | Klebsiella pneumoniae | 52521 | 49766 |
The first three columns are row labels and reflect taxonomic nodes. BugSeq follows the NCBI taxonomic scheme. Taxon rank codes reflect (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Intermediate ranks, eg. F1, reflects one level below family.
Each sample is then included as two columns:
- Read count at this taxon and below: This field contains the summed read count at this taxon. In this example, 67449 reads were assigned to the superkingdom Bacteria or a rank below Bacteria.
- Read count directly assigned to this taxon: These reads could not be assigned to a lower taxonomic node given mapping ambiguity and/or the nature of the taxonomic tree (eg. if the reads are assigned to the lowest rank in the tree). In this example, 25 reads were identified as bacterial in origin, but could not be assigned to lower nodes such as Klebsiella pneumoniae.
Assembly Bins (FASTAs)¶
Assembled contigs are found in the Assembly
folder under Metagenomic Bins
as .fna
(FASTA) files. These files contain all organisms with sufficient depth in the submitted sequencing data to be assembled. Details on each bin, such as their completeness (eg. BUSCO count), antimicrobial resistance profile and more are found in the summary and per-sample reports.
Tip
Output FASTA files are sorted by contig length, from longest to shortest contig. Output FASTAs also have contig length and plasmid information in their sequence header.