Raw Data¶

BugSeq outputs the following raw data files, depending on analysis:

Isolate/Metagenomic Analyses¶

Text Format¶

Under the folder Metagenomic Classification Summary, there’s a file named metagenomic_classification-RUN_ID.tsv. This file contains the raw data visualized in the above file. For example, opening this file in Excel will reveal:

Taxon Rank	Taxon NCBI ID	Taxon Name	Sample 1 - Read count at this taxon and below	Sample 1 - Read count directly assigned to this taxon
U	0	unclassified	28601	28601
R	1	root	67461	0
R1	131567	cellular organisms	67458	2
D	2	Bacteria	67449	25
P	1224	Proteobacteria	67248	14
C	1236	Gammaproteobacteria	67229	4
O	91347	Enterobacterales	67214	131
F	543	Enterobacteriaceae	59408	642
F1	2890311	Klebsiella/Raoultella group	57411	34
G	570	Klebsiella	57351	864
S	573	Klebsiella pneumoniae	52521	49766

The first three columns are row labels and reflect taxonomic nodes. BugSeq follows the NCBI taxonomic scheme. Taxon rank codes reflect (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Intermediate ranks, eg. F1, reflects one level below family.

Each sample is then included as two columns:

Read count at this taxon and below: This field contains the summed read count at this taxon. In this example, 67449 reads were assigned to the superkingdom Bacteria or a rank below Bacteria.
Read count directly assigned to this taxon: These reads could not be assigned to a lower taxonomic node given mapping ambiguity and/or the nature of the taxonomic tree (eg. if the reads are assigned to the lowest rank in the tree). In this example, 25 reads were identified as bacterial in origin, but could not be assigned to lower nodes such as Klebsiella pneumoniae.

Assembly Bins (FASTAs)¶

Assembled contigs are found in the Assembly folder under Metagenomic Bins as .fna (FASTA) files. These files contain all organisms with sufficient depth in the submitted sequencing data to be assembled. Details on each bin, such as their completeness (eg. BUSCO count), antimicrobial resistance profile and more are found in the summary and per-sample reports.

Tip

Output FASTA files are sorted by contig length, from longest to shortest contig. Output FASTAs also have contig length and plasmid information in their sequence header.