Skip to content

Raw Data

BugSeq outputs the following raw data files, depending on analysis:

Isolate/Metagenomic Analyses

Text Format

Under the folder Metagenomic Classification Summary, there’s a file named metagenomic_classification-RUN_ID.tsv. This file contains the raw data visualized in the above file. For example, opening this file in Excel will reveal:

Taxon Rank Taxon NCBI ID Taxon Name Sample 1 - Read count at this taxon and below Sample 1 - Read count directly assigned to this taxon
U 0 unclassified 28601 28601
R 1 root 67461 0
R1 131567 cellular organisms 67458 2
D 2 Bacteria 67449 25
P 1224 Proteobacteria 67248 14
C 1236 Gammaproteobacteria 67229 4
O 91347 Enterobacterales 67214 131
F 543 Enterobacteriaceae 59408 642
F1 2890311 Klebsiella/Raoultella group 57411 34
G 570 Klebsiella 57351 864
S 573 Klebsiella pneumoniae 52521 49766

The first three columns are row labels and reflect taxonomic nodes. BugSeq follows the NCBI taxonomic scheme. Taxon rank codes reflect (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Intermediate ranks, eg. F1, reflects one level below family.

Each sample is then included as two columns:

  • Read count at this taxon and below: This field contains the summed read count at this taxon. In this example, 67449 reads were assigned to the superkingdom Bacteria or a rank below Bacteria.
  • Read count directly assigned to this taxon: These reads could not be assigned to a lower taxonomic node given mapping ambiguity and/or the nature of the taxonomic tree (eg. if the reads are assigned to the lowest rank in the tree). In this example, 25 reads were identified as bacterial in origin, but could not be assigned to lower nodes such as Klebsiella pneumoniae.

Assembly Bins (FASTAs)

Assembled contigs are found in the Assembly folder under Metagenomic Bins as .fna (FASTA) files. These files contain all organisms with sufficient depth in the submitted sequencing data to be assembled. Details on each bin, such as their completeness (eg. BUSCO count), antimicrobial resistance profile and more are found in the summary and per-sample reports.

Tip

Output FASTA files are sorted by contig length, from longest to shortest contig. Output FASTAs also have contig length and plasmid information in their sequence header.