Skip to content

Optimizing Data Submission & BugSeq Results

Data Compression

Uploading gzip compressed files will reduce transfer time of large files to BugSeq.

Optimizing data for upload

We are piloting a tool to validate and optimize input files for submission to BugSeq. It processes data locally, without the need for data upload. Try it at https://tools.bugseq.bio.

Sequencing Depth

For all sequencing platforms, BugSeq recommends a minimum median depth of sequencing of 40X to enable accurate genome assembly, strain characterization and antimicrobial resistance prediction. This recommendation is based on the following:

  • As per FDA, “Currently, we believe thresholds at…20X depth [at every position across the entire assembled genome] are sufficient to apply these genomes for diagnostic purposes within bounded use cases.”1 20X at every position often corresponds to a median depth of 40X, depending on the variation of sequencing depth.
  • For short read sequencing technologies, assembly quality plateaus at 40X coverage.2
  • Inter-laboratory studies demonstrate strong reproducibility at 30X coverage and above.3
  • For ONT, “near-finished microbial reference genomes can be obtained from R10.4 data alone at a coverage of approximately 40-fold”.4

Talk to our team of experts to find the right depth of sequencing for your experimental goals.

Get In Touch

Talk to our team of experts to find the right depth of sequencing for your experimental goals.

Get In Touch

Differences in reported coverage from BugSeq compared with alternative approaches

BugSeq users have reported differences in median depth of sequencing in BugSeq reports compared to alternative approaches (e.g. samtools depth) when Illumina paired-end sequencing is performed. These differences are often a result of BugSeq calculating sequencing depth of paired-end reads correctly: many tools ignore paired-end overlaps and therefore double-count these regions for depth calculation. The bioinformatics community and peer-reviewed literature agree that these overlaps should only be counted once. BugSeq has filed requests to change the default behavior of SAMtools and CDC pipelines but these requests have not been accepted or acknowledged.

Optimizing Nanopore Outputs for BugSeq

Basecalling

BugSeq recommends using the latest, SUP-version basecaller from ONT when possible. SUP basecalling enables the most reproducible, accurate BugSeq results; for certain applications which require real-time or faster basecalling, HAC may be an acceptable alternative. FAST-version basecalling may lead to high levels of gene fragmentation and should generally be avoided.

FASTQ File Size

To achieve maximal upload and runtime performance, prefer larger per-barcode files over thousands of small files.

The number of reads per FASTQ file can be specified in MinKNOW GUI when configuring the sequencing run. At the final step where output format is specified, users can modify the number of reads (records) per FASTQ file. For the most efficient analysis speed, we recommend selecting a value of at least 100,000, and possibly as high as 500,000 (much larger than the default of 4,000 reads per FASTQ file).

Modifying MinKNOW settings is best as the resulting files will still conform to ONT’s file naming conventions, which BugSeq can parse. You can also concatenate FASTQ files manually for Oxford Nanopore sequence data into a smaller number of files per sample. However, please be careful to adhere to BugSeq’s input filename requirements.

Multiplexing

BugSeq automatically performs demultiplexing and adapter trimming on nanopore sequencing data.

Strict (dual-barcode) nanopore demultiplexing

BugSeq parses FASTQ headers for barcoding data. Often, users may want BugSeq to perform strict demultiplexing of nanopore data, looking for barcodes on both ends of reads. Strict demultiplexing reduces the incidence of barcode crosstalk and leads to more accurate results. Users should either perform strict demulitplexing before submitting to BugSeq, or perform no demultiplexing before submitting to BugSeq. Files which have already been demultiplexed with default (single-ended barcode) demultiplexing will not be further demultiplexed by BugSeq.