The BaseCaller and Barcode Classification


Torrent Suite Software space on Ion Community

Use Cases TOC

Overview of BaseCaller and Barcode Classification

This page discusses BaseCaller operations in general and issues around BaseCaller parameters, barcode classification, and filtering and trimming.

The settings of BaseCaller parameters control barcode classification as well as filtering and trimming.

Icon

Barcode classification is changed in the 4.2 release . If you are used to barcode classification in the 4.0 release, please see C hanges in the 4.2 release in the BaseCaller Parameters page.

About barcodes

Barcodes are short base sequences that during library preparation are placed between the library key and the read. The barcode sequences provide a mechanism to distinguish and identify reads from different samples during data analysis .

The use of barcodes allows multiple samples to be sequenced together on one chip during a sequencing run, and still have the run's read data be analyzed separately afterward as distinct samples.

This diagram shows the placement of the barcode sequence, as well as the library key and adapters, with the read sequence (which is labeled "Template Bases"). The key is on the 5' end.

This example shows the location of the barcode sequence in both base space and flow space, using barcode IonPress_001 as an example:

Analysis pipeline overview

The beginning steps of the Torrent Suite Software analysis pipeline are shown below:

Steps:

  1. The sequencing instrument generates DAT files of electrical signals' raw traces.
  2. The signal processing step converts the raw traces into a single number per flow per well, in the 1.wells file.
  3. The BaseCaller converts the 1.wells file information into a sequence of bases and writes the sequence into an unaligned BAM file.
  4. The BAM file is passed to TMAP for alignment.

The signal processing step also marks several types of low-quality reads:

  • Polyclonal reads (reads with two template beads instead of one)
  • Reads with high signal processing residual (indicating an ambiguous signal value)
  • Reads that do not contain a valid library key

The signal processing step marks these problematic reads but does not remove them.

Overview of BaseCaller functionality

In addition to creating a sequence of bases from the 1.wells file information, the BaseCaller module also performs read filtering and read trimming.

Notes on read filtering:

  • Filters out low-quality reads that were marked during signal processing.
  • Filters out reads that fail basecalling filters.
  • Filtered out reads do not appear in the BAM file. The BaseCaller keeps counts of these reads but there is no record of specific reads that are filtered out.

Notes on read trimming:

  • Removes certain bases from the read for quality reasons.
  • The read appears in the BAM file.
  • The removed bases do not appear in the BAM file.



These are the steps performed in the BaseCaller:

  1. Remove low-quality reads that were marked during the signal processing step.
  2. Do base calling:
    1. From the signal values, create the sequence of bases.
    2. Estimate the base quality value for each base.
  3. Do barcode classification:
    1. Assign each read to a barcode.
    2. Trim the barcode sequence away.
  4. Trim extra bases at the 5' end. Controlled by --extra-trim-left (default is 0, meaning no extra trimming).
  5. Filter out reads that are too short. Controlled by --min-read-length and -- trim- min-read-len .
  6. Filter out reads that do not have the correct library key. Can be turned off by --keypass-filter .
  7. Trim the P1 adapter (at the 3' end).
  8. Perform quality trimming. Affect ed by --trim-qual-window-size and -- trim-qual-cutoff .



Notes about quality trimming:

  • The purpose of quality trimming is to identify where quality issues begin at the end of a read. We try to identify when bases fall below a quality threshold and trim both those bases and a bit before those bases.
  • The parameter --trim-qual-window-size sets the window size for quality trimming. The algorithm slides through the sequence of bases and, each time the window shifts, computes the mean Base QV value for all bases in the window.
  • If the mean Base QV value for all bases in the window falls below a threshold (set the by parameter --trim-qual-cutoff , default 16), then we trim all bases from the center of the window at that time to the 5' end.



Notes about barcode classification and barcode filtering

Barcode classification determines which barcode group a read is assigned to. Barcode classification is done for each read immediately after base calling.

Barcode filtering determines if a specific barcode is included in the run report or is filtered out. Barcode filtering works on the barcode groups as a whole



Other BaseCaller and barcode classification pages

Other pages related to barcode classification: