Technical Note

TMAP: The Torrent Mapping Alignment Program for Ion Torrent Data

Amapping software program, called Torrent Mapping Alignment Program (TMAP), has been developed to meet Ion Torrent data mapping challenges.

Background

Sequence alignment is a critical component of any project utilizing next generation sequencing technologies. There are many options for alignment software, each optimized for specific sequencing platforms and downstream applications. The Ion Torrent data have particular qualities requiring special consideration during the alignment process:

Reads generated by the Ion PGM and Proton Sequencersare variable length and are expected to increase over time.
The principal error mode associated with Ion Torrent data relates to miscalling homopolymer lengths and results in insertion or deletion errors during alignment and post-processing.

As part of Torrent Suite Software on Torrent Server, TMAP performs alignment in the primary analysis pipeline. The TMAP source code is available on the Torrent Dev community Web site under the GPLv2 license, and can be compiled on any standard *nix system for use outside the Torrent Suite Software primary analysis pipeline.

Implementation

TMAP provides a fast and accurate aligner through the i ntegration of a novel alignment algorithm and three popular algorithms:

BWA-short(Li and Durbin, 2009)
BWA-long(Li and Durbin, 2010)
SSAHA(Ning et al, 2001)
Super-maximal Exact Matching (Li 2012)

The overall alignment strategy identifies a list of Candidate Mapping Locations (CMLs) using a subset of these algorithms. The CMLs are then aligned using the Smith Waterman algorithm (Smith and Waterman, 1981). The resulting alignments are aggregated to find the best mapping(s), and a user-defined parameter determines if all alignments, a subset of alignments, or a random best alignment is reported.

TMAP creates an efficient index using the compressed suffix array to quickly and compactly index and query the genome reference. Index creation occurs for both the forward and reverse references, which benefits some alignment algorithms. The final index size for a human hg19 reference is 4.7GB and takes less than four hours to create.

Icon

For these evaluations, a Mac Pro running Mac OS X (10.6.6) with a dual 6-core Intel Xeon Processors (2.66GHz) and 32GB of 1066 MHz DDR RAM was used. Up to 24-threads could be used because of the hyper-threading capability of these processors.

TMAP implements a two-stage mapping approach to maintain sensitivity and specificity while significantly reducing runtime. In two-stage mapping, reads that do not align during the first stage are passed to the second stage with a new set of algorithms and/or parameters.

The implementation also supports easy integration of other mapping algorithms that can utilize the TMAP index.

Advantages

TMAP has key advantages over other alignment tools:

The re-implemented versions of BWA-short, BWA-long, and SSAHA are significantly optimized to deal with the varied length reads and error profiles that are specific to Ion Torrent Systems . Therefore, TMAP results are expected to perform significantly better when compared against the original algorithms.When you run TMAP, you can use all the algorithms, a subset, or only one algorithm, for greater flexibility, efficiency, and accuracy.
Because TMAP can combine the CMLs for all three methods and identifies the best alignment, the final accuracy and specificity benefit from the advantages of each individual algorithm.Nonetheless, each algorithm is optimized and can be used on its own.
TMAP has been optimized for performance and, because the TMAP index is shared between all algorithms, the overall performance (both CPU load and RAM utilization) is comparable to other popular alignment software. The performance when combining multiple algorithms is equal to or better than the total performance of running each algorithm separately.
With a framework that quickly integrates other algorithms, TMAP is a versatile tool for fast and accurate mapping of Ion Torrent data.

Best Practices

The current algorithm recommended is map4 .This algorithm runs quickly while maintaining a high degree of sensitivity and specificity. For maximum sensitivity and specificity at the cost of running time, utilizing all algorithms or a subset is recommended. For maximum speed, it is recommended to use the map4 algorithm with the stage option --stage-seed-freq-cutoff=0.1 .

References

Li, H, Durbin, R (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25, 14:1754-60.
Li, H, Durbin, R (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics , 26, 5:589-95.
Ning, Z, Cox, AJ, Mullikin, JC (2001). SSAHA: a fast search method for large DNA databases. Genome Res. , 11, 10:1725-9.
Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195 197.
Li, Heng (2012). Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics, 28, 14:1838-1844.