Torrent Suite User Documentation : Mapall

The mapall is a command to quickly map short sequences to a reference genome. This

command combines available mapping algorithms for fast and sensitive alignment. The

algorithms follows a multi-stage approach, with a set of algorithms and associated settings

for each stage. If there are no mappings for a read by applying the algorithms in the i th

stage that pass lters, then the algorithms in the i + 1 th stage are applied. For example, a

set of algorithms to quickly align near-perfect reads may be used in the rst stage, while a

set of sensitive algorithms may be used to map dicult reads in the second stage

It is recommended that mapvsw is not used for any large scale mapping project.

Overview

First, a set of global options is specied that will be used for the algorithms to be applied at

all stages. Global options should not be given in the options for a specic stage or algorithm.

Next, a stage will be specied and its associate options that will be applied across algorithms

in that stage. Stage-specic options should not be given in the options for any algorithm

in that stage. The stage name should be specied by \stage%d", where \%d" is the stage

number (one-based). Finally, the algorithms and their associated options will be specied

for the given stage.

Example:

       tmap mapall -f ref.fasta -r reads.fastq -g 1 -M 3 stage1
       


       --stage-keep-all map1 --seed-length 12 --seed-max-diff 4
       


       stage2 map2 --z-best 5 map3 --max-seed-hits 10

In this case, the global options -f ref.fasta -r reads.fastq -g 1 -M 3 will be

applied to all stages and algorithms. Therefore, map1 algorithm will be applied with the

options -g 1 -M 3 --seed-length 12 --seed-max-diff 4 in the rst stage. If no

mapping is found for a read, the map2 algorithm with the options -g 1 -M 3 --z-best

5 and the map3 algorithm with the options -g 1 -M 3 --max-seed-hits 10 will be

applied in the second stage. Notice how the global options -g 1 -M 3 are applied to all the

1.10. TMAP MAPALL 19algorithms where applicable. It is possible to have the same algorithm run in two different

stages.

The recommended values are pre-set, and the following command line is recommended:

       tmap mapall -f <in.fasta> -r <in.fastq> -n <num.threads> -v
       


       stage1 map1 map2 map3 > <out.sam>

Usage

The following options apply to a single stage, and control how mappings are handled between

stages.

       --stage-score-thres INT

Species the number of multiples of the match score (-A) for the minimum scoring threshold

for the current stage. An alignment is ltered in the given stage if it has an alignment score

less than this threshold, and there are more stages to process.

       --stage-mapq-thres INT

Species the mapping quality threshold for the current stage. An alignment is ltered in the

current stage if it has a mapping quality less than this threshold, and there are more stages

to process.

       --stage-keep-all

Species to keep mappings from the current stage for the next stage. If this option is given,

the mappings from the previous stage are added to any mappings from the subsequent stage

as candidates.

       --stage-seed-freq-cutoff FLOAT

Species the minimum frequency a seed must occur in order to be considered for mapping.

       --stage-seed-freq-cutoff-group-frac FLOAT

Species if more than this fraction of groups were ltered, keep representative hits.

       --stage-seed-freq-cutoff-rand-repr INT

Species the number of representative hits to keep. In addition, the group with most seeds

will always be kept if this option has a greater than zero value.

       --stage-seed-freq-cutoff-min-groups INT

Species the minimum of groups required after the lter has been applied, otherwise itera-

tively reduce the lter.

       --stage-seed-max-length INT

Species the length of the prex of the read to consider during seeding.