The mapall is a command to quickly map short sequences to a reference genome. This
command combines available mapping algorithms for fast and sensitive alignment. The
algorithms follows a multi-stage approach, with a set of algorithms and associated settings
for each stage. If there are no mappings for a read by applying the algorithms in the
i
th
stage that pass lters, then the algorithms in the
i
+ 1
th
stage are applied. For example, a
set of algorithms to quickly align near-perfect reads may be used in the rst stage, while a
set of sensitive algorithms may be used to map dicult reads in the second stage
It is recommended that mapvsw is not used for any large scale mapping project.
Overview
First, a set of global options is specied that will be used for the algorithms to be applied at
all stages. Global options should not be given in the options for a specic stage or algorithm.
Next, a stage will be specied and its associate options that will be applied across algorithms
in that stage. Stage-specic options should not be given in the options for any algorithm
in that stage. The stage name should be specied by \stage%d", where \%d" is the stage
number (one-based). Finally, the algorithms and their associated options will be specied
for the given stage.
Example:
tmap mapall -f ref.fasta -r reads.fastq -g 1 -M 3 stage1
--stage-keep-all map1 --seed-length 12 --seed-max-diff 4
stage2 map2 --z-best 5 map3 --max-seed-hits 10
In this case, the global options -f ref.fasta -r reads.fastq -g 1 -M 3 will be
applied to all stages and algorithms. Therefore, map1 algorithm will be applied with the
options -g 1 -M 3 --seed-length 12 --seed-max-diff 4 in the rst stage. If no
mapping is found for a read, the map2 algorithm with the options -g 1 -M 3 --z-best
5 and the map3 algorithm with the options -g 1 -M 3 --max-seed-hits 10 will be
applied in the second stage. Notice how the global options -g 1 -M 3 are applied to all the
1.10. TMAP MAPALL 19algorithms where applicable. It is possible to have the same algorithm run in two different
stages.
The recommended values are pre-set, and the following command line is recommended:
tmap mapall -f <in.fasta> -r <in.fastq> -n <num.threads> -v
stage1 map1 map2 map3 > <out.sam>
Usage
The following options apply to a single stage, and control how mappings are handled between
stages.
--stage-score-thres INT
Species the number of multiples of the match score (-A) for the minimum scoring threshold
for the current stage. An alignment is ltered in the given stage if it has an alignment score
less than this threshold, and there are more stages to process.
--stage-mapq-thres INT
Species the mapping quality threshold for the current stage. An alignment is ltered in the
current stage if it has a mapping quality less than this threshold, and there are more stages
to process.
--stage-keep-all
Species to keep mappings from the current stage for the next stage. If this option is given,
the mappings from the previous stage are added to any mappings from the subsequent stage
as candidates.
--stage-seed-freq-cutoff FLOAT
Species the minimum frequency a seed must occur in order to be considered for mapping.
--stage-seed-freq-cutoff-group-frac FLOAT
Species if more than this fraction of groups were ltered, keep representative hits.
--stage-seed-freq-cutoff-rand-repr INT
Species the number of representative hits to keep. In addition, the group with most seeds
will always be kept if this option has a greater than zero value.
--stage-seed-freq-cutoff-min-groups INT
Species the minimum of groups required after the lter has been applied, otherwise itera-
tively reduce the lter.
--stage-seed-max-length INT
Species the length of the prex of the read to consider during seeding.