Technical Note
The Per-Base Quality Score System
Overview
The Ion Torrent per-base quality score system uses a Phred-like method to predict the probability of correct base call. The prediction is based on the quality of the base incorporation signal that was used for generating the base calls. The Personal Genome Machine (PGM) and Ion Proton Sequencers' quality score system uses a set of 6 predictors whose values are correlated with the probability of a base miscall.
A Phred lookup table is used for converting the values of predictors to error probabilities. The lookup table is generated by training on a representative data set in customer configuration. The lookup table is re-trained for each software release and is shipped as part of the software package.
Quality scores are published in the BAM file.
Quality Score Predictors
Torrent software uses the following six predictors that are correlated with empirical base call quality:
P1 |
Penalty Residual : A penalty based on the difference between predicted and actual flow values. Computed by the base caller. |
P2 |
Local Noise : Noise (defined as the maximum absolute difference between the flow value and the nearest integer) in the immediate neighborhood (plus/minus 1 base) of the given base. |
P3 |
High-Residual Events : N umber of high-residual flows in the 20-flow window around the flow containing the base. A flow has high residual when the normalized difference between the observed and model-predicted signal exceeds 0.4 or falls below 0.4. The more high-residual flows in the window, the lower quality the base call . |
P4 |
Multiple Incorporations : Number of incorporated bases in this flow. Length of the homopolymer. For multiple incorporations of the same nucleotide in one flow, the last base in the incorporation order is assigned a value equivalent to the total number of incorporations. All other bases in the sequence of the multiple incorporations are assigned the value 1. |
P5 |
Environment Noise : The average signal noise (defined as the absolute difference between the flow value and the nearest integer) in the neighborhood (plus/minus 5 bases) of the given base. |
P6 |
State Inphase : Live polymerase in phase. |
Lookup Tables
The six quality predictors are calculated for each base. Other predictors (not described here) are computed from the corrected flow values generated by the base caller.
The corresponding per-base quality value is located by finding the first line in the lookup table for which all six calculated predictors are less than or equal to the predictor values in the table. This process occurs automatically as part of the standard analysis.
The Phred lookup tables are stored in the
/opt/ion/config
directory on Torrent Server. The Torrent Server supports separate phred tables for each type of chip (Ion 314
Chip
, Ion 316
Chip
, Ion 318 Chip,
Ion 900
Chip,
Ion PI Chip,
and
Ion PII Chip
), named
phredTable.314
,
phredTable.316
,
phredTable.318
,
phredTable.900
,
phredTable.p1.1.17,
and
phredTable.p2.1
respectively.
Results
The per-base quality along with all other read information is written to the unmapped BAM file.
The per-base quality scores are reported in the QUAL field.
The quality scores are on a phred
-10*log_10(error rate)
scale.
References
-
Brockman et al. (2008): "Quality scores and SNP detection in sequencing-by-synthesis systems." Genome Res. 18: 763-770.References
-
Ewing B, Hillier L, Wendl MC, Green P. (1998): "Base-calling of automated sequencer traces using phred. I. Accuracy assessment." Genome Res. 8(3):175-185.
-
Ewing B, Green P. (1998): "Base-calling of automated sequencer traces using phred. II. Error probabilities." Genome Res. 8(3):186-194.