Use Cases -- Barcode Classification


Torrent Suite Software space on Ion Community

Use Cases TOC

Troubleshooting Barcode Classification Issues

Barcode classification metrics are available in the file basecaller_results/datasets_basecaller.json in the Torrent Suite Software analysis directory.

This file contains information about all barcodes, no matter whether they appear in the run report or are filtered out. This information describes the numbers of barcodes that would be included or discarded if you reanalyze with certain changed BaseCaller settings.

A sample of this file is shown here. Later examples in this page use this file:

             "IEXL3.IonXpress_033": {
	"Q20_bases": 98859279,
	"barcode_adapter": "GAT",
	"barcode_bias": [ 0.026, -0.028, -0.034, 0.011, -0.019, -0.001, 0.072,
					 -0.061, 0.103, -0.008, -0.062, 0.110, -0.021, 0.001],
	"barcode_distance_hist": [ 907546, 50122, 10793, 4498, 5342 ],
	"barcode_errors_hist": [ 949782, 24584, 3935 ],
	"barcode_match_filtered": 162,
	"barcode_name": "IonXpress_033",
	"barcode_sequence": "TTCTCATTGAAC",
	"description": "1T 058a0112 Lib6457 0bp lr2 lr226b04",
	"filtered": false,
	"index": 33,
	"library": "hg19/IonXpress_033",
	"platform_unit": "PGM/318/IonXpress_033",
	"read_count": 978301,
	"recalibrate": true,
	"sample": "None",
	"total_bases": 109292583
},
            

Explanation of fields in the BaseCaller JSON file

read_count

The read_count field shows how many reads were assigned to this barcode.

             "read_count": 978301,
            

filtered

The filtered field is true if this barcode is filtered out and false if the barcode appears on the run report.

             "filtered": false,
            

barcode_errors_hist

The barcode errors histogram shows the number of reads with difference levels of basecalling errors in this barcode:

  • First field : The n umber of reads that have 0 basecalling errors ( 949782 in this example). This is the number of reads that perfectly match this barcode (in base space).
  • Second field :The number of reads that have one basecalling error (24584 in this example ).

  • Third field : The number of reads that have two basecalling errors (3935 in this example ).

From the 3935 value with 2 basecalling errors, we know that if we reanalyze with the number of allowed errors set to 1 instead of 2, then 3935 fewer reads will be assigned to this barcode.



             "barcode_errors_hist": [ 949782, 24584, 3935 ],
            

This histogram is typical of a real barcode. A large majority of reads are perfect matches, a few have one error, and a smaller number have two errors.

If the pattern is reversed (with very few perfect matches, some reads with one error, and many reads with 2 errors), we suspect that this is probably a fake barcode.

barcode_distance_hist

The barcode distance histogram shows, in signal space , the number of reads at various squared residual distances between the predicted signal and the observed signal.

The distance fields are given in 0.2 increments:

  • The first field gives the number of reads with a squared residual distance of between 0 and 0.2.
  • The second field gives the number of reads with a squared residual distance of between 0.2 and 0.4.

  • The third field gives the number of reads with a squared residual distance of between 0.4 and 0.6, etc.

Smaller distances reflect better matches of the read to barcode. Larger distances reflect poorer matches.

This example reflects the pattern that is typical of a real barcode:

  • The most reads have shorted distance residuals.
  • Fewer reads have larger distance residuals.
  • The entry 5342 in the fifth field tells us that reducing --barcode-cutoff to 0.8 would cause those 5342 reads not to be assigned to a barcode.

             "barcode_distance_hist": [ 907546, 50122, 10793, 4498, 5342 ],
            

barcode_match_filtered

The barcode_match_filtered field gives the number of reads that perfectly match the barcode in base space and also are filtered out because they do not meet the separation criteria in signal space . The signal for these reads are in-between two barcodes and are not close enough to either barcode to be assigned.

             "barcode_match_filtered": 162,
            

barcode_bias

The barcode_bias values show the mean signal deviation by flow: how much the observed signal is off from the expected signal. Low bias values, for example with the value shown here, are indications of good signal.

Bias values around 0.33 indicate a signal that is about a third of a base off. Values near 0.5 indicate a signal that is half a base off. Values in this range indicate a problem with the sequencing run or with the barcode classification.

             "barcode_bias": [ 0.026, -0.028, -0.034, 0.011, -0.019, -0.001, 0.072,
                     -0.061, 0.103, -0.008, -0.062, 0.110, -0.021, 0.001],
            

Other BaseCaller and barcode classification pages

Other pages related to barcode classification: