References Management Guide
Torrent Suite Software space on Ion Community
References Management Guide TOC
BED File Formats and Examples
The Browser Extensible Display (BED) format is used for both target regions files and hotspot files. The Torrent Browser also accepts the Variant Call Format (VCF) for hotspot files. (See Target Regions Files and Hotspot Files for usage information.)
BED files are text files with tab-separated fields.
Target Regions File Formats
Target regions BED files use 3-column, 4-column, 6-column, and 8-column formats.
3-column T arget Regions BED File Format
The 3-column BED file format is used when amplicon IDs and gene names are not known.
The track line is optional. If present, it includes thesetab-separated fields:
Field |
Type |
Description |
---|---|---|
Name |
string |
A unique design identifier. Optional. |
Description |
string |
Description of thedesign. Optional. |
The following is an example track line:
track name="ASD270245" description="AmpliSeq Pool ASD270245"
In a 3-column target regions BED file, thecoordinates lines requirethe followingtab-separated fields:
Field |
Type |
Description |
---|---|---|
chrom |
string(chars >= 0x20, other than \tab) |
Name of the chromosome. This name must be an exact match with a chromosome in the reference. |
chromStart |
unsigned int64 |
Starting position of the feature (zero-based). |
chromEnd |
unsigned int64 |
Ending position of the feature (not inclusive). Must be greater than chromStart. |
Partial example of a 3-column target regions BED file:
chr9 133738312 133738379 chr9 133747484 133747542 chr9 133748242 133748296 chr9 133748388 133748452 chr9 133750331 133750405 chr9 133738312 133738379 chr9 133747484 133747542 chr9 133748242 133748296 chr9 133748388 133748452 chr9 133750331 133750405 chr14 105246407 105246502 chr14 105246407 105246502 chr14 105246407 105246502 chr2 29432658 29432711
4-column T arget Regions BED File Format
The 4-column BED file format is used when gene names are not known and some or all amplicon IDs are known.
The track line is optional. If present, it includes thesetab-separated fields:
Field |
Type |
Description |
---|---|---|
Name |
string |
A unique design identifier. Optional. |
Description |
string |
Description of the design. Optional. |
The following is an example track line:
track name="ASD270245" description="AmpliSeq Pool ASD270245"
In a 4-column target regions BED file, thecoordinates lines requirethe followingtab-separated fields:
Field |
Type |
Description |
---|---|---|
chrom |
string(chars >= 0x20, other than \tab) |
Name of the chromosome. This name must be an exact match with a chromosome in the reference. |
chromStart |
unsigned int64 |
Starting position of the feature (zero-based). |
chromEnd |
unsigned int64 |
Ending position of the feature (not inclusive). Must be greater than chromStart. |
AmpliconID |
string |
Amplicon ID. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd" |
Partial example of a 4-column target regions BED file:
chr9 133738312 133738379 amplID73150 chr9 133747484 133747542 amplID73075 chr9 133748242 133748296 amplID73104 chr9 133748388 133748452 491413 chr9 133750331 133750405 74743 chr9 133738312 133738379 73150 chr9 133747484 133747542 73075 chr9 133748242 133748296 73104 chr9 133748388 133748452 491413 chr9 133750331 133750405 74743 chr14 105246407 105246502 329410 chr2 29432658 29432711 34014
6-column T arget Regions BED File Format
The 6-column BED file format is used when some or all of the gene names are known. BED files that are generated by AmpliSeq.com use this 6-column format.
The track line is required ina 6-column target regions BED file.The following is an example track line:
track name="ASD270245" description="AmpliSeq Pool ASD270245" type=bedDetail
The track line includes these tab-separated fields:
Field |
Type |
Description |
---|---|---|
Name |
string |
A unique design identifier. Optional. |
Description |
string |
Description of the design. Optional. |
Type |
string |
Must be "bedDetail" (without quotes). Required. |
ionVersion | string | Introduced in the Torrent Suite Software 4.0 release (AmpliSeq.com 3.0 and higher fixed panels). When set to "4.0", indicates that the BED file supports the Extended BED Detail format. Optional. |
In a 6-column target regions BED file, the coordinates lines require the following tab-separated fields:
Field |
Type |
Description |
---|---|---|
chrom |
string(chars >= 0x20, other than \tab) |
Name of the chromosome. This name must be an exact match with a chromosome in the reference. |
chromStart |
unsigned int64 |
Starting position of the feature (zero-based). |
chromEnd |
unsigned int64 |
Ending position of the feature (not inclusive). Must be greater than chromStart. |
AmpliconID |
string |
Amplicon ID. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd" |
ID |
string |
Customer-specified ID. If missing, set to '.'. This field is not used currently. |
GeneSymbol |
string |
Gene name. If missing, set to '.'. |
Partial example of a 6-column target regions BED file:
track name="ASD270249_v1" description="AmpliSeq Pool ASD270249" type=bedDetail chr9 133738312 133738379 AM73150 NM_005157 ABL1 chr9 133747484 133747542 AM73075 NM_005157 ABL1 chr9 133748242 133748296 AM73104 NM_005157 ABL1 chr9 133748388 133748452 AM491413 NM_005157 ABL1 chr9 133750331 133750405 74743 NM_005157 ABL1 chr9 133738312 133738379 73150 NM_007313 ABL1 chr9 133747484 133747542 73075 NM_007313 ABL1 chr9 133748242 133748296 73104 NM_007313 ABL1 chr9 133748388 133748452 491413 NM_007313 ABL1 chr9 133750331 133750405 74743 NM_007313 ABL1 chr14 105246407 105246502 329410 NM_001014431 AKT1 chr14 105246407 105246502 329410 NM_001014432 AKT1 chr14 105246407 105246502 329410 NM_005163 AKT1 chr2 29432658 29432711 34014 NM_004304 ALK
8-column Target Regions BED file
An 8-column format is used for Fusion panels. The additional columns are:
Field | Type | Description |
---|---|---|
Score | Unsigned int64 | Score. If missing, set to "." |
Strand | string (+ or-) | Strand. If unknown, set to '+'. |
BED files generated by AmpliSeq.com custom designs
this table lists fields that BED files generated by an AmpliSeq.com custom design also include in the track line. These fields are present but not used by Torrent Suite Software. ( AmpliSeq.com custom designs cannot be imported directly into Torrent Suite Software.)
Field |
Type |
Description |
---|---|---|
db |
string |
Gives the reference database (hg19 or mm10). Optional, but present in BED files from custom designs. |
color |
string |
Code for color track in UCSC Genome Browser (when uploaded from AmpliSeq.com). Optional, but present in BED files from custom designs. |
priority |
string |
Sets the order for color track in UCSC Genome Browser (when uploaded from AmpliSeq.com). Optional, but present in BED files from custom designs. |
HotSpots File Format
The track line is required in a HotSpots BED file. The following is an example track line:
track name="ASD270245" description="HotSpots locations for AmpliSeq ASD270245" type=bedDetail
The track line includes thesetab-separated fields:
Field |
Type |
Description |
---|---|---|
Name |
string |
A unique design identifier. Optional. |
Description |
string |
Description of the design. Optional. |
Type |
string |
Must be "bedDetail" (without quotes). Required. |
In HotSpots BED files, the coordinates lines require the followingtab-separated fields:
Field |
Type |
Description |
---|---|---|
chrom |
string(chars >= 0x20, other than \tab) |
Name of the chromosome. This name must be an exact match with a chromosome in the reference. |
chromStart |
unsigned int64 |
Starting position of the feature (zero-based). |
chromEnd |
unsigned int64 |
Ending position of the feature (not inclusive). Must be greater than chromStart. |
HotSpotName |
string |
This ID is either the COSMIC ID, dbSNP ID, or user-defined. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd" |
HotSpotAlleles |
string |
This required field describes the variant, using this format (see examples below):
|
AmpliconID |
string |
Amplicon ID. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd" |
The HotSpot Alleles field
This field specifies the alleles involved in variant calls, using this format:
REF=
reference_allele
;
OBS=
observed_allele
Examples :
-
A TT insertion with 1-base prior at reference C:
REF=;OBS=TT
-
A TT deletion with 1-base prior at reference G:
REF=TT;OBS=
Notes :
-
6-column format
- The two elements (REF and OBS) are required on each line.
-
The elements can be empty: "
REF=;
" or "OBS=;
". Empty means deletion.
-
- An additional element ANCHOR=base_before_allele can be provided for backward compatibility, but is completely optional. In fact, it is recommended that the ANCHOR key it is NOT provided for TS >= 4.2
- Insertion alle les should have the same start and end position, and that position corresponds to a region between two bases. SNV, MNV, deletion, and complex variants should correspond to the reference bases that are spanned by the event.
- The REF and OBS should be on the forward genomic strand.There should be one alternative allele per line.
-
8-column format
- The +/- strand notation in the hotspot file refers to the orientation of the Ion AmpliSeq design input sequence, not to the reference sequence. REF and OBS alleles must always be reported on the forward strand of the reference sequence.
- HotSpotAlleles are always reported based on the allele information from the positive strand of the reference sequence.Even if the allele strand is negative, the REF and OBS bases still report the alleles on the positive strand.
For example, if there is a hotspot either on the positive strand or on the negative strandon a genomic coordinate, the strandinformationmakes no difference to what is reported on theHotSpotAlleles column. HotSpotAlleles column always reports the alleles on the positive strand.In the followingexample, the strands are different, but the reported alleles are always from thepositive strand:
chr 143815007 43815009 ID1 0 - REF=TG;OBS=AA AMPL1
chr 143815007 43815009 ID2 0 + REF=TG;OBS=AA AMPL2
Partial example of a HotSpots BED file
track name="HSMv12.1" description="AmpliSeq Pool HSMv12.1" type=bedDetail chr1 43815007 43815009 COSM19193 REF=TG;OBS=AA AMPL495041 chr1 43815008 43815009 COSM18918 REF=G;OBS=T AMPL495041 chr1 115256527 115256528 COSM585 REF=T;OBS=A AMPL30014 chr1 115256527 115256528 COSM586 REF=T;OBS=G AMPL30014 chr1 115256527 115256529 COSM33693 REF=TT;OBS=CC AMPL30014 chr1 115256527 115256529 COSM30646 REF=TT;OBS=CA AMPL30014 chr1 115256527 115256530 COSM53223 REF=TTG;OBS=CTT AMPL30014 chr1 115256528 115256529 COSM583 REF=T;OBS=A AMPL30014 chr1 115256528 115256529 COSM584 REF=T;OBS=C AMPL30014 chr1 115256528 115256529 COSM582 REF=T;OBS=G AMPL30014 chr1 115256528 115256530 COSM12725 REF=TG;OBS=AA AMPL30014 chr1 115256528 115256530 COSM579 REF=TG;OBS=CT AMPL30014
Extended BED Detail format
Beginning with the 3.0 release, AmpliSeq.com uses this format for the following fixed panels:
- CCP
- CFTR
- CHP v2
- Ion AmpliSeq Exome
New fixed panels introduced after the AmpliSeq.com 3.0 release also follow this format. Other panels, and all panels from previous releases, do not use this format.
The Extended BED Detail format contains two additional fields (at the end of each line):
Name | Values | Description |
---|---|---|
Id | Any string, if supplied by the user, or '.' | User-supplied name or id for the region. |
Description | key-value pairs separated by semicolon, or '.' if empty |
Contains a '.' or one or more of the following:
These key-value pairs are described in the next table. |
This table describes the key-value pairs that are supported in the Description column:
Key | Description |
---|---|
GENE_ID |
A gene symbol or comma-separated list of gene symbols. If no gene symbol is available, this key is absent. Example: GENE_ID = brca1 Example: GENE_ID = brca1, ret |
Pool |
The Ampliseq.com pool or pools containing this amplicon. Example: Pool=2 If an amplicon is present in multiple pools, the pools are delimited with "," a comma, with the primary pool listed first. For example, if an amplicon is present in pools 1 and 3, and 1 is the primary pool, the entry is: Pool=1,3. Single-pool designs do not include the Pool= key-value pair. |
SUBMITTED_REGION |
The region name provided by the user during the AmpliSeq.com design process. If a region name is not provided, this key is absent. Example: SUBMITTED_REGION=Q1 |
CNV_ID | A gene symbol used to specify a copynumber region for the cnv pca algorithm. This will take precedence over the GENE_ID and once CNV_ID can span multiple GENE_IDs. |
CNV_HS | A cnv region hotspot. This can be a value of either 0 or 1. A 1 will report as a HS in the output vcf file from the cnv pca algorithm. a zero will not report as HS. |
The Extended BED Detail format requires a track line with both
type=bedDetail
and
ionVersion=4.0
. The Torrent Suite
Software BED validator treats these fields (Id and Descriptor) as optional.
Examples from BED files in the Extended BED Detail format
This example shows the
GENE_ID=
and
Pool=
keys:
track name="4477685_CCP" description="Amplicon_Insert_4477685_CCP" type=bedDetail ionVersion=4.0 chr1 2488068 2488201 242431688 . GENE_ID=TNFRSF14;Pool=2 chr1 2489144 2489273 262048751 . GENE_ID=TNFRSF14;Pool=4 chr1 2489772 2489907 241330530 . GENE_ID=TNFRSF14;Pool=1 chr1 2491241 2491331 242158034 . GENE_ID=TNFRSF14;Pool=3
This example is from the CFTR designed.bed file:
track type=bedDetail ionVersion=4.0 name="CFTRexon0313_Designed" description="Amplicon_Insert_CFTRexon0313" chr7 117119916 117120070 CFTR_1.91108 . GENE_ID=CFTR;Pool=1;SUBMITTED_REGION=1,31 chr7 117120062 117120193 CFTR_1.38466 . GENE_ID=CFTR;Pool=2;SUBMITTED_REGION=1 chr7 117120186 117120304 AMPL244371551 . GENE_ID=CFTR;Pool=1;SUBMITTED_REGION=1,32
Merged Extended BED Detail format files
In the case of two overlapping records, those records are merged during upload into Torrent Suite Software. An ampersand ( & ) is the delimiter between multiple values in merged files.
Example 1
When these two GENE_ID fields appear in overlapping records:
GENE_ID = raf
GENE_ID = brca1
The merged GENE_ID field is:
GENE_ID= raf&brca1
Example 2
When these two GENE_ID fields appear in overlapping records:
GENE_ID = raf
GENE_ID = brca1,ret
The merged GENE_ID field is:
GENE_ID=raf&brca1,ret
The score and strand fields in uploaded BED files
Uploaded BED files are converted to add score and strand columns, with the default values
0
and
+
. You see these values in BED files that you download from Torrent Suite Software:
track type=bedDetail name="BRCA1.BRCA2_HotSpots" description="BRCA_HOTSPOT_ALLELES" allowBlockSubstitutions=true chr13 32890649 32890650 COSM35423 0 + REF=G;OBS=A AMPL223487194 chr13 32893206 32893207 COSM23930 0 + REF=T;OBS= AMPL223519297 chr13 32893221 32893221 COSM23939 0 + REF=;OBS=CCAATGA AMPL223519297 chr13 32893290 32893291 COSM172578 0 + REF=G;OBS=T AMPL223521074
Work with reference files
Upload a New Reference File
Delete a Reference Sequence
Download an Ion Reference File
Details about the Ion hg19 Reference
Work with Obsolete Reference Sequences
Work with BED files
Target Regions Files and Hotspot Files
Manage Target Regions Files and Hotspot Files
BED File Formats and Examples
Manage DNA Barcodes and DNA Barcode Sets
Work with reference library indices
Update Reference Library Indices
Work with test fragments