References Management Guide


Torrent Suite Software space on Ion Community

References Management Guide TOC

BED File Formats and Examples

The Browser Extensible Display (BED) format is used for both target regions files and hotspot files. The Torrent Browser also accepts the Variant Call Format (VCF) for hotspot files. (See Target Regions Files and Hotspot Files for usage information.)

BED files are text files with tab-separated fields.

Target Regions File Formats

Target regions BED files use 3-column, 4-column, 6-column, and 8-column formats.

3-column T arget Regions BED File Format

The 3-column BED file format is used when amplicon IDs and gene names are not known.

The track line is optional. If present, it includes thesetab-separated fields:

Field

Type

Description

Name

string

A unique design identifier. Optional.

Description

string

Description of thedesign. Optional.

The following is an example track line:

             track name="ASD270245" description="AmpliSeq  Pool ASD270245"
            

In a 3-column target regions BED file, thecoordinates lines requirethe followingtab-separated fields:

Field

Type

Description

chrom

string(chars >= 0x20, other than \tab)

Name of the chromosome. This name must be an exact match with a chromosome in the reference.

chromStart

unsigned int64

Starting position of the feature (zero-based).

chromEnd

unsigned int64

Ending position of the feature (not inclusive). Must be greater than chromStart.



Partial example of a 3-column target regions BED file:

             chr9 133738312 133738379
chr9 133747484 133747542
chr9 133748242 133748296
chr9 133748388 133748452
chr9 133750331 133750405
chr9 133738312 133738379
chr9 133747484 133747542
chr9 133748242 133748296
chr9 133748388 133748452
chr9 133750331 133750405
chr14 105246407 105246502
chr14 105246407 105246502
chr14 105246407 105246502
chr2 29432658 29432711
            

4-column T arget Regions BED File Format

The 4-column BED file format is used when gene names are not known and some or all amplicon IDs are known.

The track line is optional. If present, it includes thesetab-separated fields:

Field

Type

Description

Name

string

A unique design identifier. Optional.

Description

string

Description of the design. Optional.

The following is an example track line:

             track name="ASD270245" description="AmpliSeq  Pool ASD270245"
            

In a 4-column target regions BED file, thecoordinates lines requirethe followingtab-separated fields:

Field

Type

Description

chrom

string(chars >= 0x20, other than \tab)

Name of the chromosome. This name must be an exact match with a chromosome in the reference.

chromStart

unsigned int64

Starting position of the feature (zero-based).

chromEnd

unsigned int64

Ending position of the feature (not inclusive). Must be greater than chromStart.

AmpliconID

string

Amplicon ID. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd"

Partial example of a 4-column target regions BED file:

             chr9 133738312 133738379 amplID73150
chr9 133747484 133747542 amplID73075
chr9 133748242 133748296 amplID73104
chr9 133748388 133748452 491413
chr9 133750331 133750405 74743
chr9 133738312 133738379 73150
chr9 133747484 133747542 73075
chr9 133748242 133748296 73104
chr9 133748388 133748452 491413
chr9 133750331 133750405 74743
chr14 105246407 105246502 329410
chr2 29432658 29432711 34014
            

6-column T arget Regions BED File Format

The 6-column BED file format is used when some or all of the gene names are known. BED files that are generated by AmpliSeq.com use this 6-column format.

The track line is required ina 6-column target regions BED file.The following is an example track line:

             track name="ASD270245" description="AmpliSeq  Pool ASD270245" type=bedDetail
            

The track line includes these tab-separated fields:

Field

Type

Description

Name

string

A unique design identifier. Optional.

Description

string

Description of the design. Optional.

Type

string

Must be "bedDetail" (without quotes). Required.

ionVersion string Introduced in the Torrent Suite Software 4.0 release (AmpliSeq.com 3.0 and higher fixed panels). When set to "4.0", indicates that the BED file supports the Extended BED Detail format. Optional.

In a 6-column target regions BED file, the coordinates lines require the following tab-separated fields:

Field

Type

Description

chrom

string(chars >= 0x20, other than \tab)

Name of the chromosome. This name must be an exact match with a chromosome in the reference.

chromStart

unsigned int64

Starting position of the feature (zero-based).

chromEnd

unsigned int64

Ending position of the feature (not inclusive). Must be greater than chromStart.

AmpliconID

string

Amplicon ID. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd"

ID

string

Customer-specified ID. If missing, set to '.'. This field is not used currently.

GeneSymbol

string

Gene name. If missing, set to '.'.

Partial example of a 6-column target regions BED file:

             track name="ASD270249_v1" description="AmpliSeq Pool ASD270249" type=bedDetail
chr9 133738312 133738379 AM73150 NM_005157 ABL1
chr9 133747484 133747542 AM73075 NM_005157 ABL1
chr9 133748242 133748296 AM73104 NM_005157 ABL1
chr9 133748388 133748452 AM491413 NM_005157 ABL1
chr9 133750331 133750405 74743 NM_005157 ABL1
chr9 133738312 133738379 73150 NM_007313 ABL1
chr9 133747484 133747542 73075 NM_007313 ABL1
chr9 133748242 133748296 73104 NM_007313 ABL1
chr9 133748388 133748452 491413 NM_007313 ABL1
chr9 133750331 133750405 74743 NM_007313 ABL1
chr14 105246407 105246502 329410 NM_001014431 AKT1
chr14 105246407 105246502 329410 NM_001014432 AKT1
chr14 105246407 105246502 329410 NM_005163 AKT1
chr2 29432658 29432711 34014 NM_004304 ALK
            

8-column Target Regions BED file

An 8-column format is used for Fusion panels. The additional columns are:

Field Type Description
Score Unsigned int64 Score. If missing, set to "."
Strand string (+ or-) Strand. If unknown, set to '+'.

BED files generated by AmpliSeq.com custom designs

this table lists fields that BED files generated by an AmpliSeq.com custom design also include in the track line. These fields are present but not used by Torrent Suite Software. ( AmpliSeq.com custom designs cannot be imported directly into Torrent Suite Software.)

Field

Type

Description

db

string

Gives the reference database (hg19 or mm10). Optional, but present in BED files from custom designs.

color

string

Code for color track in UCSC Genome Browser (when uploaded from AmpliSeq.com). Optional, but present in BED files from custom designs.

priority

string

Sets the order for color track in UCSC Genome Browser (when uploaded from AmpliSeq.com). Optional, but present in BED files from custom designs.

HotSpots File Format

The track line is required in a HotSpots BED file. The following is an example track line:

             track name="ASD270245" description="HotSpots locations for AmpliSeq ASD270245" type=bedDetail
            

The track line includes thesetab-separated fields:

Field

Type

Description

Name

string

A unique design identifier. Optional.

Description

string

Description of the design. Optional.

Type

string

Must be "bedDetail" (without quotes). Required.

In HotSpots BED files, the coordinates lines require the followingtab-separated fields:

Field

Type

Description

chrom

string(chars >= 0x20, other than \tab)

Name of the chromosome. This name must be an exact match with a chromosome in the reference.

chromStart

unsigned int64

Starting position of the feature (zero-based).

chromEnd

unsigned int64

Ending position of the feature (not inclusive). Must be greater than chromStart.

HotSpotName

string

This ID is either the COSMIC ID, dbSNP ID, or user-defined. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd"

HotSpotAlleles

string

This required field describes the variant, using this format (see examples below): REF= reference_allele ; OBS= observed_allele ; ANCHOR= base_before_allele

AmpliconID

string

Amplicon ID. Ifmissing, the following string is used "chrom" + ":" + "chromStart" + "-" + "chromEnd"

The HotSpot Alleles field

This field specifies the alleles involved in variant calls, using this format:

REF= reference_allele ; OBS= observed_allele

Examples :

  • A TT insertion with 1-base prior at reference C: REF=;OBS=TT
  • A TT deletion with 1-base prior at reference G: REF=TT;OBS=

Notes :

  • 6-column format
    • The two elements (REF and OBS) are required on each line.
    • The elements can be empty: " REF=; " or " OBS=; ". Empty means deletion.
    • An additional element ANCHOR=base_before_allele can be provided for backward compatibility, but is completely optional. In fact, it is recommended that the ANCHOR key it is NOT provided for TS >= 4.2
    • Insertion alle les should have the same start and end position, and that position corresponds to a region between two bases. SNV, MNV, deletion, and complex variants should correspond to the reference bases that are spanned by the event.
    • The REF and OBS should be on the forward genomic strand.There should be one alternative allele per line.
  • 8-column format
    • The +/- strand notation in the hotspot file refers to the orientation of the Ion AmpliSeq design input sequence, not to the reference sequence. REF and OBS alleles must always be reported on the forward strand of the reference sequence.
    • HotSpotAlleles are always reported based on the allele information from the positive strand of the reference sequence.Even if the allele strand is negative, the REF and OBS bases still report the alleles on the positive strand.

For example, if there is a hotspot either on the positive strand or on the negative strandon a genomic coordinate, the strandinformationmakes no difference to what is reported on theHotSpotAlleles column. HotSpotAlleles column always reports the alleles on the positive strand.In the followingexample, the strands are different, but the reported alleles are always from thepositive strand:

chr 143815007 43815009 ID1 0 - REF=TG;OBS=AA AMPL1

chr 143815007 43815009 ID2 0 + REF=TG;OBS=AA AMPL2

Partial example of a HotSpots BED file

             track name="HSMv12.1" description="AmpliSeq Pool HSMv12.1" type=bedDetail
chr1	43815007	43815009	COSM19193	REF=TG;OBS=AA	AMPL495041
chr1	43815008	43815009	COSM18918	REF=G;OBS=T	    AMPL495041
chr1	115256527	115256528	COSM585		REF=T;OBS=A	    AMPL30014
chr1	115256527	115256528	COSM586		REF=T;OBS=G	    AMPL30014
chr1	115256527	115256529	COSM33693	REF=TT;OBS=CC	AMPL30014
chr1	115256527	115256529	COSM30646	REF=TT;OBS=CA	AMPL30014
chr1	115256527	115256530	COSM53223	REF=TTG;OBS=CTT	AMPL30014
chr1	115256528	115256529	COSM583		REF=T;OBS=A	    AMPL30014
chr1	115256528	115256529	COSM584		REF=T;OBS=C	    AMPL30014
chr1	115256528	115256529	COSM582		REF=T;OBS=G	    AMPL30014
chr1	115256528	115256530	COSM12725	REF=TG;OBS=AA	AMPL30014
chr1	115256528	115256530	COSM579		REF=TG;OBS=CT	AMPL30014
            
Icon
The REF=;OBS= field is required, as is the track line.



Extended BED Detail format

Beginning with the 3.0 release, AmpliSeq.com uses this format for the following fixed panels:

  • CCP
  • CFTR
  • CHP v2
  • Ion AmpliSeq Exome

New fixed panels introduced after the AmpliSeq.com 3.0 release also follow this format. Other panels, and all panels from previous releases, do not use this format.

The Extended BED Detail format contains two additional fields (at the end of each line):

Name Values Description
Id Any string, if supplied by the user, or '.' User-supplied name or id for the region.
Description key-value pairs separated by semicolon, or '.' if empty

Contains a '.' or one or more of the following:

  • GENE_ID=
  • SUBMITTED_REGION=
  • Pool=

These key-value pairs are described in the next table.

This table describes the key-value pairs that are supported in the Description column:

Key Description
GENE_ID

A gene symbol or comma-separated list of gene symbols. If no gene symbol is available, this key is absent.

Example: GENE_ID = brca1

Example: GENE_ID = brca1, ret

Pool

The Ampliseq.com pool or pools containing this amplicon.

Example: Pool=2

If an amplicon is present in multiple pools, the pools are delimited with "," a comma, with the primary pool listed first. For example, if an amplicon is present in pools 1 and 3, and 1 is the primary pool, the entry is: Pool=1,3.

Single-pool designs do not include the Pool= key-value pair.

SUBMITTED_REGION

The region name provided by the user during the AmpliSeq.com design process. If a region name is not provided, this key is absent.

Example: SUBMITTED_REGION=Q1

CNV_ID A gene symbol used to specify a copynumber region for the cnv pca algorithm. This will take precedence over the GENE_ID and once CNV_ID can span multiple GENE_IDs.
CNV_HS A cnv region hotspot. This can be a value of either 0 or 1. A 1 will report as a HS in the output vcf file from the cnv pca algorithm. a zero will not report as HS.

The Extended BED Detail format requires a track line with both type=bedDetail and ionVersion=4.0 . The Torrent Suite Software BED validator treats these fields (Id and Descriptor) as optional.

Examples from BED files in the Extended BED Detail format

This example shows the GENE_ID= and Pool= keys:

             track name="4477685_CCP" description="Amplicon_Insert_4477685_CCP" type=bedDetail  ionVersion=4.0
chr1   2488068    2488201    242431688  .    GENE_ID=TNFRSF14;Pool=2
chr1   2489144    2489273    262048751  .    GENE_ID=TNFRSF14;Pool=4
chr1   2489772    2489907    241330530  .    GENE_ID=TNFRSF14;Pool=1
chr1   2491241    2491331    242158034  .    GENE_ID=TNFRSF14;Pool=3
            
          

This example is from the CFTR designed.bed file:

          
             track type=bedDetail ionVersion=4.0 name="CFTRexon0313_Designed" description="Amplicon_Insert_CFTRexon0313"
chr7   117119916     117120070    CFTR_1.91108    .     GENE_ID=CFTR;Pool=1;SUBMITTED_REGION=1,31
chr7   117120062     117120193    CFTR_1.38466    .     GENE_ID=CFTR;Pool=2;SUBMITTED_REGION=1
chr7   117120186     117120304    AMPL244371551   .     GENE_ID=CFTR;Pool=1;SUBMITTED_REGION=1,32
            



Merged Extended BED Detail format files

In the case of two overlapping records, those records are merged during upload into Torrent Suite Software. An ampersand ( & ) is the delimiter between multiple values in merged files.

Example 1

When these two GENE_ID fields appear in overlapping records:

GENE_ID = raf

GENE_ID = brca1

The merged GENE_ID field is:

GENE_ID= raf&brca1

Example 2

When these two GENE_ID fields appear in overlapping records:

GENE_ID = raf

GENE_ID = brca1,ret

The merged GENE_ID field is:

GENE_ID=raf&brca1,ret

The score and strand fields in uploaded BED files

Uploaded BED files are converted to add score and strand columns, with the default values 0 and + . You see these values in BED files that you download from Torrent Suite Software:

             track type=bedDetail name="BRCA1.BRCA2_HotSpots" description="BRCA_HOTSPOT_ALLELES" allowBlockSubstitutions=true
chr13 32890649    32890650    COSM35423   0     +     REF=G;OBS=A       AMPL223487194
chr13 32893206    32893207    COSM23930   0     +     REF=T;OBS=        AMPL223519297
chr13 32893221    32893221    COSM23939   0     +     REF=;OBS=CCAATGA  AMPL223519297
chr13 32893290    32893291    COSM172578  0     +     REF=G;OBS=T       AMPL223521074