bigChain Track Format
The bigChain format describes a pairwise alignment that allow gaps in both sequences simultaneously,
just as chain files do; however, bigChain files are compressed and indexed
as bigBeds. Chain files are converted to bigChain files using the program bedToBigBed
,
run with the -as
option to pull in a special
autoSql (.as) file
that defines the fields of the bigChain.
The bigChain files are in an indexed binary format. The main advantage of this format is that only
those portions of the file needed to display a particular region are transferred to the Genome
Browser server. Because of this, bigChain files have considerably faster display performance than
regular chain files when working with large data sets. The bigChain file remains on your local
web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for
the currently displayed chromosomal position is locally cached as a "sparse file".
bigChain format definition
The following autoSql definition is used to specify bigChain pairwise alignment files. This
definition, contained in the file bigChain.as, will be
pulled in when the bedToBigBed
utility is run with the -as=bigChain.as
option.
table bigChain
"bigChain pairwise alignment"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
string name; "Name or ID of item, ideally both human readable and unique"
uint score; "Score (0-1000)"
char[1] strand; "+ or - for strand"
uint tSize; "size of target sequence"
string qName; "name of query sequence"
uint qSize; "size of query sequence"
uint qStart; "start of alignment on query sequence"
uint qEnd; "end of alignment on query sequence"
uint chainScore; "score from chain"
)
Note that the bedToBigBed
utility uses a substantial amount of memory: approximately
25% more RAM than the uncompressed BED input file.
Creating a bigChain track
To create a bigChain track, follow these steps:
Step 1.
If you already have a chain file you would like to convert to a bigChain, skip to Step 3.
Otherwise download this example
chain file for the human GRCh38 (hg38) assembly.
Step 2.
Download these autoSql files needed by bedToBigBed
:
bigChain.as and
bigLink.as.
Step 3.
Download the bedToBigBed
and hgLoadChain
programs from the UCSC
binary utilities directory.
Step 4.
Use the fetchChromSizes
script from the
same directory to create a
chrom.sizes file for the UCSC database with which you are working (e.g., hg38).
Alternatively, you can download the
chrom.sizes file for any assembly hosted at UCSC from our
downloads page (click on "Full
data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38
database is located at
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
Step 5.
Use the hgLoadChain
utility to generate the chain.tab and link.tab
files needed to create the bigChain file:
hgLoadChain -noBin -test hg38 bigChain chr22_KI2707731v1_random.hg38.mm10.rbest.chain
Step 6.
Create the bigChain file from your input chain file using a combination of sed
,
awk
and the bedToBigBed
utility:
sed 's/.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > chr22_KI270731v1_random.hg38.mm10.rbest.bigChain
bedToBigBed -type=bed6+6 -as=bigChain.as -tab chr22_KI270731v1_random.hg38.mm10.rbest.bigChain hg38.chrom.sizes bigChain.bb
Step 7.
To display your date in the Genome Browser, you must also create a binary indexed link file to
accompany your bigChain file:
awk 'BEGIN {OFS="\t"} {print $1, $2, $3, $5, $4}' link.tab | sort -k1,1 -k2,2n > bigChain.bigLink bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.bigLink hg38.chrom.sizes bigChain.link.bb
Step 8.
Move the newly created bigChain (bigChain.bb) and bigLink (bigChain.link.bb)
files to a web-accessible http, https or ftp location.
Step 9.
Construct a custom track using a single
track line. Note that any of the track attributes listed
here are applicable to tracks of type bigBed. The most basic
version of the track line will look something like this:
track type=bigChain name="My Big Chain" bigDataUrl=http://myorg.edu/mylab/bigChain.bb linkDataUrl=http://myorg.edu/mylab/bigChain.link.bb
Step 10.
Paste the custom track line into the text box on the
custom track management page.
The bedToBigBed
program can be run with several additional options. For a full
list of the available options, type bedToBigBed
(with no arguments) on the command line
to display the usage message.
Examples
Example #1
In this example, you will create a bigChain custom track using an existing bigChain file,
bigChain.bb, located on the UCSC Genome Browser http server. This file contains data for
the hg38 assembly.
To create a custom track using this bigChain file:
-
Construct a track line that references the file:
track type=bigChain name="bigChain Example One" description="A bigChain file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb
-
Paste the track line into the custom track management
page for the human assembly hg38 (Dec. 2013).
-
Click the "submit" button.
Custom tracks can also be loaded via one URL line.
This link loads the same bigChain.bb track and sets additional display parameters in the URL:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random &hgct_customText=track%20type=bigChain%20name=Example %20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb %20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack
After this example bigChain is loaded in the Genome Browser, click into a chain on the browser's
track display. Note that the details page displays information about the individual chains, similar
to that which is available for a standard chain track.
Example #2
In this example, you will create your own bigChain file from an existing chain input file.
-
Save this chain file to your
computer (Step 1 in Creating a bigChain track, above).
-
Save the autoSql files bigChain.as and
bigLink.as to your computer (Step 2,
above).
-
Download the
bedToBigBed
and hgLoadChain
utilities (Step 3, above).
-
Save the hg38.chrom.sizes text file to your computer. This
file contains the chrom.sizes for the human hg38 assembly (Step 4, above).
-
Run the utilities in Steps 5-7, above, to create the bigChain and bigLink output
files.
-
Place the newly created bigChain (bigChain.bb) and and bigLink
(bigChain.link.bb) files on a web-accessible server (Step 8).
-
Construct a track line that points to the bigChain file (Step 9, above).
-
Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser
(Step 10, above).
Sharing your data with others
If you would like to share your bigChain data track with a colleague, learn how to create a URL by
looking at Example 11 on this page.
Extracting data from the bigChain format
Because the bigChain files are an extension of bigBed files, which are indexed binary files, it can
be difficult to extract data from them. UCSC has developed the following programs to assist
in working with bigBed formats, available from the
binary utilities directory.
-
bigBedToBed
— converts a bigBed file to ASCII BED format.
-
bigBedSummary
— extracts summary information from a bigBed file.
-
bigBedInfo
— prints out information about a bigBed file.
As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the
command line to view the usage statement.
Troubleshooting
If you encounter an error when you run the bedToBigBed
program, check your input
file for data coordinates that extend past the the end of the chromosome. If these are present, run
the bedClip
program
(available here) to remove the problematic
row(s) in your input file before running the bedToBigBed
program.