bigGenePred Track Format
The bigGenePred format stores annotation items that are a linked collection of exons, much as
BED files indexed as bigBeds do. However, the
bigGenePred format includes 8 additional fields that contain details about coding frames and other
gene-specific information.
The bigGenePred files are created using the program bedToBigBed
, run with the
-as
option to pull in a special
autoSql (.as) file
that defines the extra fields of the bigGenePred.
The bigGenePred files are in an indexed binary format. The main advantage of this format is that
only those portions of the file needed to display a particular region are transferred to the Genome
Browser server. Because of this, indexed binary files have considerably faster display performance
than regular BED format files when working with large data sets. The bigGenePred file remains on
your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion
needed for the currently displayed chromosomal position is locally cached as a "sparse
file".
bigGenePred file definition
The following autoSql definition is used to specify bigGenePred gene prediction files. This
definition, contained in the file bigGenePred.as,
is pulled in when the bedToBigBed
utility is run with the
-as=bigGenePred.as
option.
table bigGenePred
"bigGenePred gene models"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
string name; "Name or ID of item, ideally both human-readable and unique"
uint score; "Score (0-1000)"
char[1] strand; "+ or - for strand"
uint thickStart; "Start of where display should be thick (start codon)"
uint thickEnd; "End of where display should be thick (stop codon)"
uint reserved; "RGB value (use R,G,B string in input file)"
int blockCount; "Number of blocks"
int[blockCount] blockSizes; "Comma separated list of block sizes"
int[blockCount] chromStarts;"Start positions relative to chromStart"
string name2; "Alternative/human readable name"
string cdsStartStat; "enum('none','unk','incmpl','cmpl')"
string cdsEndStat; "enum('none','unk','incmpl','cmpl')"
int[blockCount] exonFrames; "Exon frame {0,1,2}, or -1 if no frame for exon"
string type; "Transcript type"
string geneName; "Primary identifier for gene"
string geneName2; "Alternative/human-readable gene name"
string geneType; "Gene type"
)
Click here to view an example of a bigGenePred (bed12+8)
input file. In alternative-splicing situations, each transcript has its own row.
Note that the bedToBigBed
utility uses a substantial amount of memory: approximately
25% more RAM than the uncompressed BED input file.
Creating a bigGenePred track
To create a bigGenePred track, follow these steps:
Step 1.
Create a bigGenePred file. The first 12 fields of the bigGenePred bed12+8 format are
described by the basic BED file format shown here.
(You can also read
about genePred here.)
Your bigGenePred file must also contain the 8 extra fields described in the autoSql file definition
shown above: name2, cdsStartStat, cdsEndStat, exonFrames, type, geneName, geneName2,
geneType
. Your bigGenePred file must be sorted first on the chrom
field, and
secondarily on the chromStart
field. You can use the UNIX sort
command to
do this:
sort -k1,1 -k2,2n unsorted.bed > input.bed
Step 2.
Download the bedToBigBed
program from the
binary utilities directory.
Step 3.
Use the fetchChromSizes
script from the
same directory to create a
chrom.sizes file for the UCSC database with which you are working (e.g., hg38).
Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from
our downloads page (click on "Full
data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38
database is located at
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
Step 4.
Create the bigGenePred file from your sorted input file using the bedToBigBed
utility:
bedToBigBed -as=bigGenePred.as -type=bed12+8 bigGenePred.txt chrom.sizes myBigGenePred.bb
Step 5.
Move the newly created bigGenePred file (myBigGenePred.bb) to a web-accessible http, https,
or ftp location.
Step 6.
Construct a custom track using a single
track line. Note that any of the track attributes listed
here are applicable to tracks of type bigBed. The basic
version of the track line will look something like this:
track type=bigGenePred name="My Big GenePred" description="A Gene Set Built from Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigGenePred.bb
Step 7.
Paste this custom track line into the text box on the custom
track management page.
The bedToBigBed
program can be run with several additional options. For a full list of
the available options, type bedToBigBed
(with no arguments) on the command line to
display the usage message.
Examples
Example #1
In this example, you will create a bigGenePred custom track using a bigGenePred file,
bigGenePred.bb, located on the UCSC Genome Browser http server. This file contains data for
the hg38 assembly.
To create a custom track using this bigGenePred file:
-
Construct a track line that references the file:
track type=bigGenePred name="bigGenePred Example One" description="A bigGenePred file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb
-
Paste the track line into the custom track management
page for the human assembly hg38 (Dec. 2013).
-
Click the "submit" button.
Custom tracks can also be loaded via one URL line. The link below loads the same bigGenePred track
and sets additional parameters in the URL:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgct_customText=track%20type=bigGenePred %20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb
After this example bigGenePred track is loaded in the Genome Browser, click on a gene in the
browser's track display to view the details page for that gene. Note that the page offers links to
several sequence types, including translated protein, predicted mRNA, and genomic
sequence.
Example #2
In this example, you will configure the bigGenePred track loaded in Example #1 to display codons and
amino acid numbering:
-
On the track's gene details page, click the "Go to ... track controls" link.
-
Change the "Color track by codons:" option from "OFF" to "genomic
codons" and check that the display mode is set to "full".
-
Click "submit".
-
On the Genome Browser tracks display, zoom to a track region where amino acids display, such as
chr9:133,255,650-133,255,700
, and note that the track now displays codons.
-
Return to the track controls page and click the box next to "Show codon numbering",
then click "submit".
-
The browser tracks display will now show amino acid numbering.
You can also add a parameter in the custom track line, baseColorDefault=genomicCodons
,
to set display codons by default:
browser position chr10:67,884,600-67,884,900
track type=bigGenePred baseColorDefault=genomicCodons name="bigGenePred Example Two" description="A bigGenePred file" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb
Paste the above into the hg38 custom track management
page to view an example of bigGenePred amino acid display at the beginning of the SIRT1 gene on
chromosome 10.
Example #3
In this example, you will create your own bigGenePred file from an existing bigGenePred input
file.
-
Save the example bed12+8 input file,
bigGenePred.txt, to your computer (Step 1
in Creating a bigGenePred track, above).
-
Download the
bedToBigBed
utility (Step 2, above).
-
Save the hg38.chrom.sizes text file to your computer.
This file contains the chrom.sizes for the human hg38 assembly (Step 3, above).
-
Save the autoSql file bigGenePred.as to your
computer.
-
Run the
bedToBigBed
utility to create the bigGenePred output file (step 4,
above):
bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as bigGenePred.txt hg38.chrom.sizes bigGenePred.bb
-
Place the newly created bigGenePred file (bigGenePred.bb) on a web-accessible server
(Step 5, above).
-
Construct a track line that points to the bigGenePred file (Step 6, above).
-
Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser
(step 7, above).
Sharing your data with others
If you would like to share your bigGenePred data track with a colleague, learn how to create a URL
by looking at Example #11 on this page.
Extracting data from bigBed format
Because the bigGenePred files are an extension of bigBed files, which are indexed binary files, it
can be difficult to extract data from them. UCSC has developed the following programs to
assist in working with bigBed formats, available from the
binary utilities directory.
-
bigBedToBed
— converts a bigBed file to ASCII BED format.
-
bigBedSummary
— extracts summary information from a bigBed
file.
-
bigBedInfo
— prints out information about a bigBed file.
As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the
command line to view the usage statement.
Troubleshooting
If you encounter an error when you run the bedToBigBed
program, check your input
file for data coordinates that extend past the end of the chromosome. If these are
present, run the bedClip
program
(available here) to remove the problematic
row(s) before running the bedToBigBed
program.