bigMaf Track Format
The bigMaf format stores multiple alignments in a format compatible with
MAF files, which is then compressed and indexed as a
bigBed.
The bigMaf files are created using the program bedToBigBed
, run with the
-as
option to pull in a special autoSql (.as) file that defines the fields of the bigMaf.
The bigMaf files are in an indexed binary format. The main advantage of this format is that only
those portions of the file needed to display a particular region are transferred to the Genome
Browser server. Because of this, bigMaf files have considerably faster display performance than
regular MAF files when working with large data sets. The bigMaf file remains on your local
web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for
the currently displayed chromosomal position is locally cached as a "sparse file".
bigMaf file definition
The following autoSql definition is used to specify bigMaf multiple alignment files. This
definition, contained in the file bigMaf.as, is
pulled in when the bedToBigBed
utility is run with the -as=bigMaf.as
option.
table bedMaf
"Bed3 with MAF block"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
lstring mafBlock; "MAF block"
)
Note that the bedToBigBed
utility uses a substantial amount of memory: approximately
25% more RAM than the uncompressed BED input file.
Creating a bigMaf track
To create a bigMaf track, follow these steps:
Step 1.
If you already have a MAF file you would like to convert to a bigMaf, skip to Step 3.
Otherwise, download this example
MAF file for the human GRCh38 (hg38) assembly.
Step 2.
If you would like to include optional reading frame and block summary information, download the
chr22_KI270731v1_random.gp genePred file.
Step 3.
Download the autoSql file bigMaf.as needed by
bedToBigBed
. If you have opted to include the optional frame summary and information
with your bigMaf file, you must also download the autoSql files
mafSummary.as and
mafFrames.as files.
Step 4.
Download the bedToBigBed
and mafToBigMaf
programs from the UCSC
binary utilities directory. If you have
opted to generate the optional frame and summary files for your multiple alignment, you must also
download the hgLoadMafSummary
, genePredSingleCover
, and
genePredToMafFrames
programs from the same
directory.
Step 5.
Use the fetchChromSizes
script from the
same directory to create a
chrom.sizes file for the UCSC database with which you are working (e.g., hg38).
Alternatively, you can download the
chrom.sizes file for any assembly hosted at UCSC from our
downloads page (click on "Full
data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38
database is located at
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
mafToBigMaf hg38 chr22_KI270731v1_random.maf stdout | sort -k1,1 -k2,2n > bigMaf.txt
bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt hg38.chrom.sizes bigMaf.bb
Step 6.
Follow the below steps to create the binary indexed mafFrames and mafSummary files to accompany
your bigMaf file:
genePredSingleCover chr22_KI270731v1_random.gp single.gp
genePredToMafFrames hg38 chr22_KI270731v1_random.maf bigMafFrames.txt hg38 single.gp
bedToBigBed -type=bed3+8 -as=mafFrames.as -tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb
hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf
cut -f 2 bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed
bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb
Step 7.
Move the newly created bigMaf file (bigMaf.bb) to a web-accessible http, https or ftp
location. If you generated the bigMafSummary.bb and/or bigMafFrames.bb files,
move those to a web accessible location, likely same location as the bigMaf.bb file.
Step 8.
Construct a custom track using a single
track line. Note that any of the track attributes listed
here are applicable to tracks of type bigBed. The most basic
version of the track line will look something like this:
track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/bigMaf.bb
Step 9.
Paste the custom track line into the text box on the custom track
management page.
The bedToBigBed
program can be run with several additional options. For a full
list of the available options, type bedToBigBed
(with no arguments) on the command line
to display the usage message.
Examples
Example #1
In this example, you will create a bigMaf custom track using an existing bigMaf file,
bigMaf.bb, located on the UCSC Genome Browser http server. This file contains data for
the hg38 assembly.
To create a custom track using this bigMaf file:
-
Construct a track line that references the file:
track type=bigMaf name="bigMaf Example One" description="A bigMaf file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb
-
Paste the track line into the custom track management
page for the human assembly hg38 (Dec. 2013).
-
Click the "submit" button.
Note that additional track line options exist that are specific to the
MAF format. For instance, adding the parameter
setting speciesOrder="panTro4 rheMac3 mm10 rn5 canFam3 monDom5"
to the above
example will specify the order of sequences by species.
Custom tracks can also be loaded via one URL line.
This link loads the same bigMaf.bb track and sets additional display
parameters in the URL:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb%20visibility=pack
After this example bigMaf is loaded in the Genome Browser, click into an alignment on the browser's
track display. Note that the details page displays information about the individual alignments,
similar to that which is available for a standard MAF track.
Example #2
In this example, you will create a bigMaf file from an existing bigMaf input file,
bigMaf.txt, located on the UCSC Genome Browser http server.
-
Save the bed3+1 example file, bigMaf.txt, to your
computer (Step 6, above).
-
Save the autoSql file bigMaf.as to your computer
(Step 3, above).
-
Download the
bedToBigBed
utility
(Step 4, above).
-
Save the hg38.chrom.sizes text file to your computer.
This file contains the chrom.sizes for the human (hg38) assembly (Step 5, above).
-
Run the
bedToBigBed
utility to create a binary indexed MAF file (Step 6,
above):
bedToBigBed -type=bed3+1 -tab -as=bigMaf.as bigMaf.txt hg38.chrom.sizes bigMaf.bb
-
Move the newly created bigMaf file (bigMaf.bb) to a web-accessible location (Step
7, above).
-
Construct a track line that points to the bigMaf file (Step 8, above).
-
Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser
(step 9, above).
Sharing your data with others
If you would like to share your bigMaf data track with a colleague, learn how to create a URL by
looking at Example 11 on this page.
Extracting data from the bigMaf format
Because bigMaf files are an extension of bigBed files, which are indexed binary files, it can
be difficult to extract data from them. UCSC has developed the following programs to assist
in working with bigBed formats, available from the
binary utilities directory.
-
bigBedToBed
— converts a bigBed file to ASCII BED format.
-
bigBedSummary
— extracts summary information from a bigBed file.
-
bigBedInfo
— prints out information about a bigBed file.
As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the
command line to view the usage statement.
Troubleshooting
If you encounter an error when you run the bedToBigBed
program, check your input
file for data coordinates that extend past the the end of the chromosome. If these are present, run
the bedClip
program
(available here) to remove the problematic
row(s) in your input file before running the bedToBigBed
program.