Genome Graphs User's Guide
Contents
Questions and feedback on this User's Guide are welcome.
User questions and answers on Genome Graphs and other topics are available in the
Genome Browser mailing list.
Introduction
Genome Graphs is a tool for displaying genome-wide data sets such as the results of genome-wide SNP
association studies, linkage studies and homozygosity mapping.
Using the Genome Graphs tool, you can:
-
upload several sets of genome-wide data and display them simultaneously
-
click on an area of interest and go directly to the genome browser at that position
-
set a significance threshold for your data and view only regions that meet that threshold
-
view the genes that exist in areas where your data meet your significance threshold
To return to Genome Graphs from any other location on the Genome Browser website, use your browser's
Back button, or click Home on the blue navigation bar, then click the Genome
Graphs link.
Note that only the "standard" chromosomes are displayed in the Genome Graphs display;
haplotype and mitochondrial chromosomes are not displayed.
This User's Guide is aimed at both the novice Genome Graphs user as well as the advanced user. If
you are new to the Genome Graphs tool, read the Quick Start section to learn
about the basics using some sample data. Advanced users may want to proceed directly to the section
that addresses a particular area of functionality in detail.
Formatting, uploading and importing data
Formatting data
Genome Graphs allows you to upload data from files that reside on your computer. Several file
formats are accepted by the program. For all formats there is a single line for each marker. Each
line starts with information on the marker, and ends with the numerical values associated with that
marker. The markers can be of one of the following types:
-
chromosome base: e.g.,
chr1 130000
(Note that the first base in a
chromosome is considered position 0)
-
STS Marker: e.g.,
RH75228
-
dbSNP rsID: e.g.,
rs12345
-
Affymetrix 500k Gene Chip: e.g.,
SNP_A-1780270
-
Affymetrix Genome-Wide SNP Array 6: e.g.,
SNP_A-8575125
-
Affymetrix SNP Array 6 Structural-Variation: e.g.,
CN_47396
-
Illumina HumanHap300 Bead Chip: e.g.,
rs3934834
-
Illumina HumanHap550 Bead Chip: e.g.,
rs3094315
-
Illumina HumanHap650 Bead Chip: e.g.,
rs3094315
- Agilent CGH 244A: e.g.
A_14_P112718
The marker-value pairs in each line of the file can be separated with a single space, a tab, or a
comma. The file can contain multiple values for each marker. In that case, a separate graph will be
created for each value column in the input file.
For example, chromosome base markers with only one value associated with the marker
would be entered like this:
chrX 100000 1.23
dbSNP rsID markers with two values associated with the marker would be entered like
this:
rs10218492 0.384 0.882
The Genome Graph program will map the marker IDs to the genome. In cases where the marker maps to
more than one location in the genome, the value(s) in your input file will be associated with each
location.
If the value associated with your marker is positive, do not include a sign (e.g., "+").
Include a sign ("-") only if the value is negative.
Note that markers can only be mapped to assemblies for which there already exists a track of the
type that contains your marker type. You can not, for example, use dbSNP rsID markers for the cow
genome, as it does not have a SNP track.
Uploading data
Once you have created your input file, you must upload it to Genome Graphs. From the main Genome
Graphs page, choose the clade, genome, and assembly to which your data pertains. If you are unsure
of the UCSC assembly name, check this page. Then,
click the upload button to go to the upload page.
To upload a file in any of the supported formats, locate the file on your computer using the
controls next to "file name", and then submit. The other controls on this form are
optional, but can be used to enhance the display. In general, the controls that default to
"best guess" will not need modification, since the default guess is almost always
correct.
The controls for display min and max values and connecting lines may be adjusted later via the
configuration page. Here is a description of each control:
-
name of data set: Displayed in graph drop-down in Genome Graphs and as the track
name in Genome Browser. Only the first 16 characters are visible in some contexts. For data sets
with multiple graphs, this is the first part of the name, shared with all members of the data
set.
-
description: A short sentence describing the data set. Displayed in the Genome
Graphs and Genome Browser configuration pages, and as the center label in the Genome Browser.
-
file format: Controls whether the upload file is a tab-separated,
comma-separated, or space separated table.
-
markers are: Describes how to map the data to chromosomes. The choices are that
either the first column of the file is an ID of some sort, or the first column is a chromosome and
the next a base. The IDs can be SNP rs numbers, STS marker names or IDs from any of the supported
genotyping platforms.
-
column labels: Controls whether the first row of the upload file is interpreted
as labels or data. If the first row contains text in the numerical fields, or if the mapping
fields are empty, it is interpreted by "best guess" as labels. This is generally
correct, but you can override this interpretation by explicitly setting the control.
-
display min value/max value: Set the range of the data set that will be plotted.
If left blank, the range will be taken from the min/max values in the data set itself. For all
data sets to share the same scale, you will usually need to set this.
-
label values: A comma-separated list of numbers for the vertical axis. If left
blank, the axis will be labeled at the 1/3 and 2/3 points of your data range.
-
draw connecting lines: Lines are drawn connecting data points that are separated
by this number of bases or fewer.
-
file name, or Paste URLs or data: Specify the uploaded data -- enter either a
file on your local computer; or a URL at which the data file can be found; or simply paste-in the
data. If entries are made in both fields, the file name will take precedence.
Importing data
In addition to supplying your own genome-wide data files, you can also import existing database
tables from an assembly into the Genome Graphs tool. Any table containing positional information can
be imported. This includes tables of the following types: BED, PSL, wiggle, MAF, and bedGraph.
Custom track tables can be imported as well. The tables made by Genome Graphs (chromGraph) can not
be imported as they are already in the format used by the tool, thus no conversion is necessary.
All tables imported into Genome Graphs will be converted into a custom track of type chromGraph
using a window-size of 10,000 bases.
To import a table or custom track, choose the group, track, and table from the lists, then click the
submit button. The other controls are optional, though completing them will enhance the display.
The controls for display min and max values and connecting lines can be set later via the
configuration page as well. Here is a description of each control.
-
name of data set: This will be displayed in the graph list in the Genome Graphs
tool and as the track name in the Genome Browser. Only the first 16 characters are visible in some
contexts. For data sets with multiple graphs, this is the first part of the name, shared with all members of the data set.
-
description: Enter a short sentence describing the data set. It will be displayed
in the Genome Graphs tool and in the Genome Browser.
-
display min value/max value: Set the range of the data set to be plotted. If left
blank, the range will be taken from the min and max values in the data set itself. If you would
like all of your data sets to share the same scale, you will need to set this.
-
label values: A comma-separated list of numbers for the vertical axis. If left
blank the axis will be labeled at the 1/3 and 2/3 point.
-
draw connecting lines: Lines connecting data points separated by no more than
this number of bases are drawn.
-
depth or coverage: When importing positional tables, you can choose to convert
those tables to the chromGraph format by using either the depth or coverage
conversion method. Both conversion methods use a non-overlapping window size of 10,000 bases when
converting to the chromGraph format. In the depth method, the weighted average for each
10,000 base window is assigned to a single point in the center of this window. Whereas the
coverage method is binary &mdash if there is even one point in the input table in that
10,000 base window, the resulting graph will have a value of 1 for that range.
Quick start
Use the examples in this section of the User's Guide to get a feel for how the tool works. Refer to
other sections in this User's Guide for details and instructions for more advanced features.
The Genome Graphs tool comes pre-loaded with sample data. These sample data sets are from real-world
genome-wide studies. Use these data sets to quickly see what the tool looks like when data is
displayed. To view the sample data, choose a data set from the graph drop-down
list, then choose your desired display color from the in drop-down list. The tool
will display the data set directly above the chromosomes in Genome Graphs. Read on to learn how to
customize the display.
Example #1 — SNPs on chr22
Follow these steps to display in Genome Graphs all of the highest quality SNPs on chromosome 22 for
the hg18 assembly whose predicted functional role is "coding non-synonymous" (where there
is a change in the peptide for the allele with respect to the reference assembly). Note that there
are no SNPs on the p-arm of chromosome 22.
This data set is formatted in the "marker value
" style.
The markers
are dbSNP rsIDs. The associated value
is
+1 if the SNP is on the positive strand, and -1 if the SNP is on
the negative strand. Here are the first ten rows of the data file:
rs1007298 +1
rs1007863 +1
rs10154509 +1
rs10154678 +1
rs10154785 +1
rs1018448 +1
rs10212022 +1
rs1022478 +1
rs1042311 +1
rs1042435 +1
Step 1. Upload the data into the Genome Graphs tool
Copy the entire sample data set into a text editor and save the file to your computer. This data set is associated with the human assembly: hg18 (Mar.
2006). Be sure to configure the Genome Graphs tool to use the hg18 assembly like so:
clade: Vertebrate
genome: Human
assembly: Mar. 2006
Upload the file into the Genome Graphs tool. You can configure each
control on the upload page, or just leave them set to their default values.
The upload process may take some time, as the program is actually mapping each rsID in the input
file to its location(s) in the genome.
Step 2. Display the graph in Genome Graphs
Now that your input file has been uploaded to the server, you will want to display it in the Genome
Graphs tool. To display your uploaded data, simply choose the graph name from the
graph drop-down list, then choose your desired display color from the
in drop-down list. Your graph will be displayed directly above the chromosomes in
Genome Graphs. You should see the data plotted directly above chromosome 22.
Step 3. View the graph in the Genome Browser
From the Genome Graphs display, click anywhere on the graph or on chromosome 22 to open the Genome
Browser for hg18 centered at that location on chr22. The graph will be drawn as a track near the top
of the Genome Browser display.
Displaying data in Genome Graphs
Once you have uploaded your data, you will want to display it in the Genome Graphs tool. To display
your uploaded data, simply choose the graph name from the graph drop-down list,
then choose the color in which you would like it to be displayed from the in
drop-down list. Your graph will be displayed directly above the chromosomes in Genome Graphs. Read
on to learn how to customize the display.
Configuring the display
Configuring the graphs display
To go to the configuration page, click the configure button on the main Genome Graphs page.
This is the page from which you can configure many overall aspects of the Genome Graphs display.
Individual graphs can also be configured (see the next section for help on that).
On this page you will find the following controls:
-
image width - controls the overall width of the graphs display on the main Genome
Graphs page. The default is 620 pixels.
-
graph height - controls the height of the graph(s) in the space above each
chromosome. The default is 27 pixels.
-
graphs per line - controls how many graphs are displayed on each line in the
space above each chromosome. For example, if you set this value to two, the display will
superimpose two graphs on top of each other on one line. The axis label for the first graph will
appear on the left side of the display and the axis for the second graph on the right side.
-
lines of graphs - controls how many sets of graphs will appear above each
chromosome. For example, if you set this value to 2, the display will make room for two lines of
graphs (each at the graph height above) in the space above each chromosome.
-
chromosome layout - controls how the chromosomes are laid out in the Genome
Graphs display. You can choose to view one or two chromosomes on each horizontal line in the
display. Alternatively, you can set up the display such that all of the chromosomes appear in one
long line. If you choose this layout, you may want to adjust the width of the image
(image width above).
-
numerical labels - check this box if you would like to see axis labels to the
right/left of the display. If you did not specify label values when you uploaded
your file, the numerical labels will default to 1/3 and 2/3 of the max and min values in your
data input file.
-
highlight missing - check this box if you would like to see the areas in your
graph where there is no data. Note that if you are displaying more than one graph, this attribute
only pertains to the first graph.
-
region padding - controls the size of the data regions. The data points in your
graphs which exceed the significance threshold are padded by this number of bases on either side.
The default places 25,000 bases on each side.
When you have completed configuring the display, click the submit button to return
to the Genome Graphs display.
Configuring individual graphs
Near the bottom of the Configuration page, you will see a list of the graphs that you have uploaded.
Click on the hyperlinked graph name to configure that graph. This configuration pertains to the
Genome Graphs view.
You can set the range of the display by editing the display min/max value values.
This will restrict the Genome Graphs display for this graph to that data range. The axis will be
labeled at 1/3 and 2/3 of the data range that you set.
If your data is sparse, you may want to draw lines between your data points. You can configure that
by editing the draw connecting lines between markers separated by up to ... bases
value. The default value is 25,000,000 bases.
When you have completed configuring the display, click the submit button twice to
return to the Genome Graphs display.
Setting a significance threshold
Most genome-wide data has some amount of noise and is only interesting when the data values are
above a certain value. You can set this value using the significance threshold input box. Enter a
decimal number in this input box and click Enter. The display will now have a light gray line across
the graph at this data value. If you have more than one graph displayed, the significance threshold
only pertains to the graphs that contain the significance threshold in the displayed data range.
The significance threshold works in concert with the browse regions and
sort genes buttons; it will affect the regions that are displayed once you click
either of these two buttons.
To open the Genome Browser with a view of all of the regions in your graph that include data points
that pass the significance threshold, click the browse regions button. This will
open the Genome Browser with a navigation pane on the left side of the screen. This pane will
contain links to all regions which pass your significance threshold. Note that if you are displaying
more than one graph, the significant regions are based only on the first graph in the display
list.
To view a list of genes which are in regions that pass the significance threshold, click the
sort genes button. This will open the Gene Sorter with only the genes that are in
significant locations with respect to your data.
If you would rather view all of your regions without restricting the output to only those regions
that pass the significance threshold, simply delete any values from the significance threshold input
box and click Enter before clicking browse regions.
Setting a data region
The data region is the span of bases that will be added to either side of the data points in your
graphs which exceed the significance threshold. Set the data region by editing the region
padding value on the configuration page. The combination of setting the data region and the
significance threshold will affect two things:
-
the regions displayed in the Genome Browser after you click the
browse regions button,
-
the genes displayed in the Gene Sorter after you click the sort
genes button.
For example, take a data set that contains the following data:
chr2 100100000 2.3
chr2 100100500 4.5
chr2 100101000 1.2
If you set the significance threshold at 4.0, one data point in the data set passes that threshold.
If you then set the data range to 200, then the one significant data point will be padded on each
side by 200 base pairs. In that case, the only resulting significant data region will be
chr2:100,100,300-100,100,700.
If instead you set the data range to 2,000, then the one significant data point will be padded on
each side by 2,000 base pairs. In that case, the resulting significant data region will be
chr2:100,098,500-100,102,500.
Viewing data in the Genome Browser
To view your graphs in the Genome Browser, click the browse regions button. This
will open the Genome Browser with your graph(s) displayed as track(s). You can configure and
edit your track as you can any other track in the Genome
Browser. In addition to the Genome Browser, you will also see a pane on the left-hand side, which
contains links to all of the significant regions in your data. Please note
that if you are displaying more than one graph in Genome Graphs, the significant regions are based
only on the first graph in the display list.
You can also navigate to the Genome Browser by clicking directly on a graph or chromosome in Genome
Graphs. The Genome Browser will open with a 1,000,000 bp window centered on the location on which
you clicked.
Viewing data in the Gene Sorter
To view the set of genes that are in significant regions in your data,
click the sort genes button. This will open the Gene Sorter with a filter to
include only genes that are located in regions in your input data that are above the significance
threshold. Please note that if you are displaying more than one graph in Genome Graphs, the
significant genes are based only on the first graph in the display list.
If the graph was uploaded using markers, then a custom Gene Sorter column with the same name as the
graph will be created. This column will list all markers for each gene that contain values above the
significance threshold.
Deleting data
There are several ways to delete your data once it has been uploaded. If you are viewing your data
as a track in the Genome Browser, you can click on the mini-button or track control for the track
and delete the track using the Remove custom track button. You can also choose to
reset your cart which will reset the browser interface settings to their defaults, as well as delete
all custom tracks and data. Do this by visiting the gateway page and clicking the hyper link:
"Click here to reset".
Your data will be saved on our server for at least 48 hours from the time you last access it, unless
it is saved in a Session.
Correlating data sets
To calculate how well correlated with one another your data sets are, click the
correlate button. This will calculate and display the correlation coefficient
(R) among each of your data sets. R, also known as Pearson's
correlation coefficient, is a measure of the extent that two graphs move together. The value of
R ranges between -1 and 1. A positive R indicates that the graphs
tend to move in the same direction, while a negative R indicates that they tend to
move in opposite directions. R-Squared (which is indeed just R*R)
measures how much of the variation in one graph can be explained by a linear dependence on the other
graph. R-Squared ranges between 0 when the two graphs are independent to 1 when the
graphs are completely dependent.
To return to the Genome Graphs, click the return to graphs button.