Table Browser User's Guide
Contents
See also the Open
Helix tutorial and training materials.
Questions and feedback are welcome.
Introduction
The Table Browser provides a powerful and flexible graphical interface for querying and manipulating
the Genome Browser annotation tables. Because the Table Browser uses the same database as the Genome
Browser, the two views are always consistent.
Using the Table Browser, you can:
-
retrieve the DNA sequence data or annotation data underlying Genome Browser tracks for the entire
genome, a specified coordinate range, or a set of accessions
-
apply a filter to set constraints on field values included in the output
- generate a custom track and
automatically add it to your session so that it can be graphically displayed in the Genome
Browser
-
conduct both structured and free-from SQL queries on the data
- combine queries on multiple tables or custom tracks through an intersection or union and
generate a single set of output data
-
display basic statistics calculated over a selected data set
-
display the schema for table and list all other tables in the database connected to the table
-
organize the output data into several different formats for use in other applications,
spreadsheets, or databases
This User's Guide is aimed at both the novice Table Browser user as well the advanced user. If you
are new to the Table Browser, read the Getting started section to
learn about browser basics and try some simple queries. Advanced users may want to proceed directly
to the section that addresses a particular area of functionality in detail.
Although the Table Browser provides sufficient flexibility to satisfy the needs of most users, some
advanced users may require the ability to run MySQL directly on the Genome Browser database. UCSC
provides a public MySQL server at genome-mysql.soe.ucsc.edu. Alternatively, the database may be
downloaded to a local computer for MySQL access. See the mirror
site documentation for information on setting up a local copy of the database.
About the Table Browser databases and tables
The Table Browser is built on top of the Genome Browser database, which actually consists of
several separate databases, one for each genome assembly.
Tables within the databases may be differentiated by whether the data are based on genome start-stop
coordinates (positional tables) or are independent of position (non-positional tables).Some output
formats and query options are applicable only to positional tables, hence the distinction.
Non-positional tables
Non-positional tables contain data not tied to genomic location, for example a table that correlates
a Known Gene ID with a RefSeq accession ID. Some non-positional tables relate internal numeric mRNA
IDs to extended information such as author, tissue, or keyword. Some "meta" tables in
this category contain information about the structure of the database itself or describe external
files containing sequence data.
Positional tables
Positional tables contain data associated with specific locations in the genome, such as mRNA
alignments, gene predictions, cross-species alignments, and other annotations. Each of the
annotation tracks displayed in the Genome Browser is based on a positional table. In some instances,
data from other positional and non-positional tables may also be incorporated into the track. Data
associated with custom annotation tracks active within the user's Table Browser session are also
available as positional tables.
Positional tables can be further subdivided into several categories based on the type of data they
describe. Alignment data can be best described by using a block structure to represent each element.
Other tables require only start and end coordinate data for each element. Some tables specify a
translation start and end in addition to the transcription start and end. Some tables contain strand
information, others don't. Most tables, but not all, specify a name for each element. Based on the
format of the data described by a table, different query and output formatting options may be
offered.
Getting started - simple queries
In its most basic form, the Table Browser can be used to retrieve a specific subset of records from
a track or positional table in a selected genome assembly. The query may be based on a specific
position or a set of one or more identifiers.
This section describes the steps required to conduct basic simple data queries using the Table
Browser. Once you have mastered the basic Table Browser functionality, refer to subsequent sections
for information about generating more complex queries that use filters, intersections, and
alternative data output formats.
Simple position-based query
Follow these steps to display a list of records that lie within a specific position in a table:
Step 1. Pick a genome assembly
Specify the genome assembly from which you'd like to retrieve the data by choosing the appropriate
organism in the genome
list, then selecting the assembly version from the
assembly
list. Note that the assembly
list refreshes each time a different
option is selected in the genome
list. Assemblies are typically named after the first
three characters of an organism's genus and species names.
Step 2. Pick an annotation track
The group
list shows all the annotation track groups available in the selected genome
assembly. The names correspond to the groupings displayed at the bottom of the Genome Browser
annotation tracks page. When a group is selected from the list, the track
list
automatically updates to show all the annotation tracks available within that group.
-
If you already know the name of the annotation track in which you're interested, select the
All Tracks option in the
group
list, then select the track from the
track
list. Similarly, you can directly select a table by choosing the
All Tables option in the group
list, selecting a database from the
database
list, then selecting the table from the table
list.
-
To examine all the tracks available within a certain group (e.g., all gene prediction tracks),
select the group name from the
group
list, then browse the entries in the
track
list.
-
Custom annotation tracks created during the current session are listed under the Custom
Tracks group.
-
If no selections are made from the
group
or track
lists, the track
selection defaults to the Known Genes track in the Genes and Gene Prediction
Tracks group.
Step 3. Pick a table
The table
list shows all tables (both positional and non-positional) associated with
the currently-selected track. By default, it displays the primary table for the track, i.e. the
table containing the data shown in the Genome Browser annotation track. Other tables in the list are
linked to the primary table by a common field and may provide supporting data used in constructing
the annotation.
-
If the
group
list is set to the All Tables option, the tables list will show
all tables present in the database currently selected in the database
list, rather
than those associated with a particular track.
Step 4. Pick a genomic region (positional tables only)
By default, the Table Browser region is set to genome
, which will display all the data
records in the selected table.
-
To restrict the data to a specific position range, type the position into the
position
box. Some examples of specific positions include a chromosome name
(chrX), a coordinate range within a chromosome (chrX:100000-400000), or a
scaffold name.
-
You can select multiple genomic regions by clicking the "define regions" button and
entering up to 1,000 regions in a 3- or 4-field
BED file format.
-
To look up the position range of a genomic element -- such as a gene name, an accession ID, an
STS marker, etc. -- or keywords from the GenBank description of an mRNA, type the string into the
position
box, then click the Lookup
button.
-
The data in non-positional tables are not tied to genomic coordinates; therefore, the
region
option is unavailable when a non-positional table is selected. A basic query
on a non-positional table will show all the data in the table.
Step 5. Display the output
Click the Get Output
button to display the results of the query. By default, the Table
Browser outputs the data from all fields in the selected table as tab-separated text on the screen.
See the Output formats section for information on configuring the query
output.
Example:
Here is an example of a simple query that retrieves all the RefSeq Genes records in the position
range chr7:26906938-26940301 on the May 2004 human genome assembly.
-
Select the Human option in the
genome
list.
-
Select the May 2004 option in the
assembly
list.
-
Select the Genes and Gene Prediction Tracks option in the
group
list.
-
Select the RefSeq Genes option in the
track
list.
-
Type chr7:26906938-26940301 in the
position
box (the Table Browser will
automatically select the position
option button).
-
Click the
Get Output
button.
The Table Browser will display the records for the RefSeq accessions NM_005522, NM_153620,
NM_006735, NM_153632, NM_030661, and NM_153631.
Batch query using identifiers
In many cases, you may want to retrieve data based on a list of one or more accessions or names,
rather than querying by genomic position. Many tracks in the Table Browser, such as those in the
Genes and Gene Prediction track group, support identifier queries. The identifier type used
in the query must match the kind of identifiers present in the track data, e.g., mRNA accession IDs
must be used to query the mRNA table.
Follow these steps to display a list of records that correspond to a set of accessions or names
entered as query input.
Step 1. Pick the genome assembly, track, and table
Step 2. Select the genome region
setting
Step 3. Load the identifiers into the browser
Click the Paste List
button to type or paste in the identifiers or the Upload
List
button to load the data from a file existing on your local computer.
-
If you are loading multiple identifiers, entries must be separated by a space, tab, or line.
-
Wildcards may not be used in the list (see the Filter section for
information about conducting queries that include wildcards).
-
The Table Browser will retain the identifier list until you delete the information by clicking the
Clear List
button.
Step 4. Click the Get Output
button
See the Output formats section for information about configuring the
query output.
Filtering output by constraining field values
The Table Browser filter
option can be used to:
-
apply constraints on table field values to restrict which records should appear in the query
output
-
conduct batch queries using wildcards
-
include fields from multiple tables in the query output
Filtering on fields from a single table
Follow these steps to create a filter on one or more fields in a single table:
Step 1. Select the assembly, track, and region
Step 2. Click the Create
button on the filter
line
Step 3. Add the filter constraints
One or more of the fields in the currently selected table may be filtered by typing constraints into
the corresponding text boxes.
-
By default, the initial values set up in the filter match all records in the table.
-
Constraints must match the data type of the field to be applied successfully. For example, the
geneName field in the hg17 refFlat table is a string; therefore, constraining
values must also be strings. See the Filter constraints sections
for more information on valid filter values.
-
Multiple filter values may be applied against one field by separating the values with spaces.
-
Individual field constraints are combined with AND, i.e. a record must meet the
constraints on all fields to be retrieved.
Step 4. Click the Submit button to apply the filter
Once a filter has been created on a table, it will persist for the duration of the Table Browser
session or until it has been cleared. Only one filter can exist for a table at a time, but multiple
filters may exist in one session if they are applied on different tables. To modify an existing
filter, click the Edit
button on the filter
line. To remove a filter,
click the Clear
button.
Filtering on fields from multiple tables
A Table Browser filter may include constraints on fields from tables related to the primary table.
To create a filter composed of fields from multiple tables:
Step 1. Select the assembly, track, and region
Step 2. Click the Create
button on the filter
line
Note: If a filter already exists on the table, click the Edit
button
to modify it or the Clear
button to remove it.
Step 3. Select the tables to include in the filter
Scroll down to the Linked Tables section of the page. The tables listed in this section are
linked to the selected table by one or more common fields (typically a name, accession, or ID
field). Click the boxes in front of the table(s) whose fields you wish to include in the filter,
then click the Allow Filtering Using Field in Checked Tables
button. The fields of the
selected tables will be displayed in the top portion of the page.
Step 4. Add the filter constraints
Step 5. Click the Submit button to apply the filter
Note: In the current implementation of the Table Browser, the selected fields
from primary and related tables output format option must be used when including fields from
multiple tables in a filter. Check the boxes for all tables in the Linked Tables
list
on which filter constraints have been applied, then click the Allow Selection From Checked
Tables
button to include them in the output.
Filter constraints
Strings
Text fields are compared to words or patterns containing wildcard characters. Valid wildcards are
"*" (matches 0 or more characters) and "?" (matches a single character). Each
space-separated word or pattern in a text field box is matched against the value of that field in
each record. If any word or pattern matches the value, then the record meets the constraint on that
field.
Numbers
Numeric fields are compared to table data using an operator such as <, >, != (not equals) followed
by a number. To specify a range, enter two numbers (start and end) separated by white space and/or a
comma.
Free-form queries
When the filters on individual fields aren't sufficiently flexible, the free-form query
text box allows the application of more complex constraints that typically relate two or more field
names of the selected table. Valid free-form queries use the syntax of the SQL
where clause
(using wildcards as defined above).
Free-form queries combine simple constraints with AND, OR, and NOT using
parentheses as needed for clarity. A simple constraint consists of a table field name, a comparison
operator (see below), and a value: a number, string, wildcard value (see below), or another field
name. In place of a field name, you may use an arithmetic expression of numeric field names.
-
String or wildcard values for text comparisons must be quoted. Single or double quotes may be
used. If comparing to a literal string value, use the "=" or "!=" operator.
If comparing to a wildcard value, use the "LIKE" or "NOT LIKE" operator.
-
Numeric comparison operators include <, <=, =, != (not equals), >=, and >.
-
Arithmetic operators include +, -, *, and /.
-
Other SQL comparison keywords may also be used.
Example:
The following examples show free-form queries applied to the human refGene table).
-
txStart = cdsStart
- searches for gene models missing expected 5' UTR upstream
sequence (if strand is "+"; 3' UTR downstream if strand is "-")
-
chrom NOT LIKE "chr??"
- restricts search to chromosomes 1 - 9, X and
Y
-
cdsEnd - cdsStart) > 10000
- selects genes with coding spanning more than 10 kbp
-
txStart != cdsStart) AND (txEnd != cdsEnd) AND exonCount = 1
- finds single exon
genes with both 3' and 5' flanking UTR
-
cdsEnd - cdsStart) > 30000) AND (exonCount=2 OR exonCount=3)
- finds genes with long
spans but only 2 - 3 exons
Intersecting data from multiple tables
It is often interesting to compare the positions of features in different annotation tracks to
identify points of overlap. The Table Browser intersection
utility can be used to
generate various position-based comparisons of track features. Using the intersection
utility, you can:
-
examine all genomic positions where the feature data from the two tracks overlap
-
identify genomic locations where there is no overlap between track features
-
establish thresholds for the amount of overlap that must exist between the two feature sets
-
conduct feature-by-feature comparisons as well as base-by-base comparisons of tracks
-
complement (invert) a position set before comparing the tracks
An intersection may be expanded to include additional tables by using the Table Browser custom track
feature.
Note: The intersection
utility can be used only on
positional tables. To generate intersections incorporating data in non-positional tables, use the
Table Browser filter
utility. See the Filtering on fields
from multiple tables section for more information.
Intersecting data from two tables
Follow these steps to configure and generate an intersection between two positional tables:
Step 1. Select the assembly, track, table, and region for the primary table
Note: Only positional tables may be used in an intersection.
Step 2. Click the Create
button on the intersection
line
Note: If an intersection already exists on the table, click the Edit
button to modify it or the Clear
button to remove it.
Step 3. Select the secondary track to include in the filter
Select a group in the group
list, then select a track from the
track
list. To view all the tracks available, regardless of group, select the
All Tracks option in the group
list.
Step 4. Select a combination method
The Table Browser provides two major types of comparisons:
-
feature-by-feature comparisons preserve the structure of the primary table. For example,
if the primary table describes exon structure and the features are compared with a second table,
the results will describe exon structure (unless you choose an output format in which the
structure is lost).
-
base-by-base comparisons examine the primary table and the table underlying the secondary
track one base at a time. The structure of the primary table is not preserved in this comparison.
For example, even if the primary table describes exon structure, the intersection results will
contain only position ranges; no information about exon/block structure, strand, or translation
region will be retained.
Click the circle in front of a combination method to select it. Only one method may be selected from
the two sets of methods. For more information about the individual combination options, see the
Intersection Options section.
Step 5. (optional) Select the complement options
Check the box in front of one or both tables to complement the feature data in the The complement options allow you to invert the set of positions covered by one or both tables. For example, if you
choose to complement the primary track, any position covered by the that track's features will be
considered not covered, and vice versa. This option provides more flexibility in comparing
track positions.
Step 6. Click the Submit
button to apply the intersection
Once an intersection has been created on a table, it will persist for the duration of the Table
Browser session or until it has been cleared. Only one intersection may exist at a time. To modify
an existing intersection, click the Edit
button on the intersection
line. To remove an intersection, click the Clear
button.
Intersecting data from more than two tables
The Table Browser intersection
utility limits combinations to only two
tables. An existing intersection may be expanded to include additional tables by using the Table
Browser custom track utility. To create an intersection on multiple tables:
Step 1. Set up an intersection between two tables
See the Intersecting data from two tables section for more
information.
Step 2. Save the intersection data in a custom track
See the
Saving data as a custom track section for information on generating a
custom track. Note: In the current implementation of the Table Browser, you must
use the Get Custom Track
button on the custom track page to add the custom track to the
Table Browser track
list.
Step 3. Select the newly-generated custom track
Select the Custom Tracks option in the group
list, then select the newly
created custom track from the track
list.
Step 4. Create an intersection with another track
Follow the steps in the Intersecting data from two tables section
to intersect the custom track with another track.
Intersection options
Feature-by-feature comparisons
Some comparisons preserve the primary table's gene and alignment structure, if it exists. For
example, if the refGene table (human RefSeq Genes track) is combined with another
table using one of these comparisons, the resulting output data will describe exon structure
(unless you choose an output format in which the structure is lost). Primary table features are kept
or discarded based on the amount of positional overlap with the features in the table underlying the
secondary track. The Table Browser offers the following options in this category:
-
Any overlap: A primary table record will appear in the output if any of its base
positions are covered by any feature in the secondary table.
-
No overlap: A primary table record will appear in the output only if none of its
base positions are covered by any feature in the secondary table.
-
Overlap greater than a specified threshold: A primary table record will appear in
the output if the percentage of its base positions covered by secondary table features is greater
than the user-specified threshold.
-
Overlap less a specified threshold: A primary table record will appear in the
output if the percentage of its base positions covered by secondary table features is less than
the user-specified threshold.
Note: If the primary table has an exon/block structure, only those bases located in
exons and/or blocks will be counted.
Base-by-base comparisons
In these combination options, the positions of the primary and secondary table features are compared
one base position at a time. When applying base-by-base comparisons, the structure of the primary
table is not preserved. For example, if the refGene table (from the human RefSeq
Genes track) is compared with a secondary table using these comparisons, the resulting output
data will not describe exon structure. Instead, only position ranges will be returned; the
exon/block structure, strand, and translation region information will be discarded. The Table
Browser provides the following base-by-base combination options:
-
Base-by-base intersection (AND): A nucleotide position is included in the output
if it is covered by at least one feature of both the primary table and the secondary
table.
-
Base-by-base union (OR): A nucleotide position is included in the output if it is
covered by at least one feature of either the primary table or the secondary table.
Note: If the primary table has an exon/block structure, only base positions located
in exons and/or blocks will be counted.
Base-by-base complement (NOT)
Before the Table Browser applies a feature-by-feature or base-by-base comparison to the table data,
the set of positions covered by one or both tables can be inverted (complemented). When the data set
of a table is complemented, any position covered by the table's features in the original data will
be considered not covered in the inverted data, and vice versa. This option gives the user more
flexibility in comparing table positions.
Correlating data from two tables
The Table Browser Correlation function creates a scatter plot of the data points of two tables as
well as provides individual histograms of the data points from both tables. Additionally, it will
also show a plot of the Residuals vs. Fitted which can be used to detect non-linearity, unequal
error variances and outliers.
The correlation function uses Pearson's correlation, which is optimized to work with continuous data
such as wiggle tracks. For tracks that do not have data
values such as gene-structured tracks, the data value used in the calculation is 1.0 for bases
covered by exons and 0.0 at all other positions in the region.
Due to memory and processing limitations, the number of data points that can be plotted is limited
to 300,000,000. The "Window data to" function allows you to smooth out your plot by taking
the average of the number of data points specified (defaults to 1). The total number of bases
analyzed is independent of the data window. There is currently no way to output the results of the
Correlation function.
Output formats
The data resulting from a Table Browser query may be configured in a number of different ways:
-
The output can be displayed on the screen, saved to a file, or saved to an annotation track table
that can be displayed in the Genome Browser or used in a subsequent Table Browser query.
-
The data can include all fields from the primary or selected table, or can be restricted to
selected fields from the primary table and related tables.
-
The data can be organized in one of several formats: tab-separated, sequence (FASTA), Browser
Extensible Data format (BED), Gene Transfer Format (GTF), or a statistical summary of the data in
the query.
The output options available for a specific query may vary depending on the table(s) selected. For
example, non-positional table data cannot be organized in a position-based format, but instead may
be displayed only in tab-separated format. The Table Browser will automatically update the options
on the output format
list to show only those available for the current query.
Displaying all fields in a table
To display all the fields of the records in the query output in tab-separated format, select the
all fields from primary table option.
Displaying selected fields from one or more tables
To restrict the query output to a subset of the fields in a table, choose the selected fields
from primary and related tables option. You will be prompted to pick the table fields to
display. Click the box in front of the fields you would like to see in the query output (or click
the Check All
button to select all the fields), then click the Get Fields
button.
To include data fields from other tables linked to the selected table, choose the selected
fields from primary and related tables option, then scroll down to the Linked Tables
section of the page. The tables listed in this section are linked to the selected table by one or
more common fields (typically a name, accession, or ID field). Click the boxes in front of the
table(s) whose fields you wish to include in the query output, then click the Allow Selection
From Checked Tables
. The fields of the selected tables will be displayed in the top portion
of the page. Click the boxes in front of the fields that you wish to include in the query output,
then click the Get Fields
button underneath any of the field lists to generate
tab-separated output that includes data from all the selected fields. Note that the Get
Fields
and Cancel
buttons apply globally to all the selected tables, but the
Check All
and Clear All
buttons apply only to the fields listed directly
above the buttons.
Displaying sequence (FASTA) data (positional tables only)
To display the genomic sequence underlying the query results, select the sequence option in
the output format
list. The Table Browser will present you with several options to
configure the output display. When you have completed the configuration, click the Get
Sequence
button. When displaying sequence data for gene prediction tracks, you will also be
offered the option to view the protein and mRNA sequence as extracted from the data source in
addition to the genomic sequence.
Displaying CDS FASTA alignments (genePred tables only)
The CDS FASTA alignments are created from a Multiple Alignment File
(MAF) in combination with a
genePred table. The UCSC MAF format stores multiple
alignments at the DNA level between entire genomes. You can use the Table Browser to return FASTA
alignments of coding regions in nucleotide-space or translated into amino acid-space. However, it is
worth noting that the initial MAF files are all created by aligning genomes at the DNA level.
Genome-wide CDS FASTA alignments
Note that when using the Table Browser to fetch CDS FASTA output, it is best to restrict your query
to a reasonable-sized position range rather than requesting output from the entire genome. A
genome-wide query will take a substantial amount of compute time, and it is likely that your
Internet browser will time out and disconnect. If you would like to download genome-wide CDS FASTA
output for any of several model organisms, you can do so from the
download server.
Creating CDS FASTA alignments using the Table Browser
To display FASTA multiple alignments for the CDS regions of genes, select the CDS FASTA
alignment from multiple alignment option in the output format
list. In order to
see this output format option, you must have a genePred table selected. If you limit your search to
a certain position range within the genome (rather than searching the entire genome), the tool will
return FASTA alignments for all genes that overlap the position for which you are searching. The
Table Browser will present you with a configuration page. On this page, you can select options for
your output.
First, select your MAF table. This is the table from which the multiple alignments will be
extracted for the CDS regions of your gene track. If you do not know the name of the MAF table that
corresponds to the Conservation track, you can find it in the Genome Browser by following these
instructions.
Then select any of the following choices:
-
Separate into exons - The default behavior is for the coding exons of each gene
to be concatenated into one sequence in the output FASTA multiple alignment. In this case each
output row header has the format listed below under "Whole gene format". If the separate
into exons option is chosen then each exon will be listed with a separate header in the format
listed below under "Exon format".
-
Show nucleotides - The default behavior is for the nucleotides in the alignment
to be translated into amino acids according to the strand and exon frames defined in the selected
genePred table. If this option is chosen, then the nucleotides in the alignment will not be
translated into amino acids.
-
Output lines with just dashes - The default behavior is for the alignment rows
that contain only dashes to not be printed. If this option is chosen, then these dashes-only
rows are printed.
-
Format output as table - If this option is chosen, the header and sequence for
each organism will appear on the same line.
-
Truncate headers as __ characters (enter zero for no headers) - This option
works in conjunction with the "Format output as table" option. If you want to see only
a portion of the headers, choose this option, and enter the number of characters at which you
would like the headers truncated.
Finally, from the list of species, select those that you would like included in the FASTA multiple
alignment output. Press the "get output" button to view the output.
Explanation of CDS FASTA header format
Whole gene format: geneName_assemblyName peptideLength location
Exon format: geneName_assemblyName_exonNum_totalExons exonLength inFrame outFrame
location
Here are the descriptions for each field name:
-
geneName- the name field from the genePred table.
-
assemblyName- the UCSC assembly
name for the species.
-
peptideLength- the length of the entire coding region. If the "Show
nucleotides" option is chosen, this will be in nucleotides, otherwise it will be the number
of amino acids in the peptide.
-
location- this is the chromosome position within the assembly that is aligned in
the multiple alignment. The format of this string is chrom:start-end followed by the strand where
the alignment occurs. If more than one region is aligned then all the regions are listed with a
semi-colon (;) between each position. This address is in genome browser coordinates (i.e. the
start address is one-based).
-
exonNum- the ordinal of the exon. Exons are counted starting at one and begin at
the transcription start site and progress along the strand of transcription.
-
totalExons- the number of coding exons in the gene.
-
exonLength- the length of the current exon. If the "Show nucleotides"
option is chosen, this will be the number of nucleotides in the exon, otherwise it will be the
number of amino acids in the exon (with amino acids translated from split codons placed in the
exon where two of the three nucleotides lie).
-
inFrame- the frame number of the first nucleotide in the exon. Frame numbers can
be 0, 1, or 2 depending on what position that nucleotide takes in the codon which contains
it.
-
outFrame- the frame number of the nucleotide after the last nucleotide in this
exon. Frame numbers can be 0, 1, or 2 depending on what position that nucleotide takes in the
codon which contains it.
Explanation of CDS FASTA sequence format
As noted above, the CDS FASTA output files can be in either DNA-space or protein-space.
In some instances, there is a dash ("–") in the sequence portion of the CDS FASTA
file. Dashes are used in several circumstances. They indicate missing sequence for the aligning
genome, as well as deletions in the aligning genome or insertions in the base genome.
Because the CDS FASTA alignments are based on one reference genome, any amino acids or nucleotides
that are not in the reference genome are not displayed. Consequently the peptides shown for aligning
genomes are not necessarily the peptide that the gene of the other organism would generate. Any
sequence inserted in an aligning genome or deleted in the base genome will not be present in the
alignment. We represent this condition with an orange bar in the Genome Browser display, but the
CDS FASTA alignments silently ignore this issue.
Nucleotide CDS FASTA sequence:
Consider the example below that shows the FASTA sequence for four species aligned with the first
exon of the human gene PLEKHO1 (UCSC Gene: uc001ett.1). Note that the rat (rn4) row is missing the
first three nucleotides. This could be due to a lineage-specific insertion between the rat and human
genomes, or a lineage-specific deletion between the human and rat genomes. Note also that the
Zebrafish (danRer4) row contains only dashes. This could be due to excessive evolutionary distance
between the zebrafish and human, missing data in the zebrafish, or independent indels in the region
in both species. Sometimes it is helpful to view the Conservation track in the Genome Browser in
this area to clarify the exact meaning of the dashes.
>uc001ett.1_hg18_1_6 30 0 0 chr1:148389072-148389101+
ATGATGAAGAAGAACAAcode
>uc001ett.1_panTro2_1_6 30 0 0 chr1:129156502-129156531+
ATGATGAAGAAGAACAAcode
>uc001ett.1_rn4_1_6 30 0 0 chr2:190795892-190795918-
---ATGAAGAAGAGCGGCTCCGGCAAGCGG
>uc001ett.1_danRer4_1_6 30 0 0
------------------------------
>uc001ett.1_oryLat2_1_6 30 0 0 chr11:3404940-3404969-
AGGATGAAGAAAAGCAACCAGAGCAGGCGG
Amino Acid CDS FASTA sequence:
-
Codons that have a dash in any of the three nucleotides are represented by a dash in the amino
acid.
-
Codons with an N in any position are represented with an X.
-
Stop codons are represented with a Z.
-
All other amino acids follow the IUPAC amino acid codes.
-
In exon format, when the codon triplet is split between two exons, the amino acid will be
displayed as part of the exon containing two of the three nucleotides like so:
|exon1| |exon2|
nucleotide: AAACCCT code
protein: K P F G K
Saving query results in GTF or BED format (positional tables only)
To format the query results using
GTF or
BED conventions, select the
corresponding option in the output format
list. Note that when you select GTF, the
table browser translates the output into this format. For tables that lack feature designations, all
records are arbitrarily assigned the feature "exon" to conform to GTF specifications. If
you select BED format, you will be presented with the option to include and configure a custom track
header and options for organizing the data. When you have finished the configuration -- or to accept
the default options -- click the Get BED
button at the bottom of the window.
Saving data to a file
By default, the Table Browser displays query results directly in your internet browser window. To
redirect the data to a file, type a file name into the output file
box before starting
the query. The Table Browser will prompt you for the location of this file on your local disk while
processing the query.
Saving data as a custom track (positional tables only)
Query output may be saved in a format that can be displayed as a custom annotation track in the
Genome Browser. Custom tracks created during a Table Browser session may also be used for
subsequent queries and intersections in the same session. For more information on custom tracks, see
the Genome Browser User's Guide.
To save query data in custom track format, select the custom track option in the
output format
list. When the query is executed, the Table Browser will prompt you to
customize the track header and configure the record layout of the data. The configuration is
optional; the Table Browser automatically sets up a default track configuration. Click the
Custom track link for more information on custom track syntax and format.
When you have finished configuring the custom track -- or to accept the default configuration --
click one of the buttons at the bottom of the window to create the custom annotation track.
-
To display the query results as text on the screen, click the
Get Custom Track File
button.
-
To save the query results to a file on your local disk for future use, specify a file name in the
output file
box before executing the query, then click the Get Custom Track
File
button.
-
To load the query results into a table accessible from the Table Browser
table
list,
click the Get Custom Track in Table Browser
button.
-
To view the query results as a custom track in the Genome Browser, click the
>Get Custom
Track in Genome Browser
button. Your browser display will be redirected automatically to
the Genome Browser, with your custom track positioned near the top of the annotation tracks
window.
-
To access your custom track data in a subsequent query in the same Table Browser session, select
the Custom Tracks option from the
group
list to display the custom tracks
available.
Displaying query results as Genome Browser hyperlinks (positional tables only)
To examine the records in the query output individually in the Genome Browser, select the
hyperlinks to Genome Browser output option. The Table Browser will display a list of one or
more hyperlinks corresponding to the individual records in the output data. Click a link to open up
the Genome Browser display to the item and position shown on the hyperlink.
Displaying a statistical summary of query data (positional tables only)
To generate a statistical summary of the query output data, the region covered by the query, and the
CPU time required to process the query, click the Summary/Statistics
button.