UCSC Genome Browser: Genome Graphs User's Guide

Genome Graphs User's Guide

Introduction
Formatting, Uploading & Importing Data

Formatting Data
Uploading Data
Importing Data

Quick Start
Displaying Data in Genome Graphs

Configuring the Display
Setting a Significance Threshold
Setting a Data Region

Viewing Data in the Genome Browser
Viewing Data in the Gene Sorter
Deleting Data

Correlating Data

Questions and feedback on this User's Guide are welcome.

User questions and answers on Genome Graphs and other topics are available in the Genome mailing list archives.

Introduction

Genome Graphs is a tool for displaying genome-wide data sets such as the results of genome-wide SNP association studies, linkage studies and homozygosity mapping.

Using the Genome Graphs tool, you can:

upload several sets of genome-wide data and display them simultaneously

click on an area of interest and go directly to the genome browser at that position

set a significance threshold for your data and view only regions that meet that threshold

view the genes that exist in areas where your data meet your significance threshold

To return to Genome Graphs from any other location on the Genome Browser website, use your browser's Back button, or press Home on the blue navigation bar, then press the Genome Graphs link.

Note that only the "standard" chromosomes are displayed in the Genome Graphs display; haplotype and mitochondrial chromosomes are not displayed.

This User's Guide is aimed at both the novice Genome Graphs user as well as the advanced user. If you are new to the Genome Graphs tool, read the Quick Start section to learn about the basics using some sample data. Advanced users may want to proceed directly to the section that addresses a particular area of functionality in detail.

Formatting, Uploading & Importing Data

Formatting Data
Genome Graphs allows you to upload data from files that reside on your computer. Several file formats are accepted by the program. For all formats there is a single line for each marker. Each line starts with information on the marker, and ends with the numerical values associated with that marker. The markers can be of one of the following types:

— chromosome base: e.g. chr1 130000 (Note that the first base in a chromosome is considered position 0.)

— STS Marker: e.g. RH75228

— dbSNP rsID: e.g. rs12345

— Affymetrix 500k Gene Chip: e.g. SNP_A-1780270

— Affymetrix Genome-Wide SNP Array 6: e.g. SNP_A-8575125

— Affymetrix SNP Array 6 Structural-Variation: e.g. CN_47396

— Illumina HumanHap300 Bead Chip: e.g. rs3934834

— Illumina HumanHap550 Bead Chip: e.g. rs3094315

— Illumina HumanHap650 Bead Chip: e.g. rs3094315

— Agilent CGH 244A: e.g. A_14_P112718

The marker-value pairs in each line of the file can be separated with a single space, a tab, or a comma. The file can contain multiple values for each marker. In that case, a separate graph will be created for each value column in the input file.

For example, chromosome base markers with only one value associated with the marker would be entered like this:
chrX 100000 1.23
dbSNP rsID markers with two values associated with the marker would be entered like this:
rs10218492 0.384 0.882
The Genome Graph program will map the marker IDs to the genome. In cases where the marker maps to more than one location in the genome, the value(s) in your input file will be associated with each location.

If the value associated with your marker is positive, do not include a sign (e.g. '+'). Include a sign ('-') only if the value is negative.

Note that markers can only be mapped to assemblies for which there already exists a track of the type that contains your marker type. You can not, for example, use dbSNP rsID markers for the cow genome, as it does not have a SNP track.

Uploading Data
Once you have created your input file, you must upload it to Genome Graphs. From the main Genome Graphs page, choose your clade, genome, and assembly to which your data pertains. If you are unsure of the UCSC assembly name, you can check this page. Now, press the upload button to go to the upload page.

To upload a file in any of the supported formats, locate the file on your computer using the controls next to file name, and then submit. The other controls on this form are optional, though filling them out will sometimes enhance the display. In general the controls that default to "best guess" can be left alone, since the guess is almost always correct.

The controls for display min and max values and connecting lines can be set later via the configuration page as well. Here is a description of each control.

name of data set: Displayed in graph drop-down in Genome Graphs and as the track name in Genome Browser. Only the first 16 characters are visible in some contexts. For data sets with multiple graphs, this is the first part of the name, shared with all members of the data set.

description: A short sentence describing the data set. Displayed in the Genome Graphs and Genome Browser configuration pages, and as the center label in the Genome Browser.

file format: Controls whether the upload file is a tab-separated, comma-separated, or space separated table.

markers are: Describes how to map the data to chromosomes. The choices are that either the first column of the file is an ID of some sort, or the first column is a chromosome and the next a base. The IDs can be SNP rs numbers, STS marker names or ID's from any of the supported genotyping platforms.

column labels: Controls whether the first row of the upload file is interpreted as labels or data. If the first row contains text in the numerical fields, or if the mapping fields are empty, it is interpreted by "best guess" as labels. This is generally correct, but you can override this interpretation by explicitly setting the control.

display min value/max value: Set the range of the data set that will be plotted. If left blank, the range will be taken from the min/max values in the data set itself. For all data sets to share the same scale, you will usually need to set this.

label values: A comma-separated list of numbers for the vertical axis. If left blank, the axis will be labeled at the 1/3 and 2/3 points of your data range.

draw connecting lines: Lines are drawn connecting data points that are separated by this number of bases or fewer.

file name, or Paste URLs or data: Specify the uploaded data -- enter either a file on your local computer; or a URL at which the data file can be found; or simply paste-in the data. If entries are made in both fields, the file name will take precedence.

Importing Data
In addition to supplying your own genome-wide data files, you can also import existing database tables from an assembly into the Genome Graphs tool. Any table containing positional information can be imported. This includes tables of the following types: BED, PSL, wiggle, MAF, and bedGraph. Custom track tables can be imported as well. The tables made by Genome Graphs (chromGraph) can not be imported as they are already in the format used by the tool, thus no conversion is necessary. All tables imported into Genome Graphs will be converted into a custom track of type chromGraph using a window-size of 10,000 bases.

To import a table or custom track, choose the group, track, and table from the lists, then press the submit button. The other controls are optional, though completing them will enhance the display. The controls for display min and max values and connecting lines can be set later via the configuration page as well. Here is a description of each control.

name of data set: This will be displayed in the graph list in the Genome Graphs tool and as the track name in the Genome Browser. Only the first 16 characters are visible in some contexts. For data sets with multiple graphs, this is the first part of the name, shared with all members of the data set.

description: Enter a short sentence describing the data set. It will be displayed in the Genome Graphs tool and in the Genome Browser.

display min value/max value: Set the range of the data set to be plotted. If left blank, the range will be taken from the min and max values in the data set itself. If you would like all of your data sets to share the same scale, you will need to set this.

label values: A comma-separated list of numbers for the vertical axis. If left blank the axis will be labeled at the 1/3 and 2/3 point.

draw connecting lines: Lines connecting data points separated by no more than this number of bases are drawn.

depth or coverage: When importing positional tables, you can choose to convert those tables to the chromGraph format by using either the depth or coverage conversion method. Both conversion methods use a non-overlapping window size of 10,000 bases when converting to the chromGraph format. In the depth method, the weighted average for each 10,000 base window is assigned to a single point in the center of this window. Whereas the coverage method is binary &mdash if there is even one point in the input table in that 10,000 base window, the resulting graph will have a value of 1 for that range.

Quick Start

Use the examples in this section of the User's Guide to get a feel for how the tool works. Refer to other sections in this User's Guide for details and instructions for more advanced features.

The Genome Graphs tool comes pre-loaded with sample data. These sample data sets are from real-world genome-wide studies. Use these data sets to quickly see what the tool looks like when data is displayed. To view the sample data, choose a data set from the graph drop-down list, then choose your desired display color from the in drop-down list. The tool will display the data set directly above the chromosomes in Genome Graphs. Read on to learn how to customize the display.

Example #1 — SNPs on chr22
Follow these steps to display in Genome Graphs all of the highest quality SNPs on chromosome 22 for the hg18 assembly whose predicted functional role is "coding non-synonymous" (where there is a change in the peptide for the allele with respect to the reference assembly). Note that there are no SNPs on the p-arm of chromosome 22.

This data set is formatted in the "marker value" style. The markers are dbSNP rsIDs. The associated value is +1 if the SNP is on the positive strand, and -1 if the SNP is on the negative strand. Here are the first ten rows of the data file:

rs1007298 +1 rs1007863 +1 rs10154509 +1 rs10154678 +1 rs10154785 +1 rs1018448 +1 rs10212022 +1 rs1022478 +1 rs1042311 +1 rs1042435 +1

Step 1. Upload the data into the Genome Graphs tool
Copy the entire sample data set into a text editor and save the file to your computer. This data set is associated with the human assembly: hg18 (Mar. 2006). Be sure to configure the Genome Graphs tool to use the hg18 assembly like so:

clade: Vertebrate genome: Human assembly: Mar. 2006
Upload the file into the Genome Graphs tool. You can configure each control on the upload page, or just leave them set to their default values.
The upload process may take some time, as the program is actually mapping each rsID in the input file to its location(s) in the genome.

Step 2. Display the graph in Genome Graphs
Now that your input file has been uploaded to the server, you will want to display it in the Genome Graphs tool. To display your uploaded data, simply choose the graph name from the graph drop-down list, then choose your desired display color from the in drop-down list. Your graph will be displayed directly above the chromosomes in Genome Graphs. You should see the data plotted directly above chromosome 22.

Step 3. View the graph in the Genome Browser
From the Genome Graphs display, press anywhere on the graph or on chromosome 22 to open the Genome Browser for hg18 centered at that location on chr22. The graph will be drawn as a track near the top of the Genome Browser display.

Displaying Data in Genome Graphs

Once you have uploaded your data, you will want to display it in the Genome Graphs tool. To display your uploaded data, simply choose the graph name from the graph drop-down list, then choose the color in which you would like it to be displayed from the in drop-down list. Your graph will be displayed directly above the chromosomes in Genome Graphs. Read on to learn how to customize the display.

Configuring the Display
Configuring the graphs display
To go to the configuration page, press the configure button on the main Genome Graphs page. This is the page from which you can configure many overall aspects of the Genome Graphs display. Individual graphs can also be configured (see the next section for help on that).

On this page you will find the following controls:

image width - controls the overall width of the graphs display on the main Genome Graphs page. The default is 620 pixels.

graph height - controls the height of the graph(s) in the space above each chromosome. The default is 27 pixels.

graphs per line - controls how many graphs are displayed on each line in the space above each chromosome. For example, if you set this value to two, the display will superimpose two graphs on top of each other on one line. The axis label for the first graph will appear on the left side of the display and the axis for the second graph on the right side.

lines of graphs - controls how many sets of graphs will appear above each chromosome. For example, if you set this value to 2, the display will make room for two lines of graphs (each at the graph height above) in the space above each chromosome.
chromosome layout - controls how the chromosomes are laid out in the Genome Graphs display. You can choose to view one or two chromosomes on each horizontal line in the display. Alternatively, you can set up the display such that all of the chromosomes appear in one long line. If you choose this layout, you may want to adjust the width of the image (image width above).

numerical labels - check this box if you would like to see axis labels to the right/left of the display. If you did not specify label values when you uploaded your file, the numerical labels will default to 1/3 and 2/3 of the max and min values in your data input file.

highlight missing - check this box if you would like to see the areas in your graph where there is no data. Note that if you are displaying more than one graph, this attribute only pertains to the first graph.

region padding - controls the size of the data regions. The data points in your graphs which exceed the significance threshold are padded by this number of bases on either side. The default places 25,000 bases on each side.

When you have completed configuring the display, press the submit button to return to the Genome Graphs display.

Configuring individual graphs

Near the bottom of the Configuration page, you will see a list of the graphs that you have uploaded. Click on the hyperlinked graph name to configure that graph. This configuration pertains to the Genome Graphs view.

You can set the range of the display by editing the display min/max value values. This will restrict the Genome Graphs display for this graph to that data range. The axis will be labeled at 1/3 and 2/3 of the data range that you set.

If your data is sparse, you may want to draw lines between your data points. You can configure that by editing the draw connecting lines between markers separated by up to ... bases value. The default value is 25,000,000 bases.

When you have completed configuring the display, press the submit button twice to return to the Genome Graphs display.

Setting a Significance Threshold
Most genome-wide data has some amount of noise and is only interesting when the data values are above a certain value. You can set this value using the significance threshold input box. Enter a decimal number in this input box and press Enter. The display will now have a light gray line across the graph at this data value. If you have more than one graph displayed, the significance threshold only pertains to the graphs that contain the significance threshold in the displayed data range.

The significance threshold works in concert with the browse regions and sort genes buttons; it will affect the regions that are displayed once you press either of these two buttons.

To open the Genome Browser with a view of all of the regions in your graph that include data points that pass the significance threshold, press the browse regions button. This will open the Genome Browser with a navigation pane on the left side of the screen. This pane will contain links to all regions which pass your significance threshold. Note that if you are displaying more than one graph, the significant regions are based only on the first graph in the display list.

To view a list of genes which are in regions that pass the significance threshold, press the sort genes button. This will open the Gene Sorter with only the genes that are in significant locations with respect to your data.

If you would rather view all of your regions without restricting the output to only those regions that pass the significance threshold, simply delete any values from the significance threshold input box and press Enter before pressing browse regions.

Setting a Data Region
The data region is the span of bases that will be added to either side of the data points in your graphs which exceed the significance threshold. Set the data region by editing the region padding value on the configuration page. The combination of setting the data region and the significance threshold will affect two things:

the regions displayed in the Genome Browser after you press the browse regions button,

the genes displayed in the Gene Sorter after you press the sort genes button.

For example, take a data set that contains the following data:
chr2 100100000 2.3 chr2 100100500 4.5 chr2 100101000 1.2
If you set the significance threshold at 4.0, one data point in the data set passes that threshold. If you then set the data range to 200, then the one significant data point will be padded on each side by 200 base pairs. In that case, the only resulting significant data region will be chr2:100,100,300-100,100,700.

If instead you set the data range to 2,000, then the one significant data point will be padded on each side by 2,000 base pairs. In that case, the resulting significant data region will be chr2:100,098,500-100,102,500.

Viewing Data in the Genome Browser

To view your graphs in the Genome Browser, press the browse regions button. This will open the Genome Browser with your graph(s) displayed as track(s). You can configure and edit your track as you can any other track in the Genome Browser. In addition to the Genome Browser, you will also see a pane on the left-hand side, which contains links to all of the significant regions in your data. Please note that if you are displaying more than one graph in Genome Graphs, the significant regions are based only on the first graph in the display list.

You can also navigate to the Genome Browser by clicking directly on a graph or chromosome in Genome Graphs. The Genome Browser will open with a 1,000,000 bp window centered on the location on which you clicked.

Viewing Data in the Gene Sorter

To view the set of genes that are in significant regions in your data, press the sort genes button. This will open the Gene Sorter with a filter to include only genes that are located in regions in your input data that are above the significance threshold. Please note that if you are displaying more than one graph in Genome Graphs, the significant genes are based only on the first graph in the display list.

If the graph was uploaded using markers, then a custom Gene Sorter column with the same name as the graph will be created. This column will list all markers for each gene that contain values above the significance threshold.

Deleting Data

There are several ways to delete your data once it has been uploaded. If you are viewing your data as a track in the Genome Browser, you can click on the mini-button or track control for the track and delete the track using the Remove custom track button. You can also choose to reset your cart which will reset the browser interface settings to their defaults, as well as delete all custom tracks and data. Do this by visiting the gateway page and pressing the hyper link: "Click here to reset".

Your data will be saved on our server for at least 48 hours from the time you last access it, unless it is saved in a Session.

Correlating Data Sets

To calculate how well correlated with one another your data sets are, press the correlate button. This will calculate and display the correlation coefficient (R) among each of your data sets. R, also known as Pearson's correlation coefficient, is a measure of the extent that two graphs move together. The value of R ranges between -1 and 1. A positive R indicates that the graphs tend to move in the same direction, while a negative R indicates that they tend to move in opposite directions. R-Squared (which is indeed just R*R) measures how much of the variation in one graph can be explained by a linear dependence on the other graph. R-Squared ranges between 0 when the two graphs are independent to 1 when the graphs are completely dependent.

To return to the Genome Graphs, press the return to graphs button.

Table of Contents: