custom tophat tracks in ucsc browser - rna-seq · creating custom tophat alignment data tracks in...

7
Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison University How to cite this work: This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License. Recommended citation: Enke, R. (2016) Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser. CSHL DNALC RNA-Seq for the Next Generation Working Group. http://www.rnaseqforthenextgeneration.org/profiles/raymond-enke.html#teaching Objectives: Create your own custom data tracks in the UCSC Genome Browser Visualize RNA-Seq TopHat alignment data as custom tracks in the UCSC Genome Browser Integrate RNA-Seq alignment data with other genome-wide data sets I. Creating custom data tracks in the UCSC Genome Browser Last week you viewed and collected some stats from the DNA Subway Green Line about how many reads were sequenced, mapped and paired from each sample and replicate from the chicken E8 retina, E18 retina and E18 cornea RNA-Seq experiment after the TopHat software package was run. You saw that ~30-60 million individual 300 nt paired end sequencing reads were aligned to the reference chicken genome/sample. This week you will create custom tracks in the Chicken genome assembly within the UCSC Genome Browser to visualize these large sequencing data sets. First, you will complete this brief exercise will go over the basic steps to create, label and name your own custom annotation data tracks in the UCSC Genome Browser using the human genome assembly. Navigate to the Human 2009 hg19 genome assembly Navigate to the RHO gene > hide all tracks > add back UCSC gene in pack view You should see the RHO gene with it’s 5 exons & 4 introns coded on the top strand of the genome We will add a custom data track to label new features on RHO as an example The Custom Tracks feature in the browser allows you to display your own or previously published data as 1 or more annotation tracks on top of a specific genome assembly. Hover over the “My Data” option in the browser toolbar and select “Custom Tracks” This will take you to an “Add Custom Tracks” page

Upload: others

Post on 01-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Custom TopHat tracks in UCSC Browser - RNA-Seq · Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 1

Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab

James Madison University

How to cite this work:

ThisworkislicensedunderaCreativeCommonsAttribution-ShareAlike3.0UnitedStatesLicense.Recommendedcitation:Enke,R.(2016)CreatingCustomTopHatAlignmentDataTracksintheUCSCGenomeBrowser.CSHLDNALCRNA-SeqfortheNextGenerationWorkingGroup.http://www.rnaseqforthenextgeneration.org/profiles/raymond-enke.html#teachingObjectives:

• Create your own custom data tracks in the UCSC Genome Browser • Visualize RNA-Seq TopHat alignment data as custom tracks in the UCSC Genome Browser • Integrate RNA-Seq alignment data with other genome-wide data sets

I. Creating custom data tracks in the UCSC Genome Browser

Last week you viewed and collected some stats from the DNA Subway Green Line about how many reads were sequenced, mapped and paired from each sample and replicate from the chicken E8 retina, E18 retina and E18 cornea RNA-Seq experiment after the TopHat software package was run. You saw that ~30-60 million individual 300 nt paired end sequencing reads were aligned to the reference chicken genome/sample. This week you will create custom tracks in the Chicken genome assembly within the UCSC Genome Browser to visualize these large sequencing data sets.

First, you will complete this brief exercise will go over the basic steps to create, label and name your own custom annotation data tracks in the UCSC Genome Browser using the human genome assembly.

• Navigate to the Human 2009 hg19 genome assembly • Navigate to the RHO gene > hide all tracks > add back UCSC gene in pack view • You should see the RHO gene with it’s 5 exons & 4 introns coded on the top strand of the

genome • We will add a custom data track to label new features on RHO as an example

The Custom Tracks feature in the browser allows you to display your own or previously published data as 1 or more annotation tracks on top of a specific genome assembly.

• Hover over the “My Data” option in the browser toolbar and select “Custom Tracks” • This will take you to an “Add Custom Tracks” page

Page 2: Custom TopHat tracks in UCSC Browser - RNA-Seq · Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 2

This page allows you to input custom track data 3 different ways: 1. Copy/paste in properly formatted tab separated data 2. browse & upload a tab separated data (.tsv) file containing formatted data 3. paste a link to a host URL containing formatted data

We will start with option #1 to simply copy/paste data into the window to create a custom data track. First lets define: 1. Tab separated value data (.tsv file): a text file that can be created/viewed by most spreadsheet

programs and text editors. Each entry takes up a single line with the first line serving as the header line labeling each field. tsv files can be used for any type of data (no required fields). As an example, here are some stats for several of the 2015 Baltimore Orioles in .tsv format:

Pos Name G PA AB R H HR C Caleb_Joseph 100 355 320 38 75 11 1B *Chris_Davis 160 670 573 100 150 47 2B Jonathan_Schoop 86 321 305 34 85 15 SS J.J._Hardy 114 437 411 45 90 8 3B Manny_Machado 162 713 633 102 181 35 LF Steve_Pearce 92 325 294 42 64 15 CF Adam_Jones 137 581 546 74 147 27

*led team in HRs

2. Browser Extensible Data (BED) formatting: This is a derivative of TSV data in specific format for

genome browser data. Like .tsv data, each entry takes up a single line. BED lines have 3 required fields 1) Chromosome # (chr), 2) starting genomic coordinate (start), and 3) ending genomic coordinate (stop). A 4th optional field can be used as a label for each line entry using any text w/o special characters or spaces. As an example, here are the chromosomal coordinates for each of the 5 RHO exons in the human 2009 hg19 genome assembly:

chr start stop exon chr3 129247482 129247937 exon1 chr3 129249719 129249887 exon2 chr3 129251094 129251259 exon3 chr3 129251376 129251615 exon4 chr3 129252451 129254187 exon5

Page 3: Custom TopHat tracks in UCSC Browser - RNA-Seq · Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 3

• Copy/paste these coordinates w/o the header line into the “Add Custom Tracks” data window and hit submit. If you get an error you probably included the header line.

• This takes you to a “Manage Custom Tracks page giving you some info about your data track

• Select the “Pos” link to view the genomic position listed in the 1st line of your custom track • Zoom out to view the entire RHO gene and view UCSC genes in dense view • You should also see your new custom track called “User Track”, view in pack view • Your viewer window should look like this:

Hopefully, you’ve created a custom track individually labeling each of the RHO exons. Next we will edit some of your custom track parameters such as the title of the track, the color of the track and the optional data label field (column #4)

• Go back to the Manage Custom Tracks page (My Data > Custom Tracks) • Edit your custom track by selecting “User Track”

This window will allow you to edit your existing custom track or replace it with new data. Keep the same data but change the data label field by copy/pasting the below BED data into the replacement window (do not include headers) and hit submit:

chr start stop nameorlabelchr3 129247482 129247937 BIO480chr3 129249719 129249887 withchr3 129251094 129251259 Dr_Enkechr3 129251376 129251615 is_thechr3 129252451 129254187 bomb

Page 4: Custom TopHat tracks in UCSC Browser - RNA-Seq · Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 4

• View your custom track in pack view and UCSC genes in full view • This illustrates that you can use the 4th column in a BED formatted file to indicate any parameter

you like (ie statistical rank, up-regulation vs down-regulation, increased DNA methylation vs decreased DNA methylation, a simple text label, etc)

• Your viewer window should look like this:

Next, edit the “track name” with a short text tag and the track “description” with a more detailed text tag

• Go back to the custom track editing page (My Data > Custom Tracks > User Track) • Change the “track name=” in the “Edit configuration” window to “exonic seq’ • Change the description= to ‘protein coding exonic sequence’ • Navigate back to view the entire RHO gene, • Notice that your custom annotation track reflects the Track Name that you input

Lastly, lets edit the color of the custom track. Track color is defined digitally by Red, Green, Blue (RGB) values between 0-255. The default color is black (RGB value of 0,0,0). Conversely pure white has an RGB=255,255,255. Pure red=255,0,0, pure green=0,255,0 and pure blue=0,0,255. Derivative colors are combinations of R, G, and B (e.g. 206,39,212= pink). The R,G,B value can be entered in the Edit configuration window after the track description as “color=0,0,0”

• Navigate to the RGB Color Code website: (tinyurl.com/8pa5kvm) • Get the RGB color combination for a handsome teal color (bluish green) or another color of your

liking using the RGB color codes chart • Go back to the custom track editing page (My Data > Custom Tracks > Name hyperlink) • Directly after the track description type ‘color=0,0,0’ using your RGB color code of choice

Page 5: Custom TopHat tracks in UCSC Browser - RNA-Seq · Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 5

• Hit submit, navigate back to the full RHO gene and view the custom track in pack view • Copy your BED data and a hi res image of your browser viewer window into your notebook • Your viewer window should now look like this but with your color

*Make up assignment: include a hi res image of this custom track in your make up assignment in addition to the posted assignment

II. Visualization of RNA-Seq TopHat Alignment Data as custom UCSC tracks

Nice job that was fun! Next you will use these same steps to plot some actual genome-wide data (our chicken RNA-Seq data).

• Navigate to the Chicken 2011 galGal4 genome assembly • Select the “add custom tracks” option below the search window

Instead of pasting in BED formatted data into the custom track window, we will simply copy/paste URLs of the websites where the TopHat RNA-Seq alignment data is stored. Remember that TopHat files are a collection of ~30-60 million individual 150-300 bp FASTQ files aligned and indexed to a reference genome. Even though these are text files, they are enormous (~1-6 GB of data). Collectively they’re stored as Binary Alignment/Map (BAM) files. BAM files are too big to open or move around easily. Instead they are typically hosted on a server and easily accessed using a URL. The BAM files for our 6 RNA-Seq samples are hosted on the server of an NSF-funded virtual organization called CyVerse Discovery Environment. Your free DNA Subway account also gives you access to Discovery Environment data storage and bioinformatics tools. Today though, we will only need the URLs for 1 replicate of each sample, which I’ve stored on a class GoogleSheet (tinyurl.com/z6bhr6u).

Visualizing BAM files from a RNA-Seq transcriptomes experiments in a genome browser can be a useful way to qualitatively assess differential expression at a particular locus. Mapped reads can be compared between samples to determine if the accumulated sequences are equal between samples (no change in expression), higher in sample #1 than #2 (upregulated in #1) or vice versa (downregulated in sample #1). Additionally, different isoform species can also be visualized between samples using BAM data. To build custom BAM alignment data tracks in the UCSC Genome Browser, we will simply upload the URL of the site where your BAM file is hosted.

• Copy/paste the BAM URL from the E8 retina replicate #1 into the custom track window • Click into the custom track “Name” and change the track name and description • Do this by replacing the information in the Edit Configuration window with the following:

track name=‘E8_retina’ description=‘E8_retina’_RNA-Seq_rep1’

• Hit submit. Your track Name & Description should reflect this change. If not, try again

Page 6: Custom TopHat tracks in UCSC Browser - RNA-Seq · Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 6

• Hit go, navigate to the Rho gene and view only RefSeq genes in full view and your custom track in dense view to get a basic visualization of RNA-Seq reads

• Click the custom track hyperlink or the grey bar to the left of the custom track to reconfigure the data into a BAM Density Plot

• Change display mode to full and select “display data as a density graph” and hit submit

Page 7: Custom TopHat tracks in UCSC Browser - RNA-Seq · Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab James Madison

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724 7

Repeat these steps for the E18 retina and E18 cornea bam data. If you like, you can change the colors for the custom tracks. If all goes well your view of the Rhodopsin gene should look like this (but with cornea data included):

• Save this session as “Gg your initials BAM Density” & copy the session URL to your notebook • Copy a hi res image of your browser window of Rho & +-5Kb area into your notebook • Provide a brief summary and interpretation of the data shown in your window