bioinformatics for biologists dr. habil zare, phd pi of oncinfo lab assistant professor, department...
TRANSCRIPT
![Page 1: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/1.jpg)
Bioinformatics for biologists
Dr. Habil Zare, PhDPI of Oncinfo Lab
Assistant Professor, Department of Computer Science Texas State University
Presented at University of Texas, Health Science Center – San Antonio20 November 2015
![Page 2: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/2.jpg)
Part 1
- BioLinux - Mapping RNAseq data to transcriptome (Salmon)
![Page 3: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/3.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
3
Bioinformatics: Computational and statistical analysis of biological data
Data
Biologists
ResultsGenotypes / Phenotypes
![Page 4: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/4.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
4
In this workshop: A compact demo of bioinformatics analysis starting from raw data to produce useful plots and meaningful interpretation of the data
RNAseq
Biologists
Pathway and Network Analysis
![Page 5: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/5.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
5
Goals of the workshop
- A practical introduction to some basic bioinformatics tools for biologists.
- Having hands-on experience with simple, toy-example data.
![Page 6: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/6.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
6
Bio-Linux
Bio-Linux is a free workstation platform that facilitates running hundreds of bioinformatics tools without the corresponding installation hassles.
An easy way to install it on Mac OS X and Windows computers is described below:http://oncinfo.org/file/view/BioLinux_VM.pptx/564155065/BioLinux_VM.pptx
![Page 7: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/7.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
7
Browsing files and folders
tar.gz refers to a compressed file in Linux. Let’s practice decompressing such a file with an example. Follow the next steps in BioLinux.
![Page 8: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/8.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
8
.Double-click on Bio-Linux
Documentation to open it.
![Page 9: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/9.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
9
.
Double-click on Introductory Tutorial
![Page 10: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/10.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
10
.Click on File>New TAb
![Page 11: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/11.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
11
.Select the second tab and click
on Home.
![Page 12: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/12.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
12
.
Drag and drop this file from intro_course tab to Home tab.
![Page 13: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/13.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
13
.
Right-click on the file and then Extract Here…
![Page 14: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/14.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
14
.This folder will appear. Open it
and have a look inside.
![Page 15: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/15.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
15
.
![Page 16: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/16.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
16
Downloading and installing
Most useful bioinformatics tools are publicly available. You can download, install, and use them easily.
Let’s practice with an example. Follow the next steps in BioLinux.
![Page 17: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/17.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
17
.
This is the “Dash”. Use it to launch and organize applications.
![Page 18: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/18.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
18
.E.g., use “Firefox” to browse the web.
![Page 19: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/19.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
19
.Type oncinfo.org in the address bar and press enter.
![Page 20: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/20.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
20
From the right menu, click on the workshop link.
![Page 21: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/21.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
21
.
Click on “zipped” to download the folder.
![Page 22: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/22.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
22
.
Choose to save the file.
![Page 23: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/23.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
23
1- Click on Files icon.
2- Click on Downloads.
The file that you just downloaded was saved in Downloades folder.
![Page 24: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/24.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
24
This is the file you just downloaded.
The file that you just downloaded was saved in Downloades folder.
![Page 25: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/25.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
25
Extract (decompress) the file that you just .
Right-click on the file and then Extract Here…
![Page 26: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/26.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
26
The file that you just downloaded in saved in Downloades folder.
![Page 27: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/27.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
27
Salmon
Salmon, a successor of Sailfish, is a useful tool for mapping RNAseq data. It is faster and easier to run than alternatives such as TopHat.
![Page 28: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/28.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
28
Installing Salmon software
We will run a script provided in the zipped file using a terminal.
Terminal is an interface that uses only text to communicate between the user and the computer.
![Page 29: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/29.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
29
.
Click on the black rectangular to open a terminal.
How to open a terminal?
![Page 30: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/30.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
30
.Try a few simple Linux commands e.g.,echo, date, cal, …
![Page 31: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/31.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
31
.
Type “cd” in the terminal to “change directory”.
![Page 32: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/32.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
32
.
Drag the folder to the terminal.
![Page 33: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/33.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
33
.
Now press Enter.
![Page 34: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/34.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
34
. Double-click on the folder to open it.
What is in the folder?
![Page 35: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/35.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
35
Equivalently, “ls” shows you the list
of files in this folder.
What is in the folder?
![Page 36: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/36.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
36
This script will install Salmon for you.
What is in the folder?
![Page 37: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/37.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
37
Type the name of the script
and then press Enter.
How to run the script?
![Page 38: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/38.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
38
How to run the script?
Type your password, which is “manager” by default.
![Page 39: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/39.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
39
How to make sure Salmon is installed?
Type “salmon v” to test if it is installed or not.
The script should download and install salmon. The following test indicates that installation was OK.
![Page 40: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/40.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
40
1- A FASTA file, which has the sequence information of the transcriptome of the species of interest.
2- One or more FASTQ files, which are provided by the sequencer instrument and contain the reads information from the samples.
Input for Salmon
![Page 41: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/41.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
41
Toy examples of FASTA and FASTQ files
Open the sample_data folder
![Page 42: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/42.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
42
Next generation sequencing A sequencer produces millions of short reads (50-200 bps).
Biological sample Sequencer Short reads
![Page 43: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/43.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
43
Toy examples of a FASTQ file
Double click on reads_1.fastq file.
![Page 44: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/44.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
44
This is a read of length 50 with nucleotide and (Phred) quality information.
Toy examples of a FASTQ file
![Page 45: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/45.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
45
Double click on transcripts.fasta file.
Toy examples of a FASTA file
![Page 46: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/46.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
46
This is a transcript.
Toy examples of a FASTA file
![Page 47: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/47.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
47
It is an mRNA with RefSeq ID NM_001168316
Toy examples of a FASTA file
![Page 48: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/48.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
48
Type the RefSeq ID, e.g., NM_001168316
More information on the transcript Search in the NCBI database http://www.ncbi.nlm.nih.gov/nuccore/
![Page 49: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/49.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
49
Type the RefSeq ID, e.g., NM_001168316
Visualize the transcript on the genome Search in the UCSC genome browserhttps://genome.ucsc.edu/cgi-bin/hgGateway
![Page 50: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/50.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
50
This is the transcript
Visualize the transcript on the genome Search in the UCSC genome browserhttps://genome.ucsc.edu/cgi-bin/hgGateway
![Page 51: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/51.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
51
More information on this region is available.
Visualize the transcript on the genome Search in the UCSC Genome Browserhttps://genome.ucsc.edu/cgi-bin/hgGateway
![Page 52: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/52.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
52
Quantify the level of expressionThe level of expression of each transcript can be quantified by counting the number of reads that are aligned to it.
![Page 53: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/53.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
53
Next generation sequencing A sequencer produces millions of short reads (50-200 bps).
Biological sample Sequencer Short reads
![Page 54: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/54.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
54
Only exons are present in mRNA
} } } }
exon 1 exon 2 exon 3 exon 4
![Page 55: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/55.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
55
Alignment
Gene 1 Gene 2
Determines what transcript (where on the genome) each read was originated from.
Short reads in a FASTQ file
![Page 56: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/56.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
56
Alignment
Gene 1 Gene 2
Short reads in a FASTQ file
Determines what transcript (where on the genome) each read was originated from.
![Page 57: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/57.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
57
Alignment
Gene 1 Gene 2
Count the number of aligned (mapped) reads to each region.
![Page 58: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/58.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
58
Alignment
Gene 1 Gene 2
High expression Low expression
Compare the level of expression between genes.
![Page 59: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/59.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
59
Quantifying expression from RNAseq data
Salmon processes raw data and quantifies expression levels in 2 steps.http://salmon.readthedocs.org/en/latest/salmon.html#using-salmon
Step 1- Building an index for the transcriptome. Step 2- Aligning the reads to the transcriptome.
![Page 60: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/60.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
60
Are you in the right directory?Before you start, make sure you are in the correct directory.The pwd command in Linux shows the current directory.
Typing “pwd” and then “Enter” will “print the working directory”, i.e., your current path.
![Page 61: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/61.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
61
Always make sure that the files are stored where you expect them to be.
Are you in the right directory?Before you start, make sure you are in the correct directory.The pwd command in Linux shows the current directory.
![Page 62: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/62.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
62
Step 1- Building an index for the transcriptome.
Run the following command in the terminal in BioLinux:
salmon index -t transcripts.fasta -i transcripts_index --type fmd
![Page 63: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/63.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
63
Type the command here.
![Page 64: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/64.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
64
For now, ig
nore this
warning.
![Page 65: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/65.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
65
The index is built.
![Page 66: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/66.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
66
Salmon created a new folder.
![Page 67: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/67.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
67
Step 2- Aligning the reads to the transcriptome.
Run the following command in the terminal in BioLinux:
salmon quant -i transcripts_index –l IU -1 reads_1.fastq -2 reads_2.fastq –o transcripts_quanton
![Page 68: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/68.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
68
Step 2- Aligning the reads to the transcriptome.
Run the following command in the terminal in BioLinux:
}
The command
}
The indexing built in step 1
}
The first input file
}The secondinput file
}
Output folder
salmon quant -i transcripts_index –l IU -1 reads_1.fastq -2 reads_2.fastq –o transcripts_quanton
![Page 69: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/69.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
69
Salmon created a new folder and stored the results there.
![Page 70: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/70.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
70
quant.sf is the main output file that reports the number of reads and expression. Double click on it.
![Page 71: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/71.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
71
The names of the transcripts (RefSeq IDs) and their length are in the first 2 columns.
![Page 72: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/72.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
72
The number of mapped reads is reported on the last column.
![Page 73: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/73.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
73
Transcript per million (TPM) is the estimated expression.
![Page 74: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/74.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
74
Transcript per million (TPM) is the estimated expression.
TPM values correspond to counts normalized by the length of transcripts and also the depth of sequencing. There are other normalization methods such as RPKM and FPKM.
![Page 75: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/75.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
75
This transcript is highly expressed
![Page 76: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/76.jpg)
Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015
76
This transcript is highly expressed
These transcripts have low expression.
![Page 77: Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697bff21a28abf838cbbac3/html5/thumbnails/77.jpg)
Instaling BioLinux using VM, Dr. Habil Zare 27 Oct 2015
77
References:
• Some of the slides are based on Introduction to Biolinux http://nebc.nerc.ac.uk/downloads/courses/Bio-Linux/bl8_latest.pdf
• Salmon is a useful tool for mapping and analyzing RNAseq data. https://combine-lab.github.io/salmon/
• I prepared these guidelines to facilitate the “Bioinformatics for biologists workshop”, 20 Nov 2015, UTHSC – San Antonio.http://oncinfo.org/Bioinformatics+for+biologist+workshop