2013 july 25 systems biology rna seq v2
TRANSCRIPT
![Page 1: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/1.jpg)
Cancer Systems Biology: RNA-‐Seq and Differen;al Expression Analysis
Taking advantage of a Measurement Revolu;on
July 25, 2013 Anne DeslaLes Mays
Wellstein/Riegel Laboratory Mentor: Anton Wellstein, MD, PhD
7/25/13 Wellstein/Riegel Laboratory 1
![Page 2: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/2.jpg)
Talk Outline
• On the Shoulders of Giants • Sequencing Timeline • RNASeq for Everyone • RNA-‐Sequencing Details • Differen;al Expression Analysis • Causality • Cancer Therapeu;cs Example • Ask Bigger Ques;ons – Sequencing Everything
7/25/13 Wellstein/Riegel Laboratory 2
![Page 3: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/3.jpg)
7/25/13 Wellstein/Riegel Laboratory 3
Rosalind Franklin “pioneered use of x-‐rays to create images of unorganized maLer – such as
large biological molecules – not just single crystals”
hLp://www.pbs.org/wgbh/aso/databank/entries/bofran.html
“Franklin made equipment adjustments to produce an extremely fine beam of x-‐rays. She extracted finer DNA fibers than ever before and arranged them in parallel bundles. Studied fibers’ reac;ons to humid condi;ons. … allowed her to discover cruical keys to DNA’s structure…. Wilkins shared this with Watson & Crick at Cambridge without her knowledge…”
![Page 4: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/4.jpg)
7/25/13 Wellstein/Riegel Laboratory 4
���� ���� ���� ���� ���� ��� ��� ���� ���� ���� ���� ���� ���� ���� ���� ��� ��� ���� ���� ���� ���� ���������������
� �������������� ����
�� ������������� �� ����������� ������������������� �
�
��������������������������������
����������
��������!��"���������"����������� �#���
$
%����� "���� ��&��" �&�'� ����������(��� "���������"���)��
$*
+������ ����""��"��� �����,-�
��!��"� "������ "�
.
%����"������"&�������������,-�
��!��"������/0� ��
$$
,� ��"�������+���"�1�"�������2����3+1�4
+1�� �"���� ������ ���������������!��"�����������������5�
$5
'������������������������!��"� "�
$�
(��� ����������� �
�
6�������������������1�-7)
�
��������� ��� ������!��"������
������#� � "������"� � $
������"�'������"�����������"�� ����!��"��
��� 0�/����"�
�
+1���"�� ���������� ��� ����8��& "��������� ",�������"���� �"��
$
9���"�+��� �"����%����� "�����:�� 5
� ��"���������%- ���������" "�
$;
�"�������8���<��������������
� ����������������� �
$.
1 �������""��"������"�����������1�"���� ���0
�
)�"����+����"����� ������ �������������������"���"����
8 �����=�����&���
%� � ��=�>���"����"��6���������������� ������!��"����������"������������55$.
-��� ���% ��������������� ���������������!��"� "������ "���"����&�� 5
6���"�����"��������������������"�� ����������� ����"������"�
*
+����� <�& ����������� ����������
�����#��"�����������,-�
$;
/���� "#%����? ����,-���!��"���
$.
%��&��9���"� ������������@- �����
���" "�
$. ��"" �#���������� ���� ������"�� �����
) ���"�(�������� ���%)-�:������ ��������� �" "����!��"���
;
(������ ���"���� ���������������"�� ��� "&��������
�����������"���"���
7"���"�� �"�����"���� ��������������!��"� "�
���� �������"�
Sequencing Timeline
![Page 5: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/5.jpg)
7/25/13 Wellstein/Riegel Laboratory 5
���� ���� ���� ���� ���� ��� ��� ���� ���� ���� ���� ���� ���� ���� ���� ��� ��� ���� ���� ���� ���� ���������������
� �������������� ����
�� ������������� �� ����������� ������������������� �
�
��������������������������������
����������
��������!��"���������"����������� �#���
$
%����� "���� ��&��" �&�'� ����������(��� "���������"���)��
$*
+������ ����""��"��� �����,-�
��!��"� "������ "�
.
%����"������"&�������������,-�
��!��"������/0� ��
$$
,� ��"�������+���"�1�"�������2����3+1�4
+1�� �"���� ������ ���������������!��"�����������������5�
$5
'������������������������!��"� "�
$�
(��� ����������� �
�
6�������������������1�-7)
�
��������� ��� ������!��"������
������#� � "������"� � $
������"�'������"�����������"�� ����!��"��
��� 0�/����"�
�
+1���"�� ���������� ��� ����8��& "��������� ",�������"���� �"��
$
9���"�+��� �"����%����� "�����:�� 5
� ��"���������%- ���������" "�
$;
�"�������8���<��������������
� ����������������� �
$.
1 �������""��"������"�����������1�"���� ���0
�
)�"����+����"����� ������ �������������������"���"����
8 �����=�����&���
%� � ��=�>���"����"��6���������������� ������!��"����������"������������55$.
-��� ���% ��������������� ���������������!��"� "������ "���"����&�� 5
6���"�����"��������������������"�� ����������� ����"������"�
*
+����� <�& ����������� ����������
�����#��"�����������,-�
$;
/���� "#%����? ����,-���!��"���
$.
%��&��9���"� ������������@- �����
���" "�
$. ��"" �#���������� ���� ������"�� �����
) ���"�(�������� ���%)-�:������ ��������� �" "����!��"���
;
(������ ���"���� ���������������"�� ��� "&��������
�����������"���"���
7"���"�� �"�����"���� ��������������!��"� "�
���� �������"�
Human Sequencing Timeline
Key Technical Advances: Celera Human Sequence done in one loca;on on the largest super computer in private hands at that ;me
![Page 6: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/6.jpg)
7/25/13 Wellstein/Riegel Laboratory 6
![Page 7: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/7.jpg)
7/25/13 Wellstein/Riegel Laboratory 7
![Page 8: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/8.jpg)
7/25/13 Wellstein/Riegel Laboratory 8
![Page 9: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/9.jpg)
7/25/13 Wellstein/Riegel Laboratory 9
![Page 10: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/10.jpg)
7/25/13 Wellstein/Riegel Laboratory 10
![Page 11: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/11.jpg)
7/25/13 Wellstein/Riegel Laboratory 11
![Page 12: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/12.jpg)
7/25/13 Wellstein/Riegel Laboratory 12
![Page 13: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/13.jpg)
Cancer Systems Biology Taking advantage of measurement revolu3on
Declining sequencing costs, decreasing compu3ng costs How do you leverage all this data?
GEO May 25, 2012
GEO June 25, 2013
![Page 14: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/14.jpg)
Here is an example RNA-‐Seq Workflow
7/25/13 Wellstein/Riegel Laboratory 14
Experimental Design
Sample Collec;on
Quality Control Read Trimming
Differen;al Analysis
Transcript Iden;fica;on
Pathway Analysis
Feature Discovery
Sequencing
![Page 15: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/15.jpg)
7/25/13 Wellstein/Riegel Laboratory 15
hLp://rnaseq.uoregon.edu/index.html
![Page 16: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/16.jpg)
7/25/13 Wellstein/Riegel Laboratory 16 hLp://rnaseq.uoregon.edu/index.html
![Page 17: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/17.jpg)
7/25/13 Wellstein/Riegel Laboratory 17 hLp://rnaseq.uoregon.edu/index.html
![Page 18: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/18.jpg)
7/25/13 Wellstein/Riegel Laboratory 18 hLp://rnaseq.uoregon.edu/index.html
![Page 19: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/19.jpg)
7/25/13 Wellstein/Riegel Laboratory 19 hLp://rnaseq.uoregon.edu/index.html
![Page 20: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/20.jpg)
7/25/13 Wellstein/Riegel Laboratory 20
hLp://rnaseq.uoregon.edu/index.html
![Page 21: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/21.jpg)
7/25/13 Wellstein/Riegel Laboratory 21
hLp://rnaseq.uoregon.edu/index.html
![Page 22: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/22.jpg)
Replicates: Type I and Type II errors
7/25/13 Wellstein/Riegel Laboratory 22
![Page 23: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/23.jpg)
Detec;ng Signal vs. Noise
7/25/13 Wellstein/Riegel Laboratory 23
![Page 24: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/24.jpg)
7/25/13 Wellstein/Riegel Laboratory 24
![Page 25: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/25.jpg)
What is the goal of the sequencing experiment?
7/25/13 Wellstein/Riegel Laboratory 25
![Page 26: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/26.jpg)
7/25/13 Wellstein/Riegel Laboratory 26
![Page 27: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/27.jpg)
7/25/13 Wellstein/Riegel Laboratory 27
![Page 28: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/28.jpg)
7/25/13 Wellstein/Riegel Laboratory 28
Before Library Construc;on 1. Most vendors and cores will assess
the quality of the RNA before sequencing
2. Important to determine before sequencing begins
Garbage – in == Garbage out
Before library construc;on, RNA quality must be assessed
![Page 29: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/29.jpg)
7/25/13 Wellstein/Riegel Laboratory 29
RNA-‐seq
![Page 30: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/30.jpg)
7/25/13 Wellstein/Riegel Laboratory 30
Three steps to get to a fresh sequence with the Illumina Genome Sequence Analyzer
• Library genera;on • Cluster genera;on • Sequencing
![Page 31: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/31.jpg)
7/25/13 Wellstein/Riegel Laboratory 31
Before Library Construc;on 1. Poly-‐A Selec;on (Total RNA -‐>
mRNA) 2. mRNA fragmenta;on 3. First strand synthesis (here we stop
if we want to maintain strand specificity
4. Second strand synthesis
Other techniques 1. Ribozero 2. Ribominus
Library Construc;on: Messenger RNA are Poly-‐A selected from Total RNA, fragmented and cDNA synthesized
![Page 32: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/32.jpg)
7/25/13 Wellstein/Riegel Laboratory 32
cDNA (single or double stranded) 1. cDNA is blunt end-‐repaired and
phosphorylated (B.) 2. A-‐base added to prepare for
indexed adapter liga;on (C.)
Library Construc;on: End repair and adenyla;on results in adapter liga;on ready constructs
![Page 33: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/33.jpg)
7/25/13 Wellstein/Riegel Laboratory 33
Index adapter liga;on and product ready for amplifica;on on cBot or the cluster sta;on 1. Strand specific tags are added to
the A base – ligate index adapter (D)
2. Denature and amplify for final product (E)
Library Construc;on: Adapter liga;on results in cluster-‐genera;on-‐ready constructs
![Page 34: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/34.jpg)
7/25/13 Wellstein/Riegel Laboratory 34
Single DNA molecules hybridize to the lawn of oligos graped to the surface of the flow cell 1. Oligo lawn 2. Oligos hybridize to the adapters
that had been ligated to the library fragments which flow through the cell
Cluster Genera;on: In the illumina Cbot system, single molecules are isothermally amplified in a flow cell to prepare them for sequencing
![Page 35: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/35.jpg)
7/25/13 Wellstein/Riegel Laboratory 35
Bridge amplifica;ons resul;ng in 100s of millions of unique clusters 1. Each fragment is clonally
amplified through a series of extensions and isothermal bridge amplifica;ons
2. Reverse strands cleaved and washed away
3. Ends are blocked 4. Sequencing primer hybridized to
the DNA template 5. Libraries are ready for
sequencing
Cluster genera;on: Bound fragments are extended to make copies and reverse strands cleaved and washed away
![Page 36: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/36.jpg)
7/25/13 Wellstein/Riegel Laboratory 36
4 fluorescently labeled reversibly terminated nucleo;des 1. Each base competes for addi;on 2. Natural compe;;on ensures
highest accuracy 3. Aper each round of synthesis,
clusters are excited by a laser emiqng a color that iden;fies the newly added base
4. Fluorescent label and blocking group are removed allowing for addi;on of next nucleo;de
5. Proprietary (Illumina) chemistry reads a base in each cycle
6. Allows for accurate sequencing through difficult regions such as homopolymers and repe;;ve sequence
Sequencing: 100s of millions of clusters sequenced simultaneously
![Page 37: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/37.jpg)
There are other ways to Inquire about the Transcriptome
• Array Based Technologies – Affymetrix – Agilent – Known genes and hybridiza;on protocols
• Microarray – 20,000+ array experiments on a single platorm – Edge effects – False posi;ves / false nega;ves
• Bead-‐based arrays • Tiling arrays • SAGE 7/25/13 Wellstein/Riegel Laboratory 37
![Page 38: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/38.jpg)
What is unique about RNA-‐Seq?
• Allows you to discover and profile the en;re transcriptome of any organism
• No probes or primers to design • Novel transcripts • Novel isoforms • Alterna;ve splice sites • Rare transcripts • cSNPS – all of this in one experiment
7/25/13 Wellstein/Riegel Laboratory 38
![Page 39: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/39.jpg)
7/25/13 Wellstein/Riegel Laboratory 39
Aper sequencing… 1. Quality control – trim your reads 2. Count Reads
• Align to genome • Align to transcriptome
3. Interpret Data • Sta;s;cal tests (differen;al
expression analysis) • Visualiza;on (mapped
reads) • Pathway analysis
Not so simple – big data, big compute requirements
Aper sequencing, we must then perform RNA-‐Seq Data Analysis
![Page 40: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/40.jpg)
7/25/13 Wellstein/Riegel Laboratory 40
![Page 41: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/41.jpg)
7/25/13 Wellstein/Riegel Laboratory 41
![Page 42: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/42.jpg)
RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Step 1: align-reads:
FASTQ PE* reads
Reference Genome Assembly WGS
Exis;ng Gene models
(gt files w/ tss ids)*
Gene models mapped to reference
gsnap
trimmoma;c FASTQC
trimmed PE* reads
Quality control consensus
per read length graphs
• Tss ids = transcription start site ids, in a gtf file format • PE – paired end • The gene models that are built with the pasa pipeline can be input to tophat
Shadeless rectangle An unshaded rectangle represents code to be run – a process
Shaded rectangle
A shaded rectangle is a file or a graphic which may be an input and/or an output
Legend
Gsnap aligned Bam files
Dark rectangle Dark rectangle represents a file that can be displayed as a track in crop-pedia
Align-reads: Gsnap is used to align reads to the genome sequence.
samtools Gsnap.CoordSorted.bam
![Page 43: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/43.jpg)
RNA Alterna;ve Splicing: Why you need gapped aligners
7/25/13 Wellstein/Riegel Laboratory 43
![Page 44: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/44.jpg)
RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Step 2: assemble-reads:
Prep_rnaseq_ alignments_for
genome_assisted_ assembly.pl
• Tss ids = transcription start site ids, in a gtf file format • PE – paired end • The gene models that are built with the pasa pipeline can be input to tophat
Shadeless rectangle An unshaded rectangle represents code to be run – a process
Shaded rectangle
A shaded rectangle is a file or a graphic which may be an input and/or an output
Legend
Dark rectangle Dark rectangle represents a file that can be displayed as a track in crop-pedia
assemble-reads: Trinity is used to assemble the RNA-Seq reads in each partition. This can be done in a massiviely parallel manner, typically requiring little RAM as compared to whole de novo RNA-Seq assemblies, and can be executed using standard hardware. The firs step (pre_rnaseq_alignments_for genome_assisted_assembly.pl – partitions the reads according to covered regions
Gsnap.CoordSorted.bam
Find Dir_* -‐name “*reads” > read_files.list
Read_files.list
GG_write_trinity_ cmds.pl
ParaFly
Trinity_GG.cmds
Find Dir_* -‐name “*inity.fasta” –exec cat {} |
Inchworm_accession_incrementer.pl > Trinity_GG.fasta
Trinity_GG.fasta
![Page 45: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/45.jpg)
RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Steps 3 and 4: align-transcripts and assemble-transcript alignments
Launch_PASA_pipeline.pl
• Tss ids = transcription start site ids, in a gtf file format • PE – paired end • The gene models that are built with the pasa pipeline can be input to tophat
Shadeless rectangle An unshaded rectangle represents code to be run – a process
Shaded rectangle
A shaded rectangle is a file or a graphic which may be an input and/or an output
Legend
Dark rectangle Dark rectangle represents a file that can be displayed as a track in crop-pedia
Trinity_GG.fasta
Pasa_databasename .pasa_assemblies.denovo_ transcript_isoforms.gt
Pasa_databasename .pasa_assemblies.denovo_ transcript_isoforms.bed
Pasa_databasename .pasa_assemblies.denovo_ transcript_isoforms.gff3
Pasa_databasename .pasa_assemblies.denovo_ transcript_isoforms.fasta
![Page 46: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/46.jpg)
RNASeq flow chart – Step 5 – Tuxedo Suite – using the output of the trinity-genome-guided assembly and the pasa and keygene annotation pipelines à call tuxedo suite (in parallel with then calling the abundancy estimator RSEM
• Tss ids = transcription start site ids, in a gtf file format • PE – paired end • The gene models that are built with the pasa pipeline can be input to tophat
Shadeless rectangle An unshaded rectangle represents code to be run – a process
Shaded rectangle
A shaded rectangle is a file or a graphic which may be an input and/or an output
Legend
Dark rectangle Dark rectangle represents a file that can be displayed as a track in crop-pedia
Gff3 (gene model)
Gff3togt (convert to gt format
Gt (gene model)
tophat Calls Bow;e2
Junc;ons.bed
Accepted.hits. sam
![Page 47: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/47.jpg)
RNASeq Quantitation and Differential Analysis
• Tss ids = transcription start site ids, in a gtf file format • PE – paired end • The gene models that are built with the pasa pipeline can be input to tophat
Shadeless rectangle An unshaded rectangle represents code to be run – a process
Shaded rectangle
A shaded rectangle is a file or a graphic which may be an input and/or an output
Legend
Quantitation (matrix file with counts per isoform) Model building/Differential analysis
Trinity.fasta
Dark rectangle Dark rectangle represents a file that can be displayed as a track in crop-pedia
Tuxedo suite
Trinity genome guided assembly Abundance es;ma;on RSEM
Transcripts .gt/.gff*
trimmed PE* reads
RSEM.isoform. results
Limma Model Design/contrast
matrix building
randomForest pcAlg
Genie3.R DREAM4
Accepted.hits. sam
cuffdiff2
• Transcript annotation file produced by cufflinks, cuffcompare or other source
• Counts and read group tracking files also created
Isoforms.fpkm_tracking
Genes.fpkm.tracking
Cds.fpkm.tracking
Tss_groups.fpkm.tracking
Isoform_exp.diff
Gene_exp.diff
Tss_group_exp.diff
Cds_exp.diff
![Page 48: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/48.jpg)
7/25/13 Wellstein/Riegel Laboratory 48
How much RNA-‐sequencing data? 1. 20 million paired end reads ~ 2 GB of data 2. 100 million paired end reads ~ 10 GB of data How much computa;on power? 1. More memory, more processors, less ;me it takes to compute 2. Outsource the analysis, s;ll will need to store the results somewhere
Amazon web services S3 storage EC elas;c cloud on demand computa;onal facility Georgetown University High Performance Computer Core matrix.georgetown.edu UPENN Galaxy services
How much RNA-‐sequencing data, how much computa;on power and where do you go to compute?
![Page 49: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/49.jpg)
7/25/13 Wellstein/Riegel Laboratory 49
A growing number of tools enable RNA-‐Seq analysis
![Page 50: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/50.jpg)
7/25/13 Wellstein/Riegel Laboratory 50
What percentage of reads are covered? What percentage of reads are mapped?
3’ Bias on transcript reads 1. 60-‐80% of reads are mapped 2. Highest percentage or 3’ end of
reads are mapped 3. Reads need to be quality trimmed
Mapping tools bias exons to known genes
![Page 51: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/51.jpg)
7/25/13 Wellstein/Riegel Laboratory 51
Galaxy is a web based tool commiLed to enable a researcher (more than just for RNA-‐Seq)
![Page 52: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/52.jpg)
7/25/13 Wellstein/Riegel Laboratory 52
![Page 53: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/53.jpg)
How to visualize mapped results?
• UCSC Genome Browser (Gbrowse) • Integrated Genome Browser (IGB) • Integrated Genome Viewer (IGV)
Many shared formats, reading many of the outputs generated by the programs, ability to generate ones own tracks
7/25/13 Wellstein/Riegel Laboratory 53
![Page 54: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/54.jpg)
7/25/13 Wellstein/Riegel Laboratory 54
Scalechr21:
DNase Clusters
Multiz Align
Human mRNAs
K562 CTCF Int 1
K562 Pol2 Int 1
HeLaS3 Pol2 Int 1
GM12878 1H1-hESC 1
K562 1HeLa-S3 1
HepG2 1GM12878H1-hESC
K562HeLa-S3
HepG2HUVEC
GM12878 PkH1-hESC Pk
K562 PkHeLa-S3 Pk
50 kb hg1923,600,000 23,650,000
C7 Random
C7 Targeted
Transcription Factor ChIP-seq from ENCODE
SwitchGear Genomics Transcription Start SitesH3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE
RefSeq GenesHuman ESTs That Have Been Spliced
Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODEVertebrate Multiz Alignment & Conservation (46 Species)
UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics)Simple Nucleotide Polymorphisms (dbSNP 137) Found in >= 1% of Samples
Individual matches for article Przybylski2010Sequences in Articles: PubmedCentral and Elsevier
SNPs in PublicationsHuman mRNAs from GenBank
Regulatory elements from ORegAnnoChromatin Interaction Analysis Paired-End Tags (ChIA-PET) from ENCODE/GIS-Ruan
DNA Methylation by Reduced Representation Bisulfite Seq from ENCODE/HudsonAlpha
CpG Methylation by Methyl 450K Bead Arrays from ENCODE/HAIB
Chromatin Interactions by 5C from ENCODE/Dekker Univ. Mass.
HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_1:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_1:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2205:11321:72079_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_1:N:0:CTCTCAHWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_2:N:0:CTCTCA
HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_1:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_2:N:0:CACTCCHWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_1:N:0:CACTCC
HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_1:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_2:N:0:CACTCAHWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_1:N:0:CACTCA
HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_1:N:0:ATCACGHWI-ST1129:299:C18KJACXX:6:1102:19732:75986_1:N:0:ATCACGHWI-ST1129:299:C18KJACXX:6:1305:12160:63303_2:N:0:ATCACGHWI-ST1129:299:C18KJACXX:6:1102:19732:75986_2:N:0:ATCACGKCEBPB
LMafK_(ab50322)KTAL1_(SC-12984)
KCEBPB KKYY1KTBPKE2F4KTAF1KELF1_(SC-631)KPol2-4H8KHEY1KE2F6_(H-50)KCEBPBKTFIIIC-110
ggNFKBGgPU.1GBATFGIRF4_(M-17)GBCL11A
GgPU.1
gPU.1 KCEBPB
DA743484BF207587
Delgado-Olguin2004
Layered H3K27Ac100 _
0 _
Mammal Cons
K562 CTCF Sig 1
K562 Pol2 Sig 1
HeLaS3 Pol2 Sig 1
![Page 55: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/55.jpg)
7/25/13 Wellstein/Riegel Laboratory 55
![Page 56: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/56.jpg)
7/25/13 Wellstein/Riegel Laboratory 56
![Page 57: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/57.jpg)
What do RNA-‐Seq reads look like for GAPDH?
Repeat masked allowing 1/2 mismatched bases blat’d reads viewed in IGB 6.7.2
![Page 58: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/58.jpg)
7/25/13 Wellstein/Riegel Laboratory 58
RNA-‐Seq Differen;al Expression analysis
![Page 59: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/59.jpg)
What does GAPDH look like in terms of quan;ta;on?
TOTAL BM HPP RPKM 3SEQ Counts BLAT Reads RPKM 3SEQ Counts BLAT Reads
CD34 0.7 340 230 8 8 14 BST1 19.7 5374 31 31 CD133 0.2 173 176 16 16 33 THY1 0 7 4 4 A12 1 0 A5 0 0 ALK 0 9 24 0 0 3 B9 0 0 C1 0 0 C2 0 0 C7 0 0 E7 0 0 E9 2 0 F6 0 0 G12 0 0 GAPDH 3013.2 727831 356289 120.8 5559 2670 H3 0 0
Blat read raw counts ra;o == 3Seq counts ra;o ~= 130 to 1 RPKM ra;o ~= 24.3
![Page 60: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/60.jpg)
7/25/13 Wellstein/Riegel Laboratory 60
RNA-‐Seq Quan;fica;on Challenge: A problem that exists with RNA-‐Seq data that doesn’t exist with array data: Longer transcripts produce more reads than shorter transcripts
One solu;on to account for this is RPKM (FPKM used by Cufflinks) RPKM = 10^9 x C / NL, which is really just simply C/N C(gene)= the number of mappable reads that fall onto a gene's exons N= total number of mappable reads in the experiment L(gene)= the sum of the exons in base pairs. Wold (2008) RPKM – reads per kilo base per million CPM – counts per million
![Page 61: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/61.jpg)
7/25/13 Wellstein/Riegel Laboratory 61
RNA-‐Seq Quan;fica;on Challenge: DESeq Method uses the geometric mean of counts in all samples
DESeq Method: Construct a "reference sample" by taking, for each gene, the geometric mean of the counts in all samples. To get the sequencing depth of a sample rela;ve to the reference, calculate for each gene the quo;ent of the counts in your sample divided by the counts of the reference sample. Now you have, for each gene, an es;mate of the depth ra;o. Simply take the median of all the quo;ents to get the rela;ve depth of the library. 'es;mateSizeFactors' func;on of DESeq package does this calcula;on.
![Page 62: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/62.jpg)
DESeq: an R package that works with Raw Counts to determine genes differen;ally expressed across samples
• Simon Anders
7/25/13 Wellstein/Riegel Laboratory 62
![Page 63: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/63.jpg)
7/25/13 Wellstein/Riegel Laboratory 63
![Page 64: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/64.jpg)
7/25/13 Wellstein/Riegel Laboratory 64
![Page 65: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/65.jpg)
7/25/13 Wellstein/Riegel Laboratory 65
![Page 66: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/66.jpg)
Given a list of differen;ally expressed Genes now enrichment analysis should be performed
• Enrichment analysis allows the researcher to leverage documented experiments which provide evidence for genes roles in pathways and func;ons that enable the researcher to determine the results and significance of their experiments
• DAVID – Gene ontology – Func;onal ontology
• Revigo – Output of David may be placed in REVIGO for further interpreta;on and sta;s;cal explora;on of significance of discovered sets of genes
7/25/13 Wellstein/Riegel Laboratory 66
![Page 67: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/67.jpg)
Using differen;ally expressed genes, biological pathways should be explored
• Differen;ally expressed genes are put into programs such as pathway studio or ingenuity
• Shortest path programs and • Canonical pathway analysis • Enables a researcher to reverse engineer the pathways
expressed in the course of a healthy response to a diseased response
• Ideally a pathway reveals the observed phenotype – connec;ng the expressed gene expression program with the phenotype – genotype – gene expression program to phenotype
7/25/13 Wellstein/Riegel Laboratory 67
![Page 68: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/68.jpg)
RNA-‐Sequencing: What is it good for?
• Transcript Annota;on – Muta;on iden;fica;on – Isoform determina;on – Alterna;ve Splice Varia;on
• Differen;al Gene Expression – Phenotypically segrega;ng experiments – Allows us to get at the How in looking at the response of an organism within a par;cular cell popula;on to events
– Good and careful design will allow us to unfold the dynamics of this response and iden;fy targets for altering disease responses to improve ones chances of surviving
7/25/13 Wellstein/Riegel Laboratory 68
![Page 69: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/69.jpg)
7/25/13 Wellstein/Riegel Laboratory 69
![Page 70: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/70.jpg)
7/25/13 Wellstein/Riegel Laboratory 70
hLp://bayes.cs.ucla.edu/home.htm
![Page 71: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/71.jpg)
7/25/13 Wellstein/Riegel Laboratory 71
![Page 72: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/72.jpg)
7/25/13 Wellstein/Riegel Laboratory 72
![Page 73: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/73.jpg)
7/25/13 Wellstein/Riegel Laboratory 73
![Page 74: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/74.jpg)
7/25/13 Wellstein/Riegel Laboratory 74
Acknowledgements Dr. Anton Wellstein Dr. Anna Riegel
Dr. Marcel Schmidt Dr. Elena Tassi The en;re lab: Elena, Virginie, Ghada, Ivana, Eveline, Khalid, Eric the en;re Wellstein/Riegel laboratory My CommiLee Dr. Yuri Gusev Dr. Anatoly Dritschilo Dr. Michael Johnson Dr. Christopher Loffredo Dr. Habtom Ressom Dr. Terry Ryan (external commiLee member) High Performance Core Group, Steve Moore, especially Woonki Chung Amazon Cloud Services Dr. Ann Loraine, UNC, IGB Developer Brian Haas, Author Trinity Suite
![Page 75: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/75.jpg)
Some Resources
• hLp://rnaseq.uoregon.edu/index.html • hLp://dx.doi.org/10.1038/npre.2010.4282.1 (DESeq) • hLp://galaxy.psu.edu/ • hLp://seqanswers.com/ • hLp://www.broadins;tute.org/igv/ • hLp://bioviz.org/igb/index.html • hLp://www.illumina.com • hLp://www.otogene;cs.com • hLp://www.dnanexus.com • hLp://bioconductor.org/packages/2.12/bioc/html/limma.html • hLp://trinityrnaseq.sourceforge.net/ • hLp://trinityrnaseq.sourceforge.net/genome_guided_trinity.html • hLp://cufflinks.cbcb.umd.edu/ • hLp://brb.nci.nih.gov/BRB-‐ArrayTools.html • hLp://www.modernatx.com/
7/25/13 Wellstein/Riegel Laboratory 75
![Page 76: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/76.jpg)
Systems Biology History (wikipedia)
• Systems biology roots found in – Quan;ta;ve modeling of enzyme kine;cs – Mathema;cal modeling of popula;on growth – Simula;ons to study neurophysiology – Control theory and cyberne;cs
• Theorists – Ludwig von Bertalanffy – General Systems Theory – Alan Lloyd Hodgkin and Andrew Fielding Huxley – constructed a
mathema;cal model that explained poten;al propaga;ng along the axon of a neuron cell
– Denis Nobel – first computer model of the heart Pacemaker
7/25/13 Wellstein/Riegel Laboratory 76
![Page 77: 2013 july 25 systems biology rna seq v2](https://reader034.vdocuments.us/reader034/viewer/2022052619/55504409b4c905b2788b4bad/html5/thumbnails/77.jpg)
Scien;fic knowledge is limited (and advanced) by the limits (and advancements) of measurement
7/25/13 Wellstein/Riegel Laboratory 77
• Ilya Shmulevich Genomic Signal Processing “Validity of the model involves observa;on and measurement, scien;fic knowledge is limited by the limits of measurement”
• Erwin Shrödinger Science Theory and Man: “It really is the ul;mate purpose of all schemes and models to serve as scaffolding for any observa;ons that are at all means observable”