quantitative rna sequencing (rna-seq) and exome … 1 hmgp 7620: advanced genome analysis...
Post on 17-Jun-2018
219 Views
Preview:
TRANSCRIPT
2/3/2015
1
HMGP 7620: Advanced Genome Analysis
Quantitative RNA Sequencing (RNA-seq) and Exome Analysis
Richard A. Radcliffe, Ph.D.Professor of Pharmacology
School of Pharmacy, Department of Pharmaceutical SciencesRoom V20-3124(303) 724-3362
richard.radcliffe@ucdenver.edu
Why RNA-seq?
Crick (1970) Nature 227:561-563
Phenotype
Genetic architectureDevelopmental stage
Environmental influencesTissue type
Disease state
HMGP 7620: Advanced Genome Analysis
2/3/2015
2
“Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular
constituents of cells and tissues, and also for understanding development and disease.”
• Catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs
• Determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications
• Quantify the changing expression levels of each transcript during development and under different conditions.
• Pathway/network/ontology analysis.
Why RNA-seq?
Massively parallel expression analysis
Wang et al. (2009) Nat Rev Genetics 10:57-63HMGP 7620: Advanced Genome Analysis
RNA-seq OverviewAAAAAA AAAAAA
AAAAAA
AAAAAA
AAAAAAAAAAAA
AAAAAA AAAAAA
AAAAAA
AAAAAA AAAAAAAAAAAA AAAAAA AAAAAA
AAAAAA AAAAAA
Adapted from: Pepke et al. (2009) Nat Methods 6:S22-S32
Analysis(QC, quantitation, transcript
annotation)
Select fraction of interest
Library prep
Sequence and map to reference genome
HMGP 7620: Advanced Genome Analysis
2/3/2015
3
Library Prep
HMGP 7620: Advanced Genome Analysis Corney (2013) Mater Methods 3:203
Library Prep: Some Considerations
HMGP 7620: Advanced Genome Analysis
• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)
• Strandedness• Read length• Single- vs. pair-end• Multiplexing
2/3/2015
4
RNA Fraction
Mattick & Makunin (2006) Hum Mol Genet 1:R17-29 Genomes, 2nd Edition, Oxford: Wiley-Liss, 2002
~80% ~15%
HMGP 7620: Advanced Genome Analysis
Genomic Distribution Total RNA Distribution
TranscribedBoth strandstranscribed
RR34
HMGP 7620: Advanced Genome Analysis
• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)
• Strandedness– Overlapping transcripts– Annotation of novel transcripts
• Read length• Single- vs. pair-end • Multiplexing
Library Prep: Some Considerations
Slide 7
RR34 The area of the box represents the genome. The area of large green circle is equivalent to the documented extent of transcription, with the darker green area corresponding to that on both strands. CDSs are protein-coding sequences, and UTRs are 5′- and 3′-untranslated sequences in mRNAs. The dots indicate (and in fact overstate) the proportion of the genome occupied by known snoRNAs and miRNAs. Richard Radcliffe, 1/26/2015
2/3/2015
5
Strandedness
HMGP 7620: Advanced Genome Analysis
Strandedness
HMGP 7620: Advanced Genome Analysis
<<<<< <<<
<<<
<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Transcription
DS library prep
Alignment
<<<<<<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<
Ncstn (-)
Copa (+)
<<<<<<<<<<<<<<<<<<<<<<
SS library prep
<<<<<<<<<<<<<<<<<<<<<<
Which strand (gene) didthe fragment come from?
No question about which strand(gene) the fragment came from.
2/3/2015
6
HMGP 7620: Advanced Genome Analysis
• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)
• Strandedness• Read length• Single- vs. pair-end• Multiplexing
Library Prep: Some Considerations
Read Length
HMGP 7620: Advanced Genome Analysis
• Read length is related to:– Sequencing accuracy: quality declines as a function of the length of a read– Mapping accuracy: the longer the read, the more accurately it maps
2/3/2015
7
HMGP 7620: Advanced Genome Analysis
• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)
• Strandedness• Read length• Single- vs. pair-end • Multiplexing
Library Prep: Some Considerations
Single vs. Paired-end
HMGP 7620: Advanced Genome Analysis Zhernakova et al. (2013) PLoS Genet e1003594
2/3/2015
8
HMGP 7620: Advanced Genome Analysis
• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)
• Strandedness• Read length• Single- vs. pair-end • Multiplexing
Library Prep: Some Considerations
Mapping to the Reference Genome
HMGP 7620: Advanced Genome Analysis
Alignment
@HWUSI-EA541_0032:1:2:0:325#0 CCATCTTTTTGATGTCCGCAATGATTT+WTORTSOQXTVVYXRXXXVPTXXXWUUL
@HWUSI-EA541_0032:1:2:0:325#0 - chr7 13619194 CCATCTTT…
• Bowtie, BWA• Computational
considerations
2/3/2015
9
Mapping to the Genome: Some Considerations
HMGP 7620: Advanced Genome Analysis
• Non-unique reads – Gene families– Repeat sequences (simple repeats, transposons)
• Depth– Probability of representation & limits of detection– Transcript isoform quantification– Variant calling (SNPs, small indels)
• Reference genome effects
HMGP 7620: Advanced Genome Analysis
Number of multiple alignment reads allowed (bowtie option -m)
100 101 102 103 104 105
Fra
ctio
n o
f re
ads
supp
ress
ed (
%)
0
4
8
12
16
20
Num
ber
of a
lignm
ents
(10
6)
0
50
100
150
200
250
Non-unique Reads
2/3/2015
10
Non-unique Reads: Gene Families
HMGP 7620: Advanced Genome Analysis
Non-unique Reads: Repeats
HMGP 7620: Advanced Genome Analysis
2/3/2015
11
Mapping to the Genome: Some Considerations
HMGP 7620: Advanced Genome Analysis
• Non-unique reads – Gene families– Repeat sequences (simple, SINEs, LINEs, etc.)
• Depth– Probability of representation & limits of detection– Transcript isoform quantification– Variant calling (SNPs, small indels)
• Reference genome effects
Depth: Transcript Quantification
HMGP 7620: Advanced Genome Analysis
2/3/2015
12
Depth: Variant Calling
HMGP 7620: Advanced Genome Analysis
Mapping to the Genome: Some Considerations
HMGP 7620: Advanced Genome Analysis
• Non-unique reads – Gene families– Repeat sequences (simple, SINEs, LINEs)
• Depth– Probability of representation & limits of detection– Variant calling (SNPs, small indels)– Transcript isoform quantification
• Reference genome effects
2/3/2015
13
Reference Genome Effects
HMGP 7620: Advanced Genome Analysis
RNA seq: ISS(ISS genome)
RNA seq: ISS(mm10 genome)
ILS DNA Sequencing
ISS DNA Sequencing
GeneAnnotations
Analysis
HMGP 7620: Advanced Genome Analysis
• QC• Assembly/Quantification
– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)
• Differential expression• Pathway/network functional analysis• Annotation
– Novel exons – novel splice junctions – novel genes
2/3/2015
14
Quality Control
HMGP 7620: Advanced Genome Analysis
• Pre-library construction:– RNA quality
• Pre-alignment:– Per base quality– Per read quality– Nucleotide distribution per position – GC content– Sequence over-representation
• Post-alignment:– Mean coverage, 5’-3’ and 3’-5’– Ribosomal RNA contamination– Percent mapped reads
Quality Control: RNA Degradation
HMGP 7620: Advanced Genome Analysis
18s
28s
2/3/2015
15
Quality Control
HMGP 7620: Advanced Genome Analysis
Quality per position Quality per read Nucleotide distribution
Analysis
HMGP 7620: Advanced Genome Analysis
• QC• Assembly/Quantification
– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)
• Differential expression• Pathway/network functional analysis• Annotation
– Novel exons – novel splice junctions – novel genes
2/3/2015
16
Assembly/Quantification: RPKM
HMGP 7620: Advanced Genome Analysis
RPKM = C/LN
3.18
Analysis
HMGP 7620: Advanced Genome Analysis
• QC• Assembly/Quantification
– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)
• Differential expression• Pathway/network functional analysis• Annotation
– Novel exons – novel splice junctions – novel genes
2/3/2015
17
Differential Expression
HMGP 7620: Advanced Genome Analysis
Hddc3
Analysis
HMGP 7620: Advanced Genome Analysis
• QC• Assembly/Quantification
– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)
• Differential expression• Pathway/network functional analysis• Annotation
– Novel exons – novel splice junctions – novel genes
2/3/2015
18
Pathway/Network Functional Analysis
HMGP 7620: Advanced Genome AnalysisDarlington et al. (2013) Genes Brain Behav 12:263-274Bennett et al. (2015) Alcohol Clin Exp Res NIHMS658870
Weighted Gene Co-expression Network Analysis (WGCNA)
Gene Ontology (GO) Cluster Analysis
Analysis
HMGP 7620: Advanced Genome Analysis
• QC• Assembly/Quantification
– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)
• Differential expression• Pathway/network functional analysis• Annotation
– Novel exons – novel splice junctions – novel genes
2/3/2015
19
Annotation
HMGP 7620: Advanced Genome Analysis
Exome Sequencing
HMGP 7620: Advanced Genome Analysis
• Why– Identification of variants (SNPs, CNVs, small InDels)– Linkage/association/pedigree studies– Clinical diagnostics
• How– Isolate, fragment DNA– Build library– Exome enrichment– Sequence– Align to reference genome– Variant calling– Higher order genetic analysis
2/3/2015
20
Exome Enrichment
HMGP 7620: Advanced Genome Analysis www.genomics.agilent.com
Variant Calling
HMGP 7620: Advanced Genome Analysis Altmann et al. (2012) Hum Genetics 131:1541-1554
RR1
Slide 40
RR1 Examples of intragenic deletion and duplication detected by WES and confirmed by exome aCGH. Each bar in the graphs (a)–(c) and (e)–(g) represents an exon. (a–c) WES data from a family trio in which the (a) proband has inherited a whole-gene duplication of KRT34 from the (b) father, whereas the (c) mother shows normal copy number at that gene. (e–g) WES data from a family trio in which the (e) proband has inherited a partial-gene heterozygous deletion in the SYCP2L gene from the (g) mother, whereas the (f) father shows normal copy number at those exons. Each dot in panels d and h represents an oligonucleotide probe in the gene of interest on the exome array, with a duplication shown by probes deviating to a positive log2 ratio (marked in red) and a deletion shown by probes deviating to a negative log2 ratio (marked in green). Panels d and h show confirmation of the KRT34 duplication and the SYCP2L deletion, respectively, by exome aCGH. aCGH, array comparative genomic hybridization; WES, whole-exome sequencing.Radcliffe, Richard, 2/1/2015
2/3/2015
21
Variant Calling: CNVs/Indels
HMGP 7620: Advanced Genome Analysis Retterer et al. (2014) Genetics Med doi:10.1038/gim.2014
Child
Father
Mother
RR2
Genetic Analysis: Mendelian Inheritance
HMGP 7620: Advanced Genome Analysis
Assumptions:• Only consider small indels and
SNPs• Causal variants are coding• Causal variants alter protein
sequence• Near complete penetrance
Rabbani et al. (2012) J Hum Genetics 57:621-632
Slide 41
RR2 Examples of intragenic deletion and duplication detected by WES and confirmed by exome aCGH. Each bar in the graphs (a)–(c) and (e)–(g) represents an exon. (a–c) WES data from a family trio in which the (a) proband has inherited a whole-gene duplication of KRT34 from the (b) father, whereas the (c) mother shows normal copy number at that gene. (e–g) WES data from a family trio in which the (e) proband has inherited a partial-gene heterozygous deletion in the SYCP2L gene from the (g) mother, whereas the (f) father shows normal copy number at those exons. Each dot in panels d and h represents an oligonucleotide probe in the gene of interest on the exome array, with a duplication shown by probes deviating to a positive log2 ratio (marked in red) and a deletion shown by probes deviating to a negative log2 ratio (marked in green). Panels d and h show confirmation of the KRT34 duplication and the SYCP2L deletion, respectively, by exome aCGH. aCGH, array comparative genomic hybridization; WES, whole-exome sequencing.Radcliffe, Richard, 2/1/2015
2/3/2015
22
Genetic Analysis
HMGP 7620: Advanced Genome Analysis Ku et al. (2012) Ann Neurology 71:5-14
A Few ReferencesRNA-seq:• Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou YC, Pugh TJ, Robertson G,
Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJ, Tai IT, Marra MA (2010) Alternative expression analysis by RNA sequencing. Nat Methods 7:843-847.
• Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621-628.
• Munger SC, Raghupathy N, Choi K, Simons AK, Gatti DM, Hinerfeld DA, Svenson KL, Keller MP, Attie AD, Hibbs MA, Graber JH, Chesler EJ, Churchill GA (2014) RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations. Genetics 198:59-73.
• Oshlack A, Robinson MD, Young MD (2010) From RNA-seq reads to differential expression results. Genome Biol 11:220.
• Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57-63.
Exome sequencing:• Altmann A, Weber P, Bader D, Preuß M, Binder E, Müller-Myhsok B (2012) A beginners guide to SNP calling from high-
throughput DNA-sequencing data. Hum Genet 131:1541-1554.
• Biesecker LG, Green RC (2014) Diagnostic clinical genome and exome sequencing. The New England Journal of Medicine370:2418-2425.
• Krumm N, Sudmant PH, Ko A, O'Roak BJ, Malig M, Coe BP, Quinlan AR, Nickerson DA, Eichler EE (2012) Copy number variation detection and genotyping from exome sequence data. Genome Res 22:1525-1532.
• Majewski J, Schwartzentruber J, Lalonde E, Montpetit A, Jabado N (2011) What can exome sequencing do for you? Journal of Medical Genetics 48:580-589.
• Singleton AB (2011) Exome sequencing: a transformative technology. The Lancet Neurology 10:942-946.
top related