rnaseq short intro - göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf ·...
TRANSCRIPT
![Page 1: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/1.jpg)
RNA-Seq practical!
Basic processing: UNIX tools and IGV!
Erik Larsson
![Page 2: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/2.jpg)
RNA-Seq practical!
• Tophat!– Alignment!
• IGV!– Visualization!
• Cufflinks!– Gene discovery!– Find differentially expressed genes!
!
![Page 3: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/3.jpg)
<3% coding sequence
~40% coding genes
GGGGTGAGATCTGGCTGGGTAGGGCTGTTTGACAGGGACACAGTTCACGGCCTGGGACTTGCCAACAAAGTCACCCTGTAGTTCAGGTGACACACAAGTGGATGGGGAGGGTGAGACCCAGGATCTCTTCTCCCCCAGGTCCTTATGAGGGGCTGGAGGAGACAGAACTGGGGTGCTGGACCCTCAGCATAAAGAATGCTATAGGCTGGGCATGGTGACTCATGCCTGTAAATCCCAGCGTTTTGGGAGGCCAAGGCGGGCAGATTGCTTGAGCCCAGAAATTTGAGACCAGCCTGGGCAACATAGCGAGACCCCGGGCAACATAGCGAGACCCCATCTCTAAAAAAATAAAATAAAATTAGCCAGGTTGGTGGCACAAGTCTGCAATTCTAACTACTTGGATGGGCTGAGATGGGAGGATCACTTGAGCCTGGGAGGTCAAGGCTGCAGTGAGCTGTGATTGTGCCACTGCACTCCAGCCGAGGGGACAGAGTGAAACCTTGCCTTAAAAAGACTGCTATGGCCCGAGTCCCTCTGCTGTGCCGGGCACTGTGCTGGGCATGTAACAGGCATATTCTTCTGATCTTTACAACTCTCCCATGAGGCAGGCACTATCGTTAGCCCATTTTACAGATGTGGCCATAGAGGCCCAGAGAGGAGAAGGGGCTTACCTAAGGCTATAGACTGTTGGTATCTGGAGATAAACCCGGGATGGTGCTCACTAAACTACCTTGGGTGTCAGTCCTGCTTCAAGACTCCAGAGAGATAAAGAGAGATGACCTCAGAGACAAAGAGACTCAGACCCAGCCAGAGGCCCAATGGACAGTGGGAGGGGTGGGTGGAAGAAGGCTGGTCTCTGTCTGACCAAGCCCCCCCAGAATAACGCAGGCTGCCCCCCTAGGTGGAAACAATGACACAATCAGCTCCCAATACCAAGGGCCTGACATCACAAGGGGAGGGGAAGGCAGCTGAGGTTGTGGGGGGAGGTGCCCCGCCCCTTGGCAGGCCCCTACAGCCAATGGAACGGCCCTGGAAGAGACCCGGGTCGCCTCCGGAGCTTCAAAAACATGTGAGGAGGGAAGAGTGTGCAGACGGAACTTCAGCCGCTGCCTCTGTTCTCAGCGTCAGTGCCGCCACTGCCCCCGCCAGAGCCCACCGGCCAGCATGTCCTCTGCTCACTTCAACCGAGGCCCTGCCTACGGGCTGTCAGCCGAGGTTAAGAACAAGGTAGGGCTGGAGGGCCTCCCTGGCCTGGCCCACACGTCCTGCCAGGCCAGAGCCCTGAGCTTGGGGTCCCTTGAACCCCCTCCTGCCTATCCTATGTGACTTGGAAACTGAGAGGGGAAAAGGGAGTGATATGGGATAGGGGCTGCCTGTCTCCCCCTGAACATCCCGGAGCCCCCAGCTATGGTTGGGGCTGGAATGGGGGGGCACACAGCCACACATAAACAGAGGGGGTCAGTCCATTGCAAAGATACCCACCTGATCAGTCTTCTGTTAACCCTTCGTGTTCTTGGGGGGAACAACATAGGGGGAAGACTTGTTGATTTTTCCATATCCCCCGGCCTGACAAAGAAATTGGGGAGCGCTTGAGTGCTGGGGTACCTGGGAAGTGACGCCGTGAAAGTGTGGGAGATCCTGAAGACAGAGGGGGACGGTGAAAGGCAGGAAGCGGGCATCAGAAGTGCGGCAGGGGTCTCCTGACTGTGGAGCTAGGAAGATACCTGGACACCACCTTCATGCTATGGTTGGGTAAACTGAGGTTCGGAGAGGAGAGGCAAATAGCTGGGGTCCCAGGTAAAGCAGGTACAGCGCTCGGACCCTGGACTCACCCCCCATACACCAGGATGGGCTCAGCTTCTCCCAGCTGGAGAACTTTAAGTTTCCAGCCCACTGGAATCGCCCCAACAGTATTGCCGAGGGAGGAGTTCCTGCCCCATTTGACAGAGGGGAACACTGAGGCTCAGGGTGGCTTTTCCCAGGGTCCCATGGTGAGGAAGTGGGGGACTGGGTTGGAACCTGGGTCGAGGGATCTCGGGGCTGGAGGAGGGGGCTGGTGGGGGGCGGGTCCTCGGGCGAGAGACAGATCCCAGCGCCGCCCTCCTCCCCCCCAGCGCCGGCCCCAGAGCCGCGCAGAGCCGCGCAGAGACGCCGCGCCTTATAAGGCGGCCTCGGGGAGCCCGGGCCACGCTATATAAGGGCCGGTTTGCTTTATAAAGCCGGGCTGGTGGCGTGGGGGGCGGCAGGGCCAGGGCCAGGTGAGGGGGCCGCCCCTCCCACCTCCCCCCACTCACCCGGGAGAAGAAGAGGCAGCCCGGTCCCCTAGGGGCTGGGAGCCTGGCTGGGCTTGGGCGGAGGGTTCTGGAGAAATGGGAGTGGAGTGGGGGAGGGGGGGGACAGTGGAGAGAGGGAAAAGCAGGGAGGTGGGGGGAGAGGCAGACAGAGATACTGGGAGCCTGAGACACCCTAGGGACAGACGGGGGAGGGCGAGCCAGGAGCGAGATAAGACCTAGACAAGGATGGAGGGGCAGGGAGAGGAGACAGAGCCCCACCACCCCCACCCCAGGCAGGAAACCTGGAGACAGAGAAAGACCTAGAGAGGCAGATATACAAGACCCAGGAGCCCTACCCCTGGCCAGACAGGGACTAGCCACCTAGAGAGATGGGGACCCAAGACTGGGCCAAGAAAAGACAGCGCTGGGGAAGAGAGAGACAGAGGAGTCGGGGGGATAAGAGGGAGAGAGACATACAGACGTGCAAGGGGTGGGGGCTAAGACAGAGACAAGCCCCCACCACTAACCAGAGACAGAGCCCTGGAGCTGAAGACCTGGGGGACACGGAGAGACAGAGATGTATGACCAGCACTCCTCTGCAAGCCAGCACCCAGGGACACCTCCTTAGACATCCTTCTTCCCTTCCTGAGGTGCCCTCTCTTCCAACAGGGGGCACAGAGGGGGCAGGGCTAGAGGAAGAGAAGCCCCAAGTTTGGCCTGGGCGAAAAACCAGGGTGCCGGGTGCCACCCCTCTAGCTCAGAGGATCCAGCTCCCCACACCCCACCCCTCATCTACATTCCCTGGTGCCAAACCTCAGAATGCCCGGAATGGCCCCCTGGGCAGGTGCCACCTCAGCCCTGGCTCTCAGCCCGCCCCAGCCCCCATCCCCCAACTATGGATCTGGGGCAAAATTGCCTTAGTTGGGAAGGACGAGGGAGATCAGGCTCTAGGAAGTTCAGACAGGACCCAGGGAGCCCAGGCTGCCCCCAATGCATCCTCACCCCTTTCTCTGTGCCCCCTGCCCTCCCCTCGCCCCAGCTGGCCCAGAAGTATGACCACCAGCGGGAGCAGGAGCTGAGAGAGTGGATCGAGGGGGTGACAGGCCGTCGCATCGGCAACAACTTCATGGACGGCCTCAAAGATGGCATCATTCTTTGCGAGTGAGTGAGGCTCTCGAAGCCGAGACCCTGCAACATCCCCCAACTCCATGCAGCCCCTCAACCCCCAAAACAACCATGATCCTGGAACTGAGTTGAACACTTTCTATTGGATACCTTTGGGGTGGCCAGTAATCATTGTGCCCATTTAACAGGCACAGAAAACTGAGGCTCAGGTGAAATGCATTGCACCAAGTCCCACGTGGTTTCAAGGGAAATGACTCTAGAATCTTAACCACCATGCTATATAGGGTAGGCCCATCTGTGGCCGCCAGAGTCCCCAGAAAGAGCGGTCACAGCTAAAAGGCAGCAGCCAACAGCTGTTCATGGCTGGCTTGGTGATGTGAGGAGAGATGTGCAGCAATAATTAAAGGAGGCCCTGGTTTTCTTTCTGTTTTCTTTTTGTTTTTTTGAGATACAGTCTTGTTCTGTTGCCCAGGCTGCAGTGCAGAGACACAATCTCGGCTCACTGCAACCTCCGCCTCCAGGGTTTAAGTGATTCTCCTGCCTCAGCCTCCCCAATAGCTGGGATTACAGGCACGCACCACCATGCCTGGCTAATTTTTGTATTTTTTTAAAGTAGAGATGGGGTTTCACCATGTTGGCCAGGATGGTTACGAACTCCTGACCTCAATTGATCCACCTACCTCAGCCTCCCAAAGTGCTGGGATTACAGGCACGTGCCACCATGCCCGGTTAATTTTTGTTTTTTTTTTTTTTTTTTCAGTAGAGATGGAGTTTCACCATGTTGACTAGGCTGGTCTTGAACTCCTGACTTCAAGTGATCCACCTGCCTTGGCCTCCCAAAGTGCTGGGATTGCAGGCACATGCCACCACGCCTGGCTAATTTTTGTATTTTTTTTTTTTTTTTTTAGTAGAGACAGTGTTTCACCATGTTGACCGGGCTGGTCTCAAACTGTGTGTGACACACACACACATGTGACAGTTGTGAAAAACACACACGTGTGTGTGTGGACACACACACACACACACACAC
~60% transcribed
![Page 4: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/4.jpg)
The human transcriptome (according to GENCODE v11)!
1,944 SnRNA
1,521 SnoRNA
1,756 MicroRNA1,190 Misc. RNA19,999
Protein-coding12,534Pseudogene
10,419 LncRNA
Shahrouki, Larsson, Frontiers in Genetics 2012
![Page 5: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/5.jpg)
RNA-seq, RNA sequencing, transcriptome sequencing, total RNA-seq, mRNA-seq,
miRNA-seq…!
• Many names, sometimes mean same!• All about characterizing RNA with next-
generation sequencing (NGS) in one way or the other!
![Page 6: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/6.jpg)
Microarrays vs. RNA-seq!
• Simultaneously quantify most known genes!
• Simultaneously quantify all known genes at high accuracy!
• Identify new genes!• Study splicing patterns!• Discover mutations!• Fusion transcripts!• Find viruses!• Allele-specific expression!• …!
![Page 7: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/7.jpg)
New toys
Applied Biosystems 3730 (2002) Illumina HiSeq 2000 (2010)
50.000-100.000 bp per run ~200.000.000.000 bp per run
![Page 8: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/8.jpg)
NGS principle (Illumina/Solexa)!Take picture to figure out first base in each cluster !
Remove terminators and repeat everything many times!
Add labeled nucleotides, primers, polymerase!
![Page 9: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/9.jpg)
Source: Illumina!Sequencing!!
Isolate polyA+!Fragmentation!
Add random primers!
cDNA synthesis!(first and second strand)!
Ligate adapters!
Standard RNA-seq workflow (polyA+)!
![Page 10: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/10.jpg)
Directional/strand-specific RNA-seq:dUTP method!
Levin et al, Nature Methods 2010!
RNA!
dsDNA!
Adapters!
U U U U U!
U U U U U!
UNC treatment!
![Page 11: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/11.jpg)
RNA-seq data analysis!
• Alignment!• Gene discovery!• Expression quantification!• Testing for differential expression!• Variant discovery!
![Page 12: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/12.jpg)
Pairwise alignment
• Figure out where one sequence belongs within another sequence
• Trivial if not for substitutions, insertions, deletions
Genome: TGCGTACGCTCGATAGCTCGCATCGCTAGCCTCGCATAGCTAGCGATCGT
TCGCATCGCTAGCCTCGCAGAGCTAGC RNA:
||||||||||||||||||| |||||||
![Page 13: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/13.jpg)
Aligning RNA-seq reads!
• Why? Figure out from where the were transcribed!!• Required prior to most analyses!!Two main options:!• Align to transcriptome!
– Fast, simple!– Avoids problems with “spliced”/junction-spanning
reads!• Align to genome!
– Requires specialized RNA-seq aligner (can handle junction-spanning reads)!
![Page 14: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/14.jpg)
Gapped alignments
• Aligners for RNA-seq will need to handle gapped alignments
• Junction-spanning reads will otherwise be lost
Genome:
Spliced mRNA: AAA
NGS reads:
![Page 15: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/15.jpg)
Splice-junction aware aligners!
• TopHat!– Popular option, big online user community!– Finds new junctions but can be guided by
known annotation!– Cuts up reads into smaller pieces and calls
the Bowtie short-read aligner!• SOAPsplice!• SpliceMap!• …!
![Page 16: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/16.jpg)
TopHat output visualized using IGV(human ACTB locus)!
![Page 17: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/17.jpg)
RNA-seq data analysis!
• Alignment!• Gene discovery!• Expression quantification!• Testing for differential expression!• Variant discovery!
![Page 18: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/18.jpg)
Transcriptome assembly/gene discovery!
• Task:!– Use aligned reads to discover genes and
figure out transcript structures !• Tools:!
– Cufflinks!• Most popular choice!• Lots of online support, actively developed!
– Scripture!– Trans-ABySS!
![Page 19: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/19.jpg)
Cufflinks discovers new transcripts/genes from aligned reads!
Aligned reads!
Discovered transcript isoforms!
Abundance estimates!
![Page 20: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/20.jpg)
RNA-seq data analysis!
• Alignment!• Gene discovery!• Expression quantification!• Testing for differential expression!• Variant discovery!
![Page 21: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/21.jpg)
Testing for differential expression!
• Normal t-test not optimal!– RNA-seq is “digital” rather than continuous!
• Negative binomial distribution is better!– EdgeR, DeSeq!
• Runs in R environment!– Cuffdiff (Cufflinks package)!
• +Easy: use alignments without prior quantification!• +Can test for differential splicing!• -Very conservative!
![Page 22: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds](https://reader034.vdocuments.us/reader034/viewer/2022050407/5f841a2199095a5ee90695ba/html5/thumbnails/22.jpg)
http://bio.lundberg.gu.se/courses/vt13/rnaseq.html
Read intro carefully!
Good luck!!