Transcriptomics 101
Expression GenomicsLaboratoryhttp://www.expressiongenomics.org
Nicole [email protected]
Winter School, 7th July 2009
I want to knowthe maths
I want the bestsoftware package
Align tags to the genome1
Measure gene expression2
Find mutations3
Find novel expression4
Assemble transcripts5
Win Nobel Prize6
0
5
10
15
20
25
1 2 3 4 5
June 2009
Presentation Outline
What is a transcriptome?
What can we learn from
studying it?
Introduction
Genomic tools for
transcriptomics.
Deriving biological
insight from transcriptomics.
Transcriptomics
What’s old is new again.
Double stranded protocols.
Strand specific protocols.
Sequencing the
transcriptome
Mapping and quantitation.
Genomic context of gene
expression.
SNPs, exon-junctions, novel
genes.
Working withRNA data
The problem of limited
information content.
Known and novel
expression.
IsomiRs.
Working withmiRNA data
pAATG
AUG AAA
TSS transcription start site pA polyadenylation signalprotein coding regions
AUG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS
All exonsfull length protein
Single transcript Geneone gene, one mRNA, one protein
pAATG
AUG AAA
TSS transcription start site pA polyadenylation signalprotein coding regions
AUG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS
All exonsfull length protein
Alternative splicingone gene, many mRNAs, many proteins
Intron retentionnew STOP codon, truncated protein, altered function
AUG AAA
Exon skippingchanged domain content, altered function
Exon skippingnew STOP codon, truncated protein, altered function
AUG AAA
AUG AAA
pAATG
AUG AAA
TSS transcription start site pA polyadenylation signalprotein coding regions
AUG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS
All exonsfull length protein
Alternative promotorsexpands coding output and gene control
Alt TSSdifferential control of gene, tissue specific or temporally specific, altered 5’ UTR content
AUG AAA
Alt TSSaltered 5’ UTR content, new ATG codon, expanded protein, altered function
AAAAUG
pAATG
AUG AAA
TSS transcription start site pA polyadenylation signalprotein coding regions
AUG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS
All exonsfull length protein
Alternative 3’ exonscan change ORF and 3’UTR content
Alternative 3’ exondifferent 3’UTR content, can change the ORF
AUG AAA
AUG AAAAlternative pAdifferent 3’UTR content
Transcriptionalcomplexity
pA
pA pApAATG ATG
AAAAAA
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
TSS
PASR TASRmiRNA
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
tiRNA
Presentation Outline
What is a transcriptome?
What can we learn from
studying it?
Introduction
Genomic tools for
transcriptomics.
Deriving biological
insight from transcriptomics.
Transcriptomics
What’s old is new again.
Double stranded protocols.
Strand specific protocols.
Sequencing the
transcriptome
Mapping and quantitation.
Genomic context of gene
expression.
SNPs, exon-junctions, novel
genes.
Working withRNA data
The problem of limited
information content.
Known and novel
expression.
IsomiRs.
Working withmiRNA data
Microarrays
PrepareMicroarray
Scan
Sample to study
ExtractRNA
LabelRNA
Hybridize
ShortProbes
Wnt4
Sox9Amh
+Female Male
Wnt4
Sox9
Amh
male gene expression
fem
ale
gene
exp
ress
ionMicroarray based
profiling
13.5dpc male vs female gonad
Gene expression predates morphology
Microarray basedprofiling
Gene expression patternscorrelate strongly with
prognosis
Nature Reviews Genetics 1; 48-56 (2000)MOLECULAR PROFILING OF HUMAN CANCER
Limitationsof microarrays
Limitedsensitivity
Limited dynamic range
Cross-hybridization
Detectionlimited by
probe design
Using arrays to surveytranscriptional complexity
pA
pA pApAATG ATGTSS TSS TSS
TSS
AAAAAA
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
microarray exon arrays exon-junction arrays
Presentation Outline
What is a transcriptome?
What can we learn from
studying it?
Introduction
Genomic tools for
transcriptomics.
Deriving biological
insight from transcriptomics.
Transcriptomics
What’s old is new again.
Double stranded protocols.
Strand specific protocols.
Sequencing the
transcriptome
Mapping and quantitation.
Genomic context of gene
expression.
SNPs, exon-junctions, novel
genes.
Working withRNA data
The problem of limited
information content.
Known and novel
expression.
IsomiRs.
Working withmiRNA data
pA
pA pApAATG ATGTSS TSS TSS
TSS
AAAAAA
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
3’ SAGE MPSS di-tag/mate-pair5’ SAGE
RNA sequencing
pA
pA pApAATG ATGTSS TSS TSS
TSS
AAAAAA
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
Shotgun sequencing
SQRL protocol
Step 1: pre-process RNA Step 2: 1st strand cDNA
Step 4: PCR amplification Step 3: template switch
AAAAAAAANNNNNN
FDV
NNNNNNFDV
NNNNNNFDV
CCC
CCC
CCC
rGGGRDV
rGGGRDV
rGGGRDV
NNNNNNFDV
CCCRDV
NNNNNNFDV
CCCRDV
NNNNNNFDV
CCCRDV
NNNNNN FDVCCCRDV
NNNNNN FDVCCCRDV
NNNNNN FDVCCCRDV
RDV FDV
RDV FDV
RDV FDV
AAAAAAAA
The SQRL protocolgenerates antisense
short-tags
LEGenD protocol
Step 1: pre-process RNA Step 2: Adaptor Ligation
Step 4: PCR amplification
NN RDVFDV
AAAAAAAA
RDVFDV NN
NN RDVFDV RDVFDV NN
NN RDVFDV RDVFDV NN
Step 3: 1st Strand cDNA
NN RDVFDV RDV
NN RDVFDV RDV
NN RDVFDV RDV
FDV
FDV
FDV
NN RDVFDV RDV
NN RDVFDV RDV
NN RDVFDV RDV
FDV
FDV
FDV
The LEGenD protocolgenerates sense
short-tags
Most commonRNAseq protocols
Step 1: pre-process RNA Step 2: 1st and 2nd strand cDNA
Step 4: PCR amplification Step 3: Adaptor Ligation
AAAAAAAANNNNNN
NNNNNN
NNNNNN
AAAAAAAA
The RNAseq protocolgenerates unstranded
short-tags
NNNNNN
NNNNNN
NNNNNNRDVRDV
FDVFDV
RDVRDV
FDVFDV
RDVRDV
FDVFDV
NNNNNN
NNNNNN
NNNNNNRDVRDV
FDVFDV
RDVRDV
FDVFDV
RDVRDV
FDVFDV
Presentation Outline
What is a transcriptome?
What can we learn from
studying it?
Introduction
Genomic tools for
transcriptomics.
Deriving biological
insight from transcriptomics.
Transcriptomics
What’s old is new again.
Double stranded protocols.
Strand specific protocols.
Sequencing the
transcriptome
Mapping and quantitation.
Genomic context of gene
expression.
SNPs, exon-junctions, novel
genes.
Working withRNA data
The problem of limited
information content.
Known and novel
expression.
IsomiRs.
Working withmiRNA data
pAATG
AUG AAA
TSS transcription start site pA polyadenylation signalprotein coding regions
AUG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS
Aligning tags to a reference genome
The fastest alignmentmethods are ungapped…but what about junctions?
Random fragmentationof RNA libraries
0
50
100
150
200
250
300
350
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Length of captured RNA
Freq
uenc
y
Short-tag length
Captured RNA Adaptor
Random fragmentationof RNA libraries
0
50
100
150
200
250
300
350
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Length of captured RNA
Freq
uenc
y
0
50
100
150
200
250
300
350
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Length of captured RNA
Freq
uenc
y
Short-tag length
Captured RNA Adaptor
Random fragmentationof RNA libraries
0
50
100
150
200
250
300
350
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Length of captured RNA
Freq
uenc
y
0
50
100
150
200
250
300
350
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Length of captured RNA
Freq
uenc
y
0
50
100
150
200
250
300
350
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Length of captured RNA
Freq
uenc
y
Short-tag length
Captured RNA Adaptor
Allowing for errorswhen mapping
Reference DNA
Amplificationerrors
Measurementerrors Polymorphisms Allelic specific
expression
RNA editing
Mappingerrors
Base changesin RNA sample
What is the minimumalignment length I should
use for my genome?
How many errors shouldI allow at the mapping
length used?
Unique- vs multi-mapping tags
Unique ≠ accurate
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
35.0.
035
.1.0
35.1.
135
.2.0
35.2.
135
.3.0
30.0.
030
.1.0
30.1.
130
.2.0
30.2.
130
.3.0
25.0.
025
.1.0
25.1.
125
.2.0
25.2.
125
.3.0
% sim % mum 5 % mum 10 % sims in known exons
Unique ≠ accurate
tagcgggatctctcgagagctcgcgat
tagcgggatctctcgacagctcgcgat
Chr A
Chr B
tctctcgacagct
1 MM
0 MM
tctctcgagagct0 MM
1 MM
Unique ≠ accurate
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
35.0.
035
.1.0
35.1.
135
.2.0
35.2.
135
.3.0
30.0.
030
.1.0
30.1.
130
.2.0
30.2.
130
.3.0
25.0.
025
.1.0
25.1.
125
.2.0
25.2.
125
.3.0
% sim % mum 5 % mum 10 % sims in known exonsIDEAL: match at thelongest possible length
RNA-MATEv1.1http://www.expressiongenomics.org/RNA-MATE/
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1 • perl/python coded• unix command line
(trialling web interface)• currently set up for PBS
managed cluster• GNU General Public
License v3.0• junction libraries
available
RNA-MATEv1.1http://www.expressiongenomics.org/RNA-MATE/
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1
Configuration File
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1 tag_length=35,30num_mismatch=3mask=11111111111111111111111111111111111max_multimatch=10expect_strand=+rescue_window=10exp_name=tag_20000_F3chromosomes=chrM,chr2chr_path=/data/matching/hg18_fasta/junction=/data/libraries/hg18_junctions.fasta.catjunction_index=/data/libraries/hg18_junctions.fasta.indexoutput_root=/data/cxu/output_dir=/data/cxu/tag_20000_F3/raw_qual=/data/raw/tag20000.qualraw_csfasta=/data/raw/tag20000.csfastaquality_check=truescript_chr_start=/data/matching/chr_start.plscript_chr_wig=/data/matching/chr_wig.plf2m=/data/matching/f2m.plmapreads=/data/matching/mapreadsmaster_script=/data/matching/rna-mate-v1.0.pl
Quality Check(optional)
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1QVBasecalls
< 5 basecalls where QV <10
Pass Fail
25mers30mers35mers
Genome Alignment
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1 Recursive mapping strategy
Genome
Junction
Size
DiscoveryBin
MatchedData
Exon-junction libraries
pA pApAATG ATG
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
Multimapping Rescue(optional)
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1• Advantages:
• can add 5-20% more data• can interrogate genomic
regions previously hidden (genomic “black holes”)
• Disadvantages:• memory hungry• can slow down analysis
Multimapping Rescue(optional)
multi-mapping region
exons
genomic DNA positive strand expression
negative strand expression
Locus CLocus B
Locus A (predicted)
user defined window width
BED and bedGraphs
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1 • outputs:• strand specific bedGraphs
(wiggle plots)• strand specific start site
bedGraphs (for tag counting applications)
• “expected strand” junction BED file (for visualization)
• “unexpected strand”junction BED file (for assessing library directionality)
Genomic context ofexpression
Gene Symbol GRB7
Single nucleotide resolution coverage plot
Exon-exon junction usage
Known gene structure(exons and introns)
Alternative splicing
Novel exons or novel transcripts
Future Versions
Start
1
2
3
ReadConfiguration File
tag aligned?
check quality?
rescuemultimappers?
Quality Check
Genome/JunctionAlignment
Trim Tag
Select SingleMapping Tags
MultimappingTag Rescue
End4 Create WigglePlot Files
Create JunctionBED Files
Yes
Yes
Yes
No
No
No
RNA-MATEv1.1• Web browser interface• Integration of SNP analysis
pipeline for transcriptome• Allow the integration of
other mapping algorithms• Allow the integration of
other exon-junction identification strategies
Novel exon-junctiondiscovery (systematic)
pA pApAATG ATG
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
Pros:Computationally easy
Cons:Does not find all novel splicing
Novel exon-junctiondiscovery (de novo)
ACGATATGACACGTACAGTCAAATCGTACGATATTACACGTACATTCAAGTCGTACGATATTACACGCACAGTCAAGTCGTCGATATTACACGTCCAGTCAAGTCGTTATATTTCACGTACAGTCAAGTCGTTCGATATTAAACGTACAGTCAAGTCGTTCG
ATTGCACGTACAGTCAAGTCGTTCGGAATTACACGTACAGTCACGTCGTTCGGA
CACGTACAGTCAAGTCGTTCGGAACCTCACGTACCTTCAAGTCGTTCGGAACCT
ACGATATTACACGTACAGTCAAGTCGTTCGGAACCT consensus read
aligned reads
Non-matching tags
Create consensus read
remove adaptor sequence
Blat against genome
Pros:De novo
Cons:Requires high coverage
Novel exon-junctiondiscovery (TopHat)
pA pApAATG ATG
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
ATG AAA
http://tophat.cbcb.umd.edu
Pros:Very sensitive
Cons:Relies on reference
Substitutionsand micro-indels
Dnttip3 Arid4b
Map tags togenome
Align tagsto identify SNPs
Annotate SNPs(eg. SNP is
non-synonymousin an ORF)
Rank SNPs(eg. polyphen,
Canpredict)
Validate SNPs(eg. SangerSequencing)
ACGATATTACACGTACACTCAAGTCGTTCGGAACCTACGATATTACACGTACATTCAAATCGTACGATATTACACGTACATTCAACTCGTACGATATTACACGCACATTCAAGTCGT
CGATATTACACGTACATTCAAGTCGTTATATTTCACGTACATTCAAGTCGTTCGATATTAAACGTACATTCAAGTCGTTCG
ATTACACGTACATTCAAGTCGTTCGGAATTACACGTACATTCACGTCGTTCGGA
CACGTACATTCAAGTCGTTCGGAACCT-----------------T------------------ SNP call
Aligned Reads
Reference
“Diagnostic” features
AAA
protein coding regions AAA polyadenylationnon-coding regions spliced intron
AAA
AAA
AAA
A
B
C
D
Transcripts defined by Aceview (September 2007 release)
“Diagnostic” features
AAA
protein coding regions AAA polyadenylationnon-coding regions spliced intron
92.6% known transcripts have diagnostic features (covers 99.8% of loci)217127 diagnostic features covering 160156 individual transcripts from 65254 loci
AAA
AAA
AAA
A
B
C
D
Accuracy relies on the qualityof the gene models used.
Different gene models will givedifferent results from the samedata.
Differential GeneExpression
Microarray Sequencing
http://www.bioconductor.org/packages/2.3/bioc/html/edgeR.html
Caution on ShotgunRNAseq analysis
Oshlack and WakefieldBiol Direct. 2009; 4: 14.
Categories of genesthat are enriched forshort sequences:
•innate immunity•cell-cell communication•signal transduction
Presentation Outline
What is a transcriptome?
What can we learn from
studying it?
Introduction
Genomic tools for
transcriptomics.
Deriving biological
insight from transcriptomics.
Transcriptomics
What’s old is new again.
Double stranded protocols.
Strand specific protocols.
Sequencing the
transcriptome
Mapping and quantitation.
Genomic context of gene
expression.
SNPs, exon-junctions, novel
genes.
Working withRNA data
The problem of limited
information content.
Known and novel
expression.
IsomiRs.
Working withmiRNA data
nucleus cytoplasm
5’3’
RNA-Induced Silencing Complex(RISC)
5’ 3’ miRNAduplex
mRNA5’ AAAAAAAAAAAAAA 3’
5’T’
MicroRNAs can inhibittranslation of mRNAs
5’ 3’pri-miRNA 5’
3’ pre-miRNA
DroshaProcessing
DicerProcessing
AsymmetricalUnwinding
RISC-mRNAinteractionsTranslational
InhibitionmRNA
sequestrationmRNA
degradation
microRNAs are small
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35Length of small RNAs in the databases
Prop
ortio
n of
sm
all R
NA
s
miRNAs piRNAs
Matches to the Genome
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
17mers 18mers 19mers 20mers 21mers 22mers 23mers 24mers 25mers 26mers 27mers0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
17mers 18mers 19mers 20mers 21mers 22mers 23mers 24mers 25mers 26mers 27mers
1 colourspace mismatch
IsomiRs are commonN
umbe
r or i
dent
ical
read
s
Red = reads that start from a different location than Sanger reference
Blue = reads that start as the Sanger reference
Reference sequencemiRNAextension
5’….. …3’
31 miRNAs show the most abundant version starting from a different location than Sangre reference
Optimizing smallRNA mapping
Refining the reference set
Optimizing the mismatches
CAAAGUGCUUACAGUGCAGGUAGUUAAAGUGCUUAUAGUGCAGGUAG-AAAAGUGCUUACAGUGCAGGUAGCUAAAGUGCUGACAGUGCAGAU----AAAGUGCUGUUCGUGCAGGUAG-UAAGGUGCAUCUAGUGCAGAUA--
miR-17-5p :miR-20 :miR-106a :miR-106b :miR-93 :miR-18 :
UGUGCAAAUCUAUGCAAAACUGA-UGUGCAAAUCCAUGCAAAACUGA-UGUGCAAAUCCAUGCAAAACUGA-
miR-19a :miR-19b-1 :miR-19b-2 :
Optimizing smallRNA mapping
Refining the reference set
Optimizing the matching strategy
Optimizing the matching lengths
Optimizing the mismatches
0
20
40
60
80
100
120
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Length of tag when matching
Num
ber o
f mat
ches
Optimizing smallRNA mapping
Refining the reference set
Optimizing the matching strategy
Optimizing the matching lengths
Optimizing the mismatches
Filter spurious mappings
Comparisons withother platforms
r = 0.81 r = 0.80
Recursive or“vector stripping”
Start
1
2
ReadConfiguration File
tag aligned?
Decode Barcodes
Custom LibraryAlignment
Trim Tag
End
3 Count miRNAs
Yes
No
miRNA-MATEv1.0
4IdentifyAdaptor
5
tag aligned?
Custom LibraryAlignment
Discard TagNo
6Translate tobase-space
Yes
7SummarizeisomiR usage
8Create SequenceLogos
End
miRNA-MATEv1.0(recursive output)
Reference sequences
• miR and miR* sequences• miRBase (http://microrna.sanger.ac.uk/)
• The “dominant” miRNA appearing in the databases is determined to be the “functional” miRNA, and the other strand is “a non-functional by product”.
(Junk RNA – sound familiar?)
• The miR and miR* sequences can change
Recursive or“vector stripping”
Start
1
2
ReadConfiguration File
tag aligned?
Decode Barcodes
Custom LibraryAlignment
Trim Tag
End
3 Count miRNAs
Yes
No
miRNA-MATEv1.0
4IdentifyAdaptor
5
tag aligned?
Custom LibraryAlignment
Discard TagNo
6Translate tobase-space
Yes
7SummarizeisomiR usage
8Create SequenceLogos
End
Adaptor Identification
T010202100202312312333020XXXXXXXXXXXXXX
“adaptor sequence”
transition base (cleaved)
SREK captured small RNA
transition base
33020XXXXXXXXXXXXXX| | | | | | | | | | | |
Tags are matched against a referenceset of miRNAs that are not ambiguous.
Correlation withrecursive mapping
r = 0.94
miRNA-MATEv1.0(isomiR output)
Tissue specific isomiRism
Brain
Ovary
has-miR-181
Could be important to know about this for qRT-PCR validation
Changes in the startsite could change the“seed” region.
Presentation Outline
What is a transcriptome?
What can we learn from
studying it?
Introduction
Genomic tools for
transcriptomics.
Deriving biological
insight from transcriptomics.
Transcriptomics
What’s old is new again.
Double stranded protocols.
Strand specific protocols.
Sequencing the
transcriptome
Mapping and quantitation.
Genomic context of gene
expression.
SNPs, exon-junctions, novel
genes.
Working withRNA data
The problem of limited
information content.
Known and novel
expression.
IsomiRs.
Working withmiRNA data
Conclusions
Field is in its infancy, not all challenges have been solved. We need more mathematical and statistical input!
RNAseq is a powerful way to increase the sensitivity and usefulness of global gene expression surveys.
Be cautious with your analysis. Think and plan your analysis before you get into the lab.
I want a Nature paper
=+ ≠Rubbish in, rubbish out.
Medical Genomics
The End
Expression GenomicsLaboratoryhttp://grimmond.imb.uq.edu.au