gene expression analysis and transcriptomics daniel hurley

42
Gene expression analysis and transcriptomics Daniel Hurley

Upload: gladys-flowers

Post on 30-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Slide 1

Gene expression analysisand transcriptomics

Daniel Hurley

What are we going to talk about?Understanding the core principles and root hypothesis of transcriptomicsChoosing between different technologiesHow to design an experimentHow to make sense of the data

Core principles: TranscriptomicsTranscriptomics is the study of the nature and abundance of transcribed elements in a population of cells or a tissueTranscribed elements are:mRNAtRNArRNABut alsoncRNAs (non-coding RNAs)miRNAsiRNApiRNAsnoRNAAnd many more being discoveredCore principles: Root hypothesisSummarising in one statement:The central dogma suggests that the abundance of transcribed elements affects cell behaviour and tissue function. Therefore, we hypothesise that comparing the abundance of transcribed elements between different conditions can tell us something about cell behaviour and tissue function in those conditions.The central dogma suggests that the abundance of mRNA affects protein activity. Therefore, we hypothesise that comparing the abundance of mRNA between different conditions can tell us something about protein activity in those conditions.

ButCore principles: the omics partThis isnt very omic yetThe central dogma suggests that the abundance of mRNA affects protein activity. Therefore, we hypothesise that comparing the abundance of mRNA between different conditions can tell us something about protein activity in those conditions.

So the omics part is about large-scale measurements, and exploratory hypotheses Core principles: what can you do with it?Some answers:Ask questions about relationships between specific genes Learn the transcriptomic signature of a condition Classify conditions according to their signature (e.g. disease)Make functional hypotheses about an uncharacterised gene Identify potential drug targetsTechnology: different types of dataRealtime RT-PCR still regarded as gold standard by many. Ubiquitous, but labour-intensive, and not really transcriptome-scaleMicroarrays revolutionary in the 1990s, driving an explosion in bioinformatics. Use has plateaued but still common RNAseq generating count data from high-throughput sequencing of the transcriptome. Perceived as the method of the future (next slides)

Transcriptomic data can be gathered using a number of methods:Pretty much everything here also applies to quantitative proteomics and to some extent metabolomics, although we will not discuss them in depth

Technology: Microarrays vs. RNAseq (1)Microarrays = oldRNAseq = the new hotnessBUT its not that simpleIts easy to think that:

MicroarraysRNAseqMeasure mRNA and ncRNA and everything elseTechnology: Microarrays vs. RNAseq (2)Microarrays

RNAseqOnly detect transcripts for which there are probes on the arrayMeasure every assembled transcriptNeed to be made specific to a species or condition (e.g. human, mouse, tobacco)Similar experimental protocol for every sample type Generally detect only one type of RNA (e.g. mRNA OR miRNA)Dynamic range said to be less than RNAseqDynamic range said to be greater than microarraysMature technology: known, reliable ways to analyse data. No argumentsNew technology: no-one is really sure how to analyse the data. Lots of argumentsGenerally do not detect alternative splicingDetect alternative splicingCosts a lotAlso costs a lot (but getting rapidly cheaper)

10Technology: Microarrays vs. RNAseq (3)So when should you use one or the other for a gene expression experiment?Availability: If you have a well-characterised and popular organism (human, E. coli, mouse, rat, fruit fly, various plant species) for which a commercial microarray exists, its an option. Otherwise its RNAseqBreadth: If you are think alternative splicing, or ncRNA are important in your biological process, then RNAseq might be a better choiceCost: Per-sample, microarrays are lower-cost than RNAseq for human work. For organisms with a smaller transcriptome, the difference is less clearComplexity: If you dont want to spend a lot of time (= money) on difficult normalisation and bioinformatics decisions, microarrays may be a better choice. RNAseq bioinformatics is still very new (~20 competing R packages doing the same job!)Futureproofing: If you really want to compare this data with future data, RNAseq is likely to be around longer. On the other hand, there is a huge amount of published microarray data to which you may be able to compare results.

Design: how do you do it?Most important, you need a clear and coherent design document. This is important because the cost of repeating experiments is high, and because the data can be bewilderingAsk yourself what is the experimental question I am asking? Good examples:Which transcripts could be differentially expressed between control and treated samples across all replicates, correcting for variance between replicates?Which transcripts could be differentially expressed between each of the 3 combinations of two tissues across all patients, correcting for inter-patient variance?Are there transcripts which represent the tissue type? That is, transcripts which are more similar across patients than we see between different samples from the same patient?A good experimental design document:Sets out the experimental question Defines the conditions that will be comparedDefines the types of comparisons that will be done between conditions (e.g. pairwise comparisons looking for differences, or a before-and-after paired analysis)

Design: variance and replicationHow many replicates is enough?The short answer is it depends On your estimate of effect strengthOn the signal-to-noise ratio of the detectorOn the amount of variance within conditions

Three observations is the minimum to define a distributionChoose your replication strategy to capture the variance that interests youand correct for the variance that doesntWhats a condition?What do I mean when I say a condition in an experimental sense?I mean any state of interest in which we can observe a cell population, tissue or organism.Examples:MTX sensitive cancer cell lineMTX resistant cancer cell lineHeLa cells 24h post-transfection with siRNA against BRCA1 HeLa cells 48h post-transfection with siRNA against BRCA1Patient A with melanomaPatient B with melanomaKnockout mouse without Gene XWild-type mouse with Gene X Design: variance and p-valuesPrecise interpretation of a p-value is complex

But its uncontroversial (I think!) to say that its a proxy measure of the weight of evidence against a null hypothesis

Multiple testing hypothesis problem: we are more likely to see what looks like an interesting result due to chance alone

Can correct for this using false discovery rate assessment and controlThe more variance within a condition

The less convincing a result (= less evidence against the null hypothesis)P-values capture this intuition in a numeric and rankable form. Data: handling the outputSimple fold-change; not recommended why?LIMMA (R package) is the benchmark for microarraysA raft of packages for RNAseq: EdgeR, deSeq the most common. Youve done an experiment, and you get a big bunch of data files. Then what?Analysis approachesData: what do the results look like? (1)Output from a typical differential expression transcriptomic experiment might look something like this:

Note the sorting, colour-coding and annotationModel parametersHypothesis strength dataTranscript informationData: when do you believe the results?Skeptical Hippo says multiple hypothesis testing is very importantT-tests work fine for realtime RT-PCR, or a chi-square test, or Fishers Exact TestThe LIMMA package for microarrays incorporates more sophisticated approaches for modelling difference and adjusting for multiple hypothesis testingThe Bonferroni correction is often too conservativeBenjamini-Hochberg FDR is a pragmatic approach for exploratory bioinformaticsAgain, various ways of doing this in RNAseq data, but no one clear approach or piece of software

Data: what do the results look like? (2)If we zoom in on an individual transcript, it might look like this:

But not everything is differentially expressed!

We can get high-altitude views of the data by using:

Data: what do the results look like? (3)Each tool represents the data in a different way, and all tell us something important. 20

Data: ranking heatmaps21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40Core principles: what can you do with it?Some answers:Ask questions about relationships between specific genes Learn the transcriptomic signature of a condition Classify conditions according to their signature (e.g. disease)Make functional hypotheses about an uncharacterised gene Identify potential drug targetsSummary: what to doSpend time clearly defining your experimental question in transcriptomic termsGet advice on technology, experimental design, and research outputsChoose replication and conditions which capture the variance that interests you, and corrects for the variance which doesntBe conservative about the number of different questions you ask at once; consider pilot and follow-up experimentsKeep returning to your data. Actively look for as many ways as possible to visualise similarity and difference within the data

FinAny questions??

Transcript A

X

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Experiment

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Experiment

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Control

Experiment

Experiment

Transcript A

Transcript B

Control

Experiment

Convincing

Control

Experiment

Not as convincing

Control

Control

Experiment

Experiment

Transcript A

Transcript B