design of microarray gene expression profiling experiments

52
Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen

Upload: lou

Post on 12-Jan-2016

71 views

Category:

Documents


4 download

DESCRIPTION

Design of microarray gene expression profiling experiments. Peter-Bram ’ t Hoen. Lay-out. Practical considerations Pooling Randomization One-color vs Two-colors Two-color hybridization designs Ratio-based vs Intensity-based analysis. Think before you start. research question - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design of microarray gene expression profiling experiments

Design of microarray gene expression profiling experiments

Peter-Bram ’t Hoen

Page 2: Design of microarray gene expression profiling experiments

2

Lay-out

• Practical considerations

• Pooling

• Randomization

• One-color vs Two-colors

• Two-color hybridization designs

• Ratio-based vs Intensity-based analysis

Page 3: Design of microarray gene expression profiling experiments

3

Think before you start

• research question

• choice of technology

• controls and replicates

Ref: Churchill. 2002. Nature Genetics Supplement 32: 490-495

Page 4: Design of microarray gene expression profiling experiments

4

Research question

• Limit your (initial) number of question / conditions

• choose best timepoint for mRNA regulation

• can be different from protein/activity

• pilots using RT-qPCR

• experimental follow-up

• what will you do with the data?

• verification of differential gene expression

• in vitro experiments to study mechanism

• "in vivo" verification in tissue sections

Page 5: Design of microarray gene expression profiling experiments

5

Choice of technology

• What is affordable?

• Do a pilot to estimate the variance for your samples,

experimental set-up and platform

• Calculate your power: What is the lower border of the effect

size that you can pick up?

Page 6: Design of microarray gene expression profiling experiments

6

Controls

• positive: genes whose regulation is known

• check on biological experiment & data analysis

• positive: spikes in mRNA and/or hyb mix

• check labeling procedure and hybridization

• detection range (sensitivity) and dynamic range

• "landing lights" for gridding software

• negative controls: non-specific binding

• check cross-hybridization: buffer, non-homologous DNA

Page 7: Design of microarray gene expression profiling experiments

7

Spikes

RCA Cab rbcL LTP4 LTP6

Spiked 2-fold change(copies/cell)

21

105

6030

100 50

300150

XCP2 RPC1 NAC1 TIM PRK31

155

6020

150 50

300100

Spiked 3-fold change(copies/cell)

TestRNA

ReferenceRNA

spike

………………………………

………………………………

………………………………

………………………………

………………………………

………………………………

………………………………

………………………………

………………………………

………………………………

………………………………

…… …… …… …… …… ……Array

containingDNA controls

………………………………

cDNA probe synth. & hybridize

Page 8: Design of microarray gene expression profiling experiments

8

Spikes

Van de Peppel et al. EMBO Reports 4, 387 (2003)

Page 9: Design of microarray gene expression profiling experiments

9

Controls

• positive: genes whose regulation is known

• check on biological experiment & data analysis

• positive: spikes in mRNA and/or hyb mix

• check labeling procedure and hybridization

• detection range (sensitivity) and dynamic range

• "landing lights" for gridding software

• negative controls: non-specific binding

• check cross-hybridization: buffer, non-homologous DNA

Page 10: Design of microarray gene expression profiling experiments

10

Replicates

• Include sufficient replicates, based on pilot experiment

• Biological replicates are preferred over technical replicates

• Control experimental variables with possible unintended

effects

• genetic background

• gender

• age

Page 11: Design of microarray gene expression profiling experiments

11

Randomization

• Randomize samples with respect to experimental influences

• experimenter

• day of hybridization

• batch of arrays

• dye

• etc

Page 12: Design of microarray gene expression profiling experiments

12

Pooling

• Often done because of lack of sufficient amounts of RNA, but

good amplification protocols are available

• Advantages:

• dampening of individual variation, may increase statistical power

• Generally not recommended:

• outliers in the population may result in large and significant

effects

• information on the differences in the population is lost and is

probably biologically relevant

• in fact, it is an artificial way to increase the significance of your

findings

Page 13: Design of microarray gene expression profiling experiments

13

Hybridization design

• One color: not many difficulties expected

• Two color: what to hybridize with what in which color?

• Reference design

• Paired design

• Loop design

• Mixed design

Read: Yang & Speed (2002). Design issues for cDNA

microarray experiments. Nature Reviews Genetics 3, 579-588

Page 14: Design of microarray gene expression profiling experiments

14

Hybridization design: general issues

• Comparisons on the same array are more precise than

comparisons on different arrays

• Identify most important comparisons

• Hybridize those on the same slide

• Dye swap

• A dye-effect is always there

• Balance designs with respect to dye (exception: some common

reference designs)

Page 15: Design of microarray gene expression profiling experiments

15

Common reference vs direct hybridizations

• Direct

• Common reference

AA BB

AA

BB

RR

Variance[ log(A/B) ] for slide = sVariance[ log(A/B) ] for slide = s22

then the variance of the then the variance of the averageaverage of the of the twotwo measurements is measurements is

ss22 /2 /2

log(A/B) = log(A/R) – log(B/R)

and variance of log(A/B) is

variance[ log(A/R) ] + variance[ log(B/R) ]

= s2 + s2 = 2 s2

Page 16: Design of microarray gene expression profiling experiments

16

More samples

• Loop Reference

6 arrays

AA

BB RR

CC

AA

BB

CC

Log (A/B) = Log (A/B) = 2/32/3 log (A/B) + log (A/B) + 1/31/3 {log (A/C) – log (B/C)} {log (A/C) – log (B/C)}

Assuming that all variances are equalAssuming that all variances are equalVariance [ log(A/B) ] = Variance [ log(A/B) ] = 4/94/9 (s (s2 2 / 2) + / 2) + 1/91/9 (s (s22) = ) = 1/31/3 s s22

Variance [ log(A/B) ] = Variance [ log(A/B) ] = Variance [ log(A/C) ] = Variance [ log(A/C) ] = Variance [ log(B/C) ] = Variance [ log(B/C) ] = 0.5s0.5s22 + 0.5s + 0.5s22 = s = s22

Page 17: Design of microarray gene expression profiling experiments

17

Common reference vs direct hybridizations

Theoretical Considerations

• A design is optimal when it minimizes the variance of the effect of interest

• Look for designs leading to small variance of log(A/B)

Practical considerations

• Common reference may be desired when experiment is extended in the future or when a lot of different conditions have to be compared

• Choose a biologically relevant common reference (say: your control sample). In that case, your ratios are of interest and better interpretable

Page 18: Design of microarray gene expression profiling experiments

18

Time-course designs

Take 4 time points

T1 T2 T3 T4

The best choice of design depends on the comparisons of

interest and on the number of slides available

Page 19: Design of microarray gene expression profiling experiments

19

Time-course designs

Using 3 slides:

T1 T2 T3 T4

which is the best to estimate changes relative to the initial

time point: T2 / T1, T3 / T1, T4 / T1

Page 20: Design of microarray gene expression profiling experiments

20

Time-course designs

• Using 3 slides:

T1 T2 T3 T4

which is the best to estimate relative changes between

successive time points: T2 / T1, T3 / T2, T4 / T3

Page 21: Design of microarray gene expression profiling experiments

21

Time course designs

• Using 4 slides:

T1 T2 T3 T4

R

which is the reference design;

All comparisons have equal precision

Page 22: Design of microarray gene expression profiling experiments

22

Time course design

• Using 4 slides:

T1 T2 T3 T4

which is the loop design, balanced wrt dye

Distant comparisons have lower precision

Page 23: Design of microarray gene expression profiling experiments

23

Time course designs

• Using 4 slides:

T1 T2 T3 T4

also uses exactly 2 hybridizations per treatment,

balanced wrt dye.

Most precise estimates: 1/2, 1/3, 2/4, 3/4

Page 24: Design of microarray gene expression profiling experiments

24

Factorial designs

• Designs for studies which involve factors as explanatory

variables

• Age group

• gender

• Cell line

• Tumor types

Page 25: Design of microarray gene expression profiling experiments

25

Factorial designs

Glonek & Solomon (2004)

• Admissible design: using the same number of arrays, there

are no other designs yielding smaller variances of all

parameters

Glonek et al.Biostatistics 5, 89-111 (2004)

Page 26: Design of microarray gene expression profiling experiments

26

Factorial design; example

• Time

• 0h

• 24h

• Cell lines

• I (non-leukaemic)

• II (leukaemic)

• Find genes diff. expressed at 24 but not at 0: interaction

between time and cell line

Page 27: Design of microarray gene expression profiling experiments

27

Factorial design; possible samples

• All combinations of factor levels. In this case, 4 are possible:

Time0 24

cell line I I,0 I,24II II,0 II,24

Page 28: Design of microarray gene expression profiling experiments

28

Factorial design: analysis model

• (log-)linear model is used

• experimental conditions correspond to parameter

combinations as in:

I,0 II,0 I,24 II,24

Page 29: Design of microarray gene expression profiling experiments

29

Factorial design; possible arrays

I,0 I,24

II,0 II,24

(1)

(2)

(3)

(4)

(5)

(6)

Page 30: Design of microarray gene expression profiling experiments

30

Optimal admissible design

• Designs that are not worse than others, and for which the

variance of the parameter of interest is (one of the) smallest

• In the example: wish to find admissible designs for which the

interaction term has one of the smallest variances

Page 31: Design of microarray gene expression profiling experiments

31 Glonek et al.Biostatistics 5, 89-111 (2004)

Page 32: Design of microarray gene expression profiling experiments

32

Optimal admissible design

Glonek et al.Biostatistics 5, 89-111 (2004)

Page 33: Design of microarray gene expression profiling experiments

33

Factorial designs: conclusions

• Design with all pairwise comparisons is not the best in this

case

• Best design can only be found with respect to a model

• if model does not fit the data well, design choice may not be the

best

• make sure model chosen is adequate

Page 34: Design of microarray gene expression profiling experiments

34

How to compare efficiently many different conditions?

• Common reference: not efficient

• Loop and mixed designs: not all

comparisons have equal precisions

GA Churchill, Nat Genet. 2002 Dec;32 Suppl:490-5

Page 35: Design of microarray gene expression profiling experiments

35

Possible solution

• Randomized design

• Intensity-based rather than ratio-based

calculations

Requires:• Hybridization of two samples independent; no competition for binding sites• Absence of large spot and array effects

To be tested for each platform

Page 36: Design of microarray gene expression profiling experiments

36

Our favourite platform

• Spotted collection of 65-mer oligonucleotides (Sigma-

Compugen collection)

• 22K

Page 37: Design of microarray gene expression profiling experiments

37

Design used to demonstrate independent hyb

‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

Page 38: Design of microarray gene expression profiling experiments

38

Distribution of signal intensities is similar

‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

Page 39: Design of microarray gene expression profiling experiments

39

Correlation of intensities is high

‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

R > 0.950.90 < R < 0.95R < 0.90

Page 40: Design of microarray gene expression profiling experiments

40

Effect of addition of unlabelled target

Single target on microarray

Tw

o ta

rget

s o

n m

icro

arr

ay

‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

Page 41: Design of microarray gene expression profiling experiments

41

Correlation of ratios calculated from different hyb designs

‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

Page 42: Design of microarray gene expression profiling experiments

42

Intensity-based analysis

• Hybridizations of two targets on the array are independent

• No saturation and no competition

• Intensity readings show high inter-array correlation

• Comparisons on the same array have highest precision and

all other comparisons have equal precision

‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

Page 43: Design of microarray gene expression profiling experiments

43

Example of randomized design

Turk et al. FASEB J 20, 127-129 (2006)

• Mouse models for muscular dystrophy

Histopathological parameters at 8 weeks

Disease Model Affected gene Age of onset Skeletal dystrophy Inflammation Central nuclei DGC loss SGC loss Reference

DMD mdx Dystrophin 2-3 wks Severe + + y y [43]

DMD mdx3cv Dystrophin 2-3 wks Severe + + y y [25]

LGMD2D Sgca-null alpha-Sarcoglycan 1 wk Severe + + n highly reduced [26]

LGMD2E Sgcb-null beta-Sarcoglycan at least 4 wks Severe + + n y [27,28]

LGMD2C Sgcg-null gamma-Sarcoglycan 2 wks Severe + + n highly reduced [29]

LGMD2F Sgcd-null delta-Sarcoglycan 2 wks Severe + + n highly reduced [30]

LGMD2B Dysf-null Dysferlin 8 wks Mild/moderate - +/- n n [11]

LGMD2B SjlDysfDysferlin 3 wks Mild/moderate n/a +/- n/a n/a [24]

not known Sspn-null Sarcospan None None - - n n [31]

Histopathological parameters at 8 weeks

Disease Model Affected gene Age of onset Skeletal dystrophy Inflammation Central nuclei DGC loss SGC loss Reference

DMD mdx Dystrophin 2-3 wks Severe + + y y [43]

DMD mdx3cv Dystrophin 2-3 wks Severe + + y y [25]

LGMD2D Sgca-null alpha-Sarcoglycan 1 wk Severe + + n highly reduced [26]

LGMD2E Sgcb-null beta-Sarcoglycan at least 4 wks Severe + + n y [27,28]

LGMD2C Sgcg-null gamma-Sarcoglycan 2 wks Severe + + n highly reduced [29]

LGMD2F Sgcd-null delta-Sarcoglycan 2 wks Severe + + n highly reduced [30]

LGMD2B Dysf-null Dysferlin 8 wks Mild/moderate - +/- n n [11]

LGMD2B SjlDysfDysferlin 3 wks Mild/moderate n/a +/- n/a n/a [24]

not known Sspn-null Sarcospan None None - - n n [31]

Page 44: Design of microarray gene expression profiling experiments

44

Our design

• Randomly assign samples to

the arrays, avoiding co-

hybridization of sample from

the same group

• 2 biological replicates

• 4 technical replicates (dye-

swap + replicate spotting)

HYB ID Cy3 Cy5

Model Individual Model IndividualMD1 3cv B Bl10 B

MD2 Sgcb2 B Mdx B

MD3 Bl10 A Sgcb2 A

MD4 Sspn B 3cv A

MD5 Bl6 B Bl10 A

MD6 Sgcb A dysf A

MD7 Sgcd B Sgcb A

MD8 Sgcg A Sspn A

MD9 Mdx A 3cv B

MD10 Bl6 A Mdx A

MD11 Sjl B Bl6 A

MD12 Sgca A Bl6 B

MD13 3cv A hDMD B

MD14 Mdx B Sgca A

MD15 Bl10 B Sgcg B

MD16 Sspn A dysf B

MD17 Sjl A Sgca B

MD18 Sgcb2 A Sgcd B

MD19 dysf A Sgcg A

MD20 Sgcd A Sgcb2 B

MD21 Sgca B Sgcd A

MD22 dysf B hDMD A

MD23 Sgcg B Sjl A

MD24 Sgcb B Sspn B

MD25 hDMD A Sgcb B

MD26 hDMD B Sjl B

Turk et al. FASEB J 20, 127-129 (2006)

Page 45: Design of microarray gene expression profiling experiments

45

Intensity-based analysis can go wrong

Vinciotti et al. Bioinformatics 21:492-501 (2005)

Page 46: Design of microarray gene expression profiling experiments

46

Intensity-based analysis can go wrong

Vinciotti et al. Bioinformatics 21:492-501 (2005)

Page 47: Design of microarray gene expression profiling experiments

47

Some guidelines

• First determine the main question, pointing out the effect of

interest

• log[A/B]

• Then choose analysis model, so that effect variance can

be computed

• VAR { log[A/B] }

• Practical constraints: amount of RNA available, number of

hybridizations, number of slides

• A good design measures the effect of interest as

accurately as possible

• small VAR { log[A/B] }

Page 48: Design of microarray gene expression profiling experiments

48

Some useful links

• http://dial.liacs.nl/Courses/CMSB%20Courses.html

• http://www.brc.dcs.gla.ac.uk/~rb106x/microarray_tips.htm

• http://exgen.ma.umist.ac.uk/course/notes/WitDesignLecture.pdf

• http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jsp

Page 49: Design of microarray gene expression profiling experiments

49

Acknowledgements

Human and Clinical Genetics, LUMCJudith BoerRenée de MenezesRolf TurkEllen SterrenburgJohan den DunnenGertjan van Ommen

Microarray facility:Leiden Genome Technology Center

Page 50: Design of microarray gene expression profiling experiments

50

Case study

• Two genetically-modified zebrafish strains and one wild-type

• Defects mainly in muscle development

• Apparent at 12-48 hours of development; early death

• Question: which biological pathways are affected and

responsible for defective myogenesis?

Page 51: Design of microarray gene expression profiling experiments

51

Possible platforms and budget

• Affymetrix (1-color): 500 euro per chip;

• variance for ratio of two samples on two chips: s2

• Homespotted arrays (2-color): 100 euro per chip

• variance for ratio of two samples on one chip: 2s2

• Budget: 12,000 euro

Page 52: Design of microarray gene expression profiling experiments

52

Questions

• Isolation of specific compartments / whole animal lysates?

• Pooling?

• How many replicates?

• Which hybridization design?

• What is the variance of the most important comparisons?