hybridization design for 2-channel microarray experiments naomi s. altman, pennsylvania state...

Post on 06-Jan-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Outline Designing a Microarray Study Designing a Microarray Study Reference Design Reference Design Loop Designs Loop Designs Replication Replication Optimal Design/Analysis Optimal Design/Analysis Incorporating Multiple Factors and Blocks Incorporating Multiple Factors and Blocks

TRANSCRIPT

Hybridization Hybridization Design for 2-Design for 2-

Channel Channel Microarray Microarray

Experiments Experiments Naomi S. Altman, Naomi S. Altman,

Pennsylvania State University), Pennsylvania State University), naomi@stat.psu.edunaomi@stat.psu.edu

NSF_RCN Meetings 04NSF_RCN Meetings 04

Expt Design and Expt Design and MicroarraysMicroarrays

Microarrays are Microarrays are ExpensiveExpensive NoisyNoisy

A perfect situation for optimal A perfect situation for optimal designdesign

OutlineOutline Designing a Microarray StudyDesigning a Microarray Study Reference DesignReference Design Loop DesignsLoop Designs ReplicationReplication Optimal Design/AnalysisOptimal Design/Analysis Incorporating Multiple Factors Incorporating Multiple Factors

and Blocksand Blocks

Designing a Microarray Designing a Microarray ExperimentExperiment

Define objectivesDefine objectives Determine factors and treatmentsDetermine factors and treatments Determine appropriate analysis Determine appropriate analysis

methodmethod Determine sample design (biological Determine sample design (biological

and technical replication)and technical replication) Determine platform Determine platform Design spots for custom arraysDesign spots for custom arrays Determine hybridization pairsDetermine hybridization pairs Perform experimentPerform experiment

Designing a Microarray Designing a Microarray ExperimentExperiment

Define objectivesDefine objectives Determine factors and treatmentsDetermine factors and treatments Determine appropriate analysis methodDetermine appropriate analysis method Determine sample design (biological Determine sample design (biological

and technical replication)and technical replication) Determine platform Determine platform Design spots for custom arraysDesign spots for custom arrays Determine hybridization pairs Determine hybridization pairs ←← Perform experimentPerform experiment

Arrow NotationArrow Notation

Introduced by Kerr and Churchill Introduced by Kerr and Churchill (2001)(2001)

Each array is represented by an arrow.Each array is represented by an arrow.

Red Green

Reference DesignReference Design

Reference

A

BC

D4 arrays

1 sample/treatment

4 reference samples

Loop DesignLoop Design(Kerr and Churchill 2001)(Kerr and Churchill 2001)

A

C

B

D

4 arrays

2 samples/treatment

ReplicationReplicationOften there is confusion among:Often there is confusion among:

Biological replicatesBiological replicates

Technical replicatesTechnical replicatesrepeated samplesrepeated samplessplit sample and relabelsplit sample and relabelspot replicationspot replication

In this presentation: We consider only In this presentation: We consider only one spot/gene/arrayone spot/gene/arrayany technical replicates are averagedany technical replicates are averagedeach sample is an each sample is an independent biological replicateindependent biological replicate

Linear Mixed Model for Linear Mixed Model for Microarray DataMicroarray Data

is the response of the gene in one channelis the response of the gene in one channel

is the mean response of the gene over all is the mean response of the gene over all treatments, channels, arraystreatments, channels, arrays

is the effect of treatment iis the effect of treatment i the effect of dye jthe effect of dye j

is the effect of the array k (or spot on the array)is the effect of the array k (or spot on the array)

is the random deviation from the other effects is the random deviation from the other effects and includes biological variation, technical and includes biological variation, technical variation and random errorvariation and random error

ijkkjiijkY

ji

ijkY

ijkk

Linear Mixed Model for Linear Mixed Model for Microarray DataMicroarray Data

The 2 channels on a single spot are correlatedThe 2 channels on a single spot are correlated→ → array should be treated as a random effectarray should be treated as a random effect

ijkkjiijkY

Differencing Channels on Differencing Channels on an Arrayan ArrayOften the difference between samples on Often the difference between samples on

a single array is the unit of analysis:a single array is the unit of analysis:

rGkiRkktir YY )).((

Normalization is almost always done on this quantity.

In a reference design, the difference between treatments A and B can be estimated from 2 arrays by

)).(()).((ˆˆ

luBrktArBA

But there can be a large loss of information.

Var()=0.126 Var(M)=0.453

)).(( ktAr

Drosophila arrays courtesy of

Bryce MacIver, PSU

Reference DesignReference DesignThe reference sample is the same biological The reference sample is the same biological

material on every arraymaterial on every array

T treatments, T treatments, k replicates,k replicates, kT arrayskT arrays

If there are technical dye-swaps, these are If there are technical dye-swaps, these are averaged to form 1 replicate.averaged to form 1 replicate.

If all comparisons are between treatments, If all comparisons are between treatments, there is no need to dye-swap. If there are there is no need to dye-swap. If there are dye-swaps, these should be balanced by dye-swaps, these should be balanced by treatment.treatment.

Reference Design – Usual Reference Design – Usual AnalysisAnalysis

Usually the analysis is done on Usually the analysis is done on E.g.E.g.

).()().()(ˆˆ

BrArBA

24

and with k replicates, the variance of the estimated difference is k/4 2

Using the linear mixed model, we see that the variance of one pair is

The optimal w is

The resulting variance for a single replicate is

and with k replicates, the variance of the estimated difference is

Reference Design – Optimal Reference Design – Optimal WeightsWeights

Consider using Consider using

ThenThen )).(()).((ˆˆ luBrw

ktArwBA

rGkiRkktirw wYY )).((

)/( 222

)/(24 2242 )/(2 224

)(/2/4 2242 kk )(/2 224 k

)/(24 2242min Var 22222 /22

)/( 222 optw

Reference Design – Optimal Reference Design – Optimal WeightsWeights

We do not know the optimal weights but

if we use mixed model ANOVA such as those available in SAS, Splus or R, the weights are approximated from the data – leading to more efficient computations.

Loop DesignsLoop Designs

A

C

B

D

A loop is balanced for dye effects and has two replicates at each node.

T treatments, 2k replicates, Tk arrays

Recall: for a reference design we get only k replicates on Tk arrays

Using optimal weighting

Var(A-B)=Var(A-D) =

Var(A-C)=

Both are smaller than the variance of the reference design with 4 arrays

Loop Designs T=4, 4 Loop Designs T=4, 4 arraysarrays

22222 2/ A

C

B

D

22222 /

22222 /22

Loop Designs T=4Loop Designs T=4

A

C

B

D

A

B

C

D

A

D

B

C

Design L4C Design L4B Design L4D

Loop Design – 3 loops = 6 replicates/treatments

3* L4C Var(A-B)=

Var(A-C)=

L4B+L4C+L4D

Var(difference) =

T=4, 12 arraysT=4, 12 arrays

22222 6/3/

Reference Design – 3 replicates/treatment

Var(difference) = )(3/23/2 22222

22222 3/3/

22222 343/23/

Loop Design – 3 loops = 6 replicates/treatments

3* L4C Var(A-B)= 0.46

Var(A-C)= 0.58

L4B+L4C+L4D

Var(difference) = 0.47

T=4, 12 arraysT=4, 12 arraysAssuming Assuming

Reference Design – 3 replicates/treatment

Var(difference) = 0.83

3/ 22

2

22

2

An 8 Treatment ExampleAn 8 Treatment ExampleA

C

B

DG

F E

H

An 8 Treatment ExampleAn 8 Treatment ExampleA

C

B

DG

F E

H2 Complete Blocks

An 8 Treatment ExampleAn 8 Treatment ExampleA

C

B

DG

F E

H

Replication:

Yellow loop?

Red “loop”?

Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop

GT

gt

gT

Gt

GT

gT

gt

Gt

2

22

821

yy

2

22

821

yy

2

22

821

yy

2

22

821

yy

Which Arrangement is Better?

Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop

The contrasts of interest can be written (in terms of the means – not the observations)½(A+B)-½ (C+D)½(A+D)-½ (B+C)½(A+C)-½ (B+D)

A

C

B

D

Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop

The optimal variances are:

½(A+B)-½ (C+D) ½(A+D)-½ (B+C)

½(A+C)-½ (B+D)

A

C

B

D

2

42

82 y

y

42

22

y

Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop

GT

gt

gT

Gt

GT

gT

gt

Gt

2

22

821

yy

2

22

821

yy

2

22

821

yy

2

22

821

yy

Best arrangement for estimating interaction

Best arrangement for estimating time main effect

And now for the rest of And now for the rest of the storythe story

Missing arrays – Missing arrays – not fatal but not fatal but reduce reduce efficiencyefficiency

Added Added treatmentstreatments

A

C

B

D

A

C

B

D

E

And now for the rest of And now for the rest of the storythe story

Missing arrays – Missing arrays – not fatal but not fatal but reduce reduce efficiencyefficiency

Added Added treatmentstreatments

A

C

B

D

A

C

B

D

E

Optimal Design?Optimal Design? The loop design has not been shown to The loop design has not been shown to

be optimalbe optimal There are lots of other BIBDs for 2 There are lots of other BIBDs for 2

samples/blocksamples/block General BIBDs can be adapted as more General BIBDs can be adapted as more

channels become availablechannels become available Loop designs are particularly Loop designs are particularly

appealing due to the dye balance and appealing due to the dye balance and graphical representationgraphical representation

The Moral of the StoryThe Moral of the Story Loop designs are very efficientLoop designs are very efficient

Can incorporate factorial arrangementsCan incorporate factorial arrangements Can incorporate blocksCan incorporate blocks Can be replicated in various ways to Can be replicated in various ways to

improve efficiencyimprove efficiency Optimal design ideas can help Optimal design ideas can help

determine which BIBD to usedetermine which BIBD to use ANOVA-type analyses on the ANOVA-type analyses on the

individual channels – not differencing individual channels – not differencing – should be used for analysis.– should be used for analysis.

ReferencesReferences Kerr and Churchill (2001), Kerr and Churchill (2001),

Experimental design for gene Experimental design for gene expression microarrays, Biostatistics, expression microarrays, Biostatistics, 2:183-201. 2:183-201.

Kerr (2003) Design Considerations for Kerr (2003) Design Considerations for efficient and effective microarray efficient and effective microarray studies, Biometrics, 59: 822-828.studies, Biometrics, 59: 822-828.

Yang and Speed (2002) Design Issues Yang and Speed (2002) Design Issues for cDNA Microarray Experiments for cDNA Microarray Experiments Nature Reviews Genetics 3, 579 -588.Nature Reviews Genetics 3, 579 -588.

C2

B2

A1

C1

B1

A2

top related