analysing time-course data kathyruggiero...

47
Department of Statistics Analysing correlated time-course data Analysing correlated time-course data 9 July 2015 Kathy Ruggiero and Kevin Chang

Upload: others

Post on 06-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Department of Statistics

Analysing correlated time-course data

Analysing correlated time-course data

9 July 2015

Kathy Ruggiero and Kevin Chang

Page 2: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course experiments• Used to investigate temporal changes in a

measured response

Time (minutes)

Page 3: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course experiments

• Important to know how measurements across time are obtained– All samples from a different experimental unit (EU) Independent observations

– Multiple samples from each EU Correlated observations

– Repeated measures experiments: Each EU contributes samples across all of the same time points

– Longitudinal studies: Each EU contributes samples through time, although not necessarilythe same time points

Page 4: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Main focus today

Repeated measures experiments

• Example: Rat lymph metabolomics experiment– Description of the statistical design– Statistical model and analysis for:

1. One time-point2. Time-course, ignoring data are correlated3. Time-course, accounting for correlated repeated measures

Page 5: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Phase 1 part of experiment

• Three surgical conditionings of lymph– Sham– IRI (Ischaemia Reperfusion Injury)– IRI + IPC (Ischaemic Pre-Conditioning)

• Lymph collected at time– 0 (baseline), 10, 30, 60 and 240

Page 6: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Phase 1 part of experiment

• Three surgical conditionings of lymph– Sham– IRI (Ischaemia Reperfusion Injury)– IRI + IPC (Ischaemic Pre-Conditioning)

• Lymph collected at time– 0 (baseline), 10, 30, 60 and 240

Page 7: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Phase 1 part of experiment

Baseline

10 20 80 320

I R

I R I R

Sham

IRI

IPC

I = Ischaemia; R = Reperfusion

n = 18 rats per surgical conditioning (disease group)

Page 8: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Phase 2 part of experiment

• GC-MS metabolomics analysis• 16 samples analysed per day• Statistical design took this “batch effect”

into account• We will ignore this today, but…

It does matter!

Page 9: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

One time-point: 320 minutes

Statistical linear model

response = systematic component + error component

• response = one numerical outcome

• systematic component = describes relationship between experimental conditions (e.g. surgeries) and response– May be described by one, or more, explanatory variables

• error component = variation in response that is not explained by systematic component

Page 10: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

One time-point: 320 minutes

Linear model for hexanoic acid (C6_0)

log(hexacid) = systematic component + error component

• response = log-abundance of hexanoic acid

• systematic component = Disease (surgical conditioning)

• error component = biological variation + measurement error

Page 11: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

One time-point: 320 minutes

What does the raw data look like?

Page 12: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

One time-point: 320 minutes

What about the log-transformed data?

Page 13: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

One time-point: 320 minutes

Linear model for hexanoic acid (C6_0)

log(hexacid) = systematic component + error component

• response = log-abundance of hexanoic acid

• systematic component = Disease (surgical conditioning)

• error component = biological variation + measurement error

Df Sum Sq Mean Sq F value Pr(>F) Disease 2 0.63 0.3151 1 0.375 Residuals 51 16.07 0.3150 

Analysis of Variance (ANOVA)

Page 14: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

One time-point: 320 minutes

ANOVA

• Think of it is a “generalized” two-sample t-test• It assumes:

– Observations are independent of one another– Data are normally distributed– Disease groups have equal variances

• Conclusion: No evidence of a difference in log(hexacid) abundance between surgeries

Df Sum Sq Mean Sq F value Pr(>F) Disease 2 0.63 0.3151 1 0.375 Residuals 51 16.07 0.3150 

Page 15: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

One time-point: 320 minutes

ANOVA

• Disease Mean Sq (MS)?– Variation between Disease group means

+ (Biological variation + Measurement error)

• Residuals MS?– (Biological variation + Measurement error)

• F value = Ratio of Disease MS to Residuals MS

Df Sum Sq Mean Sq F value Pr(>F) Disease 2 0.63 0.3151 1 0.375 Residuals 51 16.07 0.3150 

Page 16: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

Factorial experiment• Two or more sets of experimental conditions• A factor is a set of experimental conditions• The values of a factor are called its levels• Our time-course experiment:

– Disease status (factor): Sham, IRI, IPC (levels)– Time (factor): 0,10, 20, 80, 320 minutes (levels)

Page 17: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

Factorial experiment• Study all factor-level (treatment) combinations

DiseaseTime (minutes)

0 10 20 80 320Sham S.0 S.10 S.20 S.80 S.320

IRI IRI.0 IRI.10 IRI.20 IRI.80 IRI.320IPC IPC.0 IPC.10 IPC.20 IPC.80 IPC.320

Page 18: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

Factorial experiment• Study all factor-level (treatment) combinations

0 min 320 min

Mea

n lo

g-A

bund

ance

ShamIRI

.

.

.

Estimated from data

Interaction between Disease and Time

80min

Disease group:

.

Page 19: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

0 min 320 min

Mea

n lo

g-A

bund

ance

ShamIRI

.

.

.

Estimated from data

Interaction between Disease and Time

80min

Disease group:

.

Interaction means the magnitude of the difference between IRI and Sham depends on the time at which sample was taken

Page 20: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

Statistical linear model

response = systematic component + error component

• response = one numerical outcome

• systematic component = describes relationship between experimental conditions (e.g. surgeries and times) and response– May be described by one, or more, explanatory variables

• error component = variation in response that is not explained by systematic component

Page 21: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

Linear model for hexanoic acid (C6_0)

log(hexacid) = systematic component + error component

• response = log-abundance of hexanoic acid

• systematic component = Disease + Time + Disease:Time

• error component = biological variation + measurement error

Page 22: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Actual hexanoic acid time-course data

Log-transformed abundances

Each line represents one rat

Page 23: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

Linear model for hexanoic acid (C6_0)

log(hexacid) = systematic component + error component

• response = log-abundance of hexanoic acid

• systematic component = Disease + Time + Disease:Time

• error component = biological variation + measurement error

Page 24: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, ignoring correlation

Two-way ANOVA

• Conclusion: Strong time effect (p < 0.0001) but little statistical evidence of an interaction between Disease and Time (p = 0.089).

But observations are not independent!

Page 25: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Linear model for hexanoic acid (C6_0)

log(hexacid) = systematic component + error component

• response = log-abundance of hexanoic acid

• systematic component can be partitioned further, i.e.

syst. comp = explanatory component + structural component– explanatory component = Disease + Time + Disease:Time– structural component = Rat (biological variation)

• error component = variation in response that is not explained by systematic component (other, not biological, variation)

Page 26: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Systematic component• Explanatory component

– Main effects – FIXED effects Disease (average over levels of Time) Time (average over levels of Disease) Disease:Time interaction effect

• Structural component– Two RANDOM effects Rat (between rats variation) in addition to

Error component (within rats variation)

• Called a Linear MIXED Model (LMM)

Page 27: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Linear MIXED model for hexanoic acid

log(hexacid) = Disease + Time + Disease:Time+ Rat+ error component

• Also known as:

– Linear mixed effects

– Multilevel (or hierarchical) mixed models

• Provide a very flexible class of statistical models

Page 28: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Data tableDisease Time Rat AbundanceSham 0 1 9.40Sham 10 1 8.83Sham 20 1 10.33Sham 80 1 10.49Sham 320 1 9.74IRI 0 2 10.98IRI 10 2 7.92IRI 20 2 9.37IRI 80 2 8.69IRI 320 2 11.31IPC 0 3 7.01IPC 10 3 9.29

Not actual data

Page 29: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Multilevel (or multi-stratum) ANOVA(Can think of it as a generalized paired t-test)

• Conclusion: There is statistical evidence of an interaction between Disease and Time (p = 0.009).

Page 30: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Profiles of mean log-Abundance

Each line represents disease group meanError bars are 95% confidence intervals

Page 31: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Let’s compare the two analyses

Bottom half of multi-stratum ANOVA table

Complete two-way ANOVA table

Page 32: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Let’s compare the two analyses

Why the difference?

Complete two-way ANOVA table

Page 33: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Let’s compare the two analyses

Why the difference?

• The linear mixed model enables us to partition the variation between rats and the variation within rats (error component)

Page 34: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Experimental units of different sizes

The Rat (main plot)• Largest experimental unit to which an individual

treatment (Disease) is independently applied– Samples taken across time-points within the same rat are

subjected to homogeneous conditions

The samples across time-points (subplot)• Smaller experimental unit, derived by subdivision of the

main plot, to which a different individual treatment (Time) is independently applied

Sometimes called a split-plot in time

Page 35: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Pairwise comparisons of means

Comparison Difference SED LSD Lower Upper Ratio Lower Upper p‐valueiri:1‐sha:1 ‐0.2462 0.2006 0.3955 ‐0.6417 0.1493 ‐1.2791 ‐1.8996 ‐0.8613 0.2212iri:2‐sha:2 0.0618 0.2006 0.3955 ‐0.3337 0.4573 1.0638 0.7163 1.5798 0.7583iri:3‐sha:3 0.0595 0.2006 0.3955 ‐0.3360 0.4550 1.0613 0.7146 1.5762 0.7670iri:4‐sha:4 0.3408 0.2006 0.3955 ‐0.0547 0.7363 1.4061 0.9468 2.0882 0.0908iri:5‐sha:5 ‐0.1915 0.2006 0.3955 ‐0.5870 0.2040 ‐1.2110 ‐1.7985 ‐0.8154 0.3409iri:1‐ipc:1 ‐0.0852 0.2006 0.3955 ‐0.4807 0.3103 ‐1.0889 ‐1.6172 ‐0.7332 0.6715iri:2‐ipc:2 ‐0.2961 0.2006 0.3955 ‐0.6915 0.0994 ‐1.3445 ‐1.9968 ‐0.9053 0.1415iri:3‐ipc:3 0.2244 0.2006 0.3955 ‐0.1711 0.6199 1.2515 0.8427 1.8587 0.2646iri:4‐ipc:4 ‐0.1397 0.2006 0.3955 ‐0.5352 0.2558 ‐1.1500 ‐1.7078 ‐0.7743 0.4868iri:5‐ipc:5 0.0625 0.2006 0.3955 ‐0.3330 0.4580 1.0644 0.7167 1.5808 0.7558ipc:1‐sha:1 ‐0.1610 0.2006 0.3955 ‐0.5564 0.2345 ‐1.1746 ‐1.7445 ‐0.7909 0.4233ipc:2‐sha:2 0.3579 0.2006 0.3955 ‐0.0376 0.7534 1.4303 0.9631 2.1241 0.0759ipc:3‐sha:3 ‐0.1649 0.2006 0.3955 ‐0.5604 0.2306 ‐1.1792 ‐1.7513 ‐0.7940 0.4121ipc:4‐sha:4 0.4805 0.2006 0.3955 0.0850 0.8760 1.6169 1.0888 2.4014 0.0175ipc:5‐sha:5 ‐0.2539 0.2006 0.3955 ‐0.6494 0.1416 ‐1.2891 ‐1.9144 ‐0.8680 0.2070iri:1‐iri:2 ‐0.5877 0.1636 0.3225 ‐0.9102 ‐0.2651 ‐1.7998 ‐2.4849 ‐1.3036 0.0004iri:2‐iri:3 0.1536 0.1636 0.3225 ‐0.1690 0.4761 1.1660 0.8445 1.6098 0.3489iri:3‐iri:4 ‐0.8436 0.1636 0.3225 ‐1.1662 ‐0.5211 ‐2.3247 ‐3.2096 ‐1.6838 0.0000iri:4‐iri:5 0.1955 0.1636 0.3225 ‐0.1271 0.5180 1.2159 0.8807 1.6787 0.2335ipc:1‐ipc:2 ‐0.7985 0.1636 0.3225 ‐1.1211 ‐0.4760 ‐2.2223 ‐3.0682 ‐1.6096 0.0000ipc:2‐ipc:3 0.6740 0.1636 0.3225 0.3515 0.9966 1.9621 1.4212 2.7090 0.0001ipc:3‐ipc:4 ‐1.2077 0.1636 0.3225 ‐1.5303 ‐0.8852 ‐3.3459 ‐4.6194 ‐2.4234 0.0000ipc:4‐ipc:5 0.3977 0.1636 0.3225 0.0751 0.7202 1.4884 1.0780 2.0549 0.0159sha:1‐sha:2 ‐0.2797 0.1636 0.3225 ‐0.6023 0.0428 ‐1.3228 ‐1.8263 ‐0.9581 0.0888sha:2‐sha:3 0.1513 0.1636 0.3225 ‐0.1713 0.4738 1.1633 0.8426 1.6061 0.3562sha:3‐sha:4 ‐0.5623 0.1636 0.3225 ‐0.8849 ‐0.2398 ‐1.7547 ‐2.4227 ‐1.2710 0.0007sha:4‐sha:5 ‐0.3368 0.1636 0.3225 ‐0.6593 ‐0.0142 ‐1.4004 ‐1.9335 ‐1.0143 0.0408

95% Confidence LimitsLog‐Abundance

95% Confidence LimitsBack‐transformed

Page 36: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Pairwise comparisons of means

• Back-transform to original scale– iri:4 – sha:4: Ratio = exp(0.3408) = 1.401 (Differences ≥ 0 Ratios ≥ 1)– iri:1 – sha:1: Ratio = exp(-0.2462) = 0.7817 (Differences 0 0 < Ratios 1)

Too small to see on graphs, so we calculate: -1/Ratio = -1/exp(-0.2462) = -1.279 = sha:1 / iri:1 (Ratios -1)

Comparison Difference SED LSD Lower Upperiri:1‐sha:1 ‐0.2462 0.2006 0.3955 ‐0.6417 0.1493iri:2‐sha:2 0.0618 0.2006 0.3955 ‐0.3337 0.4573iri:3‐sha:3 0.0595 0.2006 0.3955 ‐0.3360 0.4550iri:4‐sha:4 0.3408 0.2006 0.3955 ‐0.0547 0.7363iri:5‐sha:5 ‐0.1915 0.2006 0.3955 ‐0.5870 0.2040

95% Confidence LimitsLog‐Abundance

Page 37: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Pairwise comparisons of means

Comparisoniri:1‐sha:1iri:2‐sha:2iri:3‐sha:3iri:4‐sha:4iri:5‐sha:5

Ratio Lower Upper p‐value‐1.2791 ‐1.8996 ‐0.8613 0.22121.0638 0.7163 1.5798 0.75831.0613 0.7146 1.5762 0.76701.4061 0.9468 2.0882 0.0908‐1.2110 ‐1.7985 ‐0.8154 0.3409

95% Confidence LimitsBack‐transformed

Page 38: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Pairwise comparisons of means

Page 39: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Profiles of mean log-Abundance

Each line represents disease group meanError bars are 95% confidence intervals

Page 40: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Why is repeated measures ANOVA different…

…to two-way ANOVA ignoring correlated observations?• ANOVA assumes independent errors• Split-plot in time ANOVA assumes

– Observations on different experimental units are uncorrelated (i.e. independent of one another)

– Observations between all pairs of observations on the same experimental unit have the same covariance

• In repeated measures designs– Adjacent observations are likely to be more highly correlated

than more distant observations– Problem involves the covariance structure

i.e. error variance-covariance matrix

Page 41: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

What else could we have done?• Adjust for each rat’s baseline levels of hexanoic

acid• Allow for non-constant correlations between

pairs of time-points– Variance models

• Model the shapes of the profiles– Random coefficient regression– Smoothing splines

Page 42: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

What else could we have done?• Multivariate linear mixed model

– Simultaneous analysis of data from all metabolites– Each metabolite draws on information from other

metabolites– Each metabolite has its own variance

• Beyond the scope of this talk

Page 43: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

The End

Thank you!

Page 44: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Principles of design: Structure in the experimental units

Experimental units of different sizes

Disease status main effect• Comparisons between levels of Disease status,

averaging over levels of Organ• Which source(s) of non-systematic variation contribute to

protein abundance for each Disease status?Organ main effect• Comparisons between levels of Organ, averaging over

levels of Disease status• Which source(s) of non-systematic variation contribute to

protein abundance for each Organ?

Page 45: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Principles of design: Structure in the experimental units

Experimental units of different sizes

Disease:Organ interaction• There are four (factorial) treatment groups:

– D:I, D:O, H:I and H:O

• Which sources of non-systematic variation contribute to these means?

• And, which sources of non-systematic variation contributed to these two pairs of comparisons between means?

1. D:I-D:O and H:I-H:O2. D:O-H:O and D:I-H:I

Page 46: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Principles of design: Structure in the experimental units

Experimental units of different sizes

Disease:Organ interaction• Sources of variation contributing to these comparisons?

1. D:I-D:O and H:I-H:O2. D:O-H:O and D:I-H:I

1. Between subplots within the same main plot Variation in within Animals + measurement error

2. Between subplots in different main plots Variation between- and within-Animals + measurement error

Page 47: Analysing time-course data KathyRuggiero pres2bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Kathy Ruggiero k.ruggiero@auckland.ac.nz Department of Statistics

Kathy Ruggiero [email protected]

Time-course, accounting for correlation

Linear MIXED model for hexanoic acid

log(hexacid) = Disease + Time + Disease:Time+ Rat+ error component

• Comparisons between Diseases are between rats

• Comparisons between Times are within rats

• Comparisons between Times for the same Disease are within rats

• Comparisons between Diseases for the same time-point involve a combination of between and within rats