Kathy Ruggiero [email protected]
Department of Statistics
A web-based personal assistant for designing statistically sound
experiments
A web-based personal assistant for designing statistically sound
experiments
9 July 2015
Kathy Ruggiero and David Banks
Kathy Ruggiero [email protected]
All too often…
Kathy Ruggiero [email protected]
Inevitably…
• Wrong data is collected– Cannot answer the questions for which they
were intended
OR
• Data is inefficiently collected– Comparisons between treatment means made
with lower precision than may otherwise have been possible
Kathy Ruggiero [email protected]
The solution“Use statistically designed experiments to ensure that the “right answers” are found with minimum
effort, subjects and other resources.”
• Three fundamental principles (Fisher, 1926):– Replication, to enable separation of signal and noise– Randomisation, to eliminate bias and induce
independence– Blocking, to control for systematic nuisance sources of
variation
Kathy Ruggiero [email protected]
The problem• Many software packages with experimental
design capabilities but…– User must know a priori the “layout” of the design, e.g. Completely randomised design Randomized complete block design Balanced incomplete block design Row-column design Split-plot design Etc., etc., etc.!
• Large body of literature, largely inaccessible to novices
Kathy Ruggiero [email protected]
Our idea“An expert system with the embedded requisite
knowledge which interactively guides usersthrough the thinking needed to develop an efficient
statistically designed experiment.”
Kathy Ruggiero [email protected]
ConceptionC2D: Concept 2 Design
Objectives
Measureable responses
Experimental factors
Experimental material
Nuisance variables
Generalisability
Finish
PDF Summary
Design of Experiment
Start
Research questions
Kathy Ruggiero [email protected]
Example: Simulating a high CO2 world
Experiment: Simulate a high CO2 world
Identify/quantify metabolite, protein and gene transcript abundances
Goal: Elucidate molecular mechanisms involved in calcification
Kathy Ruggiero [email protected]
How were samples obtained?
http://uncw.edu/aquaculture/images/Larval-rearing-tanks.jpg
Divide into 3 cultures
Mid = 540 ppmHigh = 1000 ppm
Control = 380ppm (current level) Until 4-arm stage
http://www.ceoe.udel.edu/Antarctica/pluticon.jpg
Male:Female pair
Kathy Ruggiero [email protected]
How were samples obtained?
8 Male:Female pairs
We will focus on proteomics and assume one sample is taken from each tank.
Kathy Ruggiero [email protected]
Front matter
Kathy Ruggiero [email protected]
The most important question?
Why am I collecting this data?
How will each variable you measure enable you to answer your research question(s)?
Kathy Ruggiero [email protected]
Why this experiment?Why is this information being collected?
To direct you towards writing focused statements about the investigative questions you want your experiment to answer.
Kathy Ruggiero [email protected]
Why is this information being collected?To direct you towards thinking about the variable(s) you will measure
on the subjects of your experiment.
What will you measure?
Kathy Ruggiero [email protected]
Factors you will deliberately vary?Why is this information being collected?
To direct you towards thinking about the experimental factors you will study in your experiment for the purpose of learning how they may help to explain observed changes in your measureable responses.
Kathy Ruggiero [email protected]
Factors you will deliberately vary?
Kathy Ruggiero [email protected]
Factors you will deliberately vary?
Kathy Ruggiero [email protected]
The subjects of your experiment?Why is this information being collected?
To direct you towards thinking about:1. Who or what are the subjects of your experiment?2. How will you apply the experimental treatments to them?3. How will they be managed for making measurements on them?
Kathy Ruggiero [email protected]
The subjects of your experiment?
Kathy Ruggiero [email protected]
Other experimental variables?Why is this information being collected?
To direct you towards thinking about:1. The source of each portion of experimental material. I.e. is each
derived from:a. an independent source?b. a common source?c. a combination of both independent and common sources?
2. How the experimental material will be:a. processed and/orb. arranged in time and/or spacein order to make measurements on them.
3. How 1 and 2 may contribute to systematic differences in the observed values on your measureable variables.
Kathy Ruggiero [email protected]
Other experimental variables?
Group of subjects = culture or tank3 per parent-pair
Male:Female parent-pair
Common source:A male:female pair
Kathy Ruggiero [email protected]
Other experimental variables?How will the experimental material will be:
a. processed and/orb. arranged in time and/or space
in order to make measurements on them?
8 samples can be simultaneously analysed in a single mass spec run (ignoring isobaric tags used to label samples)
Yes
Can each run accommodate a sample from each male:female pair?
Kathy Ruggiero [email protected]
Other experimental variables?How will the experimental material will be:
a. processed and/orb. arranged in time and/or space
in order to make measurements on them?
Male:Female Pair1 2 3 4 5 6 7 8
Run
1
2
3
Kathy Ruggiero [email protected]
Other experimental variables?Cells with same fill pattern = Samples analysed in a runCells with same fill colour = Samples from common source
Male:Female Pair1 2 3 4 5 6 7 8
Run
1
2
3
Every cell colour (M:F pair) occurs with every pattern (run)
Row-Column design
Kathy Ruggiero [email protected]
Other experimental variables?
Male:Female Pair1 2 3 4 5 6 7 8
Run
1 380 540 380 1000 540 380 540
2 540 380 1000 540 380 1000 540 1000
3 1000 540 380 1000 540 380 1000 380
Allocation of CO2 levels to row-column arrangementFull complement of CO2 levels in columns
CO2 levels balanced within runsBiological variation orthogonal to run-to-run variation
Kathy Ruggiero [email protected]
Our design versus theirs?
• 3 runs• 12 residual d.f.• 98.4% efficiency• CO2 levels estimated
independently of tags
• 4 runs (extra $4500 )• 14 residual d.f.• 100% efficiency• CO2 levels partially
confounded with tags
Design from ConceptionC2D
Their design
Kathy Ruggiero [email protected]
Output
• Statistical design (CSV)
Run Pair CO21 1 3801 2 5401 3 10001 4 3801 5 540
Kathy Ruggiero [email protected]
What’s next
• Beta test it– Is it accessible and easy-to-use?Want to try it out? Email me: [email protected]
• Make it “smart”– An adaptive expert system which learns from
its interactions with end-users• Make it available on smart devices