introduction to the design and analysis of...
TRANSCRIPT
![Page 1: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/1.jpg)
Introduction to the Design and Analysis ofExperiments
Prof. Moacir Pontiwww.icmc.usp.br/~moacir
Instituto de Ciências Matemáticas e de Computação – USP
2018/1
![Page 2: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/2.jpg)
Agenda
SamplingSampling strategiesExplanatory and response variables
Experiments
Examining data and variables
Hypothesis test
![Page 3: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/3.jpg)
Sampling and Variables
Census vs Sampling
I It is rare to need a censusI Sampling is often sufficient if it is representative, but implies
to accept some errors
Variables
I Numeric: discrete/continuousI Categoric: ordinal/non-ordinal
![Page 4: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/4.jpg)
Sampling and Variables
Exemple: check salt in a pan
I Exploratory analysis: sampling (why not a census?)
I To conclude if needs more salt: inferenceI We need a representative sampling, which requires
randomness
![Page 5: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/5.jpg)
Sampling and Variables
Exemple: check salt in a pan
I Exploratory analysis: sampling (why not a census?)I To conclude if needs more salt: inference
I We need a representative sampling, which requiresrandomness
![Page 6: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/6.jpg)
Sampling and Variables
Exemple: check salt in a pan
I Exploratory analysis: sampling (why not a census?)I To conclude if needs more salt: inferenceI We need a representative sampling, which requires
randomness
![Page 7: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/7.jpg)
Anecdotal evidence
I I met someone who was cured from asthma by homeopathy, soIt must work.
I Testimonials on the Internet sais that garlic supplement helpedsome people to lose weight, so garlic should be an effective forweight-loss.
I My grandfather smoked and drank his whole life and lived untilhe was 95, so it is no unhealthy to drink and smoke.
I Today is 6oC, so global warming is a hoax.
![Page 8: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/8.jpg)
Anecdotal evidence
It is based on data, however there are some issues
I Data only represent few casesI It is not clear if those are representativeI Not necessarily the evidence is valid to falsify some claim!
![Page 9: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/9.jpg)
Sampling bias
Convenience sample
Easily accesible sample
Non-response
Only a fraction of a random sample responds or has interest onparticipating
![Page 11: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/11.jpg)
Sampling bias and i.i.d.
![Page 12: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/12.jpg)
Sampling strategies
Simple random sampling
![Page 13: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/13.jpg)
Sampling strategies
Estratified sampling
![Page 14: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/14.jpg)
Sampling strategies
Clustering sampling
![Page 15: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/15.jpg)
Explanatory and response variables
Question
I Is the classification accuracy of plants in images lower fornatural images with higher levels of image compression?
I Explanatory variablemight affect−−−−−−−→ Response Variable
I Independent variablemight affect−−−−−−−→ Dependent variable
![Page 16: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/16.jpg)
Explanatory and response variables
Question
I Is the classification accuracy of plants in images lower fornatural images with higher levels of image compression?
I Explanatory variablemight affect−−−−−−−→ Response Variable
I Independent variablemight affect−−−−−−−→ Dependent variable
![Page 17: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/17.jpg)
Agenda
SamplingSampling strategiesExplanatory and response variables
Experiments
Examining data and variables
Hypothesis test
![Page 18: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/18.jpg)
Experiments
Try to establish causal relations, correlations, or comparisons
1. Control: compare intervention with some control group,2. Randomization: remove bias by experimenting over a
randomized set of exemples (e.g. used for training/test, usedto tune parameters and validate),
3. Replication: the more cases are observed, the more accurateare the estimates (e.g. cross-validation, repeatedsubsampling,etc.)
4. Blocking: evaluate some method in differentblocks/scenarios, in a separate way.
Also common in experiments, but less common in computerscience: placebo, placebo effect, blind / double-blind.
![Page 20: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/20.jpg)
Agenda
SamplingSampling strategiesExplanatory and response variables
Experiments
Examining data and variables
Hypothesis test
![Page 21: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/21.jpg)
Measures and transformations
Measures of center and dispersion
I Mean and standard deviationI Median and Interquartile Range (IQR)
Transformation
I Logarithm, Exponential, Squared-RootI Normalization
Examples:
library(MASS)data(cars)data(cats)data(pressure)
![Page 22: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/22.jpg)
Measures and transformations
cars_o <- data.frame(speed=c(19,19,20,20,20), dist=c(190, 206, 210, 220, 238))cars2 <- rbind(cars, cars_o)
# statisticsmean(cars2$dist)sd(cars2$dist)
# robust statisticmedian(cars2$dist)IQR(cars2$dist)
plot(cars2) # original dataplot(log(cars2)) # logplot(sqrt(cars2)) # sqrt
plot(pressure) # original dataplot(log(pressure)) # log transformation
![Page 23: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/23.jpg)
Result analysis
Plots
I ScatterplotsI Boxplots
Example:
boxplot(cats$Bwt ~ cats$Sex)
cats_o <- data.frame(Sex=c(’M’,’M’,’F’), Bwt=c(1.1,1.5,4.2), Hwt=c(10.0, 24.2, 20.1))cats2 <- rbind(cats, cats_o)
boxplot(cats2$Bwt ~ cats2$Sex)
![Page 24: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/24.jpg)
Linear regressionFits a line on datapoints coming from two variables: onedependent, and one ore more independent.If correlation is 6= 0, then: 1) A causes B , 2) B causes A, 3) somevariable C causes A and B , 4) A causes C that causes B , or 5)correlation between A and B is coincidental.
model1 <- lm(cars$dist ~ cars$speed)summary(model1)plot(cars)abline(model1)
model2 <- lm(cars2$dist ~ cars2$speed)summary(model2)plot(cars2)abline(model2)
![Page 26: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/26.jpg)
Correlation and Linear Regression
ComputersciencedoctoratesAr
cade
rev
enue
Totalrevenuegeneratedbyarcadescorrelateswith
ComputersciencedoctoratesawardedintheUS
Computersciencedoctorates Arcaderevenue
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
500degrees
1000degrees
1500degrees
2000degrees
$1billion
$1.25billion
$1.5billion
$1.75billion
$2billion
tylervigen.com
Thanks http://tylervigen.com/
![Page 27: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/27.jpg)
Correlation and Linear Regression
OBS: for r = 0.25, squared correlation is R2 = 0.06McNally et al (2012) http://www.i-jmr.org/2012/1/e1/
![Page 28: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/28.jpg)
Agenda
SamplingSampling strategiesExplanatory and response variables
Experiments
Examining data and variables
Hypothesis test
![Page 29: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/29.jpg)
Hypothesis test
1. Specifies null hypothesis and alternative hypothesis2. Assumes null hypothesis is true and compute test statistic3. Computes p-value: if null hypothesis is true, what is the
probability of observing some as extreme as those we have?I if p is below some threshold α (which is the probability of error
type I), the null hypothesis is rejected;I otherwise, do not reject null hypothesis.
![Page 30: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/30.jpg)
Hypothesis test
I t-Student: for data under normal distribution;I Wilcoxon: non-parametrical, uses rankings
I ANOVA: analyzes multiple sets via F statistics.I Kruskal-Wallis: non-parametric version
![Page 31: Introduction to the Design and Analysis of Experimentswiki.icmc.usp.br/images/1/19/Mpc.03-Experiments_EN.pdf · Sampling and Variables Census vs Sampling I Itisraretoneedacensus I](https://reader034.vdocuments.us/reader034/viewer/2022042311/5ed9a52e8765f51e2e27913c/html5/thumbnails/31.jpg)
Hypothesis test
Statisticians issue warning over misuse of P values
“Misuse of the P value — a common test for judging the strengthof scientific evidence — is contributing to the number of researchfindings that cannot be reproduced...”
http://www.nature.com/news/
statisticians-issue-warning-over-misuse-of-p-values-1.19503