concepts of statistical inference: a randomization-based curriculum
DESCRIPTION
Concepts of Statistical Inference: A Randomization-Based Curriculum. Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State University. Outline. Overview, motivation Three examples Merits, advantages Five questions Assessment issues - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/1.jpg)
Concepts of Statistical Inference: A Randomization-Based Curriculum
Allan Rossman, Beth Chance, John Holcomb
Cal Poly – San Luis Obispo, Cleveland State University
![Page 2: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/2.jpg)
2CAUSE Webinar April 2009 2
Outline
Overview, motivation Three examples Merits, advantages Five questions Assessment issues Conclusions, lessons learned Q&A
![Page 3: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/3.jpg)
33
Ptolemaic Curriculum?
“Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.”
– George Cobb (TISE, 2007)
![Page 4: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/4.jpg)
4
Is randomization-based approach feasible? Experience at post-calculus level
Developed spiral curriculum with logic of inference (Fisher’s Exact Test) in chapter 1
ISCAM: Investigating Statistical Concepts, Applications, and Methods
New project Rethinking for lower mathematical level More complete shift, including focus on entire
statistical process as a whole
4
![Page 5: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/5.jpg)
55
Example 1: Helper/hinderer?
Sixteen infants were shown two videotapes with a toy trying to climb a hill One where a “helper” toy pushes the original toy up One where a “hinderer” toy pushes the toy back down
Infants were then presented with the two toys as wooden blocks Researchers noted which toy infants chose
http://www.yale.edu/infantlab/socialevaluation/Helper-Hinderer.html
![Page 6: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/6.jpg)
66
Example 1: Helper/hinderer?
Data: 14 of the 16 infants chose the “helper” toy Core question of inference:
Is such an extreme result unlikely to occur by chance (random selection) alone …
… if there were no genuine preference (null model)?
![Page 7: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/7.jpg)
77
Analysis options
Could use a binomial probability calculation We prefer a simulation approach
To emphasize issue of “how often would this happen in long run?”
Starting with tactile simulation
![Page 8: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/8.jpg)
88
Strategy
Students flip a fair coin 16 times Count number of heads, representing choices of
“helper” toy Fair coin represent null model of no genuine
preference Repeat several times, combine results
See how surprising to get 14 or more heads even with “such a small sample size”
Approximate (empirical) P-value Turn to applet for large number of repetitions:
http://statweb.calpoly.edu/bchance/applets/BinomDist3/BinomDist.html
![Page 9: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/9.jpg)
9
Results
Pretty unlikely to obtain 14 or more heads in 16 tosses of a fair coin, so …
Pretty strong evidence that infants do have genuine preference for helper toy and were not just picking at random
![Page 10: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/10.jpg)
1010
Example 2: Dolphin therapy?
Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment
Is dolphin therapy more effective than control? Core question of inference:
Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?
Dolphin therapy Control group TotalSubject improved 10 3 13Subject did not 5 12 17
Total 15 15 30Proportion 0.667 0.200
![Page 11: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/11.jpg)
1111
Some approaches
Could calculate test statistic, P-value from approximate sampling distribution (z, chi-square) But it’s approximate But conditions might not hold But how does this relate to what “significance” means?
Could conduct Fisher’s Exact Test But there’s a lot of mathematical start-up required But that’s still not closely tied to what “significance” means
Even though this is a randomization test
![Page 12: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/12.jpg)
1212
Alternative approach
Simulate random assignment process many times, see how often such an extreme result occurs Assume no treatment effect (null model) Re-randomize 30 subjects to two groups (using cards)
Assuming 13 improvers, 17 non-improvers regardless Determine number of improvers in dolphin group
Or, equivalently, difference in improvement proportions Repeat large number of times (turn to computer) Ask whether observed result is in tail of distribution
Indicating saw a surprising result under null model Providing evidence that dolphin therapy is more effective
![Page 13: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/13.jpg)
1313
Analysis
http://www.rossmanchance.com/applets/Dolphins/Dolphins.html
![Page 14: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/14.jpg)
1414
Conclusion
Experimental result is statistically significant And what is the logic behind that?
Observed result very unlikely to occur by chance (random assignment) alone (if dolphin therapy was not effective)
![Page 15: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/15.jpg)
1515
Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects
on cognitive functioning three days later? 21 subjects; random assignment
Core question of inference: Is such an extreme difference unlikely to occur by
chance (random assignment) alone (if there were no treatment effect)?
improvement
sleep c
onditio
n
4032241680-8-16
deprived
unrestricted
![Page 16: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/16.jpg)
1616
One approach
Calculate test statistic, p-value from approximate sampling distribution
68.2
93.5
92.15
1073.14
1117.12
90.382.1922
2
22
1
21
21
ns
ns
xxt
008.68.2Pr ? tvaluep
![Page 17: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/17.jpg)
1717
Another approach
Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs
difference in group means by random assignment
num
ber
of ra
ndom
izations
181260-6-12-18
120
100
80
60
40
20
0
= 13 / 1000approx p-value
![Page 18: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/18.jpg)
1818
Advantages
You can do this at beginning of course Then repeat for new scenarios with more richness Spiraling could lead to deeper conceptual understanding
Emphasizes scope of conclusions to be drawn from randomized experiments vs. observational studies
Makes clear that “inference” goes beyond data in hand Very powerful, easily generalized
Flexibility in choice of test statistic (e.g. medians, odds ratio) Generalize to more than two groups
Takes advantage of modern computing power
![Page 19: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/19.jpg)
19
Question #1
Should we match type of randomness in simulation to role of randomness in data collection? Major goal: Recognize distinction between random
assignment and random sampling, and the conclusions that each permit
Or should we stick to “one crank” (always re-randomize) in the analysis, for simplicity’s sake?
For example, with 2×2 table, always fix both margins, or only fix one margin (random samples from two independent groups), or fix neither margin (random sampling from one group, then cross-classifying)
![Page 20: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/20.jpg)
2020
Question #2
What about interval estimation? Estimating effect size at least as important as assessing
significance How to introduce this?
Invert test Test “all” possible values of parameter, see which do not put
observed result in tail Easy enough with binomial, but not as obvious how to
introduce this (or if it’s possible) with 2×2 tables Alternative: Estimate +/- margin-of-error
Could estimate margin-of-error with empirical randomization distribution or bootstrap distribution
![Page 21: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/21.jpg)
2121
Question #3
How much bootstrapping to introduce, and at what level of complexity? Use to approximate SE only? Use percentile intervals? Use bias-correction?
Too difficult for Stat 101 students? Provide any helpful insights?
![Page 22: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/22.jpg)
2222
Question #4
What computing tools can help students to focus on understanding ideas? While providing powerful, generalizable tool?
Some possibilities Java applets, Flash
Very visual, contextual, conceptual; less generalizable Minitab
Provide students with macros? Or ask them to edit? Or ask them to write their own?
R Need simpler interface?
Other packages? StatCrunch, JMP have been adding resampling capabilities
![Page 23: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/23.jpg)
2323
Question #5
What about normal-based methods? Do not ignore them!
Introduce after students have gained experience with randomization-based methods
Students will see t-tests in other courses, research literature
Process of standardization has inherent value A common shape often arises for empirical
randomization/sampling distributions Duh!
![Page 24: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/24.jpg)
2424
Assessment: Developing instruments that assess … Conceptual understanding of core logic of inference
Jargon-free multiple choice questions on interpretation, effect size, etc.
“Interpret this p-value in context”: probability of observed data, or more extreme, under randomness, if null model is true
Ability to apply to new studies, scenarios Define null model, design simulation, draw conclusion More complicated scenarios (e.g., compare 3 groups)
![Page 25: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/25.jpg)
Understanding of components of activity/simulation Designed for use after an in-class activity using
simulation. Example Questions
What did the cards represent? What did shuffling and dealing the cards represent? What implicit assumption about the two groups did the
shuffling of cards represent? What observational units were represented by the dots on
the dotplot? Why did we count the number of repetitions with 10 or
more “successes” (that is, why 10)?
25
![Page 26: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/26.jpg)
26
Conducting small classroom experiments Research Questions:
Start with study that has with significant result or non? Start with binomial setting or 2×2 table? Do tactile simulations add value beyond computer
ones? Do demonstrations of simulations provide less value
than student-conducted simulations?
![Page 27: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/27.jpg)
2727
Conclusions/Lessons Learned
Put core logic of inference at center Normal-based methods obscure this logic Develop students’ understanding with
randomization-based inference Emphasize connections among
Randomness in design of study Inference procedure Scope of conclusions
But more difficult than initially anticipated “Devil is in the details”
![Page 28: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/28.jpg)
Conclusions/Lessons Learned
Don’t overlook null model in the simulation Simulation vs. Real study Plausible vs. Possible
How much worry about being a tail probability How much worry about p-value = probability
that null hypothesis is true
28
![Page 29: Concepts of Statistical Inference: A Randomization-Based Curriculum](https://reader035.vdocuments.us/reader035/viewer/2022070415/56814efd550346895dbc8bd0/html5/thumbnails/29.jpg)
2929
Thanks very much!
Thanks to NSF (DUE-CCLI #0633349) Thanks to George Cobb, advisory group More information: http://statweb.calpoly.edu/csi
Draft modules, assessment instruments Questions/comments: