randomization and bootstrap methods in the …randomization and bootstrap methods in the...
TRANSCRIPT
RandomizationandBootstrapMethodsintheIntroductory
StatisticsCourse
KariLockMorgan RobinLockDukeUniversity St.LawrenceUniversity
[email protected] [email protected]
Panela2013JointMathematicsMeetingsSanDiego,CA
HowmighttheIntroStatcurriculumchangeto
accommodate/takeadvantageofbootstrap/randomization
methods?
IntroStat– TraditionalTopics• DescriptiveStatistics– oneandtwosamples• Normaldistributions• Dataproduction(samples/experiments)
• Samplingdistributions(mean/proportion)
• Confidenceintervals(means/proportions)
• Hypothesistests(means/proportions)
• ANOVAforseveralmeans,Inferenceforregression,Chi-squaretests
IntroStat– RevisetheTopics• DescriptiveStatistics– oneandtwosamples• Normaldistributions• Dataproduction(samples/experiments)
• Samplingdistributions(mean/proportion)
• Confidenceintervals(means/proportions)
• Hypothesistests(means/proportions)
• ANOVAforseveralmeans,Inferenceforregression,Chi-squaretests
• Dataproduction(samples/experiments)• Bootstrapconfidenceintervals• Randomization-basedhypothesistests• Normaldistributions
• Bootstrapconfidenceintervals• Randomization-basedhypothesistests
• DescriptiveStatistics– oneandtwosamples
WhystartwithBootstrapCI’s?•Minimalprerequisites:
Populationparametervs.samplestatisticRandomsamplingDotplot (orhistogram)Standarddeviationand/orpercentiles
• SamemethodofrandomizationinmostcasesSamplewithreplacementfromoriginalsample
• NaturalprogressionSampleestimate==>Howaccurateistheestimate?
• Intervalsaremoreuseful?Agooddebateforanothersession…
Example:MustangPrices
Data:Sampleof25MustangslistedonAutotrader.com
Findaconfidence intervalfortheslope ofaregression linetopredictpricesofusedMustangsbasedontheirmileage.
“Bootstrap”SamplesKeyidea:• Samplewithreplacementfromtheoriginalsampleusingthesamen.
• Computethesamplestatisticforeachbootstrapsample.
• Collectlotsofsuchbootstrapstatistics
Imaginethe“population”ismany,manycopiesoftheoriginalsample.
Distributionof3000BootstrapSlopes
UsingtheBootstrapDistributiontoGetaConfidenceInterval– Version#1
Thestandarddeviationofthebootstrapstatisticsestimatesthestandarderrorofthesamplestatistic.
Quickintervalestimate:
𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 2 / 𝑆𝐸ForthemeanMustangslopetime:
)162.0,278.0(058.022.0029.0222.0 −−=−±−=⋅±−
UsingtheBootstrapDistributiontoGetaConfidenceInterval– Version#2
Keep95%inmiddle
Chop2.5%ineachtail
Chop2.5%ineachtail
95%CIforslope(-0.279,-0.163)
3.SimulationTechnology?
Fall2010:FathomFall2011:Fathom&Applets
Tactilesimulationsfirst?Bootstrap– No(withreplacementistough)Testforanexperiment– Yes(1or2)
DesirableTechnologyFeatures?
ThreeDistributions
OnetoManySamples
DesirableTechnologyFeatures
4.OneCrankorTwo?
ConfidenceIntervals– Bootstrap– onecrank
SignificanceTests– Two(ormore)cranks
Rulesforselectingrandomizationsamplesforatest.Beconsistentwith:1. thenullhypothesis2. thesampledata3. thewaydatawerecollected
RandomizationTestforSlope
5.Testfora2x2Table
Firstexample:ArandomizedexperimentTeststatistic:CountinonecellRandomize:TreatmentgroupsMargins:FixbothLaterexamplesvary,e.g.usedifferenceinproportionsorrandomizeasindependentsampleswithcommonp.
6.Whatabout“traditional”methods?
AFTERstudentshaveseenlotsofbootstrapandrandomizationdistributions(andhopefullybeguntounderstandthelogicofinference)…
• Introducethenormaldistribution(andlatert)
• Introduce“shortcuts”forestimatingSEforproportions,means,differences,…
BacktoMustangPricesThe regression equation isPrice = 30.5 - 0.219 Miles
Predictor Coef SE Coef T PConstant 30.495 2.441 12.49 0.000Miles -0.21880 0.03130 -6.99 0.000
S = 6.42211 R-Sq = 68.0% R-Sq(adj) = 66.6%
7.Assessment?
Newlearninggoals• Understandhowtogeneratebootstrap
samplesanddistribution.• Understandhowtocreaterandomization
samplesanddistribution.• Beabletouseabootstrap/randomization
distributiontofindaninterval/p-value.
8.Howdiditgo?• Studentsenjoyedandwereengagedwiththenewapproach• Instructorenjoyedandwasengagedwiththenewapproach.• Betterunderstandingofp-valuereflecting“ifH0 istrue”.• Betterinterpretationsofintervals.• Challenge:Few“experienced”studentstoserveasresources.
Goingforward
Continuewithrandomizationapproach?
ABSOLUTELY(3sectionsinFall2011)