overview the urn model, simulaon, and resampling methodsnolan/talks/beijing/urn... ·...
TRANSCRIPT
7/21/12
1
TheUrnModel,Simula6on,andResamplingmethods
DeborahNolanUniversityofCalifornia,Berkeley
Overview
• Introductoryexample• BasicselementsoftheUrnmodel
• Simula6onstudies
• Hypothesistes6ng• ConfidenceIntervalandSurveySampling
SalkFieldTrials
PolioleNmanychildrencrippledandsomewhocouldsurviveonlywithrespirators
7/21/12
2
FieldTrial
• JonasSalkhaddevelopedavaccinethatshowedpromisingresultsinthelaboratory
• Readytotestonalargerscale• 1954‐ayear‐longfieldtrialconducted• 1.8Mchildrenfrom217areasinUS,Canada,andFinland
• Cost$17.5M
Randomassignment
• AcarefullyappliedchanceprocessthatgiveseachvolunteeranequalprobabilityofgeYngvaccineorsaltsolu6on.
• Therandomiza6onturnspoten6albiasesintochanceerror,i.e.thetwogroupswillbesimilarwithrespecttofactorsthatmightbiastheresult,evenifthesefactorsareunobserved.
Placebocontrols
• Childrenwereinoculatedwithsimplesaltsolu6on.
• Placeboeffect–biasthatcomesfromreassuranceoftakinganotherwiseworthlessdrugsubs6tute.
Double‐blindevalua6on
• Neitherchildrennorphysicianswhoevaluatedtheirsubsequenthealthstatusknowwhohadbeengiventhevaccine/salt.
• Behaviorofparentsandchildrenwouldn’tbeaffectedbyknowledgethattheyreceivedthevaccine.
• Physicianwouldn’tbebiasedwhenfacingaborderlinecasethatwasdifficulttodiagnose.
7/21/12
3
Randomizedplacebocontrolmethod
• Randomlyassigntreatmenttochildren.
• Maximumeffortstoeliminate“observerbias”byuseofplacebo(injec6onwithsaltsolu6on)and“blinding”.
Outcome
• 56ofthechildrenreceivingthevaccinecontractedpolio• 142ofthosewhodidnotreceivethevaccinecontractedthedisease.• Itseemslikethevaccineworks,butcouldthisresulthaveeasilyhappenedbychance?• Therandomizeddesignlet’susanswerthatques6on
Ineffec6veVaccineScenario
• The198childrenwhocontractedpoliowouldhavebecomesickwhetherornottheyreceivedthevaccine
• Itwasonlytherandomiza6onthatledto142ofthe198sickchildrenbeingassignedtothesaltsolu6on.Nothingelsewasgoingon
HowLikelyIsThat?
• 400,000children:198aresickand399,802arehealthy.
• 200,000ofthe400,000childrenarepickedatrandomtoreceivethevaccine.
• What’sthechancethatasfewas56orfewerofthesickchildrenaregiventhevaccine?
7/21/12
4
TheUrnModel
• Marbles:400,000–oneforeachchild– 0forhealthyand1forcontractpolio– 399,8020sand1981s
• Draws:– 200,000drawswithoutreplacementfromtheurn(thetreatedchildren)
• Summary:– Sumthevaluesdrawn
BOXModel
01
0
BOXOF400,000Children
Sumofdraws=numberofpoliocasesintreatmentgroup
200,000drawsWithoutreplacement
>vals=c(0,1)>counts=c(399802,198)
>urn=rep(vals,6mes=counts)>results=replicate(50000,
sum(sample(urn,size=200000,replace=FALSE)))
>mean(results)[1]98.97672
>sd(results)[1]7.029245
>sum(results<=56)[1]0
>hist(results,breaks=50)ORplot(table….)
0500
1000
1500
2000
2500
3000
198 of 400,000 children are sick.
Draw 200,000 at random and count of number of sick
Number of sick children in sample of 200000
Counts
out of 50000 trials
72 75 78 81 84 87 90 93 96 99 103 107 111 115 119 123 128
7/21/12
5
• In50,000trialsitneverhappened–wedidnotgetasinglesampleof200,000with56orfewercases
• TheSalkfieldtrialwasaturningpointinmedicalresearch
Anotherexample
Calciumandbloodpressure
• Experiment:Studytheeffectofcalciumsupplementsonbloodpressureformalesubjects
• Randomizedinto2groups–– Onereceivedcalciumsupplementsfor12wks– Onereceivedaplacebofor12wks
• Double‐blindexperiment• Response:reduc6oninbloodpressure: (ini6albp–bpaNer12wks)
Outcome
• Treatmentgroup:7,‐4,18,17,‐3,‐5,1,10,11,‐2
• Sothefirstpersonsbloodpressuredecreasedby7fromthestarttotheendoftheprogram.
• Thesecondperson’sincreasedby4,thethirddecreasedby18,andsoon.
• Forthegroupthatreceivedtheplacebo:‐1,12,‐1,‐3,3,‐5,5,2,‐11,‐1,‐3
7/21/12
6
InformalAnalysisThoseinthecalciumgroupreducedtheirbloodpressureby5onaverage,but..
-10 -5 0 5 10 15
Calcium
Placebo
Blood Pressure
InformalAnalysis
• Thedistribu6onshaveroughlythesamespread
• Themeanofthecalciumgrouplookstobeabout5mmhigher
• Themeanofthecontrolgrouplookstobeabout0mm
Scenario
• Whatifthecalciummakesnodifference,andit’sjustbychancethatintherandomassignmentthecalciumgroupgotmorepeoplewhohadalowerbloodpressureaNer12weeks.
• Wecanuseanurnmodeltoexaminethischanceprocess.
BOXModel
7…
‐4
Averageofdrawsfrombox
18
Boxof20pa6ents10drawsw/outreplacement
7/21/12
7
TheUrnModel
• Marbles:20– Values:bloodpressurechange7,‐4,18,17,‐3,‐5,1,10,11,‐2,‐1,12,‐1,‐3,3,‐5,5,2,‐11,‐1,‐3
– Oneforeachsubject• Draws:– 10drawsfromtheurn(thecalciumtreatment)
• Summary:– Averagechangeinbloodpressure
UsingtheurninR
Ifthecalciumsupplementhasnoeffectthentheallsubjectswouldhavethesameresponsewhethertheyreceivedthesupplementornot
> avgs = replicate(100000, mean(sample(bp, 10, replace = FALSE))) > plot(table(avgs))
Let’sfindtheapproximatedistribu6onofthesampleaverageundertheUrnModel.
>sum(results>=5)/100000[1]0.0616
TheBasicUrnModel
7/21/12
8
Informa6onabouturn:
• Marbles:– Whatvaluesdowewriteonthemarbles?– Howmanymarblesofeachvaluedowehave?
• Draws:– Howmanydrawsdowetakefromtheurn?– Wewereplacemarblebetweendraws?
• Summary:– Howdowesummarizethevaluesdrawn?
UsingtheUrninR:Simula6on
• Setupanurnwithmarbles• Samplefromtheurn(specifynumberofdrawsandwithorwithoutreplacement)
• Dosomethingwiththeresults,e.g.takesum
• Repeatmany,many,many6mes
• Useempiricaldistribu6ontoapproximatetruedistribu6on
SeYnguptheurn
> vals = c(…) # values on marbles > counts = c(…) #counts for each val # Create the urn > urn = rep(vals, counts)
SamplingfromtheurninR
#Drawfromtheurnn6meswithreplacement> sample(urn, n, replace = TRUE)
# sum the draws from the urn > sum(sample(urn, n, replace = TRUE)) # repeat this process m times > replicate(m, sum(sample(urn, n, replace = TRUE)))
7/21/12
9
Useurntoanswerques6ons
• Rolladie96mesandtakethesumoftherolls.
• What’sthechancethesumis25orless?
Simula6on
• Draw9marbleswithreplacementfromtheurn,recordthesum.
• Repeat10,0006mestoget10,000sums
• Findthepropor6onofthose10,000sumsthatare25orless
• ThisempiricalproporFonshouldbeclosetothechancethatthesumwillbe25orless
StepsinR
• Setuptheurnwiththe“marbles”urn=1:6
• Samplefromtheurn
sample(urn,9,replace=TRUE)
• Dosomethingwiththeresults,e.g.takesum
sum(sample(urn,9,replace=TRUE))
StepsinR
• Repeatmany,many,many6mesobserva6ons=replicate(10000,
sum(sample(urn,9,replace=TRUE))
• Makeatableofthepropor6onsof6mesobservedeachvalue
approxChance=table(observa6ons)/10000
7/21/12
10
cumsum(approxChance)["25"]25
0.12220.00
0.02
0.04
0.06
0.08
observations
approxChance
14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
ThePMFhasanormalshape
Connec6ontoProbability
• TheUrnmodelisasimplechanceprocess• Considerthefirstdraw:– Valuedrawnisrandom
• Avgofurn=Expectedvalueforonedraw• SDofurn=SDforonedraw
Connec6ontoProbability
• Considersumofndraws:• ExpectedValueofsum=Avgofurn
• StandardErrorforsum=root(n)SDofurn
• Whendrawwithoutreplacement– Expectedvalueremainssame– Correc6onfactorforSE
Oursimpleexample
• Wecanfindtheexpectedvalueforthesum:• Wecandeterminethestandarddevia6onforthesum:
• Buthowdowecomputethechancethesumis25orless?
!
962"1
12= 5.12
n!UrnAVG=9*3.5= 31.5
7/21/12
11
WhyusetheUrnModel?
• Connectssta6s6caltechniques,– HypothesisTests– ConfidenceInterval
tounderlyingchanceprocess
• Ifclearonthechanceprocessthenclearonthesta6s6calmethodtoapply
• Simula6onstudypowerfultoolfordataanalysis
WhyusetheUrnModel?
• Simula6onstudypowerfultoolforunderstandingachanceprocess
• Discoverimportantproper6es–lawoflargenumbers,CLT,etc.
• Offeralterna6veapproachtolearningconcepts
• Helpstudentdiscoverfeaturesofachanceprocess,e.g.root(n)behavior
Urnmodelfordistribu6ons• DiscreteUniform(1,2,…,m)
– mmarbles– eachmarblehasauniquevalue:1tom– Drawamarblefromtheurn,notethevalue,replaceit
• Binomial(n,p)– mmarblesintheurn– marblesaremarkedwitha1(forsuccess)ora0(forfailure)– propor6onpofthemarblesaremarked1– Drawnmarbleswithreplacementandsumthevalues
• Hypergeometric(m,n1,n2)– n1+n2marblesintheurn– n1marblesaremarkedwitha1(forsuccess)– n2marblesaremarked0(forfailure)– Drawmmarbleswithoutreplacementandsumthevalues
HypothesisTes6ng
7/21/12
12
Tes6ng
• Permuta6ontests• Fisher’sexacttest• Chi‐squaretests• Canalsouseforzandttests,whereplaceassump6onsonthecontentsoftheurn(anduselargesampletheory)
Chi‐squaretest
• GradeexpectinclassandGender> table(grades, sex) sex grades F M A 9 22 B 21 31 C 8 0
Isgenderindependentofexpectedgrade?
Urn
• Valuesonmarbles:vals = c("A","B", "C")
• Countsofeachtype:counts = c(31, 52, 8)
• Urn:urn = rep(vals, counts)
Simulate
• Samplefromurn:table(sample(urn, 53, replace = FALSE)) A B C 15 33 5
• Repeatmany6mesandcalculatetheteststa6s6cforeachtable
results = replicate(50000, sum((table(sample(urn, 53)) – expect)^2/expect)))
7/21/12
13
> sum(stats >= 5.538192)[1] 45
SurveySample
SimpleRandomSampling
• Therandomselec6onprocessgivesussamplesthataretendtoberepresentaFveofthepopulaFon
• Therandomselec6onprocessmeanswecanusechancetodeterminethechancesomeoneisincludedinthesampleANDtheprobabilitydistribuFonofthesampleaverage
UrnModelforSimpleRandomSampling
• Onemarbleforeachunitinthepopula6on• Valueonthemarbleistheresponsetotheques6on(Weconsideronlyoneques6on)
• Drawnmarbleswithoutreplacementtogetasample
• Useasummaryofthevaluesdrawntomakeinferenceaboutthepopula6on
7/21/12
14
3Views–ATriptych
• Popula6on–DescribestheUrn• YourSample–Whatyoufoundfromtheexecu6onofthechanceprocess,whatyoudrewfromtheurn
• SamplingDistribu6on–UnderstandthebehaviorofthechanceprocessanduseitalongwithYourSampletomakestatementsaboutthePopula6on;Wefindthisbehaviorbysimula6ngwiththeurnorbyapplyinglargesampletheory
Popula6onView
Popula6on=Urn• Nmarbles‐
• Unknowndistribu6onofvalues• UnknownPopula6onAverage• UnknownPopula6onSD
SampleView
Sample• ndrawswithoutreplacement
• Observe– Sampledistribu6on– Samplemean
– SampleSD
SimpleRandomSampleshouldresemblepopula6on–duetochanceprocess
SamplingDistribu6onView
ProbabilityModelforhowthesamplemightturnout
• EV(SampleAVG)=Popula6onAVG• SE(SampleAVG)=
ThePopAVGandSDareunknown,andwedon’tknowthesamplingdistribu6onofoursta6s6c
SD(Pop)
n
N ! n
N !1
7/21/12
15
InferenceaboutthePopula6on
• SRStellsusthesampleshouldberepresenta6veofthepopula6on
• CreateabootstrappopulaFonfromthesample
• FindthetheapproximatesamplingdistribuFonofthesta6s6cforthebootstrappopula6onanduseittomakeinferencesaboutthetruepopula6on
ClusterSampling
UrnModelforClusterSampling
• Urn:packetsofmarbles• N=numberofpacketsintheurn
• n=numberofdraws/packetsfromtheurn
• Drawpacketswithoutreplacement
• Openupthepacketandtakeallofthemarblesfromthepacket
Stra6fiedSampling
7/21/12
16
UrnModelforStra6fiedSampling
• Murns:oneforeachstratum• UrnihasNimarblesinit
• DrawnimarbleswithoutreplacementfromUrni
Terence’sStuff:Simula6onIMSBulle6nMay2011
Itnowseemstomethatweareheadingintoanerawhenallsta6s6calanalysiscanbedonebysimula6on.Wedon’tneedlikelihoodfunc6ons;wejustneedtoknowhowtosimulatefromthemodelsweentertainforourdata.That’sbeenasinequanonofsta6s6calanalysisforsome6menow.…Wedon’tneedtheorytotellusourmethodworks;wejustneedtosimulateandsee.
Resources
• BoxmodelinSta7s7csbyFreedman,Pisani,Purves
• RVideo#13:hzp://www.stat.berkeley.edu/share/rvideos/R_Videos/R_Videos.html