social data science data and big data · (paid surveys) – cheap talk – diverse interpretations...
TRANSCRIPT
![Page 1: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/1.jpg)
SocialDataScience
DataandBigdataDavidDreyerLassen
UCPHECONAugust12,2016
![Page 2: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/2.jpg)
InGodwe trust,allothers mustbringdata
W.EdwardsDewing
Differenttypesofdata 2
![Page 3: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/3.jpg)
Today:1.Empirical design2.datagenerating process3.modesofcollection
standardvsbig data;examples4.strategic dataprovision
![Page 4: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/4.jpg)
roadmap
• Different datafordifferent questions• Theory andempirics,forecasting andhypothesis testing
• Effects ofcauses vs.Causes ofeffects• Datagenerating process• Modesofdatacollection – pros andcons• Strategicdatamanagementanddataproduction
Differenttypesofdata 4
![Page 5: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/5.jpg)
Different datafordifferent questionsor
Different questions fordifferent data
Sometimes possible toseparatedatacollection processfromunderlying datagenerating process – andsometimes not
Fundamentaldifferencebetween what people doandwhat they say they do‘cheap talk’/‘putyour money where your mouth is’/honest/costly signaling
Differenttypesofdata 5
![Page 6: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/6.jpg)
roadmap
• Different datafordifferent questions• Theory andempirics,forecasting andhypothesis testing
• Effects ofcauses vs.Causes ofeffects• Datagenerating process• Modesofdatacollection – pros andcons• Strategicdatamanagementanddataproduction
Differenttypesofdata 6
![Page 7: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/7.jpg)
What isyour question,again?1. Researchquestion from
theory2. Idealempirical design3. Feasible empirical
design/collection4. Results5. Adjustment of
theory/question/design6. Newresults7. …
A. WhatdatadowehaveB. Whatquestioncanthey
answerC. ResearchquestionD. Results
Differenttypesofdata 7
![Page 8: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/8.jpg)
Allmodelsare wrong –butsome are useful
Two key goals1. Forecasting:individual behavior,policy
consequences,voting,ChampionsLeague,grades…Datascience/machine learning (butalsomacroeconomics)
2. Hypothesis testing,derived fromtheory´Traditional’socialscience
Differenttypesofdata 8
GeorgeBox
![Page 9: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/9.jpg)
1. Forecasting• Example:Bankwants toforecast non-payment onloans(P_d:probability ofdefault)
• Couldn’t care less about theory• Rough”DataScience”: try topredict fromallavailabledata
• Suppose we findthat birth weight predicts default– Bankishappy,better fit (defer ethics etc)– Policy:does investing inpre-natal care reduce defaults?
• Inpractice: setofpredictors typically taken from(some)theory,even ifcasual
• Complications:ifcustomers knowthat P_d depends onbirth weight,would/should they disclose it?What ifloans only todisclosers?Would they tell thetruth?
Differenttypesofdata 9
![Page 10: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/10.jpg)
2.Hypothesis testing• Theory (rationalchoice,sociology,biology,common sense,…)posits effect ofXonYA. Selection/typetheory:Peoplewho are impatient
cannot defer immediate pleasures ->smoke anddrinkwhile pregnant ->givesbirth sooner.Ifimpatient parents ->impatient children (whether bynatureornurture),we haveanexplanation.
B. Biological theory:low birth weight affects braindevelopment andneurological wiring forpatience.
• If(A),little role forpolicy;also,both can be trueatsametime
• Howtodistinguish:exogenous shock tobirthweight,butethically tricky...
Differenttypesofdata 10
![Page 11: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/11.jpg)
Goodhart’s law
• Mostpopular:“Whenameasurebecomesatarget,itceasestobeagoodmeasure.”
• What he wrote:“Anyobservedstatisticalregularitywilltendtocollapseoncepressureisplaceduponitforcontrolpurposes.”
Differenttypesofdata 11
![Page 12: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/12.jpg)
TargetsandMeasures
• You cannot be toldhow your bankconstructsyour P_d.Why?– Goodhart’s law:people will attempt tooutmaneuver measure
– (thought)example:spending onshoes goodindicator ofaccount overdraft ->shoe lovers willhaveothers buy forthem,ceases tobe agoodmeasure
Differenttypesofdata 12
![Page 13: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/13.jpg)
CaseofGoogleFlu
• GoogleFlu:websearches forFlu symptomspredicted actual flu cases
• By-product ofGoogle’s main service• Butfrom2010,notsowell:overestimatedactual flu cases,partly asresult ofautosuggestfeature,partly becausemodelwas overfitted(we’ll return tothat)
• Bestpredictor:number ofcasespast week
Differenttypesofdata 13
![Page 14: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/14.jpg)
roadmap
• Different datafordifferent questions• Theory andempirics,forecasting andhypothesis testing
• Effects ofcauses vs.Causes ofeffects• Datagenerating process• Modesofdatacollection – pros andcons• Strategicdatamanagementanddataproduction
Differenttypesofdata 14
![Page 15: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/15.jpg)
Effects ofcausesvs.
Causes ofeffects
Different questions• Effects ofcauses:intervention,what iseffectofpolicyXonoutcomeY
• Causes ofeffects:Why does Zoccur?
Differenttypesofdata 15
![Page 16: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/16.jpg)
Effects ofcauses(forwardcausal questions)
• Narrow questions,sometimes (butnotalways)policyinterventions– Effect oftax change onbehavior– Effect ofregulation onrisk taking– Effect ofschooling onearnings– Effect ofsmokingonlung cancerpropensity– Effect ofpublichealth onschooling inAfrica– …
• Often,butnotalways,amenabletotreatments/randomization/experimentation
Differenttypesofdata 16
![Page 17: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/17.jpg)
Causes ofeffects(reverse causal inference)
• Much harder,butoften moreinteresting–Why dosome people smoke?–What are thecauses ofdemocratization?–Why dosome people pursue aPhD why othersdropoutafter primary school?
–Why didGreece (almost)gobankrupt?• Tensionswith”effects ofcauses”– search forcauses sometimes derided as‘partychatter’
Differenttypesofdata 17
![Page 18: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/18.jpg)
roadmap
• Different datafordifferent questions• Theory andempirics,forecasting andhypothesis testing
• Effects ofcauses vs.Causes ofeffects• Datagenerating process• Modesofdatacollection – pros andcons• Strategicdatamanagementanddataproduction
Differenttypesofdata 18
![Page 19: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/19.jpg)
What isthedatagenerating process?
Observational:endogenousdecisions,researcherpassivecollector ofdataRandomization:treatment-control(Some)exogeneity:policyinterventions,sometimeswithcomparisons,researcherssometimes involved
Important:moredatadoes notgivebetterresult/moreprecision if estimator isbiased
Differenttypesofdata 19
Datagenerating process
![Page 20: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/20.jpg)
Randomizedexperiments• Distinguish– Labexperiments:traditionallycomputer-basedinecon,butalsoeyetracking/brainimages(fMRI)/physiological
– Surveyexperiments:assignsurveyrespondentstodifferentframes/treatments/primings,e.g.haveSocDems andLiberalssaysamethingandlookatsupport
– Fieldexperiments:experimentalcontrolintherealworld,e.g.bankschargingdifferentratestolearnaboutmobilityofcustomers;interventionsagainstteacherabsenteeisminIndia;…)
Different typesofdata 20
![Page 21: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/21.jpg)
Randomizedexperiments
• Distinguish– Naturalexperiments(weatherinduced:effectsofpovertyonviolence,randomizationofnamesonelectionballots,…)
– Quasi-experiments(effectsofchangeinpolicy;effectoftaxreformontaxplanning;effectofimmigrantallocationoncrime)
• Throughout:exogenous(outsideoftheindividual)change
Differenttypesofdata 21
![Page 22: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/22.jpg)
Randomizedexperiments
• Large,importantcurrentdebatein(development)economics
• CofE:whatareeffectsofpenaltiesonteachers’absenceinIndianvillageschools– evidencefromrandomizedexperiments
• Randomlyselectedteachersgetharshpenaltyforno-shows->differenceinabsenteeismcausaleffect ofpenalty
• (BroaderEofC Q:whyiseducationsectorinruralIndiasoinefficient?)
Differenttypesofdata 22
![Page 23: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/23.jpg)
Randomizedexperiments
• Strongoninternalvalidity:fromrandomizationany effectonabsenteeismisfromharsherpenalties;goodfortestingtheory
• Weak(er)onexternalvalidity– wouldeffectbesimilarinAfrica?Wouldeffectfromlabworkoutsidelab?Why,whynot?
• (compare:medicineworksinsimilarwaysacrosslocations)
Differenttypesofdata 23
![Page 24: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/24.jpg)
Randomizedexperiments
• Challenges– Limitstowhatcanbestudiedbyexperimentation( ethics;law;feasibility)
– Funding(fieldexperimentsexpensive,surveyexplessso)
– Oftenparticipationconstraint– voluntaryparticipants’gain>=0ornoincentive
– Subjectsleaveforvarious(systematic)reasons– Large-scalerandomizationcanbehardinfieldexperiments
Differenttypesofdata 24
![Page 25: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/25.jpg)
Observationaldata
• Generatedwithoutexperimentalorexogenousintervention
• Typicallyrevealscorrelationsordescriptivepatternsthatcanbeinterestinginthemselves
Differenttypesofdata 25
![Page 26: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/26.jpg)
Example:Inequality
Differenttypesofdata 26
Source:Piketty andSaez,Science2014,taxreturndata
![Page 27: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/27.jpg)
Observationaldata
• Generatedwithoutexperimentalorexogenousintervention
• Typicallyrevealscorrelationsordescriptivepatternsthatcanbeinterestinginthemselves– Areinthemselvessilentaboutcausality– Theorymaybeprovidestructuretolearnaboutcausalmechanismunderstrongassumptions
– Mayconflatecorrelationandcausality
Differenttypesofdata 27
![Page 28: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/28.jpg)
Observationaldata
• Exple:Doesbeinginprivateschoolsaffectgrades– Classic:CatholicschoolsandgradesinUS– Collectattendanceandgrades->runregression
• But:supposesomeparentsaremorefocusedonschoolingthanothers– Sendkidstoprivateschoolmore– Moreinvolvedinschool+homework
• Whatdohighergradesmeasure?– EffectofprivateschoolOReffectofinvolvedparents?
Differenttypesofdata 28
![Page 29: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/29.jpg)
Observationaldata
• Whattodo?– Assignkids/parentsrandomlytoprivateschools?
• Morecomplicated–Waiting-listexperimentdesign:peoplewhosignuprevealthemselvesasschoolinterested,comparegradesbetweenthoseinprogramandonwaitinglist->muchnarrowerdesign
– Modeling(UScase):usefactthatCatholicsaremuchmorelikelytochooseCatholicschools
Differenttypesofdata 29
![Page 30: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/30.jpg)
roadmap
• Different datafordifferent questions• Theory andempirics,forecasting andhypothesis testing
• Effects ofcauses vs.Causes ofeffects• Datagenerating process• Modesofdatacollection – pros andcons• Strategicdatamanagementanddataproduction
Differenttypesofdata 30
![Page 31: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/31.jpg)
Modesofdatacollection• (Ethnographic/participantobserver)• Survey– Interviewsurvey(inperson),phonesurvey,internetsurvey,…
• Administrativedata– Usedforadministrativepurposes– Somecountries:census,taxreturn– DK:CPR-registrybased
• (Primarycollection: texts,counting)• “Bigdata”:insocialsciencestypicallyaby-productofdigitalinformation
Differenttypesofdata 31
![Page 32: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/32.jpg)
Modesofdatacollection
• Note:survey,admindata,bigdatacanallhaverandomized/exogenouselementsorbepurelyobservational
• OfteninLab/fieldexperiments:askaboutincome,educationetc – butmaybebiased
• Sometimes:combineexperimentaldatawithadminorbigdata(butrare)
Differenttypesofdata 32
![Page 33: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/33.jpg)
Ethnographic
• Pros– Attempttounderstandsituationsfromparticipants’perspective
– Verydetailedobservations(e.g.dynamicsatameeting:whospeakswhen,wholistens,whonodsoffandflirtsetc)
• Cons– Verydifficulttogeneralize(ifeventhegoal)
– Typicallyverysmalln,notforstats
– Hardtoreproduce/replicate
Differenttypesofdata 33
![Page 34: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/34.jpg)
Surveys• Pros
– Canbecheap– Elicitinfoonattitudes,
beliefs,expectations– Necessarywhennoother
meansexist– Combinewithopen-ended
info– Easilyanonymized (firms;
China)
• Cons– Canbeexpensive– Non-randomsamples,
sometimesverymuchso(paidsurveys)
– Cheaptalk– Diverseinterpretations
(e.g.1-10scales,Maasaiexample)
– Verydifferentquality:interviewvs.internet
– Notfullresearchercontrol:Interviewercompletions
Differenttypesofdata 34
![Page 35: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/35.jpg)
Administrativedata
• Denmark,Norway,Sweden– Population-wide– Ex:Knowpopulation‘bypressingEnter’
• Mostothercountries:census(countingpeople),surveys,roughapproximations
– InDK,builtonCentralPersonRegistrynumber– Systemconstructedforsourcetaxationin1960s,nowusedasubiquitousidentifier
• WhydosomecountrieshaveCPR-likesystemsandsomenot?
BigDatainEconomics
![Page 36: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/36.jpg)
Administrativedata
• Pros– Oftenfullpopulation– InDK:thirdpartyreported->noreportingbias,nosurveybias
– Verydetailed,nosurveyfatigue
– Oftenveryprecise,sinceusedforadminpurposes
• Cons– Nosoftdata(attitudes,expectations);canbelinkedtosurveys
– Privacyconcerns– Restrictedtowhatiscollectedforadminreasons,bothtypeandfrequency(e.g.annual)
BigDatainEconomics
![Page 37: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/37.jpg)
Administrativedata
• LotsofworkinDanisheconutilizesregisterdata– Taxation– Education– Health– Financialdecisions– Labormarket
• Combinedwith– Personalitymeasures– Attitudes/politicalprefsfromsurveys
– Expectationsfromsurveys
– Biologicaldata(neuro-measures,genetics)
– Datafromexperiments
BigDatainEconomics
![Page 38: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/38.jpg)
Viva la revolución?HarnessingtheDataRevolution
forGood
HumanDevelopmentReportOffice
![Page 39: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/39.jpg)
Big data
BigDatainEconomics
![Page 40: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/40.jpg)
NoagreedupondefinitionwhatBigDatais
• LargeN?• Highfrequency/muchdetail?
• Manydifferentmeasurements?
• Basedonwhatpeopledo(‘honestsignals’)– ctr surveys– Notalwayshonest
• Differenttodifferentpeople/traditions
• ToAmericans,Danishadmin/registerdataisbigdata
BigDatainEconomics
![Page 41: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/41.jpg)
‘Bigdata’
• Pros– Oftenbasedonrealdecisions (asadmindata),butmoredetail,e.g.auctions
– Highfrequency (e.g.wifi),highgranularity->almost‘largeNethnographicdata’
– Sometimescheap/free
• Cons– Noestablishedprotocolforcollection
– Sometimesdubiousquality,selectionissues(bothknown/unknown)
– Start-upcosts– Evenmoreprivacyconcerns
– Corporategatekeepers->biasinaccess(Facebook,Google)
BigDatainEconomics
![Page 42: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/42.jpg)
Characteristicsof‘bigdata’
• Structured(row/column-style)vs.unstructured(images/sound)
• Temporallyreferenced(date,time,frequency)• Geographicallyreferenced(wifi,bluetooth,Google)
• Personidentifiable(identifyvs.distinguishindividualsvs.notdistinguishindividuals)– Separatemedium(e.g.phone)fromowner
BigDatainEconomics
![Page 43: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/43.jpg)
Example:SocialFabric
• Large-scale(N=1000)bigdataproject• HandedoutsmartphonestoDTUfreshmen• Collectedphone,SMS/text/email(notcontent),GPS,wifi,bluetooth data
• ->Where,when,withwhom• ->socialnetworks
BigDatainEconomics
![Page 44: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/44.jpg)
Whyphonedata
• Phonesassociometers• Many/mostpeoplecarryphonewiththemallthetime
• WouldbeIMPOSSIBLEtohavepeoplereportindetailforevery10mineverydayforayear
• Forthisproject:tailoredsoftware,butrealizedthatmanyappscollectdetailedwifi-datawithouttelling
• Concern:take-upofphones
BigDatainEconomics
![Page 45: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/45.jpg)
Example:SocialFabric
BigDatainEconomics
Phone locations0500hMondaymorning ->canpredictwherepeopleatgiventimewith85%accuracy
![Page 46: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/46.jpg)
Example:SocialFabric
BigDatainEconomics
10minGPS wifi
![Page 47: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/47.jpg)
Example:SocialFabric
BigDatainEconomics
![Page 48: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/48.jpg)
Example:peereffectsineducationeconomics
• Studentsallocatedtostudyandsocialgroups,calledvectorgroups(randomly)
• Aretherepeereffects,i.e.arestudents’grades/healthbehavior/studybehavioraffectedbythegroup?
• Literature:sometimesyes,sometimesno;veryheterogeneous
• Why?Perhapsbeingallocatedtogroupisnot=toactuallymeeting/usinggroup
BigDatainEconomics
![Page 49: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/49.jpg)
Example:peereffects• Thinkofallocationtogroupasintentiontotreat(similartoofferingtreatment)
• Interestingexample:Carrell etal,ECMA2013.Smallgroups,yespeereffects;largegroups:no/negativepeereffects– WHY?
• Usephonetomeasurefrequencyofgroupmembersbeingtogetherphysically,measuredbybluetooth
• Threeparts:(i)yestheyaremoretogether;(ii)moretogether=>workbettertogether;(iii)peereffects?
BigDatainEconomics
![Page 50: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/50.jpg)
Broaderissue:Whomeets,andhowclosearethey?
• Again:usebluetooth signalstomeasuremeetings(duration,participants)
• Analyzes3.1mio meetingsovertwomonths• Someresults:– Women/womenpairs->closer– Facebookfriends->closer– Samestudy->closer– Differenceinbeauty->furtherapart– Oneoverweight,onenot->furtherapart
• Peoplewhostandvery(too)closetoothershavefewerfriends(!?)
BigDatainEconomics
![Page 51: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/51.jpg)
Predictionvscausality
• Measureclassattendancefromphonedata(wifi/GPS/bluetooth)– Either:constructclustersatslotsknownasteachingtime;or:useadmininfoonclasslocationsandconstructGPSoverlays
• Facebookactivity
• Predictgrades
BigDatainEconomics
![Page 52: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/52.jpg)
BigDatainEconomics
![Page 53: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/53.jpg)
BigDatainEconomics
![Page 54: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/54.jpg)
BigDatainEconomics
![Page 55: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/55.jpg)
Predictionvscausality
Attendance->grades/comprehension– Peoplewhoattendmorelearnmore– PeoplewhospendlesstimeonFacebookhavemoretimeforstudying
AND/OR
Grades/comprehension->attendance– Findcourseshard->stayathome,moretemptedbyFacebook
BigDatainEconomics
![Page 56: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/56.jpg)
Example:CSS
BigDatainEconomics
HeatmapofpeoplewithmobiledevicesonCSS(anonymous)
![Page 57: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/57.jpg)
Example:DavidonSaturday
BigDatainEconomics
![Page 58: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/58.jpg)
Example:DavidsomeSaturday
BigDatainEconomicsFleamarket
![Page 59: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/59.jpg)
Example:howtomeasureconsumerspending
• Economicallyimportant:– Indicatorofhealthofeconomy– Importantforunderstandingindividualresponsestopolicy
– d.o.toeconomicshocks– Importantforconsumerprices->inflation->adjustmentsofwagesandtransfers
– Indevelopingcountries:importantforestimatesofpoverty,inequality
BigDatainEconomics
![Page 60: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/60.jpg)
Example:consumerspending• Traditionalmethods:
– Consumerexpendituresurveys(DK:forbrugsundersøgelsen)
– Diaryorscanner– Errors,selection
• EconomistswantedaccesstoindividualspendingdatafromDankort foralongtime– Noluck
• Recently,StatisticsDenmarkgotaccesstoCOOP-carddatatomeasureinflation– Tobemadepublicsoon,
prettygoodfitwithexistingmeasures(andmuchfaster)
– Niceidea,incentivecompatible
– Indep ofpaymenttype– Butselection?
BigDatainEconomics
![Page 61: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/61.jpg)
Example:consumerspending
• Attemptsindevelopingeconomics– Usesmartphonesasscannerormeansofpayment– whatcanweinferaboutindividualsfromsmartphoneuse(dedicatedusers)
– Selectionintowhohassmartphones– Butshouldbeseenagainstotherwaysofcollectingdata
• Qs:– Howcanweusesmartphonestoinferspendingbetter?– Whatkindsofeconomicallyinterestingdatacanwecollectviasmartphones?
BigDatainEconomics
![Page 62: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/62.jpg)
StatisticalanalysisofBigData
• Manyobservations:whatdoesstatisticalsignificancemean?– Andwhatispracticalrelevance?Sizeeffects
• Multipletestingproblems?Ifbigdatageneratesmanyvariables,whynotrunthroughthemalltoseewhatissignificant?– Correctstandarderrors
• Insomecases,‘eyeballeconometrics’canbedifficult– Needsystematicapproach
BigDatainEconomics
![Page 63: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/63.jpg)
Statistical/machinelearning
• Supposeyouhavenoorverylittletheorytoguideyou
• OLSisnotonlylinear,butalsopresumessomeideaofwhatactuallygoesinthereandhow
• Varian’sTitanicexample:whosurvivedtheTitanic– Twovariables:Classandage– Researcherdecide/guessvs.dataanalysisyieldmostlikely(decisiontree,butlotsmorecomplicated->Sebastian,later)
– Einav,Levin:Econshouldconsidermachinelearning
BigDatainEconomics
![Page 64: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/64.jpg)
StatisticalanalysisofBigData
• Butwhatifyouhavetheory(orthinkyouhave)– e.g.combine econometricsandmachinelearning
• Goesbacktoolddebateineconomics– MiltonFriedman(1953): judgeamodelbyitspredictions,notitsassumptions
– Machinelearningmadeforpredictionnotforhypothesistestingandtheory(in)validation
BigDatainEconomics
![Page 65: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/65.jpg)
roadmap
• Different datafordifferent questions• Theory andempirics,forecasting andhypothesis testing
• Effects ofcauses vs.Causes ofeffects• Datagenerating process• Modesofdatacollection – pros andcons• Strategicdatamanagementanddataproduction
Differenttypesofdata 65
![Page 66: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/66.jpg)
Strategicdatamanagementandproduction
• People/firms/governmentsdonotalwaysprovidetruthfuland/orcompletedata
• Example:Nopenaltyforlyinginsurveys– butnoreasonnottoeither
• Politicalreasonsforobscuringorinventingdata:GreeceinEU,Chineseeconomy
• Firms:Proprietaryinfo,competitionreasons,foolingcustomersandregulators(VW)
BigDatainEconomics
![Page 67: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/67.jpg)
Strategicdatamanagementandproduction
• Individualdemandforprivacy(Wereturntothis)– Couldbeinstrumental:• lackofprivacydecreasesconsumersurplusbybetterestimateofreservationprice(e.g.Steering:Macvs PCwhenorderingonline)• Concernsaboutpoliticalissues
– Oranobjectiveinitself:Privacyasapoliticalgoal
BigDatainEconomics
![Page 68: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/68.jpg)
Socialdesirability biasI
• Key concern insurveys,butmoregeneralproblem:What ifpeople answer soastoconformwithgeneralnotions ofwhat’s desirable?– Examples:Won’t admit tonotvoting orhavingsexually transmitted diseases,exaggerates income
– Reportsbuying healthy food vs unhealthy food– Important forasking/assessing sensitivequestions
BigDatainEconomics
![Page 69: Social Data Science Data and Big data · (paid surveys) – Cheap talk – Diverse interpretations (e.g. 1-10 scales, Maasai example) – Very different quality: interview vs. internet](https://reader033.vdocuments.us/reader033/viewer/2022050517/5fa10555bb5858468e1e06c7/html5/thumbnails/69.jpg)
Socialdesirability biasII
• Why?• Distinguish
a) self-deceptionb) impression management
• Example:What doyou value mostinapotentialmate?– Peoplesay:"kindandunderstanding”– Fromdatingdata:physical attractiveness,status– Biascould be both (a)and(b)
BigDatainEconomics