week04 learning from data handout(1)

9
1 Week04: Learning Week04: Learning from Data from Data What are/is Statistics? What are/is Statistics? 1. Familiar (plural) meaning = facts and figures 1. Familiar (plural) meaning = facts and figures What are/is Statistics? What are/is Statistics?

Upload: rhye999

Post on 30-Sep-2015

12 views

Category:

Documents


0 download

DESCRIPTION

Learning

TRANSCRIPT

  • 1

    Week04: Learning Week04: Learning from Datafrom Data

    What are/is Statistics?What are/is Statistics?1. Familiar (plural) meaning = facts and figures1. Familiar (plural) meaning = facts and figures

    What are/is Statistics?What are/is Statistics?

  • 2

    What are/is Statistics?What are/is Statistics?

    Source: Silver, Nate. Post-Midterm Ratings Dont Predict Re-election Chances, New York Times (1/5/11)

    What are/is Statistics?What are/is Statistics?http://sda.berkeley.edu/GSS

    Source: 1972-2008 General Social Survey (GSS)

    What are/is Statistics?What are/is Statistics?2. Academic (singular) meaning = 2. Academic (singular) meaning =

    collecting, organizing, interpreting and reportingcollecting, organizing, interpreting and reporting

    Source: OECD Health Data (2005)

  • 3

    What are/is Statistics?What are/is Statistics? 1657: Christiaan Huygens publishes first printed work on games of chance

    (probability = first main historical line of modern statistics)

    1662: John Graunts Natural and Political Observations Made Upon the Bills of Mortality(analysis of social data = second main historical line of modern statistics)

    1665: Sir William Petty (father of econometrics) publishes first known national income estimates.

    1693: Edmund Halley produces first correct life table (showing link between age & death) 1693: Edmund Halley produces first correct life table (showing link between age & death)

    1790: First decennial U.S. Census (start of oldest periodic continuous census)

    1828: Adolphe Quetelet publishes first general statistics handbook in Belgium

    1854: John Snow shows link between contaminated water and London cholera deaths

    1837-1880: William Farr develops the field of vital statistics

    1880s-1940s: Galton, Pearson, Wright, Spearman, Hotelling, Wilks & Neyman work on mathematics of evolution, heredity and psychology enriches statistics

    What are/is Statistics?What are/is Statistics?

    Collecting sample data to answer questions of interest(Part I of course)

    Describing or summarizing sample data (Part II of course)

    Inferring (making decisions or predictions for apopulation ) from sample data (Part III of course)

    Why Study Statistics?Why Study Statistics?

    1.1. Objectivity Objectivity thought to be captured best by thought to be captured best by random samples of statistical (quantitative) data random samples of statistical (quantitative) data is highly and widely valued (Porter 1995, p. 3)is highly and widely valued (Porter 1995, p. 3)

    When do you find it difficult to be objective?When do you find it difficult to be objective?

    Why do lawyers ask if jury members can be Why do lawyers ask if jury members can be impartial?impartial?

    Are scientists always objective?Are scientists always objective?

  • 4

    Why Study Statistics?Why Study Statistics?

    2.2. Real world relevance: Results generated by Real world relevance: Results generated by statistical analysis reflecting a statistical analysis reflecting a population average population average (rather than one or few personal stories) are (rather than one or few personal stories) are embraced by businesses for industrial quality embraced by businesses for industrial quality control, employed by quantitativelycontrol, employed by quantitatively--oriented oriented researchers, and enshrined by public policy researchers, and enshrined by public policy experts as most representative of population experts as most representative of population behavior and health (Porter 1986, p. 3).behavior and health (Porter 1986, p. 3).

    Sociologists Using EconometricsSociologists Using Econometrics

    Source: Cohen (01/18/10 New York Times, C1, C8)

  • 5

    Learning about a Population Learning about a Population from a Samplefrom a Sample

    1.1. We typically collect data from individuals in a We typically collect data from individuals in a household, or telephone, sample survey to obtain household, or telephone, sample survey to obtain information about a population (which we information about a population (which we cannot observe due to cost and other cannot observe due to cost and other constraints).constraints).

    2.2. 2001 Los Angeles County Mexican Immigrant 2001 Los Angeles County Mexican Immigrant Legal Status Survey (LACLegal Status Survey (LAC--MILSS) and 2007 MILSS) and 2007 Boston Metropolitan Immigrant Health & Legal Boston Metropolitan Immigrant Health & Legal Status Survey (BMStatus Survey (BM--IHLSS), & 2012 L.A. County IHLSS), & 2012 L.A. County Mexican Immigrant Health & Legal Status SurveyMexican Immigrant Health & Legal Status Survey

    30 Years of Legal Status Sample Surveys30 Years of Legal Status Sample Surveys

    19801980--1981 1981 Los Angeles County Parents Los Angeles County Parents Survey (LACPS) Survey (LACPS) 1988 National Agricultural Workers Survey (NAWS)1988 National Agricultural Workers Survey (NAWS) 1994/2001 L.A. County Mexican Immigrant Legal Status Survey (LAC1994/2001 L.A. County Mexican Immigrant Legal Status Survey (LAC--MILSS)MILSS) 1996 Survey of Income and Program Participation (SIPP)1996 Survey of Income and Program Participation (SIPP) 19961996--1997 Hispanic Immigrant Health Care Access Survey (HIHCAS)1997 Hispanic Immigrant Health Care Access Survey (HIHCAS) 19961996--1997 Hispanic Immigrant Health Care Access Survey (HIHCAS)1997 Hispanic Immigrant Health Care Access Survey (HIHCAS) 19991999--2000 2000 L.A.L.A.--NYC Immigrant Survey (LANYCIS)NYC Immigrant Survey (LANYCIS) 1999 California 1999 California Health Interview Health Interview Survey (CHIS)Survey (CHIS) 20002000--2001 L.A. Family and Neighborhood Survey (LAFANS)2001 L.A. Family and Neighborhood Survey (LAFANS) 2004 Mexican Immigrant Migration and Mobility 2004 Mexican Immigrant Migration and Mobility StatusuStatusu (MIMMS)(MIMMS) 2005 2005 Chicago Chicago Metro MexicanMetro Mexican--origin Population Studyorigin Population Study 2007 2007 Boston Metro Immigrant Health & Legal Status Survey (BMBoston Metro Immigrant Health & Legal Status Survey (BM--IHLSS)IHLSS) 2012 L.A. County Mexican Immigrant Legal Status Survey (LAC2012 L.A. County Mexican Immigrant Legal Status Survey (LAC--MIHLSS)MIHLSS)

    2007 Boston 2007 Boston Metropolitan Metropolitan Immigrant Health Immigrant Health & Legal & Legal Status Survey (BMStatus Survey (BM--IHLSS)IHLSS)

    Harvard University & UMASS Boston EnricoEnrico MarcelliMarcelli, Ph.D., Principal Investigator, Ph.D., Principal Investigator Gary Bennett, Ph.D., CoGary Bennett, Ph.D., Co--Principal InvestigatorPrincipal Investigator Howard Howard KohKoh, Ph.D., Co, Ph.D., Co--Principal InvestigatorPrincipal Investigator Phillip Phillip GranberryGranberry, Ph.D., Project Manager (BM, Ph.D., Project Manager (BM--IHLSS)IHLSS) Louisa Holmes, Project Manager (BMLouisa Holmes, Project Manager (BM--IHLSS)IHLSS) OrfeuOrfeu Buxton Ph D ConsultantBuxton Ph D ConsultantOrfeuOrfeu Buxton, Ph.D., ConsultantBuxton, Ph.D., Consultant Anthony Roman, MA, ConsultantAnthony Roman, MA, Consultant Jonathan Jonathan WinickoffWinickoff, Ph.D., , Ph.D., ConsultantConsultant

    Community Community PartnersPartners FaustoFausto de Rocha, Executive Director, Brazilian Immigrant Centerde Rocha, Executive Director, Brazilian Immigrant Center MagalisMagalis TroncosoTroncoso, , Executive Director, Dominican Development CenterExecutive Director, Dominican Development Center

    Robert Wood Johnson Robert Wood Johnson Foundation, NCI, UMASS Boston, & Foundation, NCI, UMASS Boston, & Blue Cross Blue Blue Cross Blue Shield Foundation of Shield Foundation of MassachusettsMassachusetts

  • 6

    2007 BM2007 BM--IHLSS DataIHLSS Data Two systematic Two systematic blockblock--level probability level probability household samples of 307 household samples of 307

    foreignforeign--born Brazilian adults (and 120 of their children) and 299 born Brazilian adults (and 120 of their children) and 299 Dominican adults (and 74 of their children) residing Dominican adults (and 74 of their children) residing in the in the BCQBCQ--MSAMSA

    Data collected between June and September, Data collected between June and September, 2007 by 50 student 2007 by 50 student J p ,J p , yyand other foreignand other foreign--born interviewers trained at UMASS Bostonborn interviewers trained at UMASS Boston

    Instrument included household roster, adult questionnaire, child Instrument included household roster, adult questionnaire, child questionnaire, and biological data collection checklistquestionnaire, and biological data collection checklist

    Five sections of adult questionnaire: Five sections of adult questionnaire: (1) Migration experience, (2) (1) Migration experience, (2) SES, (3) Social Capital, (4) Health, and (5) SocioSES, (3) Social Capital, (4) Health, and (5) Socio--political identitypolitical identity

    2007 BM2007 BM--IHLSS Sampling Frame IHLSS Sampling Frame

    427 427 Brazilian Brazilian Subjects from Subjects from 73 Neighborhoods in Middlesex County73 Neighborhoods in Middlesex County

  • 7

    373 373 Dominican Dominican Subjects from Subjects from 84 Neighborhoods Located in Essex County84 Neighborhoods Located in Essex County

    SociogeographicSociogeographic Model of Model of Insufficient SleepInsufficient Sleep

    SOC

    IOG

    EO

    GR

    AP

    HIC

    FAC

    TO

    RS

    OUTCOME

    4. Civic Groups Church, PTA, CBO Sports, Music, etc.

    Internet-based

    1. Home Income, Tenure

    Sleep partner, Children Meals, Noise

    2. Work/School Travel and work time

    Exposure to smoke, etc. Co-worker trust

    3. Neighborhood Population density Homeownership Disorder, Noise

    METROPOLITAN AREA

    IND

    IVID

    UA

    L-L

    EV

    EL

    FA

    CT

    OR

    S

    Individual-Sociogeographic Interaction

    5. Socioeconomic Status Age, sex, skin pigmentation

    Time in U.S.A., migration experience Migrant legal status Education, Earnings

    6. Health Biomarkers, BMI, Diabetes, etc.

    Diet, Physical activity, Sleep Meds Cigarette smoking, Alcohol

    Healthy DietSleep Behavior

    Our two BM-IHLSS samples included 299 Dominican and 307 Brazilian adults (or 606 subjects), of which 599 provided responses for questions (variables) included in Marcelli & Buxtons (2011) study on how several sociogeographic factors influenced whether migrants slept 7-9 hours on workdays.

    Descriptive vs. Inferential StatisticsDescriptive vs. Inferential Statistics

    FB Brazilian & Dominican adults in our sample were 36 years old on average, 48% were male, 9% had a college degree, 39% were unauthorized to reside in the USA, and about two-thirds were sleeping a healthy number of hours each workday (sample descriptive statistics).

    We are 68% confident that the mean age of all foreign-born Brazilian and Dominican adults residing in the Boston metropolitan area fell between 24 and 48 years, and that mean skin color on a scale of 1-10 fell between XX and YY (interval population parameter estimates) . . .

  • 8

    Sleep Duration among ForeignSleep Duration among Foreign--born Brazilian and born Brazilian and Dominican Migrant Adults in the Boston Dominican Migrant Adults in the Boston

    Metropolitan Area, 2007 BMMetropolitan Area, 2007 BM--IHLSSIHLSS

    Use of Computer Technology and Data Files Use of Computer Technology and Data Files (Databases) to Perform Statistical Analysis(Databases) to Perform Statistical Analysis

    There are various competing statistical software packages available on calculators and for use on other computers (e.g., STATA, SPSS , SAS), but YOU and not software must decide what kind of statistical tools should be used to answer a specific question. This is why it is important to understand how to compute means, standard deviations, etc.; as well as under what condition different software commands are to be used.

    Statistical analysis requires that data be organized (structured) electronically and sequentially in a data file (e.g., Excel spreadsheet) to be analyzed . . .

    STATA Computer Code Used for AnalysisSTATA Computer Code Used for Analysis

  • 9

    Sample Sample Randomness and Size Randomness and Size (Percent Unauthorized Migrant)(Percent Unauthorized Migrant)

    Sample Variability: Unauthorized Migrants, Sample Variability: Unauthorized Migrants, 2007 BM2007 BM--IHLSS, PercentIHLSS, Percent