sta 304 h1f/ 1003 h1f summer 2015 · lecture 1 may 12, 2015 25. types of probability sampling...

28
STA 304 H1F/ 1003 H1F Summer 2015 Course Title- Surveys, Sampling and Observational Data Lectures: Tuesdays and Thursdays 6-9pm in SS 2117 Course website: Available through https://portal.utoronto.ca (UT Blackboard) Instructor: Dr. Shivon Sue-Chee (E-mail: [email protected]) Office hours: T and R 4-5pm in SS 6026 Teaching Assistants: Reihaneh, David, and Jinyoung Office hours: T, W and R 5-6pm in SS 1091 Lecture 1 May 12, 2015 1

Upload: others

Post on 09-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

STA 304 H1F/ 1003 H1F Summer 2015

Course Title- Surveys, Sampling and Observational Data

Lectures: Tuesdays and Thursdays 6-9pm in SS 2117

Course website: Available through https://portal.utoronto.ca (UT Blackboard)

Instructor: Dr. Shivon Sue-Chee (E-mail: [email protected])Office hours: T and R 4-5pm in SS 6026

Teaching Assistants: Reihaneh, David, and JinyoungOffice hours: T, W and R 5-6pm in SS 1091

Lecture 1 May 12, 2015 1

Page 2: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Course info

Content:

I Mathematical and statistical reasoning behind sampling

I Aspects of inference from surveys

I Observational studies

I Statistics and Society- ‘In the News’

Resources:

I Textbooks

I Other- Journals, Newspapers, Radio

Pre-requisite:ECO220Y1/ECO227Y1/EEB225H1/GGR270Y1/PSY201H1/SOC300Y1/STA220H1/STA248H1/STA255H1/STA261H1

Exclusion: STA322H1

Lecture 1 May 12, 2015 2

Page 3: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Textbooks

Main text (ESS) Additional reference (Lohr)

Lecture 1 May 12, 2015 3

Page 4: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Main ESS Textbook Chapters

Topic Chapter

Basic Sampling Concepts and Definitions 1, 2Types of Samples 2Questionnaire Design, Good and Bad Surveys 2Statistics Review, Probability samples 3Simple Random Sampling 4Stratified Random Sampling 5.................................................................... .............Ratio and Regression Estimation 6Systematic Sampling 7One-Stage Cluster Sampling 8Two-Stage Cluster Sampling 9Estimating the Population Size 10

Lecture 1 May 12, 2015 4

Page 5: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Evaluation

I Assignments (20%): TBAI One before and one after the midtermI Mostly practical

I Midterm Test (30%): Tuesday, June 2, 2015

I Final Exam (50% ): Between June 22-26, 2015, TBA

I Regular homework:I given during classI mostly from textbook(s)I not gradedI discussed in class or OHI a subset in assignment/test/exam

Lecture 1 May 12, 2015 5

Page 6: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

More course info

I No make-up test; Same-day remark policy

I Computing- some statistical computing is required

I Course website

I Communication

I Students in 1003

I Summer sessions are intensive!

I UofT is committed to accessibility and academic integrity.

I Class survey experience

Lecture 1 May 12, 2015 6

Page 7: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Short in-class questionnaire

1. Which pre-requisite(s) did you take? (Circle all that apply):ECO220 ECO227 GGR270 PSY201 SOC300 STA 220 221 248 255 261

2. Which UofT campus are you from?

3. Which passing mark (50-100), do you hope to acquire?

4. How many hours of TV did you watch last night?

5. What is your height, in inches?

6. Do you currently have a full or part-time job?

7. What is the handspan, in cm? (Use the rule on the overleaf)

Lecture 1 May 12, 2015 7

Page 8: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Outline

I Planning investigations: to reduce bias and variability

I Types of investigations: experiment vs surveyI Examples:

I CensusI Meta-analysisI A sample survey

I Who conducts surveys?

I Technical Terms and Concepts

I Sources of errors

I Probability sampling

I Computing

Lecture 1 May 12, 2015 8

Page 9: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Investigations

I Why? To learn about a research question in the presence ofuncertainty

I How?I carefully planned to lead to clearer interpretation of resultsI choose the material to study, and the features to be measuredI avoid, as much as possible, systematic errors =⇒ biasI control, as much as possible, the effect of random errors on

the conclusions −→ variabilityI subject to constraints on resources and methods

Lecture 1 May 12, 2015 9

Page 10: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Types of investigations

I Designed experiment vs observational studyI Observational study:

I at a single point in time (“cross-sectional”)I at several time points (“repeated measures/time

series/longitudinal”)

I Data can be collected retrospectively or prospectively

I Census: versus a sample

I secondary analysis of data collected for another purpose

I meta-analysis: statistical assessment of a collection of studieson the same topic

Lecture 1 May 12, 2015 10

Page 11: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Who conducts surveys?

I Government (see Ch.1); Statistics Canada

I Polling organizations: Gallup, Harris, Ipsos-Reid, Nanos, . . .Include reference when using data(e.g.,”CTV/Globe/CP24/Nanos Poll”)

I University research centres, eg. Institute for Social Researchat York U, NORC at U Chicago

I Businesses and industry associations: banks, hospitals, BBM,credit companies, auditors, market researchers

I Social scientists, natural resource scientists, ...

I Philosophers

I . . .

Lecture 1 May 12, 2015 11

Page 12: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Some Terminology and Concepts (§2.2)

I Element, or observation unit: an object to be measured

I Variable: a characteristic being measured on each element

I Population: all elements about which inference is desired

I Sampling unit: one or more elements from the population

I Sampling frame: list of sampling units

I Sample: selected sampling units or a subset of the population

Lecture 1 May 12, 2015 12

Page 13: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Some Terminology and Concepts (cont’d)

I Is the target population ≡ sampled population? Example?

I What is the purpose of most sample surveys?I Very generally, to estimate a population mean or total of a

quantityI Often, to relate this quantity to other measurements

Lecture 1 May 12, 2015 13

Page 14: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Example 1: Target vs sampled populationSuppose we want to estimate the average of power output of carsowned by residents in Toronto. Additionally, suppose we areinterested in comparing neighbourhoods.

I Element: A resident

I Variable: Power output of his car, in horsepower

I Population: All residents in Toronto

I Sampling Frame: Could be a list of all households in Toronto.

I Sampling Unit: A house

Lecture 1 May 12, 2015 14

Page 15: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Example 2: Definitions

A survey is conducted to find the average weight of cows in aregion. A list of all farms is available for the region, and 50 farmsare selected at random. Then the weight of each cow at the 50selected farms is recorded.

I Element: A cow

I Variable: Weight

I Population: All cows in the region

I Sampling Frame: A list of all farms

I Sampling Unit: A farm

Lecture 1 May 12, 2015 15

Page 16: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Source Errors

We can have two types of sources of errors in our estimate:

1. Sampling Errors: Errors in estimation, due to randomness.These errors can be treated with probabilistic terms, eg.Estimate + margin of error + probability associated with thatmargin

2. Non-sampling Errors: Errors in estimation that are not dueto randomness, but with the way in which sample wasselected.

(i) Errors of Nonobservation: Selection biasI Sample of Convenience, Judgment Sample, Undercoverage,

Overcoverage, Nonresponse

(ii) Errors of Observation: Measurement errorI Misinterpretation of the Questionnaire, Telescoping,

Respondent Inaccuracies, Wording, Processing Errors, etc.

Lecture 1 May 12, 2015 16

Page 17: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Errors of Non-observation - Selection Bias

Selection Bias: occurs when parts of the target population arenot included in the sample population or some units are sampled ata different rate than intended.

Lecture 1 May 12, 2015 17

Page 18: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Selection Bias Examples

Sample of Convenience: Units in the sample are selectedbecause they are available and easy to access.

I not representative of the “harder to select” ornon-responding units.

I not representative of the population.

Lecture 1 May 12, 2015 18

Page 19: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Selection Bias Examples

Judgement Sample: Units are selected to be included in thesample by using the investigator’s his/her judgement.Example: Investigators deliberately or purposivelyselecting a “representative” sample

Undercoverage: Not including all of the target population in thesampling frame. Example: Surveys conducted usingtelephone directories as a sampling frame areinadequate because of unlisted numbers

Overcoverage: Including units in the sampling frame that are notin the target population. Example: Can occurbecause of lack of screening or data collector’s failureto check sample eligibility

Lecture 1 May 12, 2015 19

Page 20: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Selection Bias Examples

Nonresponse: Failing to obtain responses from all of the chosensample. Example: Individuals that refuses to responddue to sensitive questions (Have you ever smokedmarijuana?), cannot respond, or cannot be reached.

Lecture 1 May 12, 2015 20

Page 21: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

More examples of Selection bias

I Misspecifying the target population Eg. Election predictionsmay be wrong if using registered voters from previous years asthe target population. The survey would miss new voters andresponses from undecided voters.

I Multiple listing in the sample frame (without adjusting for itin the analysis). Duplicates in the sampling frame will biasresults if the same units are sampled more than once. Eg.Telephone survey to estimate household size or income. How?

I Sample of almost all volunteers, like online surveys. Peoplewith strong opinions tend to volunteer responses.

I Substituting a convenient member of the population foranother who is not available

Lecture 1 May 12, 2015 21

Page 22: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Errors of Observation - Measurement Bias

Measurement Bias: Occurs when the measuringinstrument tends to differ from the true value in onedirection. Measurement errors can occur due toinaccurate responses from respondents or wrongmeasurements and/or poor survey design frominvestigators.Example 1. A scale that add 5 kilograms to theweight of every person. No amount of statisticalanalysis will disclose this issue.Example 2. Count all birds seen during 3-minuteperiod within a quarter-mile radius. Great idea...

Lecture 1 May 12, 2015 22

Page 23: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Measurement Bias - Examples with people

I People sometimes lie. Ex. sensitive matters or to obtaincertain outcomes

I People do not always understand the question. (due towording or respondent’s understanding)

I People forget. Telescoping: Respondentsmisspecification of incidents. People may not rememberexactly when asked about past experiences - durations,dates, frequency of events.

I People may give different answers to different interviewers.

I People may answer to please or impress interviewer or toavoid embarrassment.

Lecture 1 May 12, 2015 23

Page 24: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Measurement Bias - Examples with people

I People tend to agree - studies have shown that thesame people agreed to 2 contradictory questions.

I People do not want to admit to taboos, sensitivequestions, controversial issues

I Interviewers may misread questions, record incorrectly,antagonize respondent

I Words have different meanings to different people

I “Do you own a car?” - you (singular, plural), own (doesleasing count?), car all have different interpretations.....

I Question wording and order have impact on answers. Eg.double negatives, confusing sentence structure, etc.

Lecture 1 May 12, 2015 24

Page 25: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Reducing Errors

In the two types of sources of errors in our estimate:

1. Sampling Errors: Errors in estimation, due to randomness.Nothing can be done here, however an appropriated samplingtechnique might improve the treatment of these errors, forinstance, shorter confidence intervals

2. Non-sampling Errors: Errors in estimation that are not dueto randomness, but with the way in which sample wasselected.

(i) Errors of Nonobservation: Selection biasI Use Probability Sampling Methods for coverage problems,

sample of convenience and judgement samples. Callbacks,Rewards and Incentives for nonresponses cases.

(ii) Errors of Observation: Measurement BiasI Careful questionnaire design. Testing survey equipment.

Training interviewers. Pretesting surveys. Check for accuracyin respondent’s data.

Lecture 1 May 12, 2015 25

Page 26: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Types of probability sampling (§ 2.3)

1. simple random sampling (SRS):I each unit in the target population has the same probability to

be sampledI the probability that “you are sampled is independent of which

other units are in the sample

2. stratified random sampling:I target population is divided into groups/“strata; use SRS

within strataI Eg. SRS of males and of females in target population

3. cluster sampling:I target population is divided into clusters neighbourhoods,

schools, blocks, classroomsI clusters are chosen using SRSI all units in a cluster are surveyed

4. capture-recapture sampling (not in Ch. 2)I used for estimating population size (no sampling frame)I select an SRS, and tag the selected itemsI select a second SRS and count the selected items

Lecture 1 May 12, 2015 26

Page 27: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

What about computing?

I choose a number at random

I choose a simple random sample (with replacement)

I construct a histogram of a list of numbers

I compute the mean and variance of a list of numbers

I Homework: Figure out how to do the above using yourfavourite calculator or computer program

I Try out R (or R Studio, or R Commander)

I R links will be on course web page

Lecture 1 May 12, 2015 27

Page 28: STA 304 H1F/ 1003 H1F Summer 2015 · Lecture 1 May 12, 2015 25. Types of probability sampling (x2.3) 1. simple random sampling (SRS): I each unit in the target population has the

Next

I HW: Read Chapters 1 and 2 of ESS

I What issues in the news interest you?

I Consider a survey in the local media. Identify the samplingframe, sampling unit, sample, target population, variable(s) ofinterest.

I Review of some statistical concepts:I probability density functionsI expected value and varianceI i.i.d. samplingI sample mean and sample varianceI confidence interval

I Questionnaire Design (§2.5); Probability samples (§3)

Lecture 1 May 12, 2015 28