week 2 an overview

37
Week 2 An overview Exposure and outcome (dependent and Exposure and outcome (dependent and independent variables) independent variables) Reliability and validity Reliability and validity What is “statistical significance”? What is “statistical significance”? Relationships between variables Relationships between variables -continuous variables (t-tests and z-tests) -continuous variables (t-tests and z-tests) -continuous variables (correlations) -continuous variables (correlations) -the normal (gaussian) distribution -the normal (gaussian) distribution -categorical variables (chi-square tests) -categorical variables (chi-square tests) Two by two tables and confidence intervals Two by two tables and confidence intervals Review of the articles Review of the articles Example 1: Children crossing streets Example 1: Children crossing streets Measures of association between variables Measures of association between variables For next week For next week

Upload: mya

Post on 26-Jan-2016

32 views

Category:

Documents


2 download

DESCRIPTION

Week 2 An overview. Exposure and outcome (dependent and independent variables) Reliability and validity What is “statistical significance”? Relationships between variables -continuous variables (t-tests and z-tests) -continuous variables (correlations) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Week 2 An overview

Week 2An overview

• Exposure and outcome (dependent and independent Exposure and outcome (dependent and independent variables)variables)

• Reliability and validityReliability and validity• What is “statistical significance”?What is “statistical significance”?• Relationships between variablesRelationships between variables

-continuous variables (t-tests and z-tests)-continuous variables (t-tests and z-tests)-continuous variables (correlations)-continuous variables (correlations)

• -the normal (gaussian) distribution-the normal (gaussian) distribution-categorical variables (chi-square tests)-categorical variables (chi-square tests)

• Two by two tables and confidence intervalsTwo by two tables and confidence intervals• Review of the articlesReview of the articles• Example 1: Children crossing streetsExample 1: Children crossing streets• Measures of association between variablesMeasures of association between variables• For next weekFor next week

Page 2: Week 2 An overview

A somewhat advanced society has figured how to package basic knowledge in pill form. A student, needing some learning, goes to the pharmacy and asks what kind of knowledge pills are available. The pharmacist says "Here's a pill for English literature." The student takes the pill and swallows it and has new knowledge about English literature!

"What else do you have?" asks the student. "Well, I have pills for art history, biology, and world history, "replies the pharmacist. The student asks for these, and swallows them and has new knowledge about those subjects!

Then the student asks, "Do you have a pill for statistics? "The pharmacist says "Wait just a moment", and goes back into the storeroom and brings back a whopper of a pill that is about twice the size of a jawbreaker and plunks it on the counter. "I have to take that huge pill for statistics?" inquires the student.

The pharmacist understandingly nods his head and replies "Well, you know statistics always was a little hard to swallow."

Page 3: Week 2 An overview

Epidemiologic study designs

1. Randomized controlled trial• Considered the ‘gold standard’• Exposure is assigned randomly• Participants followed over time to assess

outcome• Analytic comparison of risk or benefit in

exposed vs. not exposed• Can be applied to program evaluation

Page 4: Week 2 An overview

Epidemiologic study design 2

2. Cohort study• One group exposed• Other group unexposed• Participants followed over time to assess outcome• Analytic comparison of risk in exposed vs. not

exposed• Can be applied to program evaluation

Page 5: Week 2 An overview

Epidemiologic study designs 3

3. Case-control study• Based on outcomeBased on outcome• Exposure is compared in those with and without Exposure is compared in those with and without

outcomeoutcome• Analytic comparison of risk in exposed vs. not Analytic comparison of risk in exposed vs. not

exposedexposed

4. Descriptive study• Provides descriptive statistics of problem under Provides descriptive statistics of problem under

studystudy• No analytic comparison of risk / benefitNo analytic comparison of risk / benefit• Often precedes analytic studiesOften precedes analytic studies

Page 6: Week 2 An overview

Dependent vs independent variables

• Remember the exposure/outcome relationship• Another way to describe it is to attribute

dependent and independent variables-the outcome depends on the independent exposure variables

• It is the association between these variables that leads us to statistical tests

• The test we use depends on the type of variable

Page 7: Week 2 An overview

Statistical significance

• What is statistical significance?• The probability that the observed relationship

could have happed by chance• The p-value and confidence interval are the usual

measures of significance• Set by tradition at 0.05 or 95%• The higher the p value, the more likely it could

have happened by chance• The wider the confidence interval, the more likely

it could have happened by chance• Both driven by variability in the data and sample

size

Page 8: Week 2 An overview

Types of variables

• Continuous variables-variables for which there is a range of responses

e.g., age, blood pressure, weight

• Categorical variables– Variables that fall into categories– e.g, gender, smoking status

Page 9: Week 2 An overview

Hypothesis testing for continuous variables

•Mean (the average number)-calculated by summing all the numbers and dividing by n-Hypothesis testing usually done using a t-test to compare the 2 means-Significance of t-test based on sample size and variability within the data

•Median (the number in the middle)•-not usually tested•Mode (the most frequent response)•-not usually tested

Page 10: Week 2 An overview

Hypothesis testing for categorical variables

• Counts (how many fall within each category) Compare using 2X2 table

• Proportions (what percentage fall within each category)

• Compare 2 proportions• Frequency distributions (comparing counts

and percentages between categories)• Compare using chi-square test

Page 11: Week 2 An overview

2X2 tables: the foundation

Disease or other outcome

No disease or other outcome

Exposed a b

Not exposed c d

Page 12: Week 2 An overview

2X2 tables: estimating associations

Disease or other outcome

No disease or other outcome

Exposed a b a+b

Not exposed

c d c+d

a+c b+d a+b+c+d

Page 13: Week 2 An overview

Odds ratios and relative risks

• Odds ratios (ad/bc) calculate the odds of an outcome given an exposure

• Relative risk (a/a+b)/c/c+d) calculates the relative risk of an outcome in exposed compared to non-exposed group

• Statistical packages calculate confidence intervals

Page 14: Week 2 An overview

Confidence intervals

• Confidence intervals are used for hypothesis testing in 2X2 tables (and others)

• The width of a confidence interval is based on the variablility within the data and the sample size

• An OR or RR of 1 = no association• A confidence interval that crosses 1 is NOT

statistically significant

Page 15: Week 2 An overview

Regression lines and correlation

• Correlation is the measure of the way one variable is associated with another

• Can be done with 2 continuous variables

• The regression line is the best fit between 2 variables

• Ranges from -1 to 1

Page 16: Week 2 An overview

Article review

• Questions to consider:• What is the research question?• What is their study design?• What is the exposure variable(s)?• What is the outcome variable?• What are the strengths and limitations?• Who funded the study?• How compelling are the findings?

Page 17: Week 2 An overview

Example # 1

Statistical associations of the number of streets crossed by

children and:

-socio-economic indicators-child pedestrian injury rate

Page 18: Week 2 An overview

Background

• Child pedestrian injury rate has been declining in many countries, including Canada

• Concern has been expressed that the decline is due to a reduction in exposure to traffic (i.e., children are driven or bussed rather than walking)

Page 19: Week 2 An overview

Objective

• The objective of this study was to measure the number of streets children cross on one day

• To see if the number of streets crossed varies by socio-economic status

• To see if the child pedestrian injury rate is associated with the number of streets crossed

Page 20: Week 2 An overview

Variables

• Number of streets crossed as reported by parents from a random sample of schools in Montreal

• Socio-economic status measured by:-car ownership-parental education-home ownership

• Injury rate in police district as reported by the police

Page 21: Week 2 An overview

Methods

• Frequency distribution of average # of streets crossed presented by age and SES

• Statistical testing for the differences between means for categorical variables

• Scatterplot generated and regression line calculated

Page 22: Week 2 An overview

Table 1 Number of Streets Crossed by Age and Socio-economic Indicators*

Age N Mean SD

5 & 6 487 3.8 4.2

7 730 4.2 5.0

8 &9 519 4.8 5.3

10 657 5.5 5.8

11 & 12 108 6.6 6.3

Number of cars

0 467 5.9 5.8

1 1191 4.8 5.3

2 + 815 3.8 4.8

Home Ownership

Rent home 1213 5.5 5.6

Own home 1210 3.8 4.7

Page 23: Week 2 An overview

No car 1 car

Average streets crossed

(Mean)

5.9 4.8

Standard deviation

5.8 5.3

Sample size 467 1171

Z Test for difference between means 13.8, p<0.001

Comparing average streets crossed by car ownership

Page 24: Week 2 An overview

Figure 2: Ecologic AnalysisAverage Number of Main Streets Crossed and Injury Rate

By Police District

R2 = 0.62

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 1 2 3 4 5 6 7 8

Average Number of Main Streets Crossed

Inju

ry R

ate

per

1,0

00

Police District

95% Confidence Interval(minimum)95 % Confidence Interval(maximum)Linear Regression Line

Page 25: Week 2 An overview

Measures of association between variables

• Tied in to the concept of reliability and validity• Sometimes we need to test a new variable in

relation to an old one• For example, a new questionnaire, faster

blood test, etc.• Several ways to measure association:• Cronbach’s alpha, kappa, sensitivity,

specificity, positive predictive value, negative predictive value

Page 26: Week 2 An overview

Cronbach’s alpha

• Measures the reliability of a psychometric instrument

• Assesses the extent to which a set of test items can be treated as measuring a single latent variable

• Mean correlation between a set of items with the mean of all the other items

• Looks at variation between individuals compared to variation due to items

• Can be between – infinity and 1 (although usually only between 0 and 1)

• Usually considered ‘good’ if > 0.8

Page 27: Week 2 An overview

Kappa

• Measures the extent to which ratings given by 2 raters agree

• Often used when experts are assigning scores based on opinions (e.g., medication errors)

• Gives credit when scores match exactly, takes away agreement when they don’t

• Can be between 0 and 1• Usually considered ‘good’ if > 0.7

Page 28: Week 2 An overview

Sensitivity and specificity

Sensitivity• Measures the extent to which a test agrees with a

‘gold standard’• Often used when trying out a new diagnostic test• Reports how often the new test agrees with the

old when positive• Captures the false negatives• Calculated using a 2 X 2 table• Acceptability of score depends on test qualities

Page 29: Week 2 An overview

Sensitivity and specificity

Specificity• Measures the extent to which a test agrees with a

‘gold standard’• Often used when trying out a new diagnostic test• Captures the ‘false positives’• Reports how often the new test agrees with the

old when negative (eg accurately reports the absence of the condition)

• Calculated using a 2 X 2 table• Acceptability of score depends on test qualities

Page 30: Week 2 An overview

2X2 tables revisited

Gold standard +

(has condition)

Gold standard –

(does not have condition)

New test + a b

New test - c d

Page 31: Week 2 An overview

Calculating sensitivity and specificity

Sensitivity= number who are both disease positive and test positive/number who are disease positive

a/a+c

Specificity = number who are both disease negative and test negative/number who are disease negative

d/d+b

Page 32: Week 2 An overview

Understanding sensitivity and specificity

Sensitivity is high when the test picks up a lot of the true disease (has few false negatives) High sensitivity is important for infectious diseases (e.g., HIV)

Specificity is high when the test does not have false positives. This is important when the consequences of treating the disease are significant (e.g., cancer)

Page 33: Week 2 An overview

Positive and negative predictive value

• Tells you how good a test is at predicting whether a patient actually has the disease

• Positive predictive value is the probability that the patient has the disease given a positive test

• Depends on sensitivity, specificity and the prevalence of the disease

Page 34: Week 2 An overview

Overview

• Different types of variables are measured and presented differently

• P values and confidence intervals are the measure of statistical significance

• Tell us the probability that these results could have happened by chance

• Cronbach’s alpha, kappa, sensitivity and specificity tell us about relationships between measurements

Page 35: Week 2 An overview

For next week 1

• Read Chapter 3 in the text

• Read the ICES privacy document (www.ices.on.ca)

• Think about privacy and confidentiality

• What issues are relevant to you in your current research?

Page 36: Week 2 An overview
Page 37: Week 2 An overview

For next week 2

• Identify your data set• Where did it come from?• How was it collected?• What type of variables does it include?• What is your research question?• What are your exposure variables?• What is your outcome variable?• If you are not familiar with SPSS it is

STRONGLY recommended that you complete the tutorial