testing hypotheses continuous variables. h h h h h l h l l l l l h h l h l h h l high murder low...
TRANSCRIPT
![Page 1: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/1.jpg)
Testing hypotheses
Continuous variables
![Page 2: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/2.jpg)
H
H
H
H
H
L
H
L
L
L
L
L
H
H
L
H
L
H
H
L
High
Murder
Low
Murder
Low
Income3 1
High
Income2 4
High
Murder
Low
Murder
Low
Income75% 25%
High
Income33% 67%
Median income Murder rate
Hypothesis Lower income Higher murder rate
L L HH
![Page 3: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/3.jpg)
Correlation and Regression
• Correlation: measure of the strength of an association (relationship) between continuous variables
• Regression: predicting the value of a continuous dependent variable (y) based on the value of a continuous independent variable (x)
![Page 4: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/4.jpg)
Median income
Plot IV and DV for each case
(city) on a “scattergram”
(two cities detailed)
Hypothesis Lower income Higher murder rate
Murder rateMedian income
Distribution of cities by median income
Distribution of cities by murder rate
Analysis later…
![Page 5: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/5.jpg)
Correlation statistic - r• Values of r Range from –1 to +1
• -1 is a perfect negative association (correlation), meaning that as the scores of one variable increase, the scores of the other variable decrease at exactly the same rate
• +1 is a perfect positive association, meaning that both variables go up or down together, in perfect harmony
• Intermediate values of r (close to zero) indicate weak or no relationship
• Zero r (never in real life) means no relationship – that the variables do not change or “vary” together, except as what might happen through chance alone
+ 1
- 1
0 No relationship
Perfect positive relationship
Perfect negativerelationship
![Page 6: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/6.jpg)
1 2 3 4 5
1
2
3
4
5
6
X
Y
1 2 3 4 5
1
2
3
4
5
6
X
Y
r = +1
r = - 1
Can changes in one variable be predicted by changes in the other?
Two “scattergrams” – each with a “cloud” of dots
NOTE: Independent variable (X) is always placed on the horizontal axis
NOTE: Dependent variable (Y) is always placed on the vertical axis
![Page 7: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/7.jpg)
1 2 3 4 5
1
2
3
4
5
X
Y
Can changes in one variable be predicted by changes in the other?
As X changes in value, does Y move correspondingly, either in the same or opposite direction?
Here there seems to be no connection between X and Y. One cannot predict values of Y from values of X.
r = 0
![Page 8: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/8.jpg)
1 2 3 4 5
1
2
3
4
5
X
Y
Can changes in one variable be predicted by changes in the other?
Here as X changes in value by one unit Y also changes in value by one unit.
Knowing the value of X one can predict the value of Y. X and Y go up and down together, meaning a positive relationship.
r = +1
![Page 9: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/9.jpg)
1 2 3 4 5
1
2
3
4
5
X
Y
Can changes in one variable be predicted by changes in the other?
Here as X changes in value by one unit Y also changes in value by one unit.
Knowing the value of X one can predict the value of Y. X and Y go up and down in an opposite direction, meaning a negative relationship.
r = -1
![Page 10: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/10.jpg)
Computing r using the “Line of best fit”• To arrive at a value of “r” a straight line is placed through the cloud of dots (the actual, “observed” data)
• This line is placed so that the cumulative distance between itself and the dots is minimized
• The smaller this distance, the higher the r
• r’s are normally calculated with computers. Paired scores (each X/Y combination) and the means of X and Y are used to compute:
• a, where the line crosses the Y axis• b, the slope of the line
• When relationships are very strong or very weak, one can estimate the r value by simply examining the graph
1 2 3 4 5
1
2
3
4
5
6
X
Y
2
ab
![Page 11: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/11.jpg)
• The line of best fit predicts a value for
one variable given the value of the
other variable
• There will be a difference between
these estimated values and the actual,
known (“observed”) values. This
difference is called a “residual” or an
“error of the estimate.”
• As the error between the known and
predicted values decreases – as the
dots cluster more tightly around the
line – the absolute value of r (whether +
or –) increases
1 2 3 4 5
1
2
3
4
5
6
X
Y
if x =.5, y=2.3
if y =5, x=3.4
“Line of best fit”
![Page 12: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/12.jpg)
A perfect fit: Line of best fit goes “through” each dot
1 2 3 4 5
1
2
3
4
5
6
X
Y
r = +1.0a perfect fit
1 2 3 4 5
1
2
3
4
5
6
X
Y
4
r = -1.0a perfect fit
![Page 13: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/13.jpg)
r = +.65An intermediate fit yields an intermediate value of r
1 2 3 4 5
1
2
3
4
5
6
X
Y
2
Moderate cumulative distancebetween line of best fit and “cloud” of dots
![Page 14: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/14.jpg)
1 2 3 4 5
1
2
3
4
5
6
X
Y
r = - .19
A poor fit yields a low value of r
Large cumulative distancebetween line of best fit and “cloud” of dots
![Page 15: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/15.jpg)
R-squared (R2), the coefficient of determination
• Proportion of the change in the dependent variable (also known as the “effect” variable) that is accounted for by change in the independent variable (also known as the “predictor” variable)
• Taken by squaring the correlation coefficient (r)
• “Little” r squared (r2) depicts the explanatory power of a single independent/predictor variable
• “Big” R squared (R2) combines the effects of multiple independent/predictor variables. It’s the more commonly used.
![Page 16: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/16.jpg)
Change in the IV accounts for thirty-six percent of the change in the DV.
A moderate-to-strong relationship, in the hypothesized direction –hypothesis confirmed!
R = -.6R2 = .36
Hypothesis: Lower income higher murder rate
How to “read” a scattergram
•Move along the IV. Do the values of the DV change in a consistent direction?
•Look across the IV. Does knowing the value of the IV help you predict the value of the DV?
•Place a straight line through the cloud of dots, trying to minimize the overall distance between the line and the dots. Is the line at a pronounced angle?
To the extent that you can answer “yes” to each of these, there is a relationship
![Page 17: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/17.jpg)
Class exercise
Hypothesis: Height Weight
• Use this data to build a scattergram
• Be sure that the independent variable is on the X axis, smallest value on the left, largest on the right, just like when graphing any distribution
• Be sure that the dependent variable is on the Y axis, smallest value on the bottom, largest on top
• Place a dot representing a case at the intersection of its values on X and Y
• Place a STRAIGHT line where it minimizes the overall distance between itself and the cloud of dots
• Use this overall distance to estimate a possible value of r, from -1 (perfect negative relationship,) to 0 (no relationship), to +1 (perfect positive relationship)
Height (inches) Weight Age
62 130 23
62 167 26
64 145 30
64 150 28
68 145 28
60 122 26
63 125 31
66 125 20
69 236 40
62 115 20
69 150 21
64 115 23
64 175 22
65 150 29
68 208 40
66 190 26
63 150 28
74 230 25
67 150 34
64 117 27
71 195 21
71 230 24
65 175 26
69 180 27
69 220 28
70 150 20
![Page 18: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/18.jpg)
Height Weight
r = .72r2 = .52
A strong relationship
![Page 19: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/19.jpg)
r = .35r2 = .12
Age Weight
A weak-to-moderate relationship
![Page 20: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/20.jpg)
Age Weight (less extreme cases)
r = -.17r2 = .03
A very weak negative relationship
![Page 21: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/21.jpg)
r = .04r2 =.00
Age Height
No relationship
![Page 22: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/22.jpg)
Height76747270686664626058
Weight
240
220
200
180
160
140
120
100
3
412
7
LIGHT
HEAVY
SHORT TALL
Changing the level of
measurement from
continuous to categorical
![Page 23: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/23.jpg)
A weak negative relationship
r = -.26 r2 = .07
Spring ’15p.s.
![Page 24: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/24.jpg)
Exploring data with rWhy are we so polarized? Could part of the reason be a poor economy?
The r statistic can be used to explore such questions, not just for a small group but for the whole country!
But unless we go in with a hypothesis, backed by a literature review, it’s basically a fishing expedition.
Remember that there are lots of variables changing all the time, so finding substantial correlations isn’t unusual. Theorizing after the fact is always hazardous. Remember the story about lunar cycles and homicide?
![Page 25: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/25.jpg)
• “Spearman’s r”– Assess correlation between two ordinal categorical variables
• Partial correlation– Using a control variable to assess its potential influence on a bivariate (two-
variable) relationship when all variables are continuous– Analogous to using first-order partial tables for categorical variables– Instead of height weight, is it possible that a variable related to height – age
– is the real cause of changes in weight?
Zero-ordercorrelations
First-orderpartialcorrelations
HEIGHT WEIGHT AGE
HEIGHT 1.00 .72 .04 WEIGHT .72 1.00 .34
AGE .04 .34 1.00
Other correlation techniques
![Page 26: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/26.jpg)
Some parting thoughts
• If we did not use probability sampling– Our results apply only to the cases we coded– Accounting for the influence of other variables can be tricky– R and related statistics are often unimpressive; describing what they mean
can be tricky • If we used probability sampling
– Our results can be extended to the population– But, since samples are just that – samples – we cannot assume that the
statistics a sample yields (e.g., r, R2) hold true for the population– Techniques we’ll discuss later allow us to estimate the magnitude of the
difference between sample statistics and the corresponding population parameters
– This process will also let us interpret our results with far greater clarity and precision than is possible without probability sampling
![Page 27: Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder](https://reader030.vdocuments.us/reader030/viewer/2022020219/56649f2c5503460f94c47d88/html5/thumbnails/27.jpg)
Exam preview1. You will be given a hypothesis and data from a sample. There will be two variables – the dependent variable,
and the independent variable. Both will be categorical, and each will have two levels (e.g., low/high, etc.)A. You will build a table containing the frequencies (number of cases), just like we did in class and in this
slide show. For consistency, place the categories of the independent variable in rows, just like in the slide shows.
B. You will build another table with the percentages. Remember to go to one category of the independent variable and percentage it across the dependent variable. Then go to the other category of the independent variable and do the same.
C. You will analyze the results. Are they consistent with the hypothesis? 2. You will be given the same data as above, broken down by a control variable. It will also be categorical, with
two levels. A. You will build first order partial tables, one with frequencies (number of cases), the other with
percentages, for each level of the control variable. Remember that these tables will look exactly like the zero-order table. The hypothesis, the independent and dependent variables and their categories stay the same.
B. You will be asked whether introducing the control variable affects your assessment of the hypothesized zero-order relationship. This requires that you separately compare the results for each level of the control variable to the zero-order table. Does introducing the control variable tell us anything new?
3. You will be given another hypothesis and data. There will be two variables – the dependent variable and the independent variable. Both are continuous variables.
A. You will build a scattergram and draw in a line of best fit. Remember that the independent variable must go on the X (horizontal) axis, and the dependent variable must go on the Y (vertical) axis. Also remember that the line of best fit must be a straight line, placed so as minimize its overall distance from the dots, which represent the cases.
B. You will estimate the r (correlation coefficient) and state whether the scattergram supports the hypothesis. Be careful! First, is there a relationship between variables? Second, is it in the same direction (positive or negative) as the hypothesized relationship?