1 theory of regression. 2 the course 16 (or so) lessons –some flexibility depends how we feel what...
Post on 19-Dec-2015
217 views
TRANSCRIPT
![Page 1: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/1.jpg)
1
Theory of Regression
![Page 2: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/2.jpg)
2
The Course
• 16 (or so) lessons– Some flexibility
• Depends how we feel• What we get through
![Page 3: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/3.jpg)
3
Part I: Theory of Regression1. Models in statistics2. Models with more than one parameter:
regression3. Why regression?4. Samples to populations5. Introducing multiple regression6. More on multiple regression
![Page 4: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/4.jpg)
4
Part 2: Application of regression7. Categorical predictor variables8. Assumptions in regression analysis9. Issues in regression analysis10.Non-linear regression11.Moderators (interactions) in regression12.Mediation and path analysisPart 3: Advanced Types of Regression13.Logistic Regression14.Poisson Regression15. Introducing SEM16. Introducing longitudinal multilevel models
![Page 5: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/5.jpg)
5
House Rules
• Jeremy must remember– Not to talk too fast
• If you don’t understand– Ask – Any time
• If you think I’m wrong– Ask. (I’m not always right)
![Page 6: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/6.jpg)
6
Learning New Techniques
• Best kind of data to learn a new technique– Data that you know well, and understand
• Your own data– In computer labs (esp later on)– Use your own data if you like
• My data – I’ll provide you with– Simple examples, small sample sizes
• Conceptually simple (even silly)
![Page 7: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/7.jpg)
7
Computer Programs
• SPSS– Mostly
• Excel – For calculations
• GPower• Stata (if you like)• R (because it’s flexible and free)• Mplus (SEM, ML?)• AMOS (if you like)
![Page 8: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/8.jpg)
8
![Page 9: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/9.jpg)
9
![Page 10: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/10.jpg)
10
Lesson 1: Models in statistics
Models, parsimony, error, mean, OLS estimators
![Page 11: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/11.jpg)
11
What is a Model?
![Page 12: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/12.jpg)
12
What is a model?
• Representation– Of reality– Not reality
• Model aeroplane represents a real aeroplane– If model aeroplane = real
aeroplane, it isn’t a model
![Page 13: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/13.jpg)
13
• Statistics is about modelling– Representing and simplifying
• Sifting – What is important from what is not
important
• Parsimony – In statistical models we seek
parsimony– Parsimony simplicity
![Page 14: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/14.jpg)
14
Parsimony in Science
• A model should be:– 1: able to explain a lot– 2: use as few concepts as possible
• More it explains– The more you get
• Fewer concepts– The lower the price
• Is it worth paying a higher price for a better model?
![Page 15: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/15.jpg)
15
A Simple Model
• Height of five individuals– 1.40m– 1.55m– 1.80m– 1.62m– 1.63m
• These are our DATA
![Page 16: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/16.jpg)
16
A Little Notation
Y The (vector of) data that we are modelling
iY The ith observation in our data.
5
8,7,6,5,4
2
Y
Y
![Page 17: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/17.jpg)
17
Greek letters represent the true value in the population.
(Beta) Parameters in our model (population value)
0 The value of the first parameter of our model in the population.
j The value of the jth parameter of our model, in the population.
(Epsilon) The error in the population model.
![Page 18: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/18.jpg)
18
Normal letters represent the values in our sample. These are sample statistics, which are used to estimate population parameters.
b A parameters in our model (sample statistics)
e The error in our sample.
Y The data in our sample which we are trying to model.
![Page 19: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/19.jpg)
19
Symbols on top change the meaning.
Y The data in our sample which we are trying to model (repeated).
iY The estimated value of Y, for the ith case.
Y The mean of Y.
![Page 20: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/20.jpg)
20
11ˆ So b
I will use b1 (because it is easier to type)
![Page 21: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/21.jpg)
21
• Not always that simple– some texts and computer programs
use
b = the parameter estimate (as we have used)
(beta) = the standardised parameter estimate
SPSS does this.
![Page 22: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/22.jpg)
22
A capital letter is the set (vector) of parameters/statistics
B Set of all parameters (b0, b1, b2, b3 … bp)
Rules are not used very consistently (even by me).Don’t assume you know what someone means, without checking.
![Page 23: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/23.jpg)
23
• We want a model – To represent those data
• Model 1:– 1.40m, 1.55m, 1.80m, 1.62m, 1.63m– Not a model
• A copy
– VERY unparsimonious
• Data: 5 statistics• Model: 5 statistics
– No improvement
![Page 24: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/24.jpg)
24
• Model 2:– The mean (arithmetic mean)– A one parameter model
n
YYbY
i
n
ii
10
ˆ
![Page 25: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/25.jpg)
25
• Which, because we are lazy, can be written as
n
YY
![Page 26: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/26.jpg)
26
The Mean as a Model
![Page 27: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/27.jpg)
27
The (Arithmetic) Mean
• We all know the mean– The ‘average’– Learned about it at school– Forget (didn’t know) about how clever the
mean is
• The mean is:– An Ordinary Least Squares (OLS) estimator– Best Linear Unbiased Estimator (BLUE)
![Page 28: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/28.jpg)
28
Mean as OLS Estimator
• Going back a step or two• MODEL was a representation of DATA
– We said we want a model that explains a lot– How much does a model explain?
DATA = MODEL + ERRORERROR = DATA - MODEL
– We want a model with as little ERROR as possible
![Page 29: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/29.jpg)
29
• What is error?
Error (e)Model (b0)mean
Data (Y)
0.031.63
0.021.62
0.201.80
-0.051.55
-0.20
1.60
1.40
![Page 30: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/30.jpg)
30
• How can we calculate the ‘amount’ of error?
• Sum of errors
0
03.002.020.005.020.0
)(
)ˆ(
ERROR
0
bY
YY
e
i
i
i
![Page 31: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/31.jpg)
31
– 0 implies no ERROR
• Not the case
– Knowledge about ERROR is useful
• As we shall see later
![Page 32: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/32.jpg)
32
• Sum of absolute errors– Ignore signs
ERROR ie
0.20 0.05 0.20 0.02 0.03
ˆiY Y
0iY b
0.50
![Page 33: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/33.jpg)
33
• Are small and large errors equivalent?– One error of 4– Four errors of 1
– The same?
– What happens with different data?• Y = (2, 2, 5)
– b0 = 2
– Not very representative
• Y = (2, 2, 4, 4)– b0 = any value from 2 - 4
– Indeterminate• There are an infinite number of solutions which would
satisfy our criteria for minimum error
![Page 34: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/34.jpg)
34
• Sum of squared errors (SSE)
08.0
03.002.020.005.020.0
)(
)ˆ(
ERROR
22222
20
2
2
bY
YY
e
i
i
i
![Page 35: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/35.jpg)
35
• Determinate– Always gives one answer
• If we minimise SSE– Get the mean
• Shown in graph– SSE plotted against b0
– Min value of SSE occurs when
– b0 = mean
![Page 36: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/36.jpg)
36
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
b0
SS
E
![Page 37: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/37.jpg)
37
The Mean as an OLS Estimate
![Page 38: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/38.jpg)
38
Mean as OLS Estimate
• The mean is an Ordinary Least Squares (OLS) estimate– As are lots of other things
• This is exciting because– OLS estimators are BLUE– Best Linear Unbiased Estimators– Proven with Gauss-Markov Theorem
• Which we won’t worry about
![Page 39: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/39.jpg)
39
BLUE Estimators• Best
– Minimum variance (of all possible unbiased estimators
– Narrower distribution than other estimators• e.g. median, mode
• Linear– Linear predictions– For the mean– Linear (straight, flat) line
YY
![Page 40: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/40.jpg)
40
• Unbiased– Centred around true (population) values– Expected value = population value– Minimum is biased.
• Minimum in samples > minimum in population
• Estimators– Errrmm… they are estimators
• Also consistent– Sample approaches infinity, get closer
to population values– Variance shrinks
![Page 41: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/41.jpg)
41
SSE and the Standard Deviation
• Tying up a loose end
2)ˆ( YYSSE i
n
YYs i
2)ˆ(
1)ˆ( 2
n
YYi
![Page 42: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/42.jpg)
42
• SSE closely related to SD• Sample standard deviation – s
– Biased estimator of population SD
• Population standard deviation - – Need to know the mean to calculate SD
• Reduces N by 1• Hence divide by N-1, not N
– Like losing one df
![Page 43: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/43.jpg)
43
Proof
• That the mean minimises SSE– Not that difficult– As statistical proofs go
• Available in– Maxwell and Delaney – Designing
experiments and analysing data– Judd and McClelland – Data Analysis
(out of print?)
![Page 44: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/44.jpg)
44
What’s a df?
• The number of parameters free to vary– When one is fixed
• Term comes from engineering– Movement available to structures
![Page 45: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/45.jpg)
45
0 dfNo variation
available
1 dfFix 1 corner, the
shape is fixed
![Page 46: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/46.jpg)
46
Back to the Data
• Mean has 5 (N) df– 1st moment
• has N –1 df– Mean has been fixed– 2nd moment– Can think of as amount cases vary
away from the mean
![Page 47: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/47.jpg)
47
While we are at it …
• Skewness has N – 2 df– 3rd moment
• Kurtosis has N – 3 df– 4rd moment– Amount cases vary from
![Page 48: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/48.jpg)
48
Parsimony and df
• Number of df remaining– Measure of parsimony
• Model which contained all the data– Has 0 df
– Not a parsimonious model
• Normal distribution– Can be described in terms of mean and
• 2 parameters
– (z with 0 parameters)
![Page 49: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/49.jpg)
49
Summary of Lesson 1
• Statistics is about modelling DATA– Models have parameters– Fewer parameters, more parsimony, better
• Models need to minimise ERROR– Best model, least ERROR – Depends on how we define ERROR – If we define error as sum of squared
deviations from predicted value– Mean is best MODEL
![Page 50: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/50.jpg)
50
![Page 51: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/51.jpg)
51
![Page 52: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/52.jpg)
52
Lesson 2: Models with one more parameter -
regression
![Page 53: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/53.jpg)
53
In Lesson 1 we said …
• Use a model to predict and describe data– Mean is a simple, one parameter model
![Page 54: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/54.jpg)
54
More Models
Slopes and Intercepts
![Page 55: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/55.jpg)
55
More Models• The mean is OK
– As far as it goes– It just doesn’t go very far– Very simple prediction, uses very little
information• We often have more information
than that– We want to use more information
than that
![Page 56: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/56.jpg)
56
House Prices• In the UK, two of the largest
lenders (Halifax and Nationwide) compile house price indices– Predict the price of a house– Examine effect of different
circumstances
• Look at change in prices– Guides legislation
• E.g. interest rates, town planning
![Page 57: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/57.jpg)
57
Predicting House Prices
Beds £ (000s)1 772 741 883 625 905 1362 355 1344 1381 55
![Page 58: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/58.jpg)
58
One Parameter Model• The mean
9.11806ˆ
9.88
0
SSEYbY
Y
“How much is that house worth?”“£88,900”Use 1 df to say that
![Page 59: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/59.jpg)
59
Adding More Parameters• We have more information than
this– We might as well use it– Add a linear function of number of
bedrooms (x1)
110ˆ xbbY
![Page 60: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/60.jpg)
60
Alternative Expression
• Estimate of Y (expected value of Y)
• Value of Y
110ˆ xbbY
iii exbbY 110
![Page 61: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/61.jpg)
61
Estimating the Model• We can estimate this model in four different,
equivalent ways– Provides more than one way of thinking about it
1. Estimating the slope which minimises SSE2. Examining the proportional reduction in
SSE3. Calculating the covariance 4. Looking at the efficiency of the predictions
![Page 62: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/62.jpg)
62
Estimate the Slope to Minimise SSE
![Page 63: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/63.jpg)
63
Estimate the Slope
• Stage 1– Draw a scatterplot– x-axis at mean
• Not at zero
• Mark errors on it– Called ‘residuals’– Sum and square these to find SSE
![Page 64: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/64.jpg)
64
0
20
40
60
80
100
120
140
160
1.5 2 2.5 3 3.5 4 4.5 5 5.5
![Page 65: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/65.jpg)
65
0
20
40
60
80
100
120
140
160
1.5 2 2.5 3 3.5 4 4.5 5 5.5
![Page 66: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/66.jpg)
66
• Add another slope to the chart– Redraw residuals– Recalculate SSE– Move the line around to find slope
which minimises SSE
• Find the slope
![Page 67: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/67.jpg)
67
• First attempt:
![Page 68: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/68.jpg)
68
• Any straight line can be defined with two parameters– The location (height) of the slope
• b0
– Sometimes called
– The gradient of the slope • b1
![Page 69: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/69.jpg)
69
• Gradient
1 unit
b1 units
![Page 70: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/70.jpg)
70
• Height
b0 u
nits
![Page 71: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/71.jpg)
71
• Height• If we fix slope to zero
– Height becomes mean
– Hence mean is b0
• Height is defined as the point that the slope hits the y-axis– The constant– The y-intercept
![Page 72: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/72.jpg)
72
• Why the constant?– b0x0
– Where x0 is 1.00 for every case• i.e. x0 is constant
• Implicit in SPSS– Some packages
force you to make it explicit
– (Later on we’ll need to make it explicit)
beds (x1) x0 £ (000s)1 1 772 1 741 1 883 1 625 1 905 1 1362 1 355 1 1344 1 1381 1 55
![Page 73: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/73.jpg)
73
• Why the intercept?– Where the regression line intercepts
the y-axis– Sometimes called y-intercept
![Page 74: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/74.jpg)
74
Finding the Slope
• How do we find the values of b0 and b1?– Start with we jiggle the values, to find
the best estimates which minimise SSE– Iterative approach
• Computer intensive – used to matter, doesn’t really any more
• (With fast computers and sensible search algorithms – more on that later)
![Page 75: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/75.jpg)
75
• Start with– b0=88.9 (mean)– b1=10 (nice round number)
• SSE = 14948 – worse than it was
– b0=86.9, b1=10, SSE=13828– b0=66.9, b1=10, SSE=7029– b0=56.9, b1=10, SSE=6628– b0=46.9, b1=10, SSE=8228– b0=51.9, b1=10, SSE=7178– b0=51.9, b1=12, SSE=6179– b0=46.9, b1=14, SSE=5957– ……..
![Page 76: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/76.jpg)
76
• Quite a long time later– b0 = 46.000372
– b1 = 14.79182
– SSE = 5921
• Gives the position of the – Regression line (or)– Line of best fit
• Better than guessing
• Not necessarily the only method– But it is OLS, so it is the best (it is
BLUE)
![Page 77: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/77.jpg)
77
0
20
40
60
80
100
120
140
160
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
Number of Bedrooms
Pri
ce
Actual Price
Predicted Price
![Page 78: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/78.jpg)
78
• We now know – A house with no bedrooms is worth
£46,000 (??!)– Adding a bedroom adds £15,000
• Told us two things– Don’t extrapolate to meaningless
values of x-axis– Constant is not necessarily useful
• It is necessary to estimate the equation
![Page 79: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/79.jpg)
79
Standardised Regression Line
• One big but:– Scale dependent
• Values change – £ to €, inflation
• Scales change– £, £000, £00?
• Need to deal with this
![Page 80: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/80.jpg)
80
• Don’t express in ‘raw’ units– Express in SD units
– x1=1.72
– y=36.21
• b1 = 14.79
• We increase x1 by 1, and Ŷ increases by 14.79
SDsSDs 408.0)21.36/79.14(79.14
![Page 81: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/81.jpg)
81
• Similarly, 1 unit of x1 = 1/1.72 SDs
– Increase x1 by 1 SD
– Ŷ increases by 14.79 (1.72/1) = 8.60
• Put them both together
y
xb
11
![Page 82: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/82.jpg)
82
• The standardised regression line– Change (in SDs) in Ŷ associated with a
change of 1 SD in x1
• A different route to the same answer– Standardise both variables (divide by
SD)– Find line of best fit
706.021.36
72.179.14
![Page 83: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/83.jpg)
83
• The standardised regression line has a special name
The Correlation Coefficient(r)
(r stands for ‘regression’, but more on that later)
• Correlation coefficient is a standardised regression slope– Relative change, in terms of SDs
![Page 84: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/84.jpg)
84
Proportional Reduction in Error
![Page 85: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/85.jpg)
85
Proportional Reduction in Error
• We might be interested in the level of improvement of the model– How much less error (as proportion)
do we have– Proportional Reduction in Error (PRE)
• Mean only– Error(model 0) = 11806
• Mean + slope– Error(model 1) = 5921
![Page 86: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/86.jpg)
86
0.4984PRE
118065921
1PRE
ERROR(0)ERROR(1)
1PRE
ERROR(0)ERROR(1)ERROR(0)
PRE
![Page 87: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/87.jpg)
87
• But we squared all the errors in the first place – So we could take the square root– (It’s a shoddy excuse, but it makes the
point)
706.00.4984
• This is the correlation coefficient• Correlation coefficient is the square
root of the proportion of variance explained
![Page 88: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/88.jpg)
88
Standardised Covariance
![Page 89: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/89.jpg)
89
Standardised Covariance• We are still iterating
– Need a ‘closed-form’– Equation to solve to get the parameter
estimates
• Answer is a standardised covariance– A variable has variance– Amount of ‘differentness’
• We have used SSE so far
![Page 90: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/90.jpg)
90
• SSE varies with N– Higher N, higher SSE
• Divide by N– Gives SSE per person– (Actually N – 1, we have lost a df to
the mean)• The variance• Same as SD2
– We thought of SSE as a scattergram • Y plotted against X
– (repeated image follows)
![Page 91: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/91.jpg)
91
0
20
40
60
80
100
120
140
160
1.5 2 2.5 3 3.5 4 4.5 5 5.5
![Page 92: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/92.jpg)
92
• Or we could plot Y against Y– Axes meet at the mean (88.9)– Draw a square for each point– Calculate an area for each square– Sum the areas
• Sum of areas – SSE
• Sum of areas divided by N– Variance
![Page 93: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/93.jpg)
93
Plot of Y against Y
0
20
40
60
80
100
120
140
160
180
0 20 40 60 80 100 120 140 160 180
![Page 94: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/94.jpg)
94
0
20
40
60
80
100
120
140
160
180
0 20 40 60 80 100 120 140 160 180
Draw Squares
35 – 88.9 = -53.9
35 – 88.9 = -53.9
138 – 88.9 = 40.1
138 – 88.9 = 40.1
Area = -53.9 x -53.9
= 2905.21
Area = 40.1 x 40.1= 1608.1
![Page 95: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/95.jpg)
95
• What if we do the same procedure– Instead of Y against Y– Y against X
• Draw rectangles (not squares)• Sum the area• Divide by N - 1• This gives us the variance of x with
y– The Covariance – Shortened to Cov(x, y)
![Page 96: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/96.jpg)
96
![Page 97: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/97.jpg)
97
55 – 88.9 = -33.9
1 - 3 = -2
Area = (-33.9) x (-2)
= 67.84 - 3 = 1
138-88.9 = 49.1
Area = 49.1 x 1 = 49.1
![Page 98: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/98.jpg)
98
• More formally (and easily)• We can state what we are doing as
an equation – Where Cov(x, y) is the covariance
• Cov(x,y)=44.2• What do points in different sectors
do to the covariance?
1))((
),(Cov
N
yyxxyx
![Page 99: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/99.jpg)
99
• Problem with the covariance– Tells us about two things– The variance of X and Y– The covariance
• Need to standardise it– Like the slope
• Two ways to standardise the covariance– Standardise the variables first
• Subtract from mean and divide by SD
– Standardise the covariance afterwards
![Page 100: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/100.jpg)
100
• First approach– Much more computationally
expensive• Too much like hard work to do by hand
– Need to standardise every value• Second approach
– Much easier– Standardise the final value only
• Need the combined variance– Multiply two variances– Find square root (were multiplied in
first place)
![Page 101: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/101.jpg)
101
• Standardised covariance
706.0
13119.2
2.44
)(Var)(Var
),(Cov
yx
yx
![Page 102: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/102.jpg)
102
• The correlation coefficient– A standardised covariance is a
correlation coefficient
Covariance
variance variancer
![Page 103: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/103.jpg)
103
1)(
1)(
1))((
22
Nyy
Nxx
Nyyxx
r
• Expanded …
![Page 104: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/104.jpg)
104
• This means …– We now have a closed form equation
to calculate the correlation– Which is the standardised slope– Which we can use to calculate the
unstandardised slope
![Page 105: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/105.jpg)
105
y
xbr
11
1
1x
yrb
We know that:
We know that:
![Page 106: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/106.jpg)
106
• So value of b1 is the same as the iterative approach
79.1472.1
21.36706.0
1
1
1
1
b
b
rb
x
y
![Page 107: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/107.jpg)
107
• The intercept– Just while we are at it
• The variables are centred at zero– We subtracted the mean from both
variables– Intercept is zero, because the axes
cross at the mean
![Page 108: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/108.jpg)
108
• Add mean of y to the constant– Adjusts for centring y
• Subtract mean of x– But not the whole mean of x– Need to correct it for the slope
00.46
38.149.88
11
c
c
xbyc
• Naturally, the same
![Page 109: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/109.jpg)
109
Accuracy of Prediction
![Page 110: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/110.jpg)
110
One More (Last One)• We have one more way to
calculate the correlation– Looking at the accuracy of the
prediction
• Use the parameters– b0 and b1
– To calculate a predicted value for each case
![Page 111: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/111.jpg)
111
• Plot actual price against predicted price– From the
model
BedsActual Price
Predicted Price
1 77 60.802 74 75.591 88 60.803 62 90.385 90 119.965 136 119.962 35 75.595 134 119.964 138 105.171 55 60.80
![Page 112: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/112.jpg)
112
20
40
60
80
100
120
140
20 40 60 80 100 120 140 160Actual Value
Pre
dict
ed V
alue
![Page 113: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/113.jpg)
113
• r = 0.706– The correlation
• Seems a futile thing to do– And at this stage, it is– But later on, we will see why
![Page 114: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/114.jpg)
114
Some More Formulae• For hand calculation
• Point biserial
22 yx
xyr
y
yy
sd
PQMMr
01
![Page 115: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/115.jpg)
115
• Phi ()– Used for 2 dichotomous variables
Vote P Vote Q
Homeowner A: 19 B: 54
Not homeowner C: 60 D:53
))()()(( DBCADCBA
ADBCr
![Page 116: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/116.jpg)
116
• Problem with the phi correlation– Unless Px= Py (or Px = 1 – Py)
• Maximum (absolute) value is < 1.00• Tetrachoric can be used
• Rank (Spearman) correlation– Used where data are ranked
)1(
62
2
nn
dr
![Page 117: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/117.jpg)
117
Summary• Mean is an OLS estimate
– OLS estimates are BLUE
• Regression line– Best prediction of DV from IV– OLS estimate (like mean)
• Standardised regression line– A correlation
![Page 118: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/118.jpg)
118
• Four ways to think about a correlation– 1. Standardised regression line– 2. Proportional Reduction in Error
(PRE)– 3. Standardised covariance– 4. Accuracy of prediction
![Page 119: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/119.jpg)
119
![Page 120: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/120.jpg)
120
![Page 121: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/121.jpg)
121
Lesson 3: Why Regression?
A little aside, where we look at why regression has such a
curious name.
![Page 122: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/122.jpg)
122
Regression
The or an act of regression; reversion; return towards the
mean; return to an earlier stage of development, as in an adult’s or an adolescent’s behaving like a child(From Latin gradi, to go)
• So why name a statistical technique which is about prediction and explanation?
![Page 123: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/123.jpg)
123
• Francis Galton – Charles Darwin’s cousin– Studying heritability
• Tall fathers have shorter sons• Short fathers have taller sons
– ‘Filial regression toward mediocrity’ – Regression to the mean
![Page 124: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/124.jpg)
124
• Galton thought this was biological fact– Evolutionary basis?
• Then did the analysis backward– Tall sons have shorter fathers– Short sons have taller fathers
• Regression to the mean– Not biological fact, statistical artefact
![Page 125: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/125.jpg)
125
Other Examples
• Secrist (1933): The Triumph of Mediocrity in Business
• Second albums often tend to not be as good as first
• Sequel to a film is not as good as the first one
• ‘Curse of Athletics Weekly’• Parents think that punishing bad
behaviour works, but rewarding good behaviour doesn’t
![Page 126: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/126.jpg)
126
Pair Link Diagram
• An alternative to a scatterplot
x y
![Page 127: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/127.jpg)
127
r=1.00
x
x
x
x
x
x
x
![Page 128: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/128.jpg)
128
x
x x
x
x
r=0.00
![Page 129: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/129.jpg)
129
From Regression to Correlation
• Where do we predict an individual’s score on y will be, based on their score on x?– Depends on the correlation
• r = 1.00 – we know exactly where they will be
• r = 0.00 – we have no idea• r = 0.50 – we have some idea
![Page 130: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/130.jpg)
130
x y
r=1.00
Starts here
Will end up here
![Page 131: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/131.jpg)
131
x y
r=0.00
Could end anywhere here
Starts here
![Page 132: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/132.jpg)
132
r=0.50
x y
Starts here
Probably end
somewhere here
![Page 133: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/133.jpg)
133
Galton Squeeze Diagram
• Don’t show individuals– Show groups of individuals, from the
same (or similar) starting point– Shows regression to the mean
![Page 134: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/134.jpg)
134
x y
r=0.00
Group starts here
Ends here
Group starts here
![Page 135: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/135.jpg)
135
x y
r=0.50
![Page 136: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/136.jpg)
136
x y
r=1.00
![Page 137: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/137.jpg)
137
x y
1 unit r units
• Correlation is amount of regression that doesn’t occur
![Page 138: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/138.jpg)
138
x y
• No regression• r=1.00
![Page 139: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/139.jpg)
139
• Some regression• r=0.50
x y
![Page 140: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/140.jpg)
140
r=0.00
x y
• Lots (maximum) regression• r=0.00
![Page 141: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/141.jpg)
141
Formula
xxyy zrz ˆ
![Page 142: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/142.jpg)
142
Conclusion
• Regression towards mean is statistical necessity
regression = perfection – correlation• Very non-intuitive• Interest in regression and correlation
– From examining the extent of regression towards mean
– By Pearson – worked with Galton– Stuck with curious name
• See also Paper B3
![Page 143: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/143.jpg)
143
![Page 144: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/144.jpg)
144
![Page 145: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/145.jpg)
145
Lesson 4: Samples to Populations – Standard Errors and Statistical
Significance
![Page 146: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/146.jpg)
146
The Problem• In Social Sciences
– We investigate samples• Theoretically
– Randomly taken from a specified population
– Every member has an equal chance of being sampled
– Sampling one member does not alter the chances of sampling another
• Not the case in (say) physics, biology, etc.
![Page 147: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/147.jpg)
147
Population
• But it’s the population that we are interested in– Not the sample– Population statistic represented with
Greek letter– Hat means ‘estimate’
xx
b
ˆ
ˆ
![Page 148: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/148.jpg)
148
• Sample statistics (e.g. mean) estimate population parameters
• Want to know– Likely size of the parameter– If it is > 0
![Page 149: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/149.jpg)
149
Sampling Distribution
• We need to know the sampling distribution of a parameter estimate– How much does it vary from sample to
sample
• If we make some assumptions– We can know the sampling distribution
of many statistics– Start with the mean
![Page 150: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/150.jpg)
150
Sampling Distribution of the Mean
• Given– Normal distribution– Random sample– Continuous data
• Mean has a known sampling distribution– Repeatedly sampling will give a known
distribution of means– Centred around the true (population)
mean ()
![Page 151: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/151.jpg)
151
Analysis Example: Memory• Difference in memory for different
words
– 10 participants given a list of 30 words to learn, and then tested
– Two types of word
• Abstract: e.g. love, justice
• Concrete: e.g. carrot, table
![Page 152: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/152.jpg)
152
Concrete Abstract Diff (x)12 4 811 7 44 6 -29 12 -38 6 2
12 10 29 8 18 5 3
12 10 28 4 4
10
11.3
1.2
N
x
x
![Page 153: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/153.jpg)
153
Confidence Intervals
• This means– If we know the mean in our sample– We can estimate where the mean in
the population () is likely to be
• Using– The standard error (se) of the mean– Represents the standard deviation of
the sampling distribution of the mean
![Page 154: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/154.jpg)
154
Almost 2 SDs contain
95%
1 SD contains
68%
![Page 155: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/155.jpg)
155
• We know the sampling distribution of the mean– t distributed– Normal with large N (>30)
• Know the range within means from other samples will fall– Therefore the likely range of
nxse x
)(
![Page 156: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/156.jpg)
156
• Two implications of equation– Increasing N decreases SE
• But only a bit
– Decreasing SD decreases SE • Calculate Confidence Intervals
– From standard errors• 95% is a standard level of CI
– 95% of samples the true mean will lie within the 95% CIs
– In large samples: 95% CI = 1.96 SE– In smaller samples: depends on t
distribution (df=N-1=9)
![Page 157: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/157.jpg)
157
10
,11.3
,1.2
N
x
x
98.010
11.3)(
nxse x
![Page 158: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/158.jpg)
158
22.298.026.2CI 95%
CI CI
-0.12 4.32
x x
![Page 159: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/159.jpg)
159
What is a CI?
• (For 95% CI): • 95% chance that the true
(population) value lies within the confidence interval?
• 95% of samples, true mean will land within the confidence interval?
![Page 160: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/160.jpg)
160
Significance Test
• Probability that is a certain value– Almost always 0
• Doesn’t have to be though
• We want to test the hypothesis that the difference is equal to 0– i.e. find the probability of this difference
occurring in our sample IF =0– (Not the same as the probability that
=0)
![Page 161: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/161.jpg)
161
• Calculate SE, and then t– t has a known sampling distribution– Can test probability that a certain
value is included
)(xse
xt
061.0
14.298.0
1.2
p
t
![Page 162: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/162.jpg)
162
Other Parameter Estimates
• Same approach– Prediction, slope, intercept, predicted
values– At this point, prediction and slope are
the same• Won’t be later on
• We will look at one predictor only– More complicated with > 1
![Page 163: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/163.jpg)
163
Testing the Degree of Prediction
• Prediction is correlation of Y with Ŷ– The correlation – when we have one IV
• Use F, rather than t• Started with SSE for the mean only
– This is SStotal
– Divide this into SSresidual
– SSregression
• SStot = SSreg + SSres
![Page 164: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/164.jpg)
164
,12
1
kNdf
kdf
2
1
dfSS
dfSSF
res
reg
![Page 165: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/165.jpg)
165
• Back to the house prices– Original SSE (SStotal) = 11806
– SSresidual = 5921• What is left after our model
– SSregression = 11806 – 5921 = 5885• What our model explains
• Slope = 14.79• Intercept = 46.0• r = 0.706
![Page 166: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/166.jpg)
166
2
1
dfSS
dfSSF
res
reg
81
1
95.7)1110(5921
15885
2
1
kNdf
kdf
F
![Page 167: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/167.jpg)
167
• F = 7.95, df = 1, 8, p = 0.02– Can reject H0
• H0: Prediction is not better than chance
– A significant effect
![Page 168: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/168.jpg)
168
Statistical Significance:What does a p-value (really)
mean?
![Page 169: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/169.jpg)
169
A Quiz
• Six questions, each true or false• Write down your answers (if you like)
• An experiment has been done. Carried out perfectly. All assumptions perfectly satisfied. Absolutely no problems.
• P = 0.01– Which of the following can we say?
![Page 170: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/170.jpg)
170
1. You have absolutely disproved the null hypothesis (that is, there is no difference between the population means).
![Page 171: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/171.jpg)
171
2. You have found the probability of the null hypothesis being true.
![Page 172: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/172.jpg)
172
3. You have absolutely proved your experimental hypothesis (that there is a difference between the population means).
![Page 173: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/173.jpg)
173
4. You can deduce the probability of the experimental hypothesis being true.
![Page 174: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/174.jpg)
174
5. You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.
![Page 175: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/175.jpg)
175
6. You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions.
![Page 176: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/176.jpg)
176
OK, What is a p-value
• Cohen (1994)“[a p-value] does not tell us what we
want to know, and we so much want to know what we want to
know that, out of desperation, we nevertheless believe it does” (p
997).
![Page 177: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/177.jpg)
177
OK, What is a p-value
• Sorry, didn’t answer the question• It’s The probability of obtaining a
result as or more extreme than the result we have in the study, given that the null hypothesis is true
• Not probability the null hypothesis is true
![Page 178: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/178.jpg)
178
A Bit of Notation
• Not because we like notation– But we have to say a lot less
• Probability – P• Null hypothesis is true – H• Result (data) – D• Given - |
![Page 179: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/179.jpg)
179
What’s a P Value
• P(D|H)– Probability of the data occurring if the
null hypothesis is true• Not• P(H|D)
– Probability that the null hypothesis is true, given that we have the data = p(H)
• P(H|D) ≠ P(D|H)
![Page 180: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/180.jpg)
180
• What is probability you are prime minister– Given that you are british– P(M|B)– Very low
• What is probability you are British– Given you are prime minister– P(B|M)– Very high
• P(M|B) ≠ P(B|M)
![Page 181: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/181.jpg)
181
• There’s been a murder– Someone bumped off a statto for talking
too much
• The police have DNA• The police have your DNA
– They match(!)
• DNA matches 1 in 1,000,000 people• What’s the probability you didn’t do
the murder, given the DNA match (H|D)
![Page 182: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/182.jpg)
182
• Police say:– P(D|H) = 1/1,000,000
• Luckily, you have Jeremy on your defence team
• We say:– P(D|H) ≠ P(H|D)
• Probability that someone matches the DNA, who didn’t do the murder– Incredibly high
![Page 183: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/183.jpg)
183
Back to the Questions
• Haller and Kraus (2002)– Asked those questions of groups in
Germany– Psychology Students– Psychology lecturers and professors
(who didn’t teach stats)– Psychology lecturers and professors
(who did teach stats)
![Page 184: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/184.jpg)
184
1. You have absolutely disproved the null hypothesis (that is, there is no difference between the population means).
• True• 34% of students • 15% of professors/lecturers, • 10% of professors/lecturers teaching
statistics
• False• We have found evidence against
the null hypothesis
![Page 185: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/185.jpg)
185
2. You have found the probability of the null hypothesis being true.
– 32% of students – 26% of professors/lecturers– 17% of professors/lecturers teaching
statistics
• False• We don’t know
![Page 186: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/186.jpg)
186
3. You have absolutely proved your experimental hypothesis (that there is a difference between the population means).
– 20% of students – 13% of professors/lecturers– 10% of professors/lecturers teaching
statistics
• False
![Page 187: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/187.jpg)
187
4. You can deduce the probability of the experimental hypothesis being true.
– 59% of students– 33% of professors/lecturers– 33% of professors/lecturers teaching
statistics
• False
![Page 188: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/188.jpg)
188
5. You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.
• 68% of students• 67% of professors/lecturers• 73% of professors professors/lecturers
teaching statistics
• False• Can be worked out
– P(replication)
![Page 189: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/189.jpg)
189
6. You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions.
– 41% of students– 49% of professors/lecturers– 37% of professors professors/lecturers
teaching statistics • False• Another tricky one
– It can be worked out
![Page 190: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/190.jpg)
190
One Last Quiz
• I carry out a study– All assumptions perfectly satisfied– Random sample from population– I find p = 0.05
• You replicate the study exactly– What is probability you find p < 0.05?
![Page 191: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/191.jpg)
191
• I carry out a study– All assumptions perfectly satisfied– Random sample from population– I find p = 0.01
• You replicate the study exactly– What is probability you find p < 0.05?
![Page 192: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/192.jpg)
192
• Significance testing creates boundaries and gaps where none exist.
• Significance testing means that we find it hard to build upon knowledge– we don’t get an accumulation of
knowledge
![Page 193: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/193.jpg)
193
• Yates (1951) "the emphasis given to formal tests of
significance ... has resulted in ... an undue concentration of effort by mathematical statisticians on investigations of tests of
significance applicable to problems which are of little or no practical importance ... and ... it has caused scientific research workers to pay undue attention to the
results of the tests of significance ... and too little to the estimates of the magnitude
of the effects they are investigating
![Page 194: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/194.jpg)
194
Testing the Slope
• Same idea as with the mean– Estimate 95% CI of slope– Estimate significance of difference
from a value (usually 0)
• Need to know the sd of the slope– Similar to SD of the mean
![Page 195: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/195.jpg)
195
1
)ˆ( 2
.
kN
YYs xy
1.
kN
SSs res
xy
2.278
5921. xys
![Page 196: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/196.jpg)
196
• Similar to equation for SD of mean• Then we need standard error
- Similar (ish)• When we have standard error
– Can go on to 95% CI– Significance of difference
![Page 197: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/197.jpg)
197
24.59.26
2.27)(se . xyb
2
..
)()(se
xx
sb xy
xy
![Page 198: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/198.jpg)
198
• Confidence Limits• 95% CI
– t dist with N - k - 1 df is 2.31– CI = 5.24 2.31 = 12.06
• 95% confidence limits
14.8 12.1 14.8 12.1
2.7 26.9
![Page 199: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/199.jpg)
199
• Significance of difference from zero– i.e. probability of getting result if =0
• Not probability that = 0
02.0
81
81.22.57.14
)(
p
kNdf
bse
bt
• This probability is (of course) the same as the value for the prediction
![Page 200: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/200.jpg)
200
Testing the Standardised Slope (Correlation)
• Correlation is bounded between –1 and +1– Does not have symmetrical distribution,
except around 0• Need to transform it
– Fisher z’ transformation – approximately normal
)]1ln()1[ln(5.0 rrz
3
1
nSEz
![Page 201: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/201.jpg)
201
• 95% CIs – 0.879 – 1.96 * 0.38 = 0.13– 0.879 + 1.96 * 0.38 = 1.62
879.0
)]706.01ln()706.01[ln(5.0
z
z
38.0310
1
3
1
nSEz
![Page 202: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/202.jpg)
202
• Transform back to correlation
• 95% CIs = 0.13 to 0.92• Very wide
– Small sample size– Maybe that’s why CIs are not
reported?
1
12
2
y
y
e
er
![Page 203: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/203.jpg)
203
Using Excel
• Functions in excel– Fisher() – to carry out Fisher
transformation– Fisherinv() – to transform back to
correlation
![Page 204: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/204.jpg)
204
The Others
• Same ideas for calculation of CIs and SEs for – Predicted score– Gives expected range of values given
X
• Same for intercept– But we have probably had enough
![Page 205: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/205.jpg)
205
Lesson 5: Introducing Multiple Regression
![Page 206: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/206.jpg)
206
Residuals• We said
Y = b0 + b1x1
• We could have saidYi = b0 + b1xi1 + ei
• We ignored the i on the Y• And we ignored the ei
– It’s called error, after all• But it isn’t just error
– Trying to tell us something
![Page 207: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/207.jpg)
207
What Error Tells Us
• Error tells us that a case has a different score for Y than we predict– There is something about that case
• Called the residual– What is left over, after the model
• Contains information– Something is making the residual 0– But what?
![Page 208: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/208.jpg)
208
0
20
40
60
80
100
120
140
160
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
Number of Bedrooms
Pri
ce
Actual Price
Predicted Price
swimming pool
Unpleasant neighbours
![Page 209: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/209.jpg)
209
• The residual (+ the mean) is the value of Y
If all cases were equal on X• It is the value of Y, controlling for
X
• Other words:– Holding constant– Partialling– Residualising– Conditioned on
![Page 210: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/210.jpg)
210
Pred61766190
12012076
12010561
Adj. Value1059062
11711973
129755695
Res-16
2-272830
-1641
-14-33
6
Beds £ (000s)1 772 741 883 625 905 1362 355 1344 1381 55
![Page 211: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/211.jpg)
211
• Sometimes adjustment is enough on its own– Measure performance against criteria
• Teenage pregnancy rate– Measure pregnancy and abortion rate in areas– Control for socio-economic deprivation, and
anything else important– See which areas have lower teenage
pregnancy and abortion rate, given same level of deprivation
• Value added education tables– Measure school performance– Control for initial intake
![Page 212: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/212.jpg)
212
Control?• In experimental research
– Use experimental control– e.g. same conditions, materials, time
of day, accurate measures, random assignment to conditions
• In non-experimental research– Can’t use experimental control– Use statistical control instead
![Page 213: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/213.jpg)
213
Analysis of Residuals
• What predicts differences in crime rate – After controlling for socio-economic
deprivation– Number of police?– Crime prevention schemes?– Rural/Urban proportions?– Something else
• This is what regression is about
![Page 214: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/214.jpg)
214
• Exam performance– Consider number of books a student
read (books)– Number of lectures (max 20) a
student attended (attend)
• Books and attend as IV, grade as DV
![Page 215: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/215.jpg)
215
Books Attend0 91 150 102 164 104 201 114 203 150 15
First 10 cases
Grade45574551658844878959
![Page 216: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/216.jpg)
216
• Use books as IV– R=0.492, F=12.1, df=1, 28, p=0.001
– b0=52.1, b1=5.7
– (Intercept makes sense)
• Use attend as IV– R=0.482, F=11.5, df=1, 38, p=0.002
– b0=37.0, b1=1.9
– (Intercept makes less sense)
![Page 217: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/217.jpg)
217
Books
543210-1
Gra
de (
100)
100
90
80
70
60
50
40
30
![Page 218: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/218.jpg)
218Attend
211917151311975
Gra
de
100
90
80
70
60
50
40
30
![Page 219: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/219.jpg)
219
Problem• Use R2 to give proportion of shared
variance– Books = 24%– Attend = 23%
• So we have explained 24% + 23% = 47% of the variance– NO!!!!!
![Page 220: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/220.jpg)
220
• Correlation of books and attend is (unsurprisingly) not zero– Some of the variance that books shares
with grade, is also shared by attend
• Look at the correlation matrix
BOOKS
ATTENDGRADE
BOOKSATTEN
DGRADE
0.44
0.49 0.48
1
1
1
![Page 221: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/221.jpg)
221
• I have access to 2 cars• My wife has access to 2 cars
– We have access to four cars?– No. We need to know how many of my
2 cars are the same cars as her 2 cars• Similarly with regression
– But we can do this with the residuals– Residuals are what is left after (say)
books– See of residual variance is explained
by attend– Can use this new residual variance to
calculate SSres, SStotal and SSreg
![Page 222: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/222.jpg)
222
• Well. Almost.– This would give us correct values for
SS– Would not be correct for slopes, etc
• Assumes that the variables have a causal priority– Why should attend have to take what
is left from books?– Why should books have to take what
is left by attend?
• Use OLS again
![Page 223: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/223.jpg)
223
• Simultaneously estimate 2 parameters– b1 and b2
– Y = b0 + b1x1 + b2x2
– x1 and x2 are IVs
• Not trying to fit a line any more– Trying to fit a plane
• Can solve iteratively– Closed form equations better– But they are unwieldy
![Page 224: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/224.jpg)
224
x1
x2
y
3D scatterplot(2points only)
![Page 225: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/225.jpg)
225
x1
x2
y
b0
b1
b2
![Page 226: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/226.jpg)
226
(Really) Ridiculous Equations
22211
222
211
2211222
22111
xxxxxxxx
xxxxxxyyxxxxyyb
21122
211
222
1122112
11222
xxxxxxxx
xxxxxxyyxxxxyyb
22110 xbxbyb
![Page 227: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/227.jpg)
227
• The good news– There is an easier way
• The bad news– It involves matrix algebra
• The good news– We don’t really need to know how to
do it
• The bad news – We need to know it exists
![Page 228: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/228.jpg)
228
A Quick Guide to Matrix Algebra
(I will never make you do it again)
![Page 229: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/229.jpg)
229
Very Quick Guide to Matrix Algebra
• Why?– Matrices make life much easier in
multivariate statistics– Some things simply cannot be done
without them– Some things are much easier with them
• If you can manipulate matrices– you can specify calculations v. easily– e.g. AA’ = sum of squares of a column
• Doesn’t matter how long the column
![Page 230: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/230.jpg)
230
• A scalar is a numberA scalar: 4
• A vector is a row or column of numbers
11
5
A row vector:
A column vector:
7842
![Page 231: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/231.jpg)
231
• A vector is described as rows x columns
– Is a 1 4 vector
– Is a 2 1 vector– A number (scalar) is a 1 1 vector
7842
11
5
![Page 232: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/232.jpg)
232
• A matrix is a rectangle, described as rows x columns
87251
35754
87562
• Is a 3 x 5 matrix
• Matrices are referred to with bold capitals
- A is a matrix
![Page 233: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/233.jpg)
233
• Correlation matrices and covariance matrices are special – They are square and symmetrical– Correlation matrix of books, attend
and grade
00.148.049.0
48.000.144.0
49.044.000.1
![Page 234: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/234.jpg)
234
• Another special matrix is the identity matrix I– A square matrix, with 1 in the
diagonal and 0 in the off-diagonal
1000
0100
0010
0001
I
– Note that this is a correlation matrix, with correlations all = 0
![Page 235: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/235.jpg)
235
Matrix Operations
• Transposition– A matrix is transposed by putting it
on its side – Transpose of A is A’
6
5
7
'
657
A
A
![Page 236: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/236.jpg)
236
• Matrix multiplication– A matrix can be multiplied by a scalar,
a vector or a matrix– Not commutative– AB BA– To multiply AB
• Number of rows in A must equal number of columns in B
![Page 237: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/237.jpg)
237
• Matrix by vector
141
99
33
4
3
2
231917
13117
532
![Page 238: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/238.jpg)
238
• Matrix by matrix
5038
2116
35152810
156124
54
32
75
32
dhcfdgce
bhafcfae
hg
fe
dc
ba
![Page 239: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/239.jpg)
239
• Multiplying by the identity matrix– Has no effect – Like multiplying by 1
75
32
10
01
75
32
AAI
![Page 240: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/240.jpg)
240
• The inverse of J is: 1/J• J x 1/J = 1• Same with matrices
– Matrices have an inverse– Inverse of A is A-1
– AA-1=I
• Inverting matrices is dull– We will do it once– But first, we must calculate the
determinant
![Page 241: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/241.jpg)
241
• The determinant of A is |A|• Determinants are important in
statistics– (more so than the other matrix
algebra)
• We will do a 2x2 – Much more difficult for larger
matrices
![Page 242: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/242.jpg)
242
cbad
dc
ba
A
A
91.0
3.03.011
0.13.0
3.00.1
A
A
A
![Page 243: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/243.jpg)
243
• Determinants are important because– Needs to be above zero for regression
to work– Zero or negative determinant of a
correlation/covariance matrix means something wrong with the data• Linear redundancy
• Described as:– Not positive definite– Singular (if determinant is zero)
• In different error messages
![Page 244: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/244.jpg)
244
• Next, the adjoint
ac
bd
dc
ba
A
A
adj
AA
A adj11
•Now
![Page 245: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/245.jpg)
245
• Find A-1
91.0
0.13.0
3.00.1
A
A
10.133.0
33.010.1
0.13.0
3.00.1
91.01
1
1
A
A
![Page 246: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/246.jpg)
246
Matrix Algebra with Correlation Matrices
![Page 247: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/247.jpg)
247
Determinants
• Determinant of a correlation matrix– The volume of ‘space’ taken up by
the (hyper) sphere that contains all of the points
1.0 0.0
0.0 1.0
1.0
A
A
![Page 248: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/248.jpg)
248
1.0 0.0
0.0 1.0
1.0
A
A
X X
X
X X
![Page 249: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/249.jpg)
249
1.0 1.0
1.0 1.0
0.0
A
A
X
X
X
![Page 250: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/250.jpg)
250
Negative Determinant
• Points take up less than no space– Correlation matrix cannot exist – Non-positive definite matrix
![Page 251: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/251.jpg)
251
Sometimes Obvious
0.44A
0.12.1
2.10.1A
![Page 252: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/252.jpg)
252
Sometimes Obvious (If You Think)
1 0.9 0.9
0.9 1 0.9
0.9 0.9 1
A
2.88A
![Page 253: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/253.jpg)
253
Sometimes No Idea
1.00 0.76 0.40
0.76 1 0.30
0.40 0.30 1
A
0.01A 1.00 0.75 0.40
0.75 1 0.30
0.40 0.30 1
A
0.0075A
![Page 254: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/254.jpg)
254
Multiple R for Each Variable
• Diagonal of inverse of correlation matrix– Used to calculate multiple R
– Call elements aij
.123...
11i k
ii
Ra
![Page 255: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/255.jpg)
255
Regression Weights
• Where i is DV• j is IV
.ij
i jij
ab
a
![Page 256: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/256.jpg)
256
Back to the Good News• We can calculate the standardised
parameters asB=Rxx
-1 x Rxy
• Where – B is the vector of regression weights– Rxx
-1 is the inverse of the correlation matrix of the independent (x) variables
– Rxy is the vector of correlations of the correlations of the x and y variables
– Now do exercise 3.2
![Page 257: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/257.jpg)
257
One More Thing
• The whole regression equation can be described with matrices– very simply
EXBY
![Page 258: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/258.jpg)
258
• Where– Y = vector of DV– X = matrix of IVs– B = vector of coefficients
• Go all the way back to our example
![Page 259: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/259.jpg)
259
1
2
3
40
51
62
7
8
9
10
1 0 9 45
1 1 5 57
1 0 10 45
1 2 16 51
1 4 10 65
1 4 20 88
1 1 11 44
1 4 20 87
1 3 15 89
1 0 15 59
e
e
e
eb
eb
eb
e
e
e
e
![Page 260: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/260.jpg)
260
59
89
87
44
88
65
51
45
57
45
1501
1531
2041
1111
2041
1041
1621
1001
511
901
10
9
8
7
6
5
4
3
2
1
2
1
0
e
e
e
e
e
e
e
e
e
e
b
b
b
The constant – literally a constant. Could be any number, but it is most
convenient to make it 1. Used to ‘capture’ the
intercept.
![Page 261: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/261.jpg)
261
59
89
87
44
88
65
51
45
57
45
1501
1531
2041
1111
2041
1041
1621
1001
511
901
10
9
8
7
6
5
4
3
2
1
2
1
0
e
e
e
e
e
e
e
e
e
e
b
b
bThe matrix of values for IVs (books and
attend)
![Page 262: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/262.jpg)
262
59
89
87
44
88
65
51
45
57
45
1501
1531
2041
1111
2041
1041
1621
1001
511
901
10
9
8
7
6
5
4
3
2
1
2
1
0
e
e
e
e
e
e
e
e
e
e
b
b
bThe parameter
estimates. We are trying to find the
best values of these.
![Page 263: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/263.jpg)
263
59
89
87
44
88
65
51
45
57
45
1501
1531
2041
1111
2041
1041
1621
1001
511
901
10
9
8
7
6
5
4
3
2
1
2
1
0
e
e
e
e
e
e
e
e
e
e
b
b
b
Error. We are trying to minimise this
![Page 264: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/264.jpg)
264
59
89
87
44
88
65
51
45
57
45
1501
1531
2041
1111
2041
1041
1621
1001
511
901
10
9
8
7
6
5
4
3
2
1
2
1
0
e
e
e
e
e
e
e
e
e
e
b
b
b
The DV - grade
![Page 265: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/265.jpg)
265
• Y=BX+E• Simple way of representing as many IVs
as you likeY = b0x0 + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + e
2
1
5
4
3
2
1
0
524232221202
514131211101
e
e
b
b
b
b
b
b
xxxxxx
xxxxxx
![Page 266: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/266.jpg)
266
exbxbxb
e
e
b
b
b
b
b
b
xxxxxx
xxxxxx
kk
...1100
2
1
5
4
3
2
1
0
524232221202
514131211101
![Page 267: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/267.jpg)
267
Generalises to Multivariate Case
• Y=BX+E• Y, B and E
– Matrices, not vectors
• Goes beyond this course– (Do Jacques Tacq’s course for more)– (Or read his book)
![Page 268: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/268.jpg)
268
![Page 269: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/269.jpg)
269
![Page 270: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/270.jpg)
270
![Page 271: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/271.jpg)
271
Lesson 6: More on Multiple Regression
![Page 272: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/272.jpg)
272
Parameter Estimates
• Parameter estimates (b1, b2 … bk) were standardised – Because we analysed a correlation
matrix
• Represent the correlation of each IV with the DV– When all other IVs are held constant
![Page 273: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/273.jpg)
273
• Can also be unstandardised• Unstandardised represent the unit
change in the DV associated with a 1 unit change in the IV– When all the other variables are held
constant• Parameters have standard errors
associated with them– As with one IV– Hence t-test, and associated
probability can be calculated• Trickier than with one IV
![Page 274: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/274.jpg)
274
Standard Error of Regression Coefficient
• Standardised is easier
– R2i is the value of R2 when all other
predictors are used as predictors of that variable
• Note that if R2i = 0, the equation is the same as
for previous
iRkn
RSE Y
i 2
2
1
1
1
1
![Page 275: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/275.jpg)
275
Multiple R
• The degree of prediction– R (or Multiple R) – No longer equal to b
• R2 Might be equal to the sum of squares of B– Only if all x’s are uncorrelated
![Page 276: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/276.jpg)
276
In Terms of Variance• Can also think of this in terms of
variance explained.– Each IV explains some variance in the
DV– The IVs share some of their variance
• Can’t share the same variance twice
![Page 277: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/277.jpg)
277
The total variance of Y
= 1
Variance in Y accounted for by
x1
rx1y2 = 0.36
Variance in Y accounted for by
x2
rx2y2 = 0.36
![Page 278: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/278.jpg)
278
• In this model– R2 = ryx1
2 + ryx22
– R2 = 0.36 + 0.36 = 0.72– R = 0.72 = 0.85
• But– If x1 and x2 are correlated
– No longer the case
![Page 279: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/279.jpg)
279
The total variance of Y
= 1
Variance in Y accounted for by
x2
rx2y2 = 0.36
Variance in Y accounted for by
x1
rx1y2 = 0.36
Variance shared between x1 and x2
(not equal to rx1x2)
![Page 280: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/280.jpg)
280
• So– We can no longer sum the r2
– Need to sum them, and subtract the shared variance – i.e. the correlation
• But– It’s not the correlation between them– It’s the correlation between them as a
proportion of the variance of Y
• Two different ways
![Page 281: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/281.jpg)
281
• If rx1x2 = 0
– rxy = bx1
– Equivalent to ryx12 + ryx2
2
21 212
yxyx rbrbR
• Based on estimates
![Page 282: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/282.jpg)
282
• rx1x2 = 0
– Equivalent to ryx12 + ryx2
2
2
222
21
212121
1
2
xx
xxyxyxyxyx
r
rrrrrR
• Based on correlations
![Page 283: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/283.jpg)
283
• Can also be calculated using methods we have seen– Based on PRE– Based on correlation with prediction
• Same procedure with >2 IVs
![Page 284: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/284.jpg)
284
Adjusted R2
• R2 is an overestimate of population value of R2
– Any x will not correlate 0 with Y– Any variation away from 0 increases R– Variation from 0 more pronounced
with lower N• Need to correct R2
– Adjusted R2
![Page 285: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/285.jpg)
285
• 1 – R2
– Proportion of unexplained variance– We multiple this by an adjustment
• More variables – greater adjustment• More people – less adjustment
11
)1(1 Adj. 22
kN
NRR
• Calculation of Adj. R2
![Page 286: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/286.jpg)
286
Shrunken R2
• Some authors treat shrunken and adjusted R2 as the same thing– Others don’t
![Page 287: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/287.jpg)
287
11
kN
N1875.1
1619
1320120
3,20
kN
919
1810110
8,10
kN
5.169
1310110
3,10
kN
![Page 288: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/288.jpg)
288
Extra Bits
• Some stranger things that can happen
– Counter-intuitive
![Page 289: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/289.jpg)
289
• Can be hard to understand– Very counter-intuitive
• Definition– An independent variable which
increases the size of the parameters associated with other independent variables above the size of their correlations
Suppressor variables
![Page 290: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/290.jpg)
290
• An example (based on Horst, 1941)– Success of trainee pilots
– Mechanical ability (x1), verbal ability (x2), success (y)
• Correlation matrixMech Verb Success
Mech 1 0.5 0.3Verb 0.5 1 0
Success 0.3 0 1
![Page 291: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/291.jpg)
291
– Mechanical ability correlates 0.3 with success
– Verbal ability correlates 0.0 with success
– What will the parameter estimates be?
– (Don’t look ahead until you have had a guess)
![Page 292: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/292.jpg)
292
• Mechanical ability– b = 0.4– Larger than r!
• Verbal ability – b = -0.2– Smaller than r!!
• So what is happening?– You need verbal ability to do the test– Not related to mechanical ability
• Measure of mechanical ability is contaminated by verbal ability
![Page 293: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/293.jpg)
293
• High mech, low verbal– High mech
• This is positive
– Low verbal • Negative, because we are talking about
standardised scores• Your mech is really high – you did well on
the mechanical test, without being good at the words
• High mech, high verbal– Well, you had a head start on mech,
because of verbal, and need to be brought down a bit
![Page 294: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/294.jpg)
294
Another suppressor?x1 x2 y
x1 1 0.5 0.3x2 0.5 1 0.2y 0.3 0.2 1
b1 = b2 =
![Page 295: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/295.jpg)
295
Another suppressor?
x1 x2 yx1 1 0.5 0.3x2 0.5 1 0.2y 0.3 0.2 1
b1 =0.26 b2 = -0.06
![Page 296: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/296.jpg)
296
And another?x1 x2 y
x1 1 0.5 0.3x2 0.5 1 -0.2y 0.3 -0.2 1
b1 = b2 =
![Page 297: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/297.jpg)
297
And another?
x1 x2 yx1 1 0.5 0.3x2 0.5 1 -0.2y 0.3 -0.2 1
b1 = 0.53 b2 = -0.47
![Page 298: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/298.jpg)
298
One more?x1 x2 y
x1 1 -0.5 0.3x2 -0.5 1 0.2y 0.3 0.2 1
b1 = b2 =
![Page 299: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/299.jpg)
299
One more?
x1 x2 yx1 1 -0.5 0.3x2 -0.5 1 0.2y 0.3 0.2 1
b1 = 0.53 b2 = 0.47
![Page 300: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/300.jpg)
300
• Suppression happens when two opposing forces are happening together– And have opposite effects
• Don’t throw away your IVs,– Just because they are uncorrelated with the
DV
• Be careful in interpretation of regression estimates– Really need the correlations too, to interpret
what is going on– Cannot compare between studies with
different IVs
![Page 301: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/301.jpg)
301
Standardised Estimates > 1
• Correlations are bounded -1.00 ≤ r ≤ +1.00
– We think of standardised regression estimates as being similarly bounded• But they are not
– Can go >1.00, <-1.00– R cannot, because that is a proportion
of variance
![Page 302: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/302.jpg)
302
• Three measures of ability– Mechanical ability, verbal ability 1,
verbal ability 2– Score on science exam
Mech Verbal1 Verbal2 ScoresMech 1 0.1 0.1 0.6
Verbal1 0.1 1 0.9 0.6Verbal2 0.1 0.9 1 0.3Scores 0.6 0.6 0.3 1
–Before reading on, what are the parameter estimates?
![Page 303: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/303.jpg)
303
• Mechanical– About where we expect
• Verbal 1– Very high
• Verbal 2– Very low
Mech 0.56Verbal1 1.71Verbal2 -1.29
![Page 304: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/304.jpg)
304
• What is going on– It’s a suppressor again– An independent variable which
increases the size of the parameters associated with other independent variables above the size of their correlations
• Verbal 1 and verbal 2 are correlated so highly– They need to cancel each other out
![Page 305: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/305.jpg)
305
Variable Selection
• What are the appropriate independent variables to use in a model?– Depends what you are trying to do
• Multiple regression has two separate uses– Prediction– Explanation
![Page 306: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/306.jpg)
306
• Prediction – What will happen
in the future?– Emphasis on
practical application
– Variables selected (more) empirically
– Value free
• Explanation– Why did
something happen?
– Emphasis on understanding phenomena
– Variables selected theoretically
– Not value free
![Page 307: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/307.jpg)
307
• Visiting the doctor– Precedes suicide attempts– Predicts suicide
• Does not explain suicide
• More on causality later on …• Which are appropriate variables
– To collect data on?– To include in analysis?– Decision needs to be based on theoretical
knowledge of the behaviour of those variables– Statistical analysis of those variables (later)
• Unless you didn’t collect the data
– Common sense (not a useful thing to say)
![Page 308: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/308.jpg)
308
Variable Entry Techniques
• Entry-wise– All variables entered simultaneously
• Hierarchical– Variables entered in a predetermined
order• Stepwise
– Variables entered according to change in R2
– Actually a family of techniques
![Page 309: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/309.jpg)
309
• Entrywise– All variables entered simultaneously– All treated equally
• Hierarchical– Entered in a theoretically determined
order– Change in R2 is assessed, and tested for
significance– e.g. sex and age
• Should not be treated equally with other variables
• Sex and age MUST be first
– Confused with hierarchical linear modelling
![Page 310: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/310.jpg)
310
• Stepwise– Variables entered empirically– Variable which increases R2 the most
goes first• Then the next …
– Variables which have no effect can be removed from the equation
• Example– IVs: Sex, age, extroversion, – DV: Car – how long someone spends
looking after their car
![Page 311: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/311.jpg)
311
SEX AGE EXTRO CARSEX 1.00 -0.05 0.40 0.66AGE -0.05 1.00 0.40 0.23EXTRO 0.40 0.40 1.00 0.67CAR 0.66 0.23 0.67 1.00
• Correlation Matrix
![Page 312: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/312.jpg)
312
• Entrywise analysis– r2 = 0.64
b pSEX 0.49 <0.01AGE 0.08 0.46EXTRO 0.44 <0.01
![Page 313: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/313.jpg)
313
• Stepwise Analysis– Data determines the order– Model 1: Extroversion, R2 = 0.450– Model 2: Extroversion + Sex, R2 =
0.633
b pEXTRO 0.48 <0.01
SEX 0.47 <0.01
![Page 314: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/314.jpg)
314
• Hierarchical analysis– Theory determines the order– Model 1: Sex + Age, R2 = 0.510– Model 2: S, A + E, R2 = 0.638– Change in R2 = 0.128, p = 0.001
SEX 0.49 <0.01AGE 0.08 0.46
EXTRO 0.44 <0.012
![Page 315: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/315.jpg)
315
• Which is the best model?– Entrywise – OK– Stepwise – excluded age
• Did have a (small) effect
– Hierarchical• The change in R2 gives the best estimate
of the importance of extroversion
• Other problems with stepwise– F and df are wrong (cheats with df)– Unstable results
• Small changes (sampling variance) – large differences in models
![Page 316: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/316.jpg)
316
– Uses a lot of paper– Don’t use a stepwise procedure to
pack your suitcase
![Page 317: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/317.jpg)
317
Is Stepwise Always Evil?• Yes• All right, no• Research goal is predictive
(technological)– Not explanatory (scientific)– What happens, not why
• N is large – 40 people per predictor, Cohen, Cohen,
Aiken, West (2003)• Cross validation takes place
![Page 318: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/318.jpg)
318
A quick note on R2
R2 is sometimes regarded as the ‘fit’ of a regression model– Bad idea
• If good fit is required – maximise R2
– Leads to entering variables which do not make theoretical sense
![Page 319: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/319.jpg)
319
Critique of Multiple Regression
• Goertzel (2002)– “Myths of murder and multiple
regression”– Skeptical Inquirer (Paper B1)
• Econometrics and regression are ‘junk science’– Multiple regression models (in US)– Used to guide social policy
![Page 320: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/320.jpg)
320
More Guns, Less Crime
– (controlling for other factors)• Lott and Mustard: A 1% increase in
gun ownership– 3.3% decrease in murder rates
• But: – More guns in rural Southern US– More crime in urban North (crack
cocaine epidemic at time of data)
![Page 321: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/321.jpg)
321
Executions Cut Crime
• No difference between crimes in states in US with or without death penalty
• Ehrlich (1975) controlled all variables that effect crime rates– Death penalty had effect in reducing
crime rate
• No statistical way to decide who’s right
![Page 322: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/322.jpg)
322
Legalised Abortion
• Donohue and Levitt (1999)– Legalised abortion in 1970’s cut crime in
1990’s
• Lott and Whitley (2001)– “Legalising abortion decreased murder
rates by … 0.5 to 7 per cent.”
• It’s impossible to model these data– Controlling for other historical events– Crack cocaine (again)
![Page 323: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/323.jpg)
323
Another Critique
• Berk (2003)– Regression analysis: a constructive critique
(Sage)
• Three cheers for regression– As a descriptive technique
• Two cheers for regression– As an inferential technique
• One cheer for regression– As a causal analysis
![Page 324: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/324.jpg)
324
Is Regression Useless?
• Do regression carefully– Don’t go beyond data which you have
a strong theoretical understanding of
• Validate models– Where possible, validate predictive
power of models in other areas, times, groups• Particularly important with stepwise
![Page 325: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/325.jpg)
325
Lesson 7: Categorical Independent Variables
![Page 326: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/326.jpg)
326
Introduction
![Page 327: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/327.jpg)
327
Introduction
• So far, just looked at continuous independent variables
• Also possible to use categorical (nominal, qualitative) independent variables– e.g. Sex; Job; Religion; Region; Type
(of anything)• Usually analysed with
t-test/ANOVA
![Page 328: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/328.jpg)
328
Historical Note• But these (t-test/ANOVA) are
special cases of regression analysis– Aspects of General Linear Models
(GLMs)• So why treat them differently?
– Fisher’s fault– Computers’ fault
• Regression, as we have seen, is computationally difficult– Matrix inversion and multiplication– Unfeasible, without a computer
![Page 329: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/329.jpg)
329
• In the special cases where:• You have one categorical IV• Your IVs are uncorrelated
– It is much easier to do it by partitioning of sums of squares
• These cases– Very rare in ‘applied’ research– Very common in ‘experimental’
research• Fisher worked at Rothamsted agricultural
research station• Never have problems manipulating
wheat, pigs, cabbages, etc
![Page 330: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/330.jpg)
330
• In psychology– Led to a split between ‘experimental’
psychologists and ‘correlational’ psychologists
– Experimental psychologists (until recently) would not think in terms of continuous variables
• Still (too) common to dichotomise a variable– Too difficult to analyse it properly– Equivalent to discarding 1/3 of your
data
![Page 331: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/331.jpg)
331
The Approach
![Page 332: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/332.jpg)
332
The Approach
• Recode the nominal variable – Into one, or more, variables to represent
that variable
• Names are slightly confusing– Some texts talk of ‘dummy coding’ to refer
to all of these techniques– Some (most) refer to ‘dummy coding’ to
refer to one of them– Most have more than one name
![Page 333: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/333.jpg)
333
• If a variable has g possible categories it is represented by g-1 variables
• Simplest case: – Smokes: Yes or No– Variable 1 represents ‘Yes’– Variable 2 is redundant
• If it isn’t yes, it’s no
![Page 334: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/334.jpg)
334
The Techniques
![Page 335: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/335.jpg)
335
• We will examine two coding schemes– Dummy coding
• For two groups• For >2 groups
– Effect coding• For >2 groups
• Look at analysis of change– Equivalent to ANCOVA– Pretest-posttest designs
![Page 336: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/336.jpg)
336
Dummy Coding – 2 Groups• Also called simple coding by SPSS• A categorical variable with two groups• One group chosen as a reference group
– The other group is represented in a variable
• e.g. 2 groups: Experimental (Group 1) and Control (Group 0)– Control is the reference group– Dummy variable represents experimental
group• Call this variable ‘group1’
![Page 337: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/337.jpg)
337
• For variable ‘group1’ – 1 = ‘Yes’, 2=‘No’
Original Category
New Variable
Exp 1Con 0
![Page 338: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/338.jpg)
338
• Some data• Group is x, score is y
Control Group
Experimental Group
Experiment 1 10 10Experiment 2 10 20Experiment 3 10 30
![Page 339: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/339.jpg)
339
• Control Group = 0– Intercept = Score on Y when x = 0– Intercept = mean of control group
• Experimental Group = 1– b = change in Y when x increases 1
unit– b = difference between experimental
group and control group
![Page 340: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/340.jpg)
340
0
5
10
15
20
25
30
35
Control Group Experimental Group
Experiment 1 Experiment 2 Experiment 3
Gradient of slope represents
difference between means
![Page 341: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/341.jpg)
341
Dummy Coding – 3+ Groups
• With three groups the approach is the similar
• g = 3, therefore g-1 = 2 variables needed
• 3 Groups– Control – Experimental Group 1– Experimental Group 2
![Page 342: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/342.jpg)
342
• Recoded into two variables– Note – do not need a 3rd variable
• If we are not in group 1 or group 2 MUST be in control group
• 3rd variable would add no information• (What would happen to determinant?)
Original Category
Gp1 Gp2
Con 0 0Gp1 1 0Gp2 0 1
![Page 343: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/343.jpg)
343
• F and associated p– Tests H0 that
• b1 and b2 and associated p-values– Test difference between each
experimental group and the control group
• To test difference between experimental groups– Need to rerun analysis
1 2 3g g g
![Page 344: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/344.jpg)
344
• One more complication– Have now run multiple comparisons– Increases – i.e. probability of type I
error
• Need to correct for this– Bonferroni correction– Multiply given p-values by two/three
(depending how many comparisons were made)
![Page 345: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/345.jpg)
345
Effect Coding
• Usually used for 3+ groups• Compares each group (except the
reference group) to the mean of all groups– Dummy coding compares each group to the
reference group.
• Example with 5 groups– 1 group selected as reference group
• Group 5
![Page 346: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/346.jpg)
346
• Each group (except reference) has a variable– 1 if the individual is in that group– 0 if not– -1 if in reference group
group group_1 group_2 group_3 group_4
1 1 0 0 02 0 1 0 03 0 0 1 04 0 0 0 15 -1 -1 -1 -1
![Page 347: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/347.jpg)
347
Examples
• Dummy coding and Effect Coding• Group 1 chosen as reference group
each time• Data Group Mean SD
1 52.40 4.602 56.30 5.703 60.10 5.00
Total 56.27 5.88
![Page 348: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/348.jpg)
348
• Dummy
Group dummy2
dummy3
1 0 0
2 1 0
3 0 1
Group Effect2 effect3
1 -1 -1
2 1 0
3 0 1
• Effect
![Page 349: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/349.jpg)
349
DummyR=0.543, F=5.7,
df=2, 27, p=0.009b0 = 52.4,
b1 = 3.9, p=0.100
b2 = 7.7, p=0.002
EffectR=0.543, F=5.7,
df=2, 27, p=0.009b0 = 56.27,
b1 = 0.03, p=0.980
b2 = 3.8, p=0.007
132
121
10
ggb
ggb
gb
Ggb
Ggb
Gb
32
21
0
![Page 350: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/350.jpg)
350
In SPSS• SPSS provides two equivalent
procedures for regression– Regression (which we have been using)– GLM (which we haven’t)
• GLM will:– Automatically code categorical variables– Automatically calculate interaction terms
• GLM won’t:– Give standardised effects– Give hierarchical R2 p-values– Allow you to not understand
![Page 351: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/351.jpg)
351
ANCOVA and Regression
![Page 352: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/352.jpg)
352
• Test– (Which is a trick; but it’s designed to
make you think about it)
• Use employee data.sav– Compare the pay rise (difference
between salbegin and salary)– For ethnic minority and non-minority
staff• What do you find?
![Page 353: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/353.jpg)
353
ANCOVA and Regression
• Dummy coding approach has one special use– In ANCOVA, for the analysis of change
• Pre-test post-test experimental design– Control group and (one or more)
experimental groups– Tempting to use difference score + t-test /
mixed design ANOVA– Inappropriate
![Page 354: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/354.jpg)
354
• Salivary cortisol levels– Used as a measure of stress– Not absolute level, but change in level
over day may be interesting
• Test at: 9.00am, 9.00pm• Two groups
– High stress group (cancer biopsy) • Group 1
– Low stress group (no biopsy)• Group 0
![Page 355: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/355.jpg)
355
• Correlation of AM and PM = 0.493 (p=0.008)
• Has there been a significant difference in the rate of change of salivary cortisol?– 3 different approaches
AM PM DiffHigh Stress 20.1 6.8 13.3Low Stress 22.3 11.8 10.5
![Page 356: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/356.jpg)
356
• Approach 1 – find the differences, do a t-test– t = 1.31, df=26, p=0.203
• Approach 2 – mixed ANOVA, look for interaction effect– F = 1.71, df = 1, 26, p = 0.203– F = t2
• Approach 3 – regression (ANCOVA) based approach
![Page 357: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/357.jpg)
357
– IVs: AM and group– DV: PM
– b1 (group) = 3.59, standardised b1=0.432, p = 0.01
• Why is the regression approach better?– The other two approaches took the
difference– Assumes that r = 1.00– Any difference from r = 1.00 and you add
error variance• Subtracting error is the same as adding error
![Page 358: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/358.jpg)
358
• Using regression– Ensures that all the variance that is
subtracted is true– Reduces the error variance
• Two effects– Adjusts the means
• Compensates for differences between groups
– Removes error variance
![Page 359: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/359.jpg)
359
In SPSS• SPSS automates all of this
– But you have to understand it, to know what it is doing
• Use Analyse, GLM, Univariate ANOVA
![Page 360: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/360.jpg)
360
Outcome here
Categorical predictors here
Continuous predictors here
Click options
![Page 361: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/361.jpg)
361
Select parameter estimaters
![Page 362: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/362.jpg)
362
More on Change
• If difference score is correlated with either pre-test or post-test– Subtraction fails to remove the
difference between the scores– If two scores are uncorrelated
• Difference will be correlated with both• Failure to control
– Equal SDs, r = 0• Correlation of change and pre-score
=0.707
![Page 363: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/363.jpg)
363
Even More on Change
• A topic of surprising complexity– What I said about difference scores
isn’t always true• Lord’s paradox – it depends on the
precise question you want to answer
– Collins and Horn (1993). Best methods for the analysis of change
– Collins and Sayer (2001). New methods for the analysis of change.
![Page 364: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/364.jpg)
364
Lesson 8: Assumptions in Regression Analysis
![Page 365: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/365.jpg)
365
The Assumptions1. The distribution of residuals is normal (at
each value of the dependent variable).2. The variance of the residuals for every
set of values for the independent variable is equal.
• violation is called heteroscedasticity.3. The error term is additive
• no interactions.
4. At every value of the dependent variable the expected (mean) value of the residuals is zero
• No non-linear relationships
![Page 366: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/366.jpg)
366
5. The expected correlation between residuals, for any two cases, is 0.
• The independence assumption (lack of autocorrelation)
6. All independent variables are uncorrelated with the error term.
7. No independent variables are a perfect linear function of other independent variables (no perfect multicollinearity)
8. The mean of the error term is zero.
![Page 367: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/367.jpg)
367
What are we going to do …
• Deal with some of these assumptions in some detail
• Deal with others in passing only– look at them again later on
![Page 368: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/368.jpg)
368
Assumption 1: The Distribution of Residuals is Normal at Every Value of the Dependent Variable
![Page 369: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/369.jpg)
369
Look at Normal Distributions
• A normal distribution– symmetrical, bell-shaped (so they
say)
![Page 370: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/370.jpg)
370
What can go wrong?
• Skew– non-symmetricality– one tail longer than the other
• Kurtosis– too flat or too peaked– kurtosed
• Outliers– Individual cases which are far from the
distribution
![Page 371: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/371.jpg)
371
Effects on the Mean
• Skew– biases the mean, in direction of skew
• Kurtosis– mean not biased– standard deviation is– and hence standard errors, and
significance tests
![Page 372: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/372.jpg)
372
Examining Univariate Distributions
• Histograms• Boxplots• P-P plots• Calculation based methods
![Page 373: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/373.jpg)
373
Histograms
A and B30
20
10
0
30
20
10
0
![Page 374: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/374.jpg)
374
• C and D40
30
20
10
0
14
12
10
8
6
4
2
0
![Page 375: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/375.jpg)
375
• E & F
20
10
0
![Page 376: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/376.jpg)
376
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
6
5
4
3
2
1
0
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
Histograms can be tricky ….
![Page 377: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/377.jpg)
377
Boxplots
![Page 378: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/378.jpg)
378
P-P Plots
1.00.75.50.250.00
1.00
.75
.50
.25
0.00
1.00.75.50.250.00
1.00
.75
.50
.25
0.00
• A & B
![Page 379: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/379.jpg)
379
• C & D
1.00.75.50.250.00
1.00
.75
.50
.25
0.00
1.00.75.50.250.00
1.00
.75
.50
.25
0.00
![Page 380: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/380.jpg)
380
• E & F
1.00.75.50.250.00
1.00
.75
.50
.25
0.001.00.75.50.250.00
1.00
.75
.50
.25
0.00
![Page 381: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/381.jpg)
381
• Skew and Kurtosis statistics• Outlier detection statistics
Calculation Based
![Page 382: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/382.jpg)
382
Skew and Kurtosis Statistics
• Normal distribution– skew = 0– kurtosis = 0
• Two methods for calculation– Fisher’s and Pearson’s– Very similar answers
• Associated standard error– can be used for significance of departure from
normality– not actually very useful
• Never normal above N = 400
![Page 383: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/383.jpg)
383
Skewness SE Skew Kurtosis SE Kurt
A -0.12 0.172 -0.084 0.342B 0.271 0.172 0.265 0.342C 0.454 0.172 1.885 0.342D 0.117 0.172 -1.081 0.342E 2.106 0.172 5.75 0.342F 0.171 0.172 -0.21 0.342
![Page 384: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/384.jpg)
384
Outlier Detection
• Calculate distance from mean– z-score (number of standard deviations)– deleted z-score
• that case biased the mean, so remove it
– Look up expected distance from mean• 1% 3+ SDs
• Calculate influence – how much effect did that case have on the
mean?
![Page 385: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/385.jpg)
385
Non-Normality in Regression
![Page 386: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/386.jpg)
386
Effects on OLS Estimates
• The mean is an OLS estimate• The regression line is an OLS
estimate• Lack of normality
– biases the position of the regression slope
– makes the standard errors wrong• probability values attached to statistical
significance wrong
![Page 387: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/387.jpg)
387
Checks on Normality
• Check residuals are normally distributed– SPSS will draw histogram and p-p plot
of residuals
• Use regression diagnostics– Lots of them– Most aren’t very interesting
![Page 388: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/388.jpg)
388
Regression Diagnostics
• Residuals– standardised, unstandardised, studentised,
deleted, studentised-deleted– look for cases > |3| (?)
• Influence statistics– Look for the effect a case has– If we remove that case, do we get a
different answer?– DFBeta, Standardised DFBeta
• changes in b
![Page 389: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/389.jpg)
389
– DfFit, Standardised DfFit• change in predicted value
– Covariance ratio• Ratio of the determinants of the
covariance matrices, with and without the case
• Distances– measures of ‘distance’ from the
centroid– some include IV, some don’t
![Page 390: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/390.jpg)
390
More on Residuals
• Residuals are trickier than you might have imagined
• Raw residuals– OK
• Standardised residuals – Residuals divided by SD
1
2
kn
ese
![Page 391: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/391.jpg)
391
Leverage
• But– That SD is wrong– Variance of the residuals is not equal
• Those further from the centroid on the predictors have higher variance
• Need a measure of this
• Distance from the centroid is leverage, or h (or sometimes hii)
• One predictor– Easy
![Page 392: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/392.jpg)
392
• Minimum hi is 1/n, the maximum is 1
• Except– SPSS uses standardised leverage - h*
• It doesn’t tell you this, it just uses it
2
2
)(
1
xx
xx
nh i
i
![Page 393: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/393.jpg)
393
• Minimum 0, maximum (N-1/N)
2
2*
*
)(
1
xx
xxh
nhh
i
i
i
i
![Page 394: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/394.jpg)
394
• Multiple predictors– Calculate the hat matrix (H)– Leverage values are the diagonals of
this matrix
– Where X is the augmented matrix of predictors (i.e. matrix that includes the constant)
– Hence leverage hii – element ii of H
X'X)X(X'H 1
![Page 395: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/395.jpg)
395
• Example of calculation of hat matrix
318.0
236.0273.0
273.0318.0
651
......
201
151
651
......
201
151
651
......
201
151
651
......
201
151
1
H
![Page 396: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/396.jpg)
396
Standardised / Studentised
• Now we can calculate the standardised residuals – SPSS calls them studentised residuals– Also called internally studentised
residuals
ie
ii
hs
ee
1
![Page 397: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/397.jpg)
397
Deleted Studentised Residuals
• Studentised residuals do not have a known distribution– Cannot use them for inference
• Deleted studentised residuals– Externally studentised residuals– Jackknifed residuals
• Distributed as t• With df = N – k – 1
![Page 398: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/398.jpg)
398
Testing Significance
• We can calculate the probability of a residual – Is it sampled from the same
population
• BUT– Massive type I error rate– Bonferroni correct it
• Multiply p value by N
![Page 399: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/399.jpg)
399
Bivariate Normality
• We didn’t just say “residuals normally distributed”
• We said “at every value of the dependent variables”
• Two variables can be normally distributed – univariate,– but not bivariate
![Page 400: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/400.jpg)
400
• Couple’s IQs– male and female
140.0130.0120.0110.0100.090.080.070.060.0
FEMALE
Fre
qu
en
cy
8
6
4
2
0
140.0130.0120.0110.0100.090.080.070.060.0
MALE
Fre
qu
en
cy
6
5
4
3
2
1
0
–Seem reasonably normal
![Page 401: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/401.jpg)
401
• But wait!!
FEMALE
160140120100806040
MA
LE
160
140
120
100
80
60
40
![Page 402: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/402.jpg)
402
• When we look at bivariate normality– not normal – there is an outlier
• So plot X against Y• OK for bivariate
– but – may be a multivariate outlier– Need to draw graph in 3+ dimensions– can’t draw a graph in 3 dimensions
• But we can look at the residuals instead …
![Page 403: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/403.jpg)
403
• IQ histogram of residuals12
10
8
6
4
2
0
![Page 404: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/404.jpg)
404
Multivariate Outliers …
• Will be explored later in the exercises
• So we move on …
![Page 405: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/405.jpg)
405
What to do about Non-Normality
• Skew and Kurtosis– Skew – much easier to deal with– Kurtosis – less serious anyway
• Transform data– removes skew– positive skew – log transform– negative skew - square
![Page 406: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/406.jpg)
406
Transformation
• May need to transform IV and/or DV– More often DV
• time, income, symptoms (e.g. depression) all positively skewed
– can cause non-linear effects (more later) if only one is transformed
– alters interpretation of unstandardised parameter
– May alter meaning of variable– May add / remove non-linear and moderator
effects
![Page 407: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/407.jpg)
407
• Change measures– increase sensitivity at ranges
• avoiding floor and ceiling effects
• Outliers– Can be tricky– Why did the outlier occur?
• Error? Delete them.• Weird person? Probably delete them• Normal person? Tricky.
![Page 408: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/408.jpg)
408
– You are trying to model a process• is the data point ‘outside’ the process• e.g. lottery winners, when looking at
salary• yawn, when looking at reaction time
– Which is better?• A good model, which explains 99% of
your data?• A poor model, which explains all of it
• Pedhazur and Schmelkin (1991)– analyse the data twice
![Page 409: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/409.jpg)
409
• We will spend much less time on the other 6 assumptions
• Can do exercise 8.1.
![Page 410: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/410.jpg)
410
Assumption 2: The variance of the residuals for every
set of values for the independent variable is
equal.
![Page 411: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/411.jpg)
411
Heteroscedasticity
• This assumption is a about heteroscedasticity of the residuals– Hetero=different– Scedastic = scattered
• We don’t want heteroscedasticity– we want our data to be
homoscedastic
• Draw a scatterplot to investigate
![Page 412: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/412.jpg)
412FEMALE
160140120100806040
MA
LE
160
140
120
100
80
60
40
![Page 413: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/413.jpg)
413
• Only works with one IV– need every combination of IVs
• Easy to get – use predicted values– use residuals there
• Plot predicted values against residuals– or standardised residuals– or deleted residuals– or standardised deleted residuals– or studentised residuals
• A bit like turning the scatterplot on its side
![Page 414: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/414.jpg)
414
Good – no heteroscedasticity
Predicted Value
Resid
ual
![Page 415: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/415.jpg)
415
Bad – heteroscedasticity
Predicted Value
Resid
ual
![Page 416: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/416.jpg)
416
Testing Heteroscedasticity
• White’s test– Not automatic in SPSS (is in SAS)– Luckily, not hard to do1. Do regression, save residuals.2. Square residuals3. Square IVs4. Calculate interactions of IVs
– e.g. x1•x2, x1•x3, x2 • x3
![Page 417: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/417.jpg)
417
5. Run regression using – squared residuals as DV– IVs, squared IVs, and interactions as IVs
6. Test statistic = N x R2
– Distributed as 2
– Df = k (for second regression)
• Use education and salbegin to predict salary (employee data.sav)
– R2 = 0.113, N=474, 2 = 53.5, df=5, p < 0.0001
![Page 418: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/418.jpg)
418
Plot of Pred and Res
Regression Standardized Predicted Value
86420-2
Regre
ssio
n S
tandard
ized R
esi
dual
8
6
4
2
0
-2
-4
![Page 419: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/419.jpg)
419
Magnitude of Heteroscedasticity
• Chop data into “slices”– 5 slices, based on X (or predicted
score)• Done in SPSS
– Calculate variance of each slice– Check ratio of smallest to largest– Less than 10:1
• OK
![Page 420: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/420.jpg)
420
The Visual Bander• New in SPSS 12
![Page 421: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/421.jpg)
421
• Variances of the 5 groups
• We have a problem– 3 / 0.2 ~= 15
![Page 422: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/422.jpg)
422
Dealing with Heteroscedasticity
• Use Huber-White estimates– Very easy in Stata– Fiddly in SPSS – bit of a hack
• Use Complex samples1. Create a new variable where all
cases are equal to 1, call it const2. Use Complex Samples, Prepare for
Analysis3. Create a plan file
![Page 423: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/423.jpg)
423
4. Sample weight is const5. Finish6. Use Complex Samples, GLM7. Use plan file created, and set up
model as in GLM(More on complex samples later)
In Stata, do regression as normal, and click “robust”.
![Page 424: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/424.jpg)
424
Heteroscedasticity – Implications and Meanings
Implications• What happens as a result of
heteroscedasticity?– Parameter estimates are correct
• not biased
– Standard errors (hence p-values) are incorrect
![Page 425: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/425.jpg)
425
However …
• If there is no skew in predicted scores– P-values a tiny bit wrong
• If skewed,– P-values very wrong
• Can do exercise
![Page 426: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/426.jpg)
426
Meaning• What is heteroscedasticity trying to
tell us?– Our model is wrong – it is misspecified– Something important is happening
that we have not accounted for
• e.g. amount of money given to charity (given)– depends on:
• earnings • degree of importance person assigns to
the charity (import)
![Page 427: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/427.jpg)
427
• Do the regression analysis– R2 = 0.60, F=31.4, df=2, 37, p < 0.001
• seems quite good
– b0 = 0.24, p=0.97
– b1 = 0.71, p < 0.001
– b2 = 0.23, p = 0.031
• White’s test– 2 = 18.6, df=5, p=0.002
• The plot of predicted values against residuals …
![Page 428: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/428.jpg)
428
• Plot shows heteroscedastic relationship
![Page 429: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/429.jpg)
429
• Which means …– the effects of the variables are not
additive – If you think that what a charity does
is important• you might give more money• how much more depends on how much
money you have
![Page 430: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/430.jpg)
430
IMPORT
16141210864
GIV
EN
70
60
50
40
30
20
10
Earnings
High
Low
![Page 431: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/431.jpg)
431
• One more thing about heteroscedasticity
– it is the equivalent of homogeneity of variance in ANOVA/t-tests
![Page 432: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/432.jpg)
432
Assumption 3: The Error Term is Additive
![Page 433: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/433.jpg)
433
Additivity
• What heteroscedasticity shows you– effects of variables need to be additive
• Heteroscedasticity doesn’t always show it to you– can test for it, but hard work– (same as homogeneity of covariance
assumption in ANCOVA)
• Have to know it from your theory• A specification error
![Page 434: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/434.jpg)
434
Additivity and Theory• Two IVs
– Alcohol has sedative effect• A bit makes you a bit tired• A lot makes you very tired
– Some painkillers have sedative effect• A bit makes you a bit tired• A lot makes you very tired
– A bit of alcohol and a bit of painkiller doesn’t make you very tired
– Effects multiply together, don’t add together
![Page 435: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/435.jpg)
435
• If you don’t test for it– It’s very hard to know that it will
happen
• So many possible non-additive effects– Cannot test for all of them– Can test for obvious
• In medicine– Choose to test for salient non-additive
effects– e.g. sex, race
![Page 436: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/436.jpg)
436
Assumption 4: At every value of the dependent variable the expected (mean) value of the
residuals is zero
![Page 437: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/437.jpg)
437
Linearity
• Relationships between variables should be linear – best represented by a straight line
• Not a very common problem in social sciences– except economics– measures are not sufficiently accurate to
make a difference • R2 too low• unlike, say, physics
![Page 438: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/438.jpg)
438
• Relationship between speed of travel and fuel used
Speed
Fue
l
![Page 439: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/439.jpg)
439
• R2 = 0.938– looks pretty good– know speed, make a good prediction
of fuel
• BUT– look at the chart– if we know speed we can make a
perfect prediction of fuel used– R2 should be 1.00
![Page 440: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/440.jpg)
440
Detecting Non-Linearity
• Residual plot– just like heteroscedasticity
• Using this example– very, very obvious– usually pretty obvious
![Page 441: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/441.jpg)
441
Residual plot
![Page 442: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/442.jpg)
442
Linearity: A Case of Additivity
• Linearity = additivity along the range of the IV
• Jeremy rides his bicycle harder– Increase in speed depends on current speed– Not additive, multiplicative– MacCallum and Mar (1995). Distinguishing
between moderator and quadratic effects in multiple regression. Psychological Bulletin.
![Page 443: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/443.jpg)
443
Assumption 5: The expected correlation between
residuals, for any two cases, is 0.
The independence assumption (lack of autocorrelation)
![Page 444: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/444.jpg)
444
Independence Assumption
• Also: lack of autocorrelation• Tricky one
– often ignored– exists for almost all tests
• All cases should be independent of one another– knowing the value of one case should not
tell you anything about the value of other cases
![Page 445: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/445.jpg)
445
How is it Detected?
• Can be difficult– need some clever statistics
(multilevel models)
• Better off avoiding situations where it arises
• Residual Plots• Durbin-Watson Test
![Page 446: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/446.jpg)
446
Residual Plots
• Were data collected in time order?– If so plot ID number against the
residuals– Look for any pattern
• Test for linear relationship• Non-linear relationship• Heteroscedasticity
![Page 447: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/447.jpg)
447
0 10 20 30 40
Participant Number
-2
-1
0
1
2R
esid
ual
![Page 448: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/448.jpg)
448
How does it arise?
Two main ways• time-series analyses
– When cases are time periods• weather on Tuesday and weather on Wednesday
correlated• inflation 1972, inflation 1973 are correlated
• clusters of cases– patients treated by three doctors– children from different classes– people assessed in groups
![Page 449: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/449.jpg)
449
Why does it matter?• Standard errors can be wrong
– therefore significance tests can be wrong
• Parameter estimates can be wrong– really, really wrong– from positive to negative
• An example– students do an exam (on statistics)– choose one of three questions
• IV: time• DV: grade
![Page 450: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/450.jpg)
450Time
70605040302010
Gra
de
90
80
70
60
50
40
30
20
10
•Result, with line of best fit
![Page 451: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/451.jpg)
451
• Result shows that– people who spent longer in the exam,
achieve better grades
• BUT …– we haven’t considered which question
people answered– we might have violated the
independence assumption• DV will be autocorrelated
• Look again– with questions marked
![Page 452: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/452.jpg)
452
• Now somewhat different
Time
70605040302010
Gra
de
90
80
70
60
50
40
30
20
10
Question
3
2
1
![Page 453: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/453.jpg)
453
• Now, people that spent longer got lower grades– questions differed in difficulty– do a hard one, get better grade– if you can do it, you can do it quickly
• Very difficult to analyse well– need multilevel models
![Page 454: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/454.jpg)
454
Durbin Watson Test
• Not well implemented in SPSS• Depends on the order of the data
– Reorder the data, get a different result
• Doesn’t give statistical significance of the test
![Page 455: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/455.jpg)
455
Assumption 6: All independent variables are
uncorrelated with the error term.
![Page 456: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/456.jpg)
456
Uncorrelated with the Error Term
• A curious assumption– by definition, the residuals are
uncorrelated with the independent variables (try it and see, if you like)
• It is about the DV– must have no effect (when the IVs
have been removed)– on the DV
![Page 457: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/457.jpg)
457
• Problem in economics– Demand increases supply– Supply increases wages– Higher wages increase demand
• OLS estimates will be (badly) biased in this case– need a different estimation procedure– two-stage least squares
• simultaneous equation modelling
![Page 458: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/458.jpg)
458
Assumption 7: No independent variables are a perfect linear function
of other independent variables
no perfect multicollinearity
![Page 459: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/459.jpg)
459
No Perfect Multicollinearity
• IVs must not be linear functions of one another– matrix of correlations of IVs is not positive
definite– cannot be inverted– analysis cannot proceed
• Have seen this with – age, age start, time working– also occurs with subscale and total
![Page 460: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/460.jpg)
460
• Large amounts of collinearity– a problem (as we shall see)
sometimes– not an assumption
![Page 461: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/461.jpg)
461
Assumption 8: The mean of the error term is zero.
You will like this one.
![Page 462: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/462.jpg)
462
Mean of the Error Term = 0
• Mean of the residuals = 0• That is what the constant is for
– if the mean of the error term deviates from zero, the constant soaks it up
0 1 1
0 1 1( 3) ( 3)
Y x
Y x
- note, Greek letters because we are talking about population values
![Page 463: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/463.jpg)
463
• Can do regression without the constant– Usually a bad idea– E.g R2 = 0.995, p < 0.001
• Looks good
![Page 464: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/464.jpg)
464
6 7 8 9 10 11 12 13
x1
7
8
9
10
11
12
13y
![Page 465: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/465.jpg)
465
![Page 466: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/466.jpg)
466
Lesson 9: Issues in Regression Analysis
Things that alter the interpretation of the regression equation
![Page 467: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/467.jpg)
467
The Four Issues
• Causality• Sample sizes• Collinearity• Measurement error
![Page 468: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/468.jpg)
468
Causality
![Page 469: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/469.jpg)
469
What is a Cause?
• Debate about definition of cause– some statistics (and philosophy)
books try to avoid it completely– We are not going into depth
• just going to show why it is hard
• Two dimensions of cause– Ultimate versus proximal cause– Determinate versus probabilistic
![Page 470: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/470.jpg)
470
Proximal versus Ultimate• Why am I here?
– I walked here because – This is the location of the class
because – Eric Tanenbaum asked me because – (I don’t know)– because I was in my office when he
rang because – I am a lecturer at York because – I saw an advert in the paper because
![Page 471: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/471.jpg)
471
– I exist because– My parents met because – My father had a job …
• Proximal cause– the direct and immediate cause of
something• Ultimate cause
– the thing that started the process off– I fell off my bicycle because of the
bump– I fell off because I was going too fast
![Page 472: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/472.jpg)
472
Determinate versus Probabilistic Cause
• Why did I fall off my bicycle?– I was going too fast– But every time I ride too fast, I don’t
fall off– Probabilistic cause
• Why did my tyre go flat?– A nail was stuck in my tyre– Every time a nail sticks in my tyre,
the tyre goes flat– Deterministic cause
![Page 473: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/473.jpg)
473
• Can get into trouble by mixing them together– Eating deep fried Mars Bars and doing
no exercise are causes of heart disease
– “My Grandad ate three deep fried Mars Bars every day, and the most exercise he ever got was when he walked to the shop next door to buy one”
– (Deliberately?) confusing deterministic and probabilistic causes
![Page 474: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/474.jpg)
474
Criteria for Causation
• Association• Direction of Influence• Isolation
![Page 475: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/475.jpg)
475
Association
• Correlation does not mean causation– we all know
• But– Causation does mean correlation
• Need to show that two things are related– may be correlation– my be regression when controlling for third
(or more) factor
![Page 476: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/476.jpg)
476
• Relationship between price and sales– suppliers may be cunning– when people want it more
• stick the price upPrice Demand SalesPrice 1 0.6 0
Demand 0.6 1 0.6Sales 0 0.6 1
– So – no relationship between price and sales
![Page 477: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/477.jpg)
477
– Until (or course) we control for demand
– b1 (Price) = -0.56
– b2 (Demand) = 0.94
• But which variables do we enter?
![Page 478: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/478.jpg)
478
Direction of Influence• Relationship between A and B
– three possible processes
A B
A B
A B
C
A causes B
B causes A
C causes A & B
![Page 479: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/479.jpg)
479
• How do we establish the direction of influence?– Longitudinally?
StormBarometer
Drops
– Now if we could just get that barometer needle to stay where it is …
• Where the role of theory comes in (more on this later)
![Page 480: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/480.jpg)
480
Isolation
• Isolate the dependent variable from all other influences– as experimenters try to do
• Cannot do this– can statistically isolate the effect– using multiple regression
![Page 481: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/481.jpg)
481
Role of Theory
• Strong theory is crucial to making causal statements
• Fisher said: to make causal statements “make your theories elaborate.”– don’t rely purely on statistical analysis
• Need strong theory to guide analyses– what critics of non-experimental
research don’t understand
![Page 482: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/482.jpg)
482
• S.J. Gould – a critic– says correlate price of petrol and his
age, for the last 10 years– find a correlation– Ha! (He says) that doesn’t mean
there is a causal link– Of course not! (We say).
• No social scientist would do that analysis without first thinking (very hard) about the possible causal relations between the variables of interest
• Would control for time, prices, etc …
![Page 483: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/483.jpg)
483
• Atkinson, et al. (1996)– relationship between college grades
and number of hours worked– negative correlation– Need to control for other variables –
ability, intelligence
• Gould says “Most correlations are non-causal” (1982, p243)– Of course!!!!
![Page 484: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/484.jpg)
484
I drink a lot of beer
120 non-causal correlations
16 causal relations
karaoke
jokes (about statistics)
vomit
toilet
headache
sleeping
equations (beermat)
laugh
thirsty
fried breakfast
no beer
curry
chips
falling over
lose keys
curtains closed
![Page 485: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/485.jpg)
485
• Abelson (1995) elaborates on this– ‘method of signatures’
• A collection of correlations relating to the process– the ‘signature’ of the process
• e.g. tobacco smoking and lung cancer– can we account for all of these
findings with any other theory?
![Page 486: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/486.jpg)
486
1. The longer a person has smoked cigarettes, the greater the risk of cancer.
2. The more cigarettes a person smokes over a given time period, the greater the risk of cancer.
3. People who stop smoking have lower cancer rates than do those who keep smoking.
4. Smoker’s cancers tend to occur in the lungs, and be of a particular type.
5. Smokers have elevated rates of other diseases.6. People who smoke cigars or pipes, and do not
usually inhale, have abnormally high rates of lip cancer.
7. Smokers of filter-tipped cigarettes have lower cancer rates than other cigarette smokers.
8. Non-smokers who live with smokers have elevated cancer rates.
(Abelson, 1995: 183-184)
![Page 487: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/487.jpg)
487
– In addition, should be no anomalous correlations• If smokers had more fallen arches than
non-smokers, not consistent with theory
• Failure to use theory to select appropriate variables– specification error– e.g. in previous example– Predict wealth from price and sales
• increase price, price increases• Increase sales, price increases
![Page 488: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/488.jpg)
488
• Sometimes these are indicators of the process– e.g. barometer – stopping the needle
won’t help– e.g. inflation? Indicator or cause?
![Page 489: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/489.jpg)
489
No Causation without Experimentation
• Blatantly untrue– I don’t doubt that the sun shining
makes us warm• Why the aversion?
– Pearl (2000) says problem is no mathematical operator
– No one realised that you needed one– Until you build a robot
![Page 490: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/490.jpg)
490
AI and Causality
• A robot needs to make judgements about causality
• Needs to have a mathematical representation of causality– Suddenly, a problem!– Doesn’t exist
• Most operators are non-directional• Causality is directional
![Page 491: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/491.jpg)
491
Sample Sizes
“How many subjects does it take to run a regression
analysis?”
![Page 492: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/492.jpg)
492
Introduction
• Social scientists don’t worry enough about the sample size required– “Why didn’t you get a significant result?”– “I didn’t have a large enough sample”
• Not a common answer
• More recently awareness of sample size is increasing– use too few – no point doing the research– use too many – waste their time
![Page 493: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/493.jpg)
493
• Research funding bodies• Ethical review panels
– both become more interested in sample size calculations
• We will look at two approaches– Rules of thumb (quite quickly)– Power Analysis (more slowly)
![Page 494: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/494.jpg)
494
Rules of Thumb
• Lots of simple rules of thumb exist– 10 cases per IV– >100 cases– Green (1991) more sophisticated
• To test significance of R2 – N = 50 + 8k• To test sig of slopes, N = 104 + k
• Rules of thumb don’t take into account all the information that we have– Power analysis does
![Page 495: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/495.jpg)
495
Power Analysis
Introducing Power Analysis• Hypothesis test
– tells us the probability of a result of that magnitude occurring, if the null hypothesis is correct (i.e. there is no effect in the population)
• Doesn’t tell us– the probability of that result, if the
null hypothesis is false
![Page 496: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/496.jpg)
496
• According to Cohen (1982) all null hypotheses are false– everything that might have an effect,
does have an effect• it is just that the effect is often very tiny
![Page 497: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/497.jpg)
497
Type I Errors
• Type I error is false rejection of H0
• Probability of making a type I error
– – the significance value cut-off • usually 0.05 (by convention)
• Always this value• Not affected by
– sample size– type of test
![Page 498: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/498.jpg)
498
Type II errors• Type II error is false acceptance of
the null hypothesis– Much, much trickier
• We think we have some idea– we almost certainly don’t
• Example– I do an experiment (random
sampling, all assumptions perfectly satisfied)
– I find p = 0.05
![Page 499: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/499.jpg)
499
– You repeat the experiment exactly• different random sample from same
population
– What is probability you will find p < 0.05?– ………………– Another experiment, I find p = 0.01– Probability you find p < 0.05?– ………………
• Very hard to work out– not intuitive – need to understand non-central sampling
distributions (more in a minute)
![Page 500: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/500.jpg)
500
• Probability of type II error = beta () – same as population regression
parameter (to be confusing)
• Power = 1 – Beta– Probability of getting a significant
result
![Page 501: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/501.jpg)
501
Type I errorp =
Type II error
p = power = 1 -
H0 true (we find no effect
– p > 0.05)
H0 false (we find an effect
– p < 0.05)
Research Findings
H0 false (effect to be
found)
H0 True(no effect to
be found)
State of the World
![Page 502: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/502.jpg)
502
• Four parameters in power analysis– – prob. of Type I error– – prob. of Type II error (power = 1 –
)– Effect size – size of effect in
population– N
• Know any three, can calculate the fourth– Look at them one at a time
![Page 503: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/503.jpg)
503
• Probability of Type I error– Usually set to 0.05– Somewhat arbitrary
• sometimes adjusted because of circumstances
– rarely because of power analysis
– May want to adjust it, based on power analysis
![Page 504: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/504.jpg)
504
• – Probability of type II error– Power (probability of finding a result) = 1 – – Standard is 80%
• Some argue for 90%
– Implication that Type I error is 4 times more serious than type II error• adjust ratio with compromise power
analysis
![Page 505: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/505.jpg)
505
• Effect size in the population– Most problematic to determine– Three ways1. What effect size would be useful to
find? • R2 = 0.01 - no use (probably)
2. Base it on previous research– what have other people found?
3. Use Cohen’s conventions– small R2 = 0.02– medium R2 = 0.13– large R2 = 0.26
![Page 506: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/506.jpg)
506
– Effect size usually measured as f2
– For R2
22
21
Rf
R
![Page 507: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/507.jpg)
507
– For (standardised) slopes
– Where sr2 is the contribution to the variance accounted for by the variable of interest
– i.e. sr2 = R2 (with variable) – R2 (without)• change in R2 in hierarchical regression
2
22
1 R
srf i
![Page 508: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/508.jpg)
508
• N – the sample size– usually use other three parameters to
determine this– sometimes adjust other parameters
() based on this– e.g. You can have 50 participants. No
more.
![Page 509: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/509.jpg)
509
Doing power analysis• With power analysis program
– SamplePower, GPower, Nquery
• With SPSS MANOVA– using non-central distribution
functions– Uses MANOVA syntax
• Relies on the fact you can do anything with MANOVA
• Paper B4
![Page 510: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/510.jpg)
510
Underpowered Studies
• Research in the social sciences is often underpowered– Why?– See Paper B11 – “the persistence of
underpowered studies”
![Page 511: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/511.jpg)
511
Extra Reading
• Power traditionally focuses on p values– What about CIs?– Paper B8 – “Obtaining regression
coefficients that are accurate, not simply significant”
![Page 512: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/512.jpg)
512
Collinearity
![Page 513: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/513.jpg)
513
Collinearity as Issue and Assumption
• Collinearity (multicollinearity) – the extent to which the independent
variables are (multiply) correlated• If R2 for any IV, using other IVs = 1.00
– perfect collinearity– variable is linear sum of other variables– regression will not proceed – (SPSS will arbitrarily throw out a variable)
![Page 514: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/514.jpg)
514
• R2 < 1.00, but high– other problems may arise
• Four things to look at in collinearity– meaning– implications– detection– actions
![Page 515: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/515.jpg)
515
Meaning of Collinearity
• Literally ‘co-linearity’– lying along the same line
• Perfect collinearity– when some IVs predict another– Total = S1 + S2 + S3 + S4– S1 = Total – (S2 + S3 + S4)– rare
![Page 516: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/516.jpg)
516
• Less than perfect– when some IVs are close to predicting – correlations between IVs are high
(usually, but not always)
![Page 517: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/517.jpg)
517
Implications
• Effects the stability of the parameter estimates– and so the standard errors of the
parameter estimates– and so the significance
• Because– shared variance, which the regression
procedure doesn’t know where to put
![Page 518: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/518.jpg)
518
• Red cars have more accidents than other coloured cars– because of the effect of being in a red
car?– because of the kind of person that drives
a red car?• we don’t know
– No way to distinguish between these three:
Accidents = 1 x colour + 0 x personAccidents = 0 x colour + 1 x person
Accidents = 0.5 x colour + 0.5 x person
![Page 519: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/519.jpg)
519
• Sex differences– due to genetics?– due to upbringing?– (almost) perfect collinearity
• statistically impossible to tell
![Page 520: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/520.jpg)
520
• When collinearity is less than perfect– increases variability of estimates
between samples– estimates are unstable– reflected in the variances, and hence
standard errors
![Page 521: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/521.jpg)
521
Detecting Collinearity
• Look at the parameter estimates– large standardised parameter
estimates (>0.3?), which are not significant • be suspicious
• Run a series of regressions– each IV as DV– all other IVs as IVs
• for each IV
![Page 522: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/522.jpg)
522
• Sounds like hard work?– SPSS does it for us!
• Ask for collinearity diagnostics– Tolerance – calculated for every IV
21Tolerance -R
Tolerance1
VIF
– Variance Inflation Factor• sq. root of amount s.e. has been
increased
![Page 523: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/523.jpg)
523
Actions
What you can do about collinearity“no quick fix” (Fox, 1991)
1. Get new data• avoids the problem• address the question in a different
way• e.g. find people who have been
raised as the ‘wrong’ gender• exist, but rare
• Not a very useful suggestion
![Page 524: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/524.jpg)
524
2. Collect more data• not different data, more data• collinearity increases standard error
(se)• se decreases as N increases
• get a bigger N
3. Remove / Combine variables• If an IV correlates highly with other IVs• Not telling us much new• If you have two (or more) IVs which are
very similar• e.g. 2 measures of depression, socio-
economic status, achievement, etc
![Page 525: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/525.jpg)
525
• sum them, average them, remove one
• Many measures• use principal components analysis to
reduce them
3. Use stepwise regression (or some flavour of)
• See previous comments• Can be useful in theoretical vacuum
4. Ridge regression• not very useful• behaves weirdly
![Page 526: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/526.jpg)
526
Measurement Error
![Page 527: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/527.jpg)
527
What is Measurement Error
• In social science, it is unlikely that we measure any variable perfectly– measurement error represents this
imperfection• We assume that we have a true
score – T
• A measure of that score– x
![Page 528: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/528.jpg)
528
• just like a regression equation– standardise the parameters– T is the reliability
• the amount of variance in x which comes from T
• but, like a regression equation– assume that e is random and has mean of
zero– more on that later
eTx
![Page 529: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/529.jpg)
529
Simple Effects of Measurement Error
• Lowers the measured correlation
– between two variables
• Real correlation– true scores (x* and y*)
• Measured correlation– measured scores (x and y)
![Page 530: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/530.jpg)
530
x* y*
yxe
Reliability of yryy
Reliability of xrxx
True correlation of x and y
rx*y*
Measured correlation of x and y
rxy
e
![Page 531: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/531.jpg)
531
• Attenuation of correlation
yyxxyxxy rrrr **
yyxx
xyyx
rr
rr **
• Attenuation corrected correlation
![Page 532: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/532.jpg)
532
• Example
3.0
8.0
7.0
xy
yy
xx
r
r
r
40.08.07.0
3.0**
**
yx
yyxx
xyyx
r
rr
rr
![Page 533: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/533.jpg)
533
Complex Effects of Measurement Error
• Really horribly complex• Measurement error reduces
correlations– reduces estimate of – reducing one estimate
• increases others
– because of effects of control– combined with effects of suppressor
variables– exercise to examine this
![Page 534: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/534.jpg)
534
Dealing with Measurement Error
• Attenuation correction– very dangerous– not recommended
• Avoid in the first place– use reliable measures– don’t discard information
• don’t categorise• Age: 10-20, 21-30, 31-40 …
![Page 535: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/535.jpg)
535
Complications
• Assume measurement error is – additive– linear
• Additive– e.g. weight – people may under-report /
over-report at the extremes
• Linear– particularly the case when using proxy
variables
![Page 536: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/536.jpg)
536
• e.g. proxy measures– Want to know effort on childcare,
count number of children• 1st child is more effort than last
– Want to know financial status, count income• 1st £10 much greater effect on financial
status than the 1000th.
![Page 537: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/537.jpg)
537
Lesson 10: Non-Linear Analysis in Regression
![Page 538: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/538.jpg)
538
Introduction
• Non-linear effect occurs – when the effect of one independent
variable– is not consistent across the range of
the IV
• Assumption is violated– expected value of residuals = 0– no longer the case
![Page 539: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/539.jpg)
539
Some Examples
![Page 540: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/540.jpg)
540Experience
Ski
llA Learning Curve
![Page 541: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/541.jpg)
541Arousal
Perf
orm
an
ceYerkes-Dodson Law of Arousal
![Page 542: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/542.jpg)
542Time
En
thu
siast
icEnthusiasm Levels over a
Lesson on Regression
0 3.5Suic
idal
![Page 543: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/543.jpg)
543
• Learning – line changed direction once
• Yerkes-Dodson– line changed direction once
• Enthusiasm– line changed direction twice
![Page 544: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/544.jpg)
544
Everything is Non-Linear
• Every relationship we look at is non-linear, for two reasons– Exam results cannot keep increasing
with reading more books• Linear in the range we examine
– For small departures from linearity• Cannot detect the difference• Non-parsimonious solution
![Page 545: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/545.jpg)
545
Non-Linear Transformations
![Page 546: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/546.jpg)
546
Bending the Line
• Non-linear regression is hard– We cheat, and linearise the data
• Do linear regression
Transformations• We need to transform the data
– rather than estimating a curved line• which would be very difficult• may not work with OLS
– we can take a straight line, and bend it– or take a curved line, and straighten it
• back to linear (OLS) regression
![Page 547: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/547.jpg)
547
• We still do linear regression– Linear in the parameters
– Y = b1x + b2x2 + …
• Can do non-linear regression– Non-linear in the parameters
– Y = b1x + b2
x2 + …
• Much trickier– Statistical theory either breaks down
OR becomes harder
![Page 548: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/548.jpg)
548
• Linear transformations– multiply by a constant– add a constant– change the slope and the intercept
![Page 549: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/549.jpg)
549
x
y
y=x
y=2xy=x + 3
![Page 550: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/550.jpg)
550
• Linear transformations are no use– alter the slope and intercept– don’t alter the standardised
parameter estimate
• Non-linear transformation– will bend the slope– quadratic transformation
y = x2
– one change of direction
![Page 551: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/551.jpg)
551
– Cubic transformationy = x2 + x3
– two changes of direction
![Page 552: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/552.jpg)
552
y=0 + 0.1x + 1x2
Quadratic Transformation
![Page 553: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/553.jpg)
553
Square Root Transformation
y=20 + -3x + 5x
![Page 554: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/554.jpg)
554
Cubic Transformation
0
1
2
3
4
5
6
0 1 2 3 4 5 6
y = 3 - 4x + 2x2 - 0.2x3
![Page 555: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/555.jpg)
555
Logarithmic Transformation
y = 1 + 0.1x + 10log(x)
![Page 556: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/556.jpg)
556
Inverse Transformation
y = 20 -10x + 8(1/x)
![Page 557: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/557.jpg)
557
• To estimate a non-linear regression– we don’t actually estimate anything
non-linear– we transform the x-variable to a non-
linear version– can estimate that straight line– represents the curve– we don’t bend the line, we stretch the
space around the line, and make it flat
![Page 558: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/558.jpg)
558
Detecting Non-linearity
![Page 559: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/559.jpg)
559
Draw a Scatterplot
• Draw a scatterplot of y plotted against x– see if it looks a bit non-linear– e.g. Anscombe’s data– e.g. Education and beginning salary
• from bank data• drawn in SPSS• with line of best fit
![Page 560: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/560.jpg)
560
• Anscombe (1973)– constructed a set of datasets– show the importance of graphs in
regression/correlation
• For each dataset
N 11Mean of x 9Mean of y 7.5Equation of regression line y = 3 + 0.5xsum of squares (X - mean) 110correlation coefficient 0.82R2 0.67
![Page 561: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/561.jpg)
561
![Page 562: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/562.jpg)
562
![Page 563: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/563.jpg)
563
![Page 564: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/564.jpg)
564
![Page 565: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/565.jpg)
565
A Real Example
• Starting salary and years of education– From employee data.sav
![Page 566: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/566.jpg)
566
Educational Level (years)
Beg
inni
ng S
alar
y
Expected value of error (residual) is >
0
Expected value of error (residual) is <
0
![Page 567: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/567.jpg)
567
Use Residual Plot
• Scatterplot is only good for one variable– use the residual plot (that we used for
heteroscedasticity)
• Good for many variables
![Page 568: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/568.jpg)
568
• We want– points to lie in a nice straight sausage
![Page 569: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/569.jpg)
569
• We don’t want– a nasty bent sausage
![Page 570: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/570.jpg)
570
3210-1-2
10
8
6
4
2
0
-2
• Educational level and starting salary
![Page 571: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/571.jpg)
571
Carrying Out Non-Linear Regression
![Page 572: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/572.jpg)
572
Linear Transformation
• Linear transformation doesn’t change– interpretation of slope– standardised slope– se, t, or p of slope– R2
• Can change– effect of a transformation
![Page 573: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/573.jpg)
573
• Actually more complex– with some transformations can add a
constant with no effect (e.g. quadratic)
• With others does have an effect– inverse, log
• Sometimes it is necessary to add a constant– negative numbers have no square
root– 0 has no log
![Page 574: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/574.jpg)
574
Education and Salary
Linear Regression• Saw previously that the assumption of
expected errors = 0 was violated• Anyway …
– R2 = 0.401, F=315, df = 1, 472, p < 0.001
– salbegin = -6290 + 1727 educ– Standardised
• b1 (educ) = 0.633
– Both parameters make sense
![Page 575: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/575.jpg)
575
Non-linear Effect• Compute new variable
– quadratic– educ2 = educ2
• Add this variable to the equation– R2 = 0.585, p < 0.001– salbegin = 46263 + -6542 educ + 310
educ2
• slightly curious
– Standardised• b1 (educ) = -2.4• b2 (educ2) = 3.1
– What is going on?
![Page 576: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/576.jpg)
576
• Collinearity– is what is going on– Correlation of educ and educ2
• r = 0.990
– Regression equation becomes difficult (impossible?) to interpret
• Need hierarchical regression– what is the change in R2
– is that change significant?– R2 (change) = 0.184, p < 0.001
![Page 577: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/577.jpg)
577
Cubic Effect• While we are at it, let’s look at the
cubic effect– R2 (change) = 0.004, p = 0.045– 19138 + 103 e + -206 e2 + 12
e3
– Standardised:b1(e) = 0.04
b2(e2) = -2.04
b3(e3) = 2.71
![Page 578: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/578.jpg)
578
Fourth Power• Keep going while we are ahead
– won’t run• ???
• Collinearity is the culprit– Tolerance (educ4) = 0.000005– VIF = 215555
• Matrix of correlations of IVs is not positive definite– cannot be inverted
![Page 579: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/579.jpg)
579
Interpretation• Tricky, given that parameter
estimates are a bit nonsensical• Two methods• 1: Use R2 change
– Save predicted values• or calculate predicted values to plot line
of best fit
– Save them from equation– Plot against IV
![Page 580: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/580.jpg)
580Education (Years)
222018161412108
Beg
inni
ng S
alar
y
50000
40000
30000
20000
10000
0
Cubic
Quadratic
Linear
![Page 581: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/581.jpg)
581
• Differentiate with respect to e• We said: s = 19138 + 103 e + -206 e2 + 12
e3
– but first we will simplify it to quadratic s = 46263 + -6542 e + 310 e2
• dy/dx = -6542 + 310 x 2 x e
![Page 582: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/582.jpg)
582
Education Slope9 -962
10 -34211 27812 89813 151814 213815 275816 337817 399818 461819 523820 5858
1 year of education at the higher end of the scale, better than
1 year at the lower end of the scale.MBA versus GCSE
![Page 583: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/583.jpg)
583
• Differentiate Cubic19138 + 103 e + -206 e2 + 12
e3
dy/dx = 103 – 206 2 e + 12 3 e2
• Can calculate slopes for quadratic and cubic at different values
![Page 584: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/584.jpg)
584
Education Slope (Quad) Slope (Cub)9 -962 -689
10 -342 -41711 278 -7312 898 34313 1518 83114 2138 139115 2758 202316 3378 272717 3998 350318 4618 435119 5238 527120 5858 6263
![Page 585: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/585.jpg)
585
A Quick Note on Differentiation
• For y = xp
– dx/dy = pxp-1
• For equations such as y =b1x + b2xP
dy/dx = b1 + b2pxp-1
• y = 3x + 4x2
– dy/dx = 3 + 4 • 2x
![Page 586: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/586.jpg)
586
• y = b1x + b2x2 + b3x3
– dy/dx = b1 + b2 • 2x + b3 • 3 • x2
• y = 4x + 5x2 + 6x3
• dx/dy = 4 + 5 • 2 • x + 6 • 3 • x2
• Many functions are simple to differentiate– Not all though
![Page 587: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/587.jpg)
587
Automatic Differentiation
• If you – Don’t know how to differentiate– Can’t be bothered to look up the
function
• Can use automatic differentiation software– e.g. GRAD (freeware)
![Page 588: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/588.jpg)
588
![Page 589: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/589.jpg)
589
Lesson 11: Logistic Regression
Dichotomous/Nominal Dependent Variables
![Page 590: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/590.jpg)
590
Introduction
• Often in social sciences, we have a dichotomous/nominal DV– we will look at dichotomous first, then a
quick look at multinomial
• Dichotomous DV• e.g.
– guilty/not guilty– pass/fail– won/lost– Alive/dead (used in medicine)
![Page 591: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/591.jpg)
591
Why Won’t OLS Do?
![Page 592: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/592.jpg)
592
Example: Passing a Test
• Test for bus drivers– pass/fail– we might be interested in degrees of pass fail
• a company which trains them will not• fail means ‘pay for them to take it again’
• Develop a selection procedure– Two predictor variables– Score – Score on an aptitude test– Exp – Relevant prior experience (months)
![Page 593: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/593.jpg)
593
• 1st ten casesScore Exp Pass
5 6 01 15 01 12 04 6 01 15 11 6 04 16 11 10 13 12 04 26 1
![Page 594: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/594.jpg)
594
• DV – pass (1 = Yes, 0 = No)
• Just consider score first– Carry out regression– Score as IV, Pass as DV– R2 = 0.097, F = 4.1, df = 1, 48, p =
0.028.
– b0 = 0.190
– b1 = 0.110, p=0.028• Seems OK
![Page 595: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/595.jpg)
595
• Or does it? …• 1st Problem – pp plot of residuals
Observed Cum Prob
1.00.75.50.250.00
Expecte
d C
um
Pro
b
1.00
.75
.50
.25
0.00
![Page 596: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/596.jpg)
596
• 2nd problem - residual plot
![Page 597: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/597.jpg)
597
• Problems 1 and 2– strange distributions of residuals– parameter estimates may be wrong– standard errors will certainly be
wrong
![Page 598: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/598.jpg)
598
• 3rd problem – interpretation– I score 2 on aptitude. – Pass = 0.190 + 0.110 2 = 0.41– I score 8 on the test– Pass = 0.190 + 0.110 8 = 1.07
• Seems OK, but– What does it mean?– Cannot score 0.41 or 1.07
• can only score 0 or 1
• Cannot be interpreted– need a different approach
![Page 599: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/599.jpg)
599
A Different ApproachLogistic Regression
![Page 600: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/600.jpg)
600
Logit Transformation
• In lesson 10, transformed IVs– now transform the DV
• Need a transformation which gives us– graduated scores (between 0 and 1)– No upper limit
• we can’t predict someone will pass twice
– No lower limit• you can’t do worse than fail
![Page 601: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/601.jpg)
601
Step 1: Convert to Probability
• First, stop talking about values– talk about probability– for each value of score, calculate
probability of pass
• Solves the problem of graduated scales
![Page 602: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/602.jpg)
602
Score 1 2 3 4 5N 7 5 6 4 2P 0.7 0.5 0.6 0.4 0.2N 3 5 4 6 8P 0.3 0.5 0.4 0.6 0.8
Fail
Pass
probability of failure given a
score of 1 is 0.7
probability of passing given a score of 5 is 0.8
![Page 603: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/603.jpg)
603
This is better• Now a score of 0.41 has a meaning
– a 0.41 probability of pass
• But a score of 1.07 has no meaning– cannot have a probability > 1 (or < 0)– Need another transformation
![Page 604: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/604.jpg)
604
Step 2: Convert to Odds-Ratio
Need to remove upper limit• Convert to odds• Odds, as used by betting shops
– 5:1, 1:2• Slightly different from odds in speech
– a 1 in 2 chance– odds are 1:1 (evens)– 50%
![Page 605: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/605.jpg)
605
• Odds ratio = (number of times it happened) / (number of times it didn’t happen)
)event(1)event(
)eventnot ((event)
ratio oddsp
p
p
p
![Page 606: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/606.jpg)
606
• 0.8 = 0.8/0.2 = 4
– equivalent to 4:1 (odds on)
– 4 times out of five
• 0.2 = 0.2/0.8 = 0.25
– equivalent to 1:4 (4:1 against)
– 1 time out of five
![Page 607: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/607.jpg)
607
• Now we have solved the upper bound problem– we can interpret 1.07, 2.07,
1000000.07
• But we still have the zero problem
– we cannot interpret predicted scores less than zero
![Page 608: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/608.jpg)
608
Step 3: The Log
• Log10 of a number(x)
xx )log(10• log(10) = 1
• log(100) = 2
• log(1000) = 3
![Page 609: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/609.jpg)
609
• log(1) = 0• log(0.1) = -1• log(0.00001) = -5
![Page 610: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/610.jpg)
610
Natural Logs and e
• Don’t use log10
– Use loge
• Natural log, ln• Has some desirable properties, that
log10 doesn’t– For us– If y = ln(x) + c– dy/dx = 1/x– Not true for any other logarithm
![Page 611: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/611.jpg)
611
• Be careful – calculators and stats packages are not consistent when they use log– Sometimes log10, sometimes loge
– Can prove embarrassing (a friend told me)
![Page 612: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/612.jpg)
612
Take the natural log of the odds ratio
• Goes from - +– can interpret any predicted value
![Page 613: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/613.jpg)
613
Putting them all together
• Logit transformation– log-odds ratio– not bounded at zero or one
![Page 614: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/614.jpg)
614
Score 1 2 3 4 5
N 7 5 6 4 2P 0.7 0.5 0.6 0.4 0.2N 3 5 4 6 8P 0.3 0.5 0.4 0.6 0.8
2.33 1.00 1.50 0.67 0.250.85 0.00 0.41 -0.41 -1.39
Fail
Pass
Odds (Fail)log(odds)fail
![Page 615: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/615.jpg)
615
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
Logit
pro
ba
bil
ity Probability gets
closer to zero, but never reaches it as
logit goes down.
![Page 616: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/616.jpg)
616
• Hooray! Problem solved, lesson over– errrmmm… almost
• Because we are now using log-odds ratio, we can’t use OLS– we need a new technique, called
Maximum Likelihood (ML) to estimate the parameters
![Page 617: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/617.jpg)
617
Parameter Estimation using ML
ML tries to find estimates of model parameters that are most likely to give rise to the pattern of observations in the sample data
• All gets a bit complicated– OLS is a special case of ML– the mean is an ML estimator
![Page 618: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/618.jpg)
618
• Don’t have closed form equations– must be solved iteratively– estimates parameters that are most
likely to give rise to the patterns observed in the data
– by maximising the likelihood function (LF)
• We aren’t going to worry about this– except to note that sometimes, the
estimates do not converge• ML cannot find a solution
![Page 619: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/619.jpg)
619
Interpreting Output
Using SPSS• Overall fit for:
– step (only used for stepwise)– block (for hierarchical)– model (always)– in our model, all are the same– 2=4.9, df=1, p=0.025
• F test
![Page 620: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/620.jpg)
620
Omnibus Tests of Model Coefficients
4.990 1 .025
4.990 1 .025
4.990 1 .025
Step
Block
Model
Step 1
Chi-square df Sig.
![Page 621: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/621.jpg)
621
• Model summary– -2LL (=2/N)– Cox & Snell R2
– Nagelkerke R2
– Different versions of R2
• No real R2 in logistic regression• should be considered ‘pseudo R2’
![Page 622: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/622.jpg)
622
Model Summary
64.245 .095 .127
Step
1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
![Page 623: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/623.jpg)
623
• Classification Table– predictions of model– based on cut-off of 0.5 (by default)– predicted values x actual values
![Page 624: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/624.jpg)
624
Classification Tablea
18 8 69.2
12 12 50.0
60.0
Observed
0
1
PASS
Overall Percentage
Step 1
0 1
PASS PercentageCorrect
Predicted
The cut value is .500a.
![Page 625: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/625.jpg)
625
Model parameters• B
– Change in the logged odds associated with a change of 1 unit in IV
– just like OLS regression– difficult to interpret
• SE (B)– Standard error– Multiply by 1.96 to get 95% CIs
![Page 626: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/626.jpg)
626
Variables in the Equation
-.467 .219 4.566
1.314 .714 3.390
SCORE
Constant
Step1
a
B S.E. Wald
Variable(s) entered on step 1: SCORE.a.
Variables in the Equation
.386 1.263 .744 2.143
.199 .323
score
Constant
Step1
a
Sig. Exp(B) Lower Upper
95.0% C.I.for EXP(B)
Variable(s) entered on step 1: score.a.
![Page 627: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/627.jpg)
627
• Constant – i.e. score = 0– B = 1.314– Exp(B) = eB = e1.314 = 3.720– OR = 3.720, p = 1 – (1 / (OR + 1))
= 1 – (1 / (3.720 + 1))– p = 0.788
![Page 628: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/628.jpg)
628
• Score 1– Constant b = 1.314– Score B = -0.467– Exp(1.314 – 0.467) = Exp(0.847)
= 2.332– OR = 2.332– p = 1 – (1 / (2.332 + 1))
= 0.699
![Page 629: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/629.jpg)
629
Standard Errors and CIs
• SPSS gives– B, SE B, exp(B) by default– Can work out 95% CI from standard
error– B ± 1.96 x SE(B)– Or ask for it in options
• Symmetrical in B– Non-symmetrical (sometimes very) in
exp(B)
![Page 630: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/630.jpg)
630
Variables in the Equation
-.467 .219 .627 .408 .962
1.314 .714 3.720
SCORE
Constant
B S.E. Exp(B) Lower Upper
95.0% C.I.forEXP(B)
Variable(s) entered on step 1: SCORE.a.
![Page 631: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/631.jpg)
631
• The odds of passing the test are multiplied by 0.63 (CIs = 0.408, 0.962p p = 0.033), for every additional point on the aptitude test.
![Page 632: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/632.jpg)
632
More on Standard Errors
• In OLS regression– If a variable is added in a hierarchical fashion– The p-value associated with the change in R2
is the same as the p-value of the variable– Not the case in logistic regression
• In our data 0.025 and 0.033
• Wald standard errors– Make p-value in estimates is wrong – too
high– (CIs still correct)
![Page 633: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/633.jpg)
633
• Two estimates use slightly different information– P-value says “what if no effect”– CI says “what if this effect”
• Variance depends on the hypothesised ratio of the number of people in the two groups
• Can calculate likelihood ratio based p-values– If you can be bothered– Some packages provide them
automatically
![Page 634: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/634.jpg)
634
Probit Regression
• Very similar to logistic– much more complex initial
transformation (to normal distribution)– Very similar results to logistic
(multiplied by 1.7)
• In SPSS:– A bit weird
• Probit regression available through menus
![Page 635: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/635.jpg)
635
– But requires data structured differently
• However– Ordinal logistic regression is
equivalent to binary logistic• If outcome is binary
– SPSS gives option of probit
![Page 636: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/636.jpg)
636
ResultsEstimat
eSE P
Logistic (binary)
Score 0.288 0.301 0.339
Exp 0.147 0.073 0.043
Logistic (ordinal)
Score 0.288 0.301 0.339
Exp 0.147 0.073 0.043
Logistic(probit)
Score 0.191 0.178 0.282
Exp 0.090 0.042 0.033
![Page 637: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/637.jpg)
637
Differentiating Between Probit and Logistic
• Depends on shape of the error term– Normal or logistic– Graphs are very similar to each other
• Could distinguish quality of fit– Given enormous sample size
• Logistic = probit x 1.7– Actually 1.6998
• Probit advantage– Understand the distribution
• Logistic advantage– Much simpler to get back to the probability
![Page 638: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/638.jpg)
638
0
0.2
0.4
0.6
0.8
1
1.2-3
-2.8
-2.6
-2.4
-2.2 -2
-1.8
-1.6
-1.4
-1.2 -1
-0.8
-0.6
-0.4
-0.2 0
0.2
0.4
0.6
0.8 1
1.2
1.4
1.6
1.8 2
2.2
2.4
2.6
2.8 3
Normal (Probit)Logistic
![Page 639: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/639.jpg)
639
Infinite Parameters
• Non-convergence can happen because of infinite parameters– Insoluble model
• Three kinds:• Complete separation
– The groups are completely distinct• Pass group all score more than 10• Fail group all score less than 10
![Page 640: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/640.jpg)
640
• Quasi-complete separation– Separation with some overlap
• Pass group all score 10 or more• Fail group all score 10 or less
• Both cases:– No convergence
• Close to this– Curious estimates– Curious standard errors
![Page 641: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/641.jpg)
641
• Categorical Predictors– Can cause separation– Esp. if correlated
• Need people in every cell
Male Female
White Non-White White Non-White
Below Poverty Line
Above Poverty Line
![Page 642: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/642.jpg)
642
Logistic Regression and Diagnosis
• Logistic regression can be used for diagnostic tests– For every score
• Calculate probability that result is positive• Calculate proportion of people with that score (or
lower) who have a positive result
• Calculate c statistic– Measure of discriminative power– %age of all possible cases, where the model
gives a higher probability to a correct case than to an incorrect case
![Page 643: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/643.jpg)
643
– Perfect c-statistic = 1.0– Random c-statistic = 0.5
• SPSS doesn’t do it automatically– But easy to do
• Save probabilities– Use Graphs, ROC Curve– Test variable: predicted probability– State variable: outcome
![Page 644: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/644.jpg)
644
Sensitivity and Specificity
• Sensitivity:– Probability of saying someone has a
positive result – • If they do: p(pos)|pos
• Specificity– Probability of saying someone has a
negative result• If they do: p(neg)|neg
![Page 645: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/645.jpg)
645
Calculating Sens and Spec
• For each value– Calculate
• proportion of minority earning less – p(m)• proportion of non-minority earning less –
p(w)
– Sensitivity (value)• P(m)
![Page 646: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/646.jpg)
646
Salary P(minority)
10 .39
20 .31
30 .23
40 .17
50 .12
60 .09
70 .06
80 .04
90 .03
![Page 647: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/647.jpg)
647
Using Bank Data
• Predict minority group, using salary (000s)– Logit(minority) = -0.044 + salary x –
0.039
• Find actual proportions
![Page 648: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/648.jpg)
648
0.0 0.2 0.4 0.6 0.8 1.0
1 - Specificity
0.0
0.2
0.4
0.6
0.8
1.0
Sen
sit
ivit
y
Diagonal segments are produced by ties.
ROC Curve
Area under curve is c-statistic
![Page 649: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/649.jpg)
649
More Advanced Techniques
• Multinomial Logistic Regression more than two categories in DV– same procedure– one category chosen as reference group
• odds of being in category other than reference
• Polytomous Logit Universal Models (PLUM)– Ordinal multinomial logistic regression– For ordinal outcome variables
![Page 650: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/650.jpg)
650
Final Thoughts
• Logistic Regression can be extended– dummy variables– non-linear effects– interactions (even though we don’t
cover them until the next lesson)• Same issues as OLS
– collinearity– outliers
![Page 651: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/651.jpg)
651
![Page 652: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/652.jpg)
652
![Page 653: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/653.jpg)
653
Lesson 12: Mediation and Path Analysis
![Page 654: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/654.jpg)
654
Introduction
• Moderator– Level of one variable influences effect of
another variable
• Mediator– One variable influences another via a third
variable
• All relationships are really mediated– are we interested in the mediators?– can we make the process more explicit
![Page 655: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/655.jpg)
655
• In examples with bank
educationbeginning
salary
• Why?– What is the process?– Are we making assumptions about
the process?– Should we test those assumptions?
![Page 656: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/656.jpg)
656
educationbeginning
salary
job skills
expectations
negotiating skills
kudos for bank
![Page 657: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/657.jpg)
657
Direct and Indirect Influences
X may affect Y in two ways• Directly – X has a direct (causal)
influence on Y– (or maybe mediated by other
variables)
• Indirectly – X affects Y via a mediating variable - M
![Page 658: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/658.jpg)
658
• e.g. how does going to the pub effect comprehension on a Summer school course– on, say, regression
Having fun in pub
in evening
not reading
books on regressio
nless
knowledge
Anything here?
![Page 659: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/659.jpg)
659
Having fun in pub
in evening
not reading
books on regressio
nless
knowledge
fatigue
Still needed
?
![Page 660: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/660.jpg)
660
• Mediators needed– to cope with more sophisticated
theory in social sciences– make explicit assumptions made
about processes– examine direct and indirect influences
![Page 661: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/661.jpg)
661
Detecting Mediation
![Page 662: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/662.jpg)
662
4 StepsFrom Baron and Kenny (1986)• To establish that the effect of X on Y is
mediated by M1. Show that X predicts Y2. Show that X predicts M3. Show that M predicts Y, controlling for X4. If effect of X controlling for M is zero, M
is complete mediator of the relationship• (3 and 4 in same analysis)
![Page 663: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/663.jpg)
663
Example: Book habits
Enjoy Books
Buy books
Read Books
![Page 664: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/664.jpg)
664
Three Variables
• Enjoy– How much an individual enjoys books
• Buy– How many books an individual buys
(in a year)• Read
– How many books an individual reads (in a year)
![Page 665: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/665.jpg)
665
ENJOY BUY READENJOY 1.00 0.64 0.73BUY 0.64 1.00 0.75READ 0.73 0.75 1.00
![Page 666: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/666.jpg)
666
• The Theory
enjoy readbuy
![Page 667: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/667.jpg)
667
• Step 11. Show that X (enjoy) predicts Y
(read)– b1 = 0.487, p < 0.001
– standardised b1 = 0.732
– OK
![Page 668: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/668.jpg)
668
2. Show that X (enjoy) predicts M (buy)
– b1 = 0.974, p < 0.001
– standardised b1 = 0.643
– OK
![Page 669: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/669.jpg)
669
3. Show that M (buy) predicts Y (read), controlling for X (enjoy)
– b1 = 0.469, p < 0.001
– standardised b1 = 0.206
– OK
![Page 670: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/670.jpg)
670
4. If effect of X controlling for M is zero, M is complete mediator of the relationship
– (Same as analysis for step 3.)
– b2 = 0.287, p = 0.001
– standardised b2 = 0.431
– Hmmmm…• Significant, therefore not a complete
mediator
![Page 671: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/671.jpg)
671
enjoy read
buy
0.974(from step
2)
0.206(from step
3)
0.287 (step 4)
![Page 672: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/672.jpg)
672
The Mediation Coefficient
• Amount of mediation = Step 1 – Step 4=0.487 – 0.287
= 0.200• OR
Step 2 x Step 3=0.974 x 0.206
= 0.200
![Page 673: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/673.jpg)
673
SE of Mediator
• sa = se(a)
• sb = se(b)
enjoy readbuy
a(from step
2)
b(from step
2)
![Page 674: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/674.jpg)
674
• Sobel test– Standard error of mediation
coefficient can be calculated
a = 0.974sa = 0.189
b = 0.206sb = 0.054
2 2 2 2 2 2a b a bs + - se b a s s s
![Page 675: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/675.jpg)
675
• Indirect effect = 0.200– se = 0.056– t =3.52, p = 0.001
• Online Sobel test:http://www.unc.edu/~preacher/sobel/sobel.htm– (Won’t be there for long; probably will be
somewhere else)
![Page 676: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/676.jpg)
676
A Note on Power
• Recently– Move in methodological literature away from
this conventional approach– Problems of power:– Several tests, all of which must be significant
• Type I error rate = 0.05 * 0.05 = 0.0025• Must affect power
– Bootstrapping suggested as alternative• See Paper B7, A4, B9• B21 for SPSS syntax
![Page 677: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/677.jpg)
677
![Page 678: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/678.jpg)
678
![Page 679: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/679.jpg)
679
Lesson 13: Moderators in Regression
“different slopes for different folks”
![Page 680: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/680.jpg)
680
Introduction
• Moderator relationships have many different names– interactions (from ANOVA)– multiplicative– non-linear (just confusing)– non-additive
• All talking about the same thing
![Page 681: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/681.jpg)
681
A moderated relationship occurs • when the effect of one variable
depends upon the level of another variable
![Page 682: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/682.jpg)
682
• Hang on …– That seems very like a nonlinear
relationship– Moderator
• Effect of one variable depends on level of another
– Non-linear• Effect of one variable depends on level of itself
• Where there is collinearity– Can be hard to distinguish between them– Paper in handbook (B5)– Should (usually) compare effect sizes
![Page 683: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/683.jpg)
683
• e.g. How much it hurts when I drop a computer on my foot depends on– x1: how much alcohol I have drunk
– x2: how high the computer was dropped from
– but if x1 is high enough
– x2 will have no effect
![Page 684: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/684.jpg)
684
• e.g. Likelihood of injury in a car accident– depends on
– x1: speed of car
– x2: if I was wearing a seatbelt
– but if x1 is low enough
– x2 will have no effect
![Page 685: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/685.jpg)
685
0
5
10
15
20
25
30
5 15 25 35 45
Speed (mph)
Inju
ry
Seatbelt No Seatbelt
![Page 686: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/686.jpg)
686
• e.g. number of words (from a list) I can remember– depends on
– x1: type of words (abstract, e.g. ‘justice’, or concrete, e.g. ‘carrot’)
– x2: Method of testing (recognition – i.e. multiple choice, or free recall)
– but if using recognition
– x1: will not make a difference
![Page 687: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/687.jpg)
687
• We looked at three kinds of moderator
• alcohol x height = pain– continuous x continuous
• speed x seatbelt = injury– continuous x categorical
• word type x test type– categorical x categorical
• We will look at them in reverse order
![Page 688: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/688.jpg)
688
How do we know to look for moderators?
Theoretical rationale• Often the most powerful• Many theories predict additive/linear
effects– Fewer predict moderator effects
Presence of heteroscedasticity• Clue there may be a moderated
relationship missing
![Page 689: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/689.jpg)
689
Two Categorical Predictors
![Page 690: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/690.jpg)
690
Data• 2 IVs– word type (concrete [1], abstract [2])– test method (recog [1], recall [2])
• 20 Participants in one of four groups– 1, 1– 1, 2– 2, 1– 2, 2
• 5 per group• lesson12.1.sav
![Page 691: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/691.jpg)
691
Concrete Abstract TotalMean 15.40 15.20 15.30SD 2.19 2.59 2.26Mean 15.60 6.60 11.10Std. Deviation 1.67 7.44 6.95Mean 15.50 10.90 13.20Std. Deviation 1.84 6.94 5.47
Total
Recall
Recog
![Page 692: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/692.jpg)
692
• Graph of means
TEST
2.001.00
18
16
14
12
10
8
6
WORDS
1.00
2.00
![Page 693: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/693.jpg)
693
ANOVA Results
• Standard way to analyse these data would be to use ANOVA– Words: F=6.1, df=1, 16, p=0.025– Test: F=5.1, df=1, 16, p=0.039– Words x Test: F=5.6, df=1, 16,
p=0.031
![Page 694: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/694.jpg)
694
Procedure for Testing
1: Convert to effect coding• can use dummy coding, collinearity is
less of an issue• doesn’t make any difference to
substantive interpretation2: Calculate interaction term• In ANOVA interaction is automatic• In regression we create an interaction
variable
![Page 695: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/695.jpg)
695
• Interaction term (wxt) – multiply effect coded variables
together
word test wxt-1 -1 11 -1 -1-1 1 -11 1 1
![Page 696: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/696.jpg)
696
3: Carry out regression• Hierarchical
– linear effects first– interaction effect in next block
![Page 697: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/697.jpg)
697
• b0=13.2• b1 (words) = -2.3, p=0.025• b2 (test) = -2.1, p=0.039• b3 (words x test) = -2.2, p=0.031• Might need to use change in R2 to
test sig of interaction, because of collinearity
What do these mean?• b0 (intercept) = predicted value of Y
(score) when all X = 0– i.e. the central point
![Page 698: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/698.jpg)
698
• b0 = 13.2– grand mean
• b1 = -2.3– distance from grand to mean for two
word types– 13.2 – (-2.3) = 15.5– 13.2 + (-2.3) = 10.9
Concrete Abstract TotalRecog 15.40 15.20 15.30Recall 15.60 6.60 11.10Total 15.50 10.90 13.20
![Page 699: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/699.jpg)
699
• b2 = -2.1– distance from grand mean to recog
and recall means
• b3 = -2.2
– to understand b3 we need to look at predictions from the equation without this term
Score = 13.2 + (-2.3) w + (-2.1) t
![Page 700: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/700.jpg)
700
Score = 13.2 + (-2.3) w + (-2.1) t• So for each group we can calculate
an expected value
![Page 701: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/701.jpg)
701
W
C
C
A
A
T
Cog
Call
Cog
Call
Word
-1
-1
1
1
Test
-1
1
-1
1
Expected Value
13.2 + (-2.3) x (-1) + (-2.1) x -1
13.2 + (-2.3) x (-1) + (-2.1) x 1
13.2 + (-2.3) x 1 + (-2.1) x (-1)
13.2 + (-2.3) x 1 + (-2.1) x 1
b1 = -2.3, b2 = -2.1
![Page 702: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/702.jpg)
702
• The exciting part comes when we look at the differences between the actual value and the value in the 2 IV model
W T Word Test Exp Actual ValueC Call -1 -1 17.6 15.4C Cog -1 1 13.4 15.6A Call 1 -1 13.0 15.2A Cog 1 1 8.8 11.0
![Page 703: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/703.jpg)
703
• Each difference = 2.2 (or –2.2)
• The value of b3 was –2.2– the interaction term is the correction
required to the slope when the second IV is included
![Page 704: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/704.jpg)
704
• Examine the slope for word type
0
2
4
6
8
10
12
14
16
18
Recog (-1) Recall (1)
Test Type
Gradient = (11.1 - 15.3) / 2 = -
2.1
![Page 705: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/705.jpg)
705
• Add the slopes for two test groups
0
2
4
6
8
10
12
14
16
18
Recog (-1) Recall (1)
Test Type
Both word groups (-
2.1)
Abstract(6.6 - 15.2 )/2
= -4.3
Concrete(15.6-15.4 )/2
= 0.1
![Page 706: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/706.jpg)
706
b associated with interaction • the change in slope, away from the
average, associated with a 1 unit change in the moderating variable
OR• Half the difference in the slopes
![Page 707: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/707.jpg)
707
• Another way to look at itY = 13.2 + -2.3w + -2.1t + -2.2wt
• Examine concrete words group (w = -1)– substitute values into the equation
Y(concrete) = 13.2 + -2.3-1 + -2.1t + -2.2-1t
Y(concrete) = 13.2 + 2.3 + -2.1t + 2.2t
Y(concrete) = 15.5 + 0.1t • The effect of changing test type for
concrete words (the slope, which is half the actual difference)
![Page 708: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/708.jpg)
708
Why go to all that effort? Why not do ANOVA in the first place?
1. That is what ANOVA actually does• if it can handle an unbalanced design
(i.e. different numbers of people in each group)
• Helps to understand what can be done with ANOVA
• SPSS uses regression to do ANOVA
2. Helps to clarify more complex cases
• as we shall see
![Page 709: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/709.jpg)
709
Categorical x Continuous
![Page 710: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/710.jpg)
710
Note on Dichotomisation
• Very common to see people dichotomise a variable– Makes the analysis easier– Very bad idea
• Paper B6
![Page 711: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/711.jpg)
711
Data
A chain of 60 supermarkets • examining the relationship between
profitability, shop size, and local competition
• 2 IVs– shop size– comp (local competition, 0=no, 1=yes)
• DV– profit
![Page 712: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/712.jpg)
712
• Data, ‘lesson 12.2.sav’
Shopsize Comp Profit4 1 23
10 1 257 0 19
10 0 910 1 1829 1 3312 0 176 1 20
14 0 2162 0 8
![Page 713: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/713.jpg)
713
1st Analysis
Two IVs• R2=0.367, df=2, 57, p < 0.001• Unstandardised estimates
– b1 (shopsize) = 0.083 (p=0.001)
– b2 (comp) = 5.883 (p<0.001)
• Standardised estimates– b1 (shopsize) = 0.356
– b2 (comp) = 0.448
![Page 714: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/714.jpg)
714
• Suspicions– Presence of competition is likely to
have an effect– Residual plot shows a little
heteroscedasticity
2.01.51.0.50.0-.5-1.0-1.5-2.0
3
2
1
0
-1
-2
-3
![Page 715: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/715.jpg)
715
Procedure for Testing
• Very similar to last time– convert ‘comp’ to effect coding– -1 = No competition– 1 = competition– Compute interaction term
• comp (effect coded) x size
– Hierarchical regression
![Page 716: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/716.jpg)
716
Result
• Unstandardised estimates– b1 (shopsize) = 0.071 (p=0.006)
– b2 (comp) = -1.67 (p = 0.506)
– b3 (sxc) = -0.050 (p=0.050)
• Standardised estimates– b1 (shopsize) = 0.306
– b2 (comp) = -0.127
– b3 (sxc) = -0.389
![Page 717: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/717.jpg)
717
• comp now non-significant– shows importance of hierarchical– it obviously is important
![Page 718: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/718.jpg)
718
Interpretation
• Draw graph with lines of best fit– drawn automatically by SPSS
• Interpret equation by substitution of values– evaluate effects of
• size• competition
![Page 719: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/719.jpg)
719
Shopsize
100806040200
Pro
fit
40
30
20
10
0
Competition
No competition
All Shops
![Page 720: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/720.jpg)
720
• Effects of size– in presence and absence of
competition– (can ignore the constant)Y=x10.071 + x2(-1.67) + x1x2 (-
0.050)
– Competition present (x2 = 1)
Y=x10.071 + 1(-1.67) + x11 (-0.050)
Y=x10.071 + -1.67 + x1(-0.050)
Y=x1 0.021 + (–1.67)
![Page 721: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/721.jpg)
721
Y=x10.071 + x2(-1.67) + x1x2 (-0.050)
– Competition absent (x2 = -1)
Y=x10.071 + -1(-1.67) + x1-1 (-0.050)
Y=x1 0.071 + x1-1 (-0.050) + -1(-1.67)
Y= x1 0.121 (+ 1.67)
![Page 722: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/722.jpg)
722
Two Continuous Variables
![Page 723: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/723.jpg)
723
Data
• Bank Employees– only using clerical staff– 363 cases– predicting starting salary– previous experience– age– age x experience
![Page 724: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/724.jpg)
724
• Correlation matrix – only one significant
LOGSB AGESTARTPREVEXPLOGSB 1.00 -0.09 0.08AGESTART -0.09 1.00 0.77PREVEXP 0.08 0.77 1.00
![Page 725: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/725.jpg)
725
Initial Estimates (no moderator)• (standardised)
– R2 = 0.061, p<0.001– Age at start = -0.37, p<0.001– Previous experience = 0.36, p<0.001
• Suppressing each other– Age and experience compensate for
one another– Older, with no experience, bad– Younger, with experience, good
![Page 726: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/726.jpg)
726
The Procedure
• Very similar to previous– create multiplicative interaction term– BUT
• Need to eliminate effects of means – cause massive collinearity
• and SDs– cause one variable to dominate the
interaction term• By standardising
![Page 727: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/727.jpg)
727
• To standardise x, – subtract mean, and divide by SD– re-expresses x in terms of distance
from the mean, in SDs– ie z-scores
• Hint: automatic in SPSS in Descriptives
• Create interaction term of age and exp– axe = z(age) z(exp)
![Page 728: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/728.jpg)
728
• Hierarchical regression– two linear effects first– moderator effect in second– hint: it is often easier to interpret if
standardised versions of all variables are used
![Page 729: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/729.jpg)
729
• Change in R2
– 0.085, p<0.001
• Estimates (standardised)– b1 (exp) = 0.104
– b2 (agestart) = -0.54
– b3 (age x exp) = -0.54
![Page 730: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/730.jpg)
730
Interpretation 1: Pick-a-Point
• Graph is tricky– can’t have two continuous variables– Choose specific points (pick-a-point)
• Graph the line of best fit of one variable at others
– Two ways to pick a point• 1: Choose high (z = +1), medium (z = 0)
and low (z = -1)• Choose ‘sensible’ values – age 20, 50,
80?
![Page 731: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/731.jpg)
731
• We know:– Y = e 0.10 + a -0.54 + a e -0.54– Where a = agestart, and e = experience
• We can rewrite this as:– Y = (e 0.10) + (a -0.54) + (a e -0.54)– Take a out of the brackets– Y = (e 0.10) + (-0.54 + e -0.54)a
• Bracketed terms are simple intercept and simple slope 0= (e 0.10) 1= (-0.54 + e -0.54)a– Y = 0 + 1a
![Page 732: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/732.jpg)
732
• Pick any value of e, and we know the slope for a– Standardised, so it’s easy
• e = -1 0= (-1 0.10) = -0.10 1= (-0.54 + -1 -0.54)a = -0.0a
• e = 0 0= (0 0.10) = 0 1= (-0.54+ 0 -0.54)a = -0.54a
• e = 1 0= (1 0.10) = 0.10 1= (-0.54 + 1 -0.54)a = -1.08a
![Page 733: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/733.jpg)
733
Graph the Three Lines
-1.5
-1
-0.5
0
0.5
1
1.5
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Age
Lo
g(s
ala
ry)
e = -1
e = 0
e = 1
![Page 734: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/734.jpg)
734
Interpretation 2: P-Values and CIs
• Second way – Newer, rarely done
• Calculate CIs of the slope – At any point
• Calculate p-value– At any point
• Give ranges of significance
![Page 735: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/735.jpg)
735
What do you need?
• The variance and covariance of the estimates– SPSS doesn’t provide estimates for
intercept– Need to do it manually
• In options, exclude intercept– Create intercept – c = 1– Use it in the regression
![Page 736: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/736.jpg)
736
• Enter information into web page:– www.unc.edu/~preacher/interact/acov.htm
– (Again, may not be around for long)
• Get results• Calculations in Bauer and Curran
(in press: Multivariate Behavioral Research)– Paper B13
![Page 737: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/737.jpg)
737
-1.0 -0.5 0.0 0.5 1.0
4.0
4.1
4.2
4.3
4.4
4.5
MLR 2-Way Interaction Plot
X
Y
CVz1(1)CVz1(2)CVz1(3)
![Page 738: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/738.jpg)
738
Areas of Significance
-4 -2 0 2 4
-0.6
-0.4
-0.2
0.0
0.2
0.4
Confidence Bands
Experience
Sim
ple
Slo
pe
![Page 739: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/739.jpg)
739
• 2 complications– 1: Constant differed– 2: DV was logged, hence non-linear
• effect of 1 unit depends on where the unit is
– Can use SPSS to do graphs showing lines of best fit for different groups
– See paper A2
![Page 740: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/740.jpg)
740
Finally …
![Page 741: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/741.jpg)
741
Unlimited Moderators
• Moderator effects are not limited to – 2 variables– linear effects
![Page 742: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/742.jpg)
742
Three Interacting Variables
• Age, Sex, Exp• Block 1
– Age, Sex, Exp
• Block 2– Age x Sex, Age x Exp, Sex x Exp
• Block 3– Age x Sex x Exp
![Page 743: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/743.jpg)
743
• Results– All two way interactions significant– Three way not significant– Effect of Age depends on sex– Effect of experience depends on sex– Size of the age x experience
interaction does not depend on sex (phew!)
![Page 744: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/744.jpg)
744
Moderated Non-Linear Relationships
• Enter non-linear effect• Enter non-linear effect x moderator
– if significant indicates degree of non-linearity differs by moderator
![Page 745: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/745.jpg)
745
![Page 746: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/746.jpg)
746
Modelling Counts: Poisson Regression
Lesson 14
![Page 747: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/747.jpg)
747
Counts and the Poisson Distribution
• Von Bortkiewicz (1898)– Numbers of Prussian
soldiers kicked to death by horses
0 1091 652 223 34 15 0
0
20
40
60
80
100
120
0 1 2 3 4 5
![Page 748: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/748.jpg)
748
• The data fitted a Poisson probability distribution– When counts of events occur, poisson
distribution is common– E.g. papers published by researchers, police
arrests, number of murders, ship accidents
• Common approach– Log transform and treat as normal
• Problems– Censored at 0– Integers only allowed– Heteroscedasticity
![Page 749: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/749.jpg)
749
The Poisson Distribution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Count
Pro
bab
ilit
y
0.5
14
8
![Page 750: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/750.jpg)
750
!
)exp()|(
yxyp
y
![Page 751: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/751.jpg)
751
• Where:– y is the count– is the mean of the poisson distribution
• In a poisson distribution– The mean = the variance (hence
heteroscedasticity issue))–
!
)exp()|(
yxyp
y
![Page 752: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/752.jpg)
752
Poisson Regression in SPSS
• Not directly available– SPSS can be tweaked to do it in three ways:– General loglinear model (genlog)– Non-linear regression (CNLR)
• Bootstrapped p-values only
– Both are quite tricky
• SPSS 15,
![Page 753: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/753.jpg)
753
Example Using Genlog
• Number of shark bites on different colour surfboards– 100 surfboards, 50
red, 50 blue
• Weight cases by bites
• Analyse, Loglinear, General– Colour is factor
0
5
10
15
20
25
0 1 2 3 4Number of bites
Fre
qu
en
cy
BlueRed
![Page 754: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/754.jpg)
754
Results
Correspondence Between Parameters and Terms of the Design
Parameter Aliased Term
1 Constant2 [COLOUR = 1]3 x [COLOUR = 2]Note: 'x' indicates an aliased (or a redundant) parameter. These parameters are set to zero.
![Page 755: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/755.jpg)
755
Asymptotic 95% CI
Param Est. SE Z-value Lower Upper
1 4.1190 .1275 32.30 3.87 4.37
2 -.5495 .2108 -2.61 -.96 -.14
3 .0000 . . . .
• Note: Intercept (param 1) is curious
• Param 2 is the difference in the means
![Page 756: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/756.jpg)
756
SPSS: Continuous Predictors
• Bleedin’ nightmare• http://www.spss.com/tech/
answer/details.cfm?tech_tan_id=100006204
![Page 757: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/757.jpg)
757
Poisson Regression in Stata
• SPSS will save a Stata file• Open it in Stata• Statistics, Count outcomes, Poisson
regression
![Page 758: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/758.jpg)
758
Poisson Regression in R
• R is a freeware program– Similar to SPlus– www.r-project.org
• Steep learning curve to start with• Much nicer to do Poisson (and other)
regression analysishttp://www.stat.lsa.umich.edu/~faraway/book/
http://www.jeremymiles.co.uk/regressionbook/extras/appendix2/R/
![Page 759: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/759.jpg)
759
• Commands in R• Stage 1: enter data
– colour <- c(1, 0, 1, 0, 1, 0 … 1)– bites <- c(3, 1, 0, 0, … )
• Run analysis– p1 <- glm(bites ~ colour, family = poisson)
• Get results– summary.glm(p1)
![Page 760: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/760.jpg)
760
R Results
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3567 0.1686 -2.115 0.03441 *
colour 0.5555 0.2116 2.625 0.00866 **
• Results for colour– Same as SPSS– For intercept different (weird SPSS)
![Page 761: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/761.jpg)
761
Predicted Values
• Need to get exponential of parameter estimates – Like logistic regression
• Exp(0.555) = 1.74– You are likely to be bitten by a shark
1.74 times more often with a red surfboard
![Page 762: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/762.jpg)
762
Checking Assumptions
• Was it really poisson distributed?– For Poisson, 2
• As mean increases, variance should also increase
– Residuals should be random• Overdispersion is common problem• Too many zeroes
• For blue: 2 = exp(-0.3567) = 1.42• For red: 2 = exp(-0.3567 + 0.555)
= 2.48
![Page 763: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/763.jpg)
763
• Strictly:
!
)exp()|(
yxyp
y
!
ˆ)ˆexp()|(
yxyp
y
ii
![Page 764: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/764.jpg)
764
Compare Predicted with Actual Distributions
Blue
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4Frequency
Pro
ba
bil
ity
ExpectedActual
Red
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4Frequency
Pro
ba
bil
ity
ExpectedActual
![Page 765: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/765.jpg)
765
Overdispersion• Problem in poisson regression
– Too many zeroes
• Causes– 2 inflation– Standard error deflation
• Hence p-values too low
– Higher type I error rate
• Solution– Negative binomial regression
![Page 766: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/766.jpg)
766
Using R
• R can read an SPSS file – But you have to ask it nicely
• Click Packages menu, Load package, choose “Foreign”
• Click File, Change Dir– Change to the folder that contains
your data
![Page 767: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/767.jpg)
767
More on R• R uses objects
– To place something into an object use <- – X <- Y
• Puts Y into X
• Function is read.spss()– Mydata <- read.spss(“spssfilename.sav”)
• Variables are then referred to as Mydata$VAR1– Note 1: R is case sensitive– Note 2: SPSS variable name in capitals
![Page 768: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/768.jpg)
768
GLM in R
• Command– glm(outcome ~ pred1 + pred2 + … +
predk [,family = familyname])– If no familyname, default is OLS
• Use binomial for logistic, poisson for poisson
• Output is a GLM object– You need to give this a name– my1stglm <- glm(outcome ~ pred1 +
pred2 + … + predk [,family = familyname])
![Page 769: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/769.jpg)
769
• Then need to explore the result– summary(my1stglm)
• To explore what it means– Need to plot regressions
• Easiest is to use Excel
![Page 770: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/770.jpg)
770
![Page 771: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/771.jpg)
771
Introducing Structural Equation Modelling
Lesson 15
![Page 772: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/772.jpg)
772
Introduction
• Related to regression analysis– All (OLS) regression can be
considered as a special case of SEM
• Power comes from adding restrictions to the model
• SEM is a system of equations– Estimate those equations
![Page 773: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/773.jpg)
773
Regression as SEM
• Grades example– Grade = constant + books + attend +
error• Looks like a regression equation
– Also– Books correlated with attend– Explicit modelling of error
![Page 774: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/774.jpg)
774
Path Diagram
• System of equations are usefully represented in a path diagram
x Measured variable
e unmeasured variable
regression
correlation
![Page 775: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/775.jpg)
775
Path Diagram for Regression
Books
Attend
Grade
error
Must usually explicitly
model error
Must explicitly model correlation
![Page 776: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/776.jpg)
776
Results
• Unstandardised
2.00
BOOKS
17.84
ATTEND
GRADE
4.04
1.28
2.65
1.00
e13.52
![Page 777: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/777.jpg)
777
Standardised
BOOKS
ATTEND
GRADE
.35
.33
.44
e.82
![Page 778: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/778.jpg)
778
TableEstimate S.E. C.R. P St. Est.
GRADE <-- BOOKS 4.04 1.71 2.36 0.02 0.35GRADE <-- ATTEND 1.28 0.57 2.25 0.03 0.33GRADE <-- e 13.52 1.53 8.83 0.00 0.82GRADE 37.38 7.54 4.96 0.00
Coefficientsa
37.38 7.74 .00
4.04 1.75 .35 .03
1.28 .59 .33 .04
(Constant)
BOOKS
ATTEND
Model
1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
Sig.
Dependent Variable: GRADEa.
![Page 779: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/779.jpg)
779
So What Was the Point?
• Regression is a special case• Lots of other cases• Power of SEM
– Power to add restrictions to the model• Restrict parameters
– To zero– To the value of other parameters– To 1
![Page 780: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/780.jpg)
780
Restrictions
• Questions– Is a parameter really necessary?– Are a set of parameters necessary?– Are parameters equal
• Each restriction adds 1 df– Test of model with 2
![Page 781: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/781.jpg)
781
The 2 Test
• Can the model proposed have generated the data?– Test of significance of difference of
model and data– Statistically significant result
• Bad
– Theoretically driven• Start with model• Don’t start with data
![Page 782: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/782.jpg)
782
Regression Again
• Both estimates restricted to zero
BOOKS
ATTEND
GRADE
0, 1
e
![Page 783: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/783.jpg)
783
• Two restrictions– 2 df for 2 test– 2 = 15.9, p = 0.0003
• This test is (asymptotically) equivalent to the F test in regression– We still haven’t got any further
![Page 784: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/784.jpg)
784
Multivariate Regression
x1
x2
y1
y2
y3
![Page 785: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/785.jpg)
785
x1
x2
y1
y2
y3
Test of all x’s on all y’s(6 restrictions = 6 df)
![Page 786: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/786.jpg)
786
Test of all x1 on all y’s(3 restrictions)
x1
x2
y1
y2
y3
![Page 787: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/787.jpg)
787
x1
x2
y1
y2
y3
Test of all x1 on all y1
(3 restrictions)
![Page 788: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/788.jpg)
788
x1
x2
y1
y2
y3
Test of all 3 partial correlations between y’s, controlling for x’s
(3 restrictions)
![Page 789: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/789.jpg)
789
Path Analysis and SEM
• More complex models – can add more restrictions– E.g. mediator
model
• 1 restriction– No path from
enjoy -> read
ENJOY
BUY
READ
1
e_buy
1
e_read
![Page 790: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/790.jpg)
790
Result
• 2 = 10.9, 1 df, p = 0.001• Not a complete mediator
– Additional path is required
![Page 791: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/791.jpg)
791
Multiple Groups
• Same model– Different people
• Equality constraints between groups– Means, correlations, variances,
regression estimates– E.g. males and females
![Page 792: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/792.jpg)
792
Multiple Groups Example
• Age• Severity of psoriasis
– SEVE – in emotional areas• Hands, face, forearm
– SEVNONE – in non-emotional areas– Anxiety– Depression
![Page 793: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/793.jpg)
793
Correlationsa
1 -.270 -.248 .017 .035
. .004 .009 .859 .717
110 110 110 110 110
-.270 1 .665 .045 .075
.004 . .000 .639 .436
110 110 110 110 110
-.248 .665 1 .109 .096
.009 .000 . .255 .316
110 110 110 110 110
.017 .045 .109 1 .782
.859 .639 .255 . .000
110 110 110 110 110
.035 .075 .096 .782 1
.717 .436 .316 .000 .
110 110 110 110 110
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
SEVE
SEVNONE
GHQ_A
GHQ_D
AGE SEVE SEVNONE GHQ_A GHQ_D
SEX = fa.
![Page 794: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/794.jpg)
794
Correlationsa
1 -.243 -.116 -.195 -.190
. .031 .310 .085 .094
79 79 79 79 79
-.243 1 .671 .456 .453
.031 . .000 .000 .000
79 79 79 79 79
-.116 .671 1 .210 .232
.310 .000 . .063 .040
79 79 79 79 79
-.195 .456 .210 1 .800
.085 .000 .063 . .000
79 79 79 79 79
-.190 .453 .232 .800 1
.094 .000 .040 .000 .
79 79 79 79 79
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
SEVE
SEVNONE
GHQ_A
GHQ_D
AGE SEVE SEVNONE GHQ_A GHQ_D
SEX = ma.
![Page 795: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/795.jpg)
795
Model
AGE
SEVE SEVNONE
Dep Anx
1
e_sn
1
e_s
1
E_d
1
e_a
![Page 796: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/796.jpg)
796
FemalesAGE
SEVE SEVNONE
Dep Anx
-.27 -.25
.03 .15.09 -.04
e_sne_s
E_d e_a
.96
.99 .99
.97
.78
.07 .04
.64
![Page 797: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/797.jpg)
797
MalesAGE
SEVE SEVNONE
Dep Anx
-.24 -.12
.52 -.17-.12 .55
e_sne_s
E_d e_a
.97
.88 .88
.99
.74
-.08 -.08
.67
![Page 798: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/798.jpg)
798
Constraint
• sevnone -> dep– Constrained to be equal for males and
females
• 1 restriction, 1 df– 2 = 1.3 – not significant
• 4 restrictions– 2 severity -> anx & dep
![Page 799: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/799.jpg)
799
• 4 restrictions, 4 df– 2 = 1.3, p = 0.014
• Parameters are not equal
![Page 800: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/800.jpg)
800
Missing Data: The big advantage
• SEM programs tend to deal with missing data – Multiple imputation– Full Information (Direct) Maximum
Likelihood• Asymptotically equivalent
• Data can be MAR, not just MCAR
![Page 801: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/801.jpg)
801
Power: A Smaller Advantage
• Power for regression gets tricky with large models
• With SEM power is (relatively) easy– It’s all based on chi-square– Paper B14
![Page 802: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/802.jpg)
802
Lesson 16: Dealing with clustered data & longitudinal
models
![Page 803: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/803.jpg)
803
The Independence Assumption
• In Lesson 8 we talked about independence – The residual of any one case should not tell
you about the residual of any other case
• Particularly problematic when:– Data are clustered on the predictor variable
• E.g. predictor is household size, cases are members of family
• E.g. Predictor is doctor training, outcome is patients of doctor
– Data are longitudinal• Have people measured over time
– It’s the same person!
![Page 804: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/804.jpg)
804
Clusters of Cases
• Problem with cluster (group) randomised studies– Or group effects
• Use Huber-White sandwich estimator– Tell it about the groups– Correction is made– Use complex samples in SPSS
![Page 805: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/805.jpg)
805
Complex Samples
• As with Huber-White for heteroscedasticity– Add a variable that tells it about the clusters– Put it into clusters
• Run GLM– As before
• Warning:– Need about 20 clusters for solutions to be
stable
![Page 806: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/806.jpg)
806
Example
• People randomised by week to one of two forms of triage– Compare the total cost of treating each
• Ignore clustering– Difference is £2.40 per person, with 95%
confidence intervals £0.58 to £4.22, p =0.010
• Include clustering– Difference is still £2.40, with 95% CIs £5.65
to -£0.85, and p = 0.141.
• Ignoring clustering led to type I error
![Page 807: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/807.jpg)
807
Longitudinal Research
• For comparing repeated measures– Clusters are people– Can model the
repeated measures over time
• Data are usually short and fat
ID V1 V2 V3 V4
1 2 3 4 7
2 3 6 8 4
3 2 5 7 5
![Page 808: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/808.jpg)
808
Converting Data
• Change data to tall and thin
• Use Data, Restructure in SPSS
• Clusters are ID
ID V X1 1 2
1 2 3
1 3 4
1 4 7
2 1 3
2 2 6
2 3 8
2 4 4
3 1 2
3 2 5
3 3 7
3 4 5
![Page 809: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/809.jpg)
809
(Simple) Example
• Use employee data.sav– Compare beginning salary and salary– Would normally use paired samples t-
test
• Difference = $17,403, 95% CIs $16,427.407, $18,379.555
![Page 810: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/810.jpg)
810
Restructure the Data
• Do it again– With data tall and thin
• Complex GLM with Time as factor– ID as cluster
• Difference = $17,430, 95% CIs = 16427.407, 18739.555
ID Time Cash
1 1$18,75
0
1 2$21,45
0
2 1$12,00
0
2 2$21,90
0
3 1$13,20
0
3 2$45,00
0
![Page 811: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/811.jpg)
811
Interesting …
• That wasn’t very interesting– What is more interesting is when we
have multiple measurements of the same people
• Can plot and assess trajectories over time
![Page 812: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/812.jpg)
812
Single Person Trajectory
Time
+
++
+ +
+
![Page 813: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/813.jpg)
813
Multiple Trajectories: What’s the Mean and SD?
Time
![Page 814: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/814.jpg)
814
Complex Trajectories
• An event occurs– Can have two effects:– A jump in the value– A change in the slope
• Event doesn’t have to happen at the same time for each person– Doesn’t have to happen at all
![Page 815: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/815.jpg)
815
Slope 1
Event Occurs
Jump
Slope 2
![Page 816: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/816.jpg)
816
ParameterisingTime Event Time2 Outcome
1 0 0 12
2 0 0 13
3 0 0 14
4 0 0 15
5 0 0 16
6 1 0 10
7 1 1 9
8 1 2 8
9 1 3 7
![Page 817: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/817.jpg)
817
Draw the Line
What are the parameter estimates?
![Page 818: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/818.jpg)
818
Main Effects and Interactions
• Main effects – Intercept differences
• Moderator effects– Slope differences
![Page 819: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/819.jpg)
819
Multilevel Models
• Fixed versus random effects– Fixed effects are fixed across
individuals (or clusters)– Random effects have variance
• Levels– Level 1 – individual measurement
occasions– Level 2 – higher order clusters
![Page 820: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/820.jpg)
820
More on Levels• NHS direct study
– Level 1 units: …………….– Level 2 units: ……………
• Widowhood food study– Level 1 units ……………– Level 2 units ……………
![Page 821: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/821.jpg)
821
More Flexibility
• Three levels:– Level 1: measurements– Level 2: people– Level 3: schools
![Page 822: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/822.jpg)
822
More Effects
• Variances and covariances of effects
• Level 1 and level 2 residuals– Makes R2 difficult to talk about
• Outcome variable– Yij
• The score of the ith person in the jth group
![Page 823: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/823.jpg)
823
Y i j
2.3 1 1
3.2 2 1
4.5 3 1
4.8 1 2
7.2 2 2
3.1 3 2
1.6 4 2
![Page 824: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/824.jpg)
824
Notation• Notation gets a bit horrid
– Varies a lot between books and programs
• We used to have b0 and b1
– If fixed, that’s fine– If random, each person has their own
intercept and slope
![Page 825: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/825.jpg)
825
Standard Errors
• Intercept has standard errors• Slopes have standard errors• Random effects have variances
– Those variances have standard errors• Is there statistically significant variation
between higher level units (people)?• OR• Is everyone the same?
![Page 826: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/826.jpg)
826
Programs
• Since version 12– Can do this in SPSS– Can’t do anything really clever
• Menus– Completely unusable– Have to use syntax
![Page 827: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/827.jpg)
827
SPSS Syntax
• MIXED• relfd with time• /fixed = time• /random = intercept time |
subject (id) covtype(un)• /print = solution.
![Page 828: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/828.jpg)
828
SPSS Syntax
• MIXED• relfd with time
OutcomeContinuous
predictor
![Page 829: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/829.jpg)
829
SPSS Syntax
• MIXED• relfd with time• /fixed = time
Must specify effect as fixed first
![Page 830: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/830.jpg)
830
SPSS Syntax
• MIXED• relfd with time• /fixed = time• /random = intercept time |
subject (id) covtype(un)Specify random
effects
Intercept and time are random
SPSS assumes that your level 2 units are subjects, and needs to know the id
variable
![Page 831: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/831.jpg)
831
SPSS Syntax
• MIXED• relfd with time• fixed = time• /random = intercept time |
subject (id) covtype(un)Covariance matrix of random
effects is unstructured. (Alternative is id – identity or vc
– variance components).
![Page 832: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/832.jpg)
832
SPSS Syntax
• MIXED• relfd with time• fixed = time• /random = intercept time |
subject (id) covtype(un)• /print = solution.
Print the answer
![Page 833: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/833.jpg)
833
The Output• Information criteria
– We’ll come backInformation Criteriaa
64899.758
64907.758
64907.763
64940.134
64936.134
-2 Restricted LogLikelihood
Akaike's InformationCriterion (AIC)
Hurvich and Tsai'sCriterion (AICC)
Bozdogan's Criterion(CAIC)
Schwarz's BayesianCriterion (BIC)
The information criteria are displayed in smaller-is-better forms.
Dependent Variable: relfd.a.
![Page 834: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/834.jpg)
834
Fixed Effects
• Not useful here, useful for interactions
Type III Tests of Fixed Effectsa
1 741 3251.877 .000
1 741.000 2.550 .111
Source
Intercept
time
Numerator dfDenominator
df F Sig.
Dependent Variable: relfd.a.
![Page 835: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/835.jpg)
835
Estimates of Fixed Effects
• Interpreted as regression equation
Estimates of Fixed Effectsa
21.90 21.90 .38 57.025 .000 21.15 22.66
-.06 -.06 .04 -1.597 .111 -.14 .01
Parameter
Intercept
time
EstimateStd.Error df t Sig.
LowerBound
UpperBound
95% ConfidenceInterval
Dependent Variable: relfd.a.
![Page 836: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/836.jpg)
836
Covariance Parameters
Estimates of Covariance Parametersa
64.11577 1.0526353
85.16791 5.7003732
-4.53179 .5067146
.7678319 .0636116
Parameter
Residual
UN (1,1)
UN (2,1)
UN (2,2)
Intercept +time [subject= id]
Estimate Std. Error
Dependent Variable: relfd.a.
![Page 837: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/837.jpg)
837
Change Covtype to VC
• We know that this is wrong– The covariance of the effects was
statistically significant– Can also see if it was wrong by
comparing information criteria• We have removed a parameter from
the model– Model is worse– Model is more parsimonious
• Is it much worse, given the increase in parsimony?
![Page 838: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/838.jpg)
838
UN ModelInformation Criteriaa
64899.758
64907.758
64907.763
64940.134
64936.134
-2 Restricted LogLikelihood
Akaike's InformationCriterion (AIC)
Hurvich and Tsai'sCriterion (AICC)
Bozdogan's Criterion(CAIC)
Schwarz's BayesianCriterion (BIC)
The information criteria are displayed in smaller-is-better forms.
Information Criteriaa
65041.891
65047.891
65047.894
65072.173
65069.173
-2 Restricted LogLikelihood
Akaike's InformationCriterion (AIC)
Hurvich and Tsai'sCriterion (AICC)
Bozdogan's Criterion(CAIC)
Schwarz's BayesianCriterion (BIC)
The information criteria are displayed in smaller-is-better forms.
VC Model
Lower is better.
![Page 839: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/839.jpg)
839
Adding Bits
• So far, all a bit dull• We want some more predictors, to
make it more exciting– E.g. female– Add:Relfd with time female/fixed = time sex time * sex
• What does the interaction term represent?
![Page 840: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/840.jpg)
840
Extending Models
• Models can be extended– Any kind of regression can be used
• Logistic, multinomial, Poisson, etc
– More levels• Children within classes within schools• Measures within people within classes within prisons
– Multiple membership / cross classified models• Children within households and classes, but
households not nested within class
• Need a different program– E.g. MlwiN
![Page 841: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/841.jpg)
841
MlwiN Example (very quickly)
![Page 842: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/842.jpg)
842
Books
Singer, JD and Willett, JB (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford, Oxford University Press.
Examples at:http://www.ats.ucla.edu/stat/SPSS/examples/alda/default.htm
![Page 843: 1 Theory of Regression. 2 The Course 16 (or so) lessons –Some flexibility Depends how we feel What we get through](https://reader031.vdocuments.us/reader031/viewer/2022012918/56649d295503460f949fe0b7/html5/thumbnails/843.jpg)
843
The End