1 theory of regression. 2 the course 16 (or so) lessons –some flexibility depends how we feel what...

Theory of Regression

The Course

• 16 (or so) lessons– Some flexibility

• Depends how we feel• What we get through

Part I: Theory of Regression1. Models in statistics2. Models with more than one parameter:

regression3. Why regression?4. Samples to populations5. Introducing multiple regression6. More on multiple regression

Part 2: Application of regression7. Categorical predictor variables8. Assumptions in regression analysis9. Issues in regression analysis10.Non-linear regression11.Moderators (interactions) in regression12.Mediation and path analysisPart 3: Advanced Types of Regression13.Logistic Regression14.Poisson Regression15. Introducing SEM16. Introducing longitudinal multilevel models

House Rules

• Jeremy must remember– Not to talk too fast

• If you don’t understand– Ask – Any time

• If you think I’m wrong– Ask. (I’m not always right)

Learning New Techniques

• Best kind of data to learn a new technique– Data that you know well, and understand

• Your own data– In computer labs (esp later on)– Use your own data if you like

• My data – I’ll provide you with– Simple examples, small sample sizes

• Conceptually simple (even silly)

Computer Programs

• SPSS– Mostly

• Excel – For calculations

• GPower• Stata (if you like)• R (because it’s flexible and free)• Mplus (SEM, ML?)• AMOS (if you like)

Lesson 1: Models in statistics

Models, parsimony, error, mean, OLS estimators

What is a Model?

What is a model?

• Representation– Of reality– Not reality

• Model aeroplane represents a real aeroplane– If model aeroplane = real

aeroplane, it isn’t a model

• Statistics is about modelling– Representing and simplifying

• Sifting – What is important from what is not

important

• Parsimony – In statistical models we seek

parsimony– Parsimony simplicity

Parsimony in Science

• A model should be:– 1: able to explain a lot– 2: use as few concepts as possible

• More it explains– The more you get

• Fewer concepts– The lower the price

• Is it worth paying a higher price for a better model?

A Simple Model

• Height of five individuals– 1.40m– 1.55m– 1.80m– 1.62m– 1.63m

• These are our DATA

A Little Notation

Y The (vector of) data that we are modelling

iY The ith observation in our data.

8,7,6,5,4

Greek letters represent the true value in the population.

(Beta) Parameters in our model (population value)

0 The value of the first parameter of our model in the population.

j The value of the jth parameter of our model, in the population.

(Epsilon) The error in the population model.

Normal letters represent the values in our sample. These are sample statistics, which are used to estimate population parameters.

b A parameters in our model (sample statistics)

e The error in our sample.

Y The data in our sample which we are trying to model.

Symbols on top change the meaning.

Y The data in our sample which we are trying to model (repeated).

iY The estimated value of Y, for the ith case.

Y The mean of Y.

11ˆ So b

I will use b1 (because it is easier to type)

• Not always that simple– some texts and computer programs

b = the parameter estimate (as we have used)

(beta) = the standardised parameter estimate

SPSS does this.

A capital letter is the set (vector) of parameters/statistics

B Set of all parameters (b0, b1, b2, b3 … bp)

Rules are not used very consistently (even by me).Don’t assume you know what someone means, without checking.

• We want a model – To represent those data

• Model 1:– 1.40m, 1.55m, 1.80m, 1.62m, 1.63m– Not a model

• A copy

– VERY unparsimonious

• Data: 5 statistics• Model: 5 statistics

– No improvement

• Model 2:– The mean (arithmetic mean)– A one parameter model

• Which, because we are lazy, can be written as

The Mean as a Model

The (Arithmetic) Mean

• We all know the mean– The ‘average’– Learned about it at school– Forget (didn’t know) about how clever the

mean is

• The mean is:– An Ordinary Least Squares (OLS) estimator– Best Linear Unbiased Estimator (BLUE)

Mean as OLS Estimator

• Going back a step or two• MODEL was a representation of DATA

– We said we want a model that explains a lot– How much does a model explain?

DATA = MODEL + ERRORERROR = DATA - MODEL

– We want a model with as little ERROR as possible

• What is error?

Error (e)Model (b0)mean

Data (Y)

0.031.63

0.021.62

0.201.80

-0.051.55

• How can we calculate the ‘amount’ of error?

• Sum of errors

03.002.020.005.020.0

– 0 implies no ERROR

• Not the case

– Knowledge about ERROR is useful

• As we shall see later

• Sum of absolute errors– Ignore signs

ERROR ie

0.20 0.05 0.20 0.02 0.03

ˆiY Y

• Are small and large errors equivalent?– One error of 4– Four errors of 1

– The same?

– What happens with different data?• Y = (2, 2, 5)

– b0 = 2

– Not very representative

• Y = (2, 2, 4, 4)– b0 = any value from 2 - 4

– Indeterminate• There are an infinite number of solutions which would

satisfy our criteria for minimum error

• Sum of squared errors (SSE)

03.002.020.005.020.0

• Determinate– Always gives one answer

• If we minimise SSE– Get the mean

• Shown in graph– SSE plotted against b0

– Min value of SSE occurs when

– b0 = mean

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

The Mean as an OLS Estimate

Mean as OLS Estimate

• The mean is an Ordinary Least Squares (OLS) estimate– As are lots of other things

• This is exciting because– OLS estimators are BLUE– Best Linear Unbiased Estimators– Proven with Gauss-Markov Theorem

• Which we won’t worry about

BLUE Estimators• Best

– Minimum variance (of all possible unbiased estimators

– Narrower distribution than other estimators• e.g. median, mode

• Linear– Linear predictions– For the mean– Linear (straight, flat) line

• Unbiased– Centred around true (population) values– Expected value = population value– Minimum is biased.

• Minimum in samples > minimum in population

• Estimators– Errrmm… they are estimators

• Also consistent– Sample approaches infinity, get closer

to population values– Variance shrinks

SSE and the Standard Deviation

• Tying up a loose end

2)ˆ( YYSSE i

1)ˆ( 2

• SSE closely related to SD• Sample standard deviation – s

– Biased estimator of population SD

• Population standard deviation - – Need to know the mean to calculate SD

• Reduces N by 1• Hence divide by N-1, not N

– Like losing one df

• That the mean minimises SSE– Not that difficult– As statistical proofs go

• Available in– Maxwell and Delaney – Designing

experiments and analysing data– Judd and McClelland – Data Analysis

(out of print?)

What’s a df?

• The number of parameters free to vary– When one is fixed

• Term comes from engineering– Movement available to structures

0 dfNo variation

available

1 dfFix 1 corner, the

shape is fixed

Back to the Data

• Mean has 5 (N) df– 1st moment

• has N –1 df– Mean has been fixed– 2nd moment– Can think of as amount cases vary

away from the mean

While we are at it …

• Skewness has N – 2 df– 3rd moment

• Kurtosis has N – 3 df– 4rd moment– Amount cases vary from

Parsimony and df

• Number of df remaining– Measure of parsimony

• Model which contained all the data– Has 0 df

– Not a parsimonious model

• Normal distribution– Can be described in terms of mean and

• 2 parameters

– (z with 0 parameters)

Summary of Lesson 1

• Statistics is about modelling DATA– Models have parameters– Fewer parameters, more parsimony, better

• Models need to minimise ERROR– Best model, least ERROR – Depends on how we define ERROR – If we define error as sum of squared

deviations from predicted value– Mean is best MODEL

Lesson 2: Models with one more parameter -

regression

In Lesson 1 we said …

• Use a model to predict and describe data– Mean is a simple, one parameter model

More Models

Slopes and Intercepts

More Models• The mean is OK

– As far as it goes– It just doesn’t go very far– Very simple prediction, uses very little

information• We often have more information

than that– We want to use more information

than that

House Prices• In the UK, two of the largest

lenders (Halifax and Nationwide) compile house price indices– Predict the price of a house– Examine effect of different

circumstances

• Look at change in prices– Guides legislation

• E.g. interest rates, town planning

Predicting House Prices

Beds £ (000s)1 772 741 883 625 905 1362 355 1344 1381 55

One Parameter Model• The mean

9.11806ˆ

SSEYbY

“How much is that house worth?”“£88,900”Use 1 df to say that

Adding More Parameters• We have more information than

this– We might as well use it– Add a linear function of number of

bedrooms (x1)

110ˆ xbbY

Alternative Expression

• Estimate of Y (expected value of Y)

• Value of Y

110ˆ xbbY

iii exbbY 110

Estimating the Model• We can estimate this model in four different,

equivalent ways– Provides more than one way of thinking about it

1. Estimating the slope which minimises SSE2. Examining the proportional reduction in

SSE3. Calculating the covariance 4. Looking at the efficiency of the predictions

Estimate the Slope to Minimise SSE

Estimate the Slope

• Stage 1– Draw a scatterplot– x-axis at mean

• Not at zero

• Mark errors on it– Called ‘residuals’– Sum and square these to find SSE

1.5 2 2.5 3 3.5 4 4.5 5 5.5

• Add another slope to the chart– Redraw residuals– Recalculate SSE– Move the line around to find slope

which minimises SSE

• Find the slope

• First attempt:

• Any straight line can be defined with two parameters– The location (height) of the slope

• b0

– Sometimes called

– The gradient of the slope • b1

• Gradient

1 unit

b1 units

• Height

• Height• If we fix slope to zero

– Height becomes mean

– Hence mean is b0

• Height is defined as the point that the slope hits the y-axis– The constant– The y-intercept

• Why the constant?– b0x0

– Where x0 is 1.00 for every case• i.e. x0 is constant

• Implicit in SPSS– Some packages

force you to make it explicit

– (Later on we’ll need to make it explicit)

beds (x1) x0 £ (000s)1 1 772 1 741 1 883 1 625 1 905 1 1362 1 355 1 1344 1 1381 1 55

• Why the intercept?– Where the regression line intercepts

the y-axis– Sometimes called y-intercept

Finding the Slope

• How do we find the values of b0 and b1?– Start with we jiggle the values, to find

the best estimates which minimise SSE– Iterative approach

• Computer intensive – used to matter, doesn’t really any more

• (With fast computers and sensible search algorithms – more on that later)

• Start with– b0=88.9 (mean)– b1=10 (nice round number)

• SSE = 14948 – worse than it was

– b0=86.9, b1=10, SSE=13828– b0=66.9, b1=10, SSE=7029– b0=56.9, b1=10, SSE=6628– b0=46.9, b1=10, SSE=8228– b0=51.9, b1=10, SSE=7178– b0=51.9, b1=12, SSE=6179– b0=46.9, b1=14, SSE=5957– ……..

• Quite a long time later– b0 = 46.000372

– b1 = 14.79182

– SSE = 5921

• Gives the position of the – Regression line (or)– Line of best fit

• Better than guessing

• Not necessarily the only method– But it is OLS, so it is the best (it is

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Number of Bedrooms

Actual Price

Predicted Price

• We now know – A house with no bedrooms is worth

£46,000 (??!)– Adding a bedroom adds £15,000

• Told us two things– Don’t extrapolate to meaningless

values of x-axis– Constant is not necessarily useful

• It is necessary to estimate the equation

Standardised Regression Line

• One big but:– Scale dependent

• Values change – £ to €, inflation

• Scales change– £, £000, £00?

• Need to deal with this

• Don’t express in ‘raw’ units– Express in SD units

– x1=1.72

– y=36.21

• b1 = 14.79

• We increase x1 by 1, and Ŷ increases by 14.79

SDsSDs 408.0)21.36/79.14(79.14

• Similarly, 1 unit of x1 = 1/1.72 SDs

– Increase x1 by 1 SD

– Ŷ increases by 14.79 (1.72/1) = 8.60

• Put them both together

• The standardised regression line– Change (in SDs) in Ŷ associated with a

change of 1 SD in x1

• A different route to the same answer– Standardise both variables (divide by

SD)– Find line of best fit

706.021.36

72.179.14

• The standardised regression line has a special name

The Correlation Coefficient(r)

(r stands for ‘regression’, but more on that later)

• Correlation coefficient is a standardised regression slope– Relative change, in terms of SDs

Proportional Reduction in Error

• We might be interested in the level of improvement of the model– How much less error (as proportion)

do we have– Proportional Reduction in Error (PRE)

• Mean only– Error(model 0) = 11806

• Mean + slope– Error(model 1) = 5921

0.4984PRE

118065921

ERROR(0)ERROR(1)

ERROR(0)ERROR(1)ERROR(0)

• But we squared all the errors in the first place – So we could take the square root– (It’s a shoddy excuse, but it makes the

point)

706.00.4984

• This is the correlation coefficient• Correlation coefficient is the square

root of the proportion of variance explained

Standardised Covariance

Standardised Covariance• We are still iterating

– Need a ‘closed-form’– Equation to solve to get the parameter

estimates

• Answer is a standardised covariance– A variable has variance– Amount of ‘differentness’

• We have used SSE so far

• SSE varies with N– Higher N, higher SSE

• Divide by N– Gives SSE per person– (Actually N – 1, we have lost a df to

the mean)• The variance• Same as SD2

– We thought of SSE as a scattergram • Y plotted against X

– (repeated image follows)

1.5 2 2.5 3 3.5 4 4.5 5 5.5

• Or we could plot Y against Y– Axes meet at the mean (88.9)– Draw a square for each point– Calculate an area for each square– Sum the areas

• Sum of areas – SSE

• Sum of areas divided by N– Variance

Plot of Y against Y

0 20 40 60 80 100 120 140 160 180

Draw Squares

35 – 88.9 = -53.9

138 – 88.9 = 40.1

Area = -53.9 x -53.9

= 2905.21

Area = 40.1 x 40.1= 1608.1

• What if we do the same procedure– Instead of Y against Y– Y against X

• Draw rectangles (not squares)• Sum the area• Divide by N - 1• This gives us the variance of x with

y– The Covariance – Shortened to Cov(x, y)

55 – 88.9 = -33.9

1 - 3 = -2

Area = (-33.9) x (-2)

= 67.84 - 3 = 1

138-88.9 = 49.1

Area = 49.1 x 1 = 49.1

• More formally (and easily)• We can state what we are doing as

an equation – Where Cov(x, y) is the covariance

• Cov(x,y)=44.2• What do points in different sectors

do to the covariance?

),(Cov

yyxxyx

• Problem with the covariance– Tells us about two things– The variance of X and Y– The covariance

• Need to standardise it– Like the slope

• Two ways to standardise the covariance– Standardise the variables first

• Subtract from mean and divide by SD

– Standardise the covariance afterwards

• First approach– Much more computationally

expensive• Too much like hard work to do by hand

– Need to standardise every value• Second approach

– Much easier– Standardise the final value only

• Need the combined variance– Multiply two variances– Find square root (were multiplied in

first place)

• Standardised covariance

13119.2

)(Var)(Var

),(Cov

• The correlation coefficient– A standardised covariance is a

correlation coefficient

Covariance

variance variancer

• Expanded …

• This means …– We now have a closed form equation

to calculate the correlation– Which is the standardised slope– Which we can use to calculate the

unstandardised slope

We know that:

• So value of b1 is the same as the iterative approach

79.1472.1

21.36706.0

• The intercept– Just while we are at it

• The variables are centred at zero– We subtracted the mean from both

variables– Intercept is zero, because the axes

cross at the mean

• Add mean of y to the constant– Adjusts for centring y

• Subtract mean of x– But not the whole mean of x– Need to correct it for the slope

38.149.88

• Naturally, the same

Accuracy of Prediction

One More (Last One)• We have one more way to

calculate the correlation– Looking at the accuracy of the

prediction

• Use the parameters– b0 and b1

– To calculate a predicted value for each case

• Plot actual price against predicted price– From the

BedsActual Price

Predicted Price

1 77 60.802 74 75.591 88 60.803 62 90.385 90 119.965 136 119.962 35 75.595 134 119.964 138 105.171 55 60.80

20 40 60 80 100 120 140 160Actual Value

• r = 0.706– The correlation

• Seems a futile thing to do– And at this stage, it is– But later on, we will see why

Some More Formulae• For hand calculation

• Point biserial

• Phi ()– Used for 2 dichotomous variables

Vote P Vote Q

Homeowner A: 19 B: 54

Not homeowner C: 60 D:53

))()()(( DBCADCBA

• Problem with the phi correlation– Unless Px= Py (or Px = 1 – Py)

• Maximum (absolute) value is < 1.00• Tetrachoric can be used

• Rank (Spearman) correlation– Used where data are ranked

Summary• Mean is an OLS estimate

– OLS estimates are BLUE

• Regression line– Best prediction of DV from IV– OLS estimate (like mean)

• Standardised regression line– A correlation

• Four ways to think about a correlation– 1. Standardised regression line– 2. Proportional Reduction in Error

(PRE)– 3. Standardised covariance– 4. Accuracy of prediction

Lesson 3: Why Regression?

A little aside, where we look at why regression has such a

curious name.

Regression

The or an act of regression; reversion; return towards the

mean; return to an earlier stage of development, as in an adult’s or an adolescent’s behaving like a child(From Latin gradi, to go)

• So why name a statistical technique which is about prediction and explanation?

• Francis Galton – Charles Darwin’s cousin– Studying heritability

• Tall fathers have shorter sons• Short fathers have taller sons

– ‘Filial regression toward mediocrity’ – Regression to the mean

• Galton thought this was biological fact– Evolutionary basis?

• Then did the analysis backward– Tall sons have shorter fathers– Short sons have taller fathers

• Regression to the mean– Not biological fact, statistical artefact

Other Examples

• Secrist (1933): The Triumph of Mediocrity in Business

• Second albums often tend to not be as good as first

• Sequel to a film is not as good as the first one

• ‘Curse of Athletics Weekly’• Parents think that punishing bad

behaviour works, but rewarding good behaviour doesn’t

Pair Link Diagram

• An alternative to a scatterplot

r=1.00

r=0.00

From Regression to Correlation

• Where do we predict an individual’s score on y will be, based on their score on x?– Depends on the correlation

• r = 1.00 – we know exactly where they will be

• r = 0.00 – we have no idea• r = 0.50 – we have some idea

r=1.00

Starts here

Will end up here

r=0.00

Could end anywhere here

Starts here

r=0.50

Starts here

Probably end

somewhere here

Galton Squeeze Diagram

• Don’t show individuals– Show groups of individuals, from the

same (or similar) starting point– Shows regression to the mean

r=0.00

Group starts here

Ends here

Group starts here

r=0.50

r=1.00

1 unit r units

• Correlation is amount of regression that doesn’t occur

• No regression• r=1.00

• Some regression• r=0.50

r=0.00

• Lots (maximum) regression• r=0.00

Formula

xxyy zrz ˆ

Conclusion

• Regression towards mean is statistical necessity

regression = perfection – correlation• Very non-intuitive• Interest in regression and correlation

– From examining the extent of regression towards mean

– By Pearson – worked with Galton– Stuck with curious name

• See also Paper B3

Lesson 4: Samples to Populations – Standard Errors and Statistical

Significance

The Problem• In Social Sciences

– We investigate samples• Theoretically

– Randomly taken from a specified population

– Every member has an equal chance of being sampled

– Sampling one member does not alter the chances of sampling another

• Not the case in (say) physics, biology, etc.

Population

• But it’s the population that we are interested in– Not the sample– Population statistic represented with

Greek letter– Hat means ‘estimate’

• Sample statistics (e.g. mean) estimate population parameters

• Want to know– Likely size of the parameter– If it is > 0

Sampling Distribution

• We need to know the sampling distribution of a parameter estimate– How much does it vary from sample to

sample

• If we make some assumptions– We can know the sampling distribution

of many statistics– Start with the mean

Sampling Distribution of the Mean

• Given– Normal distribution– Random sample– Continuous data

• Mean has a known sampling distribution– Repeatedly sampling will give a known

distribution of means– Centred around the true (population)

mean ()

Analysis Example: Memory• Difference in memory for different

– 10 participants given a list of 30 words to learn, and then tested

– Two types of word

• Abstract: e.g. love, justice

• Concrete: e.g. carrot, table

Concrete Abstract Diff (x)12 4 811 7 44 6 -29 12 -38 6 2

12 10 29 8 18 5 3

12 10 28 4 4

Confidence Intervals

• This means– If we know the mean in our sample– We can estimate where the mean in

the population () is likely to be

• Using– The standard error (se) of the mean– Represents the standard deviation of

the sampling distribution of the mean

Almost 2 SDs contain

1 SD contains

• We know the sampling distribution of the mean– t distributed– Normal with large N (>30)

• Know the range within means from other samples will fall– Therefore the likely range of

nxse x

• Two implications of equation– Increasing N decreases SE

• But only a bit

– Decreasing SD decreases SE • Calculate Confidence Intervals

– From standard errors• 95% is a standard level of CI

– 95% of samples the true mean will lie within the 95% CIs

– In large samples: 95% CI = 1.96 SE– In smaller samples: depends on t

distribution (df=N-1=9)

98.010

11.3)(

nxse x

22.298.026.2CI 95%

-0.12 4.32

What is a CI?

• (For 95% CI): • 95% chance that the true

(population) value lies within the confidence interval?

• 95% of samples, true mean will land within the confidence interval?

Significance Test

• Probability that is a certain value– Almost always 0

• Doesn’t have to be though

• We want to test the hypothesis that the difference is equal to 0– i.e. find the probability of this difference

occurring in our sample IF =0– (Not the same as the probability that

• Calculate SE, and then t– t has a known sampling distribution– Can test probability that a certain

value is included

14.298.0

Other Parameter Estimates

• Same approach– Prediction, slope, intercept, predicted

values– At this point, prediction and slope are

the same• Won’t be later on

• We will look at one predictor only– More complicated with > 1

Testing the Degree of Prediction

• Prediction is correlation of Y with Ŷ– The correlation – when we have one IV

• Use F, rather than t• Started with SSE for the mean only

– This is SStotal

– Divide this into SSresidual

– SSregression

• SStot = SSreg + SSres

• Back to the house prices– Original SSE (SStotal) = 11806

– SSresidual = 5921• What is left after our model

– SSregression = 11806 – 5921 = 5885• What our model explains

• Slope = 14.79• Intercept = 46.0• r = 0.706

95.7)1110(5921

• F = 7.95, df = 1, 8, p = 0.02– Can reject H0

• H0: Prediction is not better than chance

– A significant effect

Statistical Significance:What does a p-value (really)

A Quiz

• Six questions, each true or false• Write down your answers (if you like)

• An experiment has been done. Carried out perfectly. All assumptions perfectly satisfied. Absolutely no problems.

• P = 0.01– Which of the following can we say?

1. You have absolutely disproved the null hypothesis (that is, there is no difference between the population means).

2. You have found the probability of the null hypothesis being true.

3. You have absolutely proved your experimental hypothesis (that there is a difference between the population means).

4. You can deduce the probability of the experimental hypothesis being true.

5. You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.

6. You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions.

OK, What is a p-value

• Cohen (1994)“[a p-value] does not tell us what we

want to know, and we so much want to know what we want to

know that, out of desperation, we nevertheless believe it does” (p

OK, What is a p-value

• Sorry, didn’t answer the question• It’s The probability of obtaining a

result as or more extreme than the result we have in the study, given that the null hypothesis is true

• Not probability the null hypothesis is true

A Bit of Notation

• Not because we like notation– But we have to say a lot less

• Probability – P• Null hypothesis is true – H• Result (data) – D• Given - |

What’s a P Value

• P(D|H)– Probability of the data occurring if the

null hypothesis is true• Not• P(H|D)

– Probability that the null hypothesis is true, given that we have the data = p(H)

• P(H|D) ≠ P(D|H)

• What is probability you are prime minister– Given that you are british– P(M|B)– Very low

• What is probability you are British– Given you are prime minister– P(B|M)– Very high

• P(M|B) ≠ P(B|M)

• There’s been a murder– Someone bumped off a statto for talking

too much

• The police have DNA• The police have your DNA

– They match(!)

• DNA matches 1 in 1,000,000 people• What’s the probability you didn’t do

the murder, given the DNA match (H|D)

• Police say:– P(D|H) = 1/1,000,000

• Luckily, you have Jeremy on your defence team

• We say:– P(D|H) ≠ P(H|D)

• Probability that someone matches the DNA, who didn’t do the murder– Incredibly high

Back to the Questions

• Haller and Kraus (2002)– Asked those questions of groups in

Germany– Psychology Students– Psychology lecturers and professors

(who didn’t teach stats)– Psychology lecturers and professors

(who did teach stats)

1. You have absolutely disproved the null hypothesis (that is, there is no difference between the population means).

• True• 34% of students • 15% of professors/lecturers, • 10% of professors/lecturers teaching

statistics

• False• We have found evidence against

the null hypothesis

2. You have found the probability of the null hypothesis being true.

– 32% of students – 26% of professors/lecturers– 17% of professors/lecturers teaching

statistics

• False• We don’t know

3. You have absolutely proved your experimental hypothesis (that there is a difference between the population means).

– 20% of students – 13% of professors/lecturers– 10% of professors/lecturers teaching

statistics

• False

4. You can deduce the probability of the experimental hypothesis being true.

– 59% of students– 33% of professors/lecturers– 33% of professors/lecturers teaching

statistics

• False

5. You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.

• 68% of students• 67% of professors/lecturers• 73% of professors professors/lecturers

teaching statistics

• False• Can be worked out

– P(replication)

6. You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions.

– 41% of students– 49% of professors/lecturers– 37% of professors professors/lecturers

teaching statistics • False• Another tricky one

– It can be worked out

One Last Quiz

• I carry out a study– All assumptions perfectly satisfied– Random sample from population– I find p = 0.05

• You replicate the study exactly– What is probability you find p < 0.05?

• I carry out a study– All assumptions perfectly satisfied– Random sample from population– I find p = 0.01

• You replicate the study exactly– What is probability you find p < 0.05?

• Significance testing creates boundaries and gaps where none exist.

• Significance testing means that we find it hard to build upon knowledge– we don’t get an accumulation of

knowledge

• Yates (1951) "the emphasis given to formal tests of

significance ... has resulted in ... an undue concentration of effort by mathematical statisticians on investigations of tests of

significance applicable to problems which are of little or no practical importance ... and ... it has caused scientific research workers to pay undue attention to the

results of the tests of significance ... and too little to the estimates of the magnitude

of the effects they are investigating

Testing the Slope

• Same idea as with the mean– Estimate 95% CI of slope– Estimate significance of difference

from a value (usually 0)

• Need to know the sd of the slope– Similar to SD of the mean

)ˆ( 2

YYs xy

SSs res

5921. xys

• Similar to equation for SD of mean• Then we need standard error

- Similar (ish)• When we have standard error

– Can go on to 95% CI– Significance of difference

24.59.26

2.27)(se . xyb

)()(se

• Confidence Limits• 95% CI

– t dist with N - k - 1 df is 2.31– CI = 5.24 2.31 = 12.06

• 95% confidence limits

14.8 12.1 14.8 12.1

2.7 26.9

• Significance of difference from zero– i.e. probability of getting result if =0

• Not probability that = 0

81.22.57.14

• This probability is (of course) the same as the value for the prediction

Testing the Standardised Slope (Correlation)

• Correlation is bounded between –1 and +1– Does not have symmetrical distribution,

except around 0• Need to transform it

– Fisher z’ transformation – approximately normal

)]1ln()1[ln(5.0 rrz

• 95% CIs – 0.879 – 1.96 * 0.38 = 0.13– 0.879 + 1.96 * 0.38 = 1.62

)]706.01ln()706.01[ln(5.0

38.0310

• Transform back to correlation

• 95% CIs = 0.13 to 0.92• Very wide

– Small sample size– Maybe that’s why CIs are not

reported?

Using Excel

• Functions in excel– Fisher() – to carry out Fisher

transformation– Fisherinv() – to transform back to

correlation

The Others

• Same ideas for calculation of CIs and SEs for – Predicted score– Gives expected range of values given

• Same for intercept– But we have probably had enough

Lesson 5: Introducing Multiple Regression

Residuals• We said

Y = b0 + b1x1

• We could have saidYi = b0 + b1xi1 + ei

• We ignored the i on the Y• And we ignored the ei

– It’s called error, after all• But it isn’t just error

– Trying to tell us something

What Error Tells Us

• Error tells us that a case has a different score for Y than we predict– There is something about that case

• Called the residual– What is left over, after the model

• Contains information– Something is making the residual 0– But what?

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Number of Bedrooms

Actual Price

Predicted Price

swimming pool

Unpleasant neighbours

• The residual (+ the mean) is the value of Y

If all cases were equal on X• It is the value of Y, controlling for

• Other words:– Holding constant– Partialling– Residualising– Conditioned on

Pred61766190

12012076

12010561

Adj. Value1059062

11711973

129755695

Res-16

2-272830

-14-33

Beds £ (000s)1 772 741 883 625 905 1362 355 1344 1381 55

• Sometimes adjustment is enough on its own– Measure performance against criteria

• Teenage pregnancy rate– Measure pregnancy and abortion rate in areas– Control for socio-economic deprivation, and

anything else important– See which areas have lower teenage

pregnancy and abortion rate, given same level of deprivation

• Value added education tables– Measure school performance– Control for initial intake

Control?• In experimental research

– Use experimental control– e.g. same conditions, materials, time

of day, accurate measures, random assignment to conditions

• In non-experimental research– Can’t use experimental control– Use statistical control instead

Analysis of Residuals

• What predicts differences in crime rate – After controlling for socio-economic

deprivation– Number of police?– Crime prevention schemes?– Rural/Urban proportions?– Something else

• This is what regression is about

• Exam performance– Consider number of books a student

read (books)– Number of lectures (max 20) a

student attended (attend)

• Books and attend as IV, grade as DV

Books Attend0 91 150 102 164 104 201 114 203 150 15

First 10 cases

Grade45574551658844878959

• Use books as IV– R=0.492, F=12.1, df=1, 28, p=0.001

– b0=52.1, b1=5.7

– (Intercept makes sense)

• Use attend as IV– R=0.482, F=11.5, df=1, 38, p=0.002

– b0=37.0, b1=1.9

– (Intercept makes less sense)

543210-1

218Attend

211917151311975

Problem• Use R2 to give proportion of shared

variance– Books = 24%– Attend = 23%

• So we have explained 24% + 23% = 47% of the variance– NO!!!!!

• Correlation of books and attend is (unsurprisingly) not zero– Some of the variance that books shares

with grade, is also shared by attend

• Look at the correlation matrix

ATTENDGRADE

BOOKSATTEN

DGRADE

0.49 0.48

• I have access to 2 cars• My wife has access to 2 cars

– We have access to four cars?– No. We need to know how many of my

2 cars are the same cars as her 2 cars• Similarly with regression

– But we can do this with the residuals– Residuals are what is left after (say)

books– See of residual variance is explained

by attend– Can use this new residual variance to

calculate SSres, SStotal and SSreg

• Well. Almost.– This would give us correct values for

SS– Would not be correct for slopes, etc

• Assumes that the variables have a causal priority– Why should attend have to take what

is left from books?– Why should books have to take what

is left by attend?

• Use OLS again

• Simultaneously estimate 2 parameters– b1 and b2

– Y = b0 + b1x1 + b2x2

– x1 and x2 are IVs

• Not trying to fit a line any more– Trying to fit a plane

• Can solve iteratively– Closed form equations better– But they are unwieldy

3D scatterplot(2points only)

(Really) Ridiculous Equations

2211222

xxxxxxxx

xxxxxxyyxxxxyyb

1122112

xxxxxxxx

xxxxxxyyxxxxyyb

22110 xbxbyb

• The good news– There is an easier way

• The bad news– It involves matrix algebra

• The good news– We don’t really need to know how to

• The bad news – We need to know it exists

A Quick Guide to Matrix Algebra

(I will never make you do it again)

Very Quick Guide to Matrix Algebra

• Why?– Matrices make life much easier in

multivariate statistics– Some things simply cannot be done

without them– Some things are much easier with them

• If you can manipulate matrices– you can specify calculations v. easily– e.g. AA’ = sum of squares of a column

• Doesn’t matter how long the column

• A scalar is a numberA scalar: 4

• A vector is a row or column of numbers

A row vector:

A column vector:

• A vector is described as rows x columns

– Is a 1 4 vector

– Is a 2 1 vector– A number (scalar) is a 1 1 vector

• A matrix is a rectangle, described as rows x columns

• Is a 3 x 5 matrix

• Matrices are referred to with bold capitals

- A is a matrix

• Correlation matrices and covariance matrices are special – They are square and symmetrical– Correlation matrix of books, attend

and grade

00.148.049.0

48.000.144.0

49.044.000.1

• Another special matrix is the identity matrix I– A square matrix, with 1 in the

diagonal and 0 in the off-diagonal

– Note that this is a correlation matrix, with correlations all = 0

Matrix Operations

• Transposition– A matrix is transposed by putting it

on its side – Transpose of A is A’

• Matrix multiplication– A matrix can be multiplied by a scalar,

a vector or a matrix– Not commutative– AB BA– To multiply AB

• Number of rows in A must equal number of columns in B

• Matrix by vector

231917

• Matrix by matrix

35152810

156124

dhcfdgce

bhafcfae

• Multiplying by the identity matrix– Has no effect – Like multiplying by 1

• The inverse of J is: 1/J• J x 1/J = 1• Same with matrices

– Matrices have an inverse– Inverse of A is A-1

– AA-1=I

• Inverting matrices is dull– We will do it once– But first, we must calculate the

determinant

• The determinant of A is |A|• Determinants are important in

statistics– (more so than the other matrix

algebra)

• We will do a 2x2 – Much more difficult for larger

matrices

3.03.011

0.13.0

3.00.1

• Determinants are important because– Needs to be above zero for regression

to work– Zero or negative determinant of a

correlation/covariance matrix means something wrong with the data• Linear redundancy

• Described as:– Not positive definite– Singular (if determinant is zero)

• In different error messages

• Next, the adjoint

A adj11

•Now

• Find A-1

0.13.0

3.00.1

10.133.0

33.010.1

0.13.0

3.00.1

Matrix Algebra with Correlation Matrices

Determinants

• Determinant of a correlation matrix– The volume of ‘space’ taken up by

the (hyper) sphere that contains all of the points

1.0 0.0

0.0 1.0

1.0 0.0

0.0 1.0

1.0 1.0

Negative Determinant

• Points take up less than no space– Correlation matrix cannot exist – Non-positive definite matrix

Sometimes Obvious

0.12.1

2.10.1A

Sometimes Obvious (If You Think)

1 0.9 0.9

0.9 1 0.9

0.9 0.9 1

Sometimes No Idea

1.00 0.76 0.40

0.76 1 0.30

0.40 0.30 1

0.01A 1.00 0.75 0.40

0.75 1 0.30

0.40 0.30 1

0.0075A

Multiple R for Each Variable

• Diagonal of inverse of correlation matrix– Used to calculate multiple R

– Call elements aij

.123...

Regression Weights

• Where i is DV• j is IV

Back to the Good News• We can calculate the standardised

parameters asB=Rxx

-1 x Rxy

• Where – B is the vector of regression weights– Rxx

-1 is the inverse of the correlation matrix of the independent (x) variables

– Rxy is the vector of correlations of the correlations of the x and y variables

– Now do exercise 3.2

One More Thing

• The whole regression equation can be described with matrices– very simply

• Where– Y = vector of DV– X = matrix of IVs– B = vector of coefficients

• Go all the way back to our example

1 0 9 45

1 1 5 57

1 0 10 45

1 2 16 51

1 4 10 65

1 4 20 88

1 1 11 44

1 4 20 87

1 3 15 89

1 0 15 59

The constant – literally a constant. Could be any number, but it is most

convenient to make it 1. Used to ‘capture’ the

intercept.

bThe matrix of values for IVs (books and

attend)

bThe parameter

estimates. We are trying to find the

best values of these.

Error. We are trying to minimise this

The DV - grade

• Y=BX+E• Simple way of representing as many IVs

as you likeY = b0x0 + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + e

524232221202

514131211101

xxxxxx

exbxbxb

xxxxxx

...1100

524232221202

514131211101

Generalises to Multivariate Case

• Y=BX+E• Y, B and E

– Matrices, not vectors

• Goes beyond this course– (Do Jacques Tacq’s course for more)– (Or read his book)

Lesson 6: More on Multiple Regression

Parameter Estimates

• Parameter estimates (b1, b2 … bk) were standardised – Because we analysed a correlation

matrix

• Represent the correlation of each IV with the DV– When all other IVs are held constant

• Can also be unstandardised• Unstandardised represent the unit

change in the DV associated with a 1 unit change in the IV– When all the other variables are held

constant• Parameters have standard errors

associated with them– As with one IV– Hence t-test, and associated

probability can be calculated• Trickier than with one IV

Standard Error of Regression Coefficient

• Standardised is easier

– R2i is the value of R2 when all other

predictors are used as predictors of that variable

• Note that if R2i = 0, the equation is the same as

for previous

Multiple R

• The degree of prediction– R (or Multiple R) – No longer equal to b

• R2 Might be equal to the sum of squares of B– Only if all x’s are uncorrelated

In Terms of Variance• Can also think of this in terms of

variance explained.– Each IV explains some variance in the

DV– The IVs share some of their variance

• Can’t share the same variance twice

The total variance of Y

Variance in Y accounted for by

rx1y2 = 0.36

rx2y2 = 0.36

• In this model– R2 = ryx1

2 + ryx22

– R2 = 0.36 + 0.36 = 0.72– R = 0.72 = 0.85

• But– If x1 and x2 are correlated

– No longer the case

The total variance of Y

rx2y2 = 0.36

rx1y2 = 0.36

Variance shared between x1 and x2

(not equal to rx1x2)

• So– We can no longer sum the r2

– Need to sum them, and subtract the shared variance – i.e. the correlation

• But– It’s not the correlation between them– It’s the correlation between them as a

proportion of the variance of Y

• Two different ways

• If rx1x2 = 0

– rxy = bx1

– Equivalent to ryx12 + ryx2

21 212

yxyx rbrbR

• Based on estimates

• rx1x2 = 0

– Equivalent to ryx12 + ryx2

212121

xxyxyxyxyx

rrrrrR

• Based on correlations

• Can also be calculated using methods we have seen– Based on PRE– Based on correlation with prediction

• Same procedure with >2 IVs

Adjusted R2

• R2 is an overestimate of population value of R2

– Any x will not correlate 0 with Y– Any variation away from 0 increases R– Variation from 0 more pronounced

with lower N• Need to correct R2

– Adjusted R2

• 1 – R2

– Proportion of unexplained variance– We multiple this by an adjustment

• More variables – greater adjustment• More people – less adjustment

)1(1 Adj. 22

• Calculation of Adj. R2

Shrunken R2

• Some authors treat shrunken and adjusted R2 as the same thing– Others don’t

N1875.1

1320120

1810110

1310110

Extra Bits

• Some stranger things that can happen

– Counter-intuitive

• Can be hard to understand– Very counter-intuitive

• Definition– An independent variable which

increases the size of the parameters associated with other independent variables above the size of their correlations

Suppressor variables

• An example (based on Horst, 1941)– Success of trainee pilots

– Mechanical ability (x1), verbal ability (x2), success (y)

• Correlation matrixMech Verb Success

Mech 1 0.5 0.3Verb 0.5 1 0

Success 0.3 0 1

– Mechanical ability correlates 0.3 with success

– Verbal ability correlates 0.0 with success

– What will the parameter estimates be?

– (Don’t look ahead until you have had a guess)

• Mechanical ability– b = 0.4– Larger than r!

• Verbal ability – b = -0.2– Smaller than r!!

• So what is happening?– You need verbal ability to do the test– Not related to mechanical ability

• Measure of mechanical ability is contaminated by verbal ability

• High mech, low verbal– High mech

• This is positive

– Low verbal • Negative, because we are talking about

standardised scores• Your mech is really high – you did well on

the mechanical test, without being good at the words

• High mech, high verbal– Well, you had a head start on mech,

because of verbal, and need to be brought down a bit

Another suppressor?x1 x2 y

x1 1 0.5 0.3x2 0.5 1 0.2y 0.3 0.2 1

b1 = b2 =

Another suppressor?

x1 x2 yx1 1 0.5 0.3x2 0.5 1 0.2y 0.3 0.2 1

b1 =0.26 b2 = -0.06

And another?x1 x2 y

x1 1 0.5 0.3x2 0.5 1 -0.2y 0.3 -0.2 1

b1 = b2 =

And another?

x1 x2 yx1 1 0.5 0.3x2 0.5 1 -0.2y 0.3 -0.2 1

b1 = 0.53 b2 = -0.47

One more?x1 x2 y

x1 1 -0.5 0.3x2 -0.5 1 0.2y 0.3 0.2 1

b1 = b2 =

One more?

x1 x2 yx1 1 -0.5 0.3x2 -0.5 1 0.2y 0.3 0.2 1

b1 = 0.53 b2 = 0.47

• Suppression happens when two opposing forces are happening together– And have opposite effects

• Don’t throw away your IVs,– Just because they are uncorrelated with the

• Be careful in interpretation of regression estimates– Really need the correlations too, to interpret

what is going on– Cannot compare between studies with

different IVs

Standardised Estimates > 1

• Correlations are bounded -1.00 ≤ r ≤ +1.00

– We think of standardised regression estimates as being similarly bounded• But they are not

– Can go >1.00, <-1.00– R cannot, because that is a proportion

of variance

• Three measures of ability– Mechanical ability, verbal ability 1,

verbal ability 2– Score on science exam

Mech Verbal1 Verbal2 ScoresMech 1 0.1 0.1 0.6

Verbal1 0.1 1 0.9 0.6Verbal2 0.1 0.9 1 0.3Scores 0.6 0.6 0.3 1

–Before reading on, what are the parameter estimates?

• Mechanical– About where we expect

• Verbal 1– Very high

• Verbal 2– Very low

Mech 0.56Verbal1 1.71Verbal2 -1.29

• What is going on– It’s a suppressor again– An independent variable which

increases the size of the parameters associated with other independent variables above the size of their correlations

• Verbal 1 and verbal 2 are correlated so highly– They need to cancel each other out

Variable Selection

• What are the appropriate independent variables to use in a model?– Depends what you are trying to do

• Multiple regression has two separate uses– Prediction– Explanation

• Prediction – What will happen

in the future?– Emphasis on

practical application

– Variables selected (more) empirically

– Value free

• Explanation– Why did

something happen?

– Emphasis on understanding phenomena

– Variables selected theoretically

– Not value free

• Visiting the doctor– Precedes suicide attempts– Predicts suicide

• Does not explain suicide

• More on causality later on …• Which are appropriate variables

– To collect data on?– To include in analysis?– Decision needs to be based on theoretical

knowledge of the behaviour of those variables– Statistical analysis of those variables (later)

• Unless you didn’t collect the data

– Common sense (not a useful thing to say)

Variable Entry Techniques

• Entry-wise– All variables entered simultaneously

• Hierarchical– Variables entered in a predetermined

order• Stepwise

– Variables entered according to change in R2

– Actually a family of techniques

• Entrywise– All variables entered simultaneously– All treated equally

• Hierarchical– Entered in a theoretically determined

order– Change in R2 is assessed, and tested for

significance– e.g. sex and age

• Should not be treated equally with other variables

• Sex and age MUST be first

– Confused with hierarchical linear modelling

• Stepwise– Variables entered empirically– Variable which increases R2 the most

goes first• Then the next …

– Variables which have no effect can be removed from the equation

• Example– IVs: Sex, age, extroversion, – DV: Car – how long someone spends

looking after their car

SEX AGE EXTRO CARSEX 1.00 -0.05 0.40 0.66AGE -0.05 1.00 0.40 0.23EXTRO 0.40 0.40 1.00 0.67CAR 0.66 0.23 0.67 1.00

• Correlation Matrix

• Entrywise analysis– r2 = 0.64

b pSEX 0.49 <0.01AGE 0.08 0.46EXTRO 0.44 <0.01

• Stepwise Analysis– Data determines the order– Model 1: Extroversion, R2 = 0.450– Model 2: Extroversion + Sex, R2 =

b pEXTRO 0.48 <0.01

SEX 0.47 <0.01

• Hierarchical analysis– Theory determines the order– Model 1: Sex + Age, R2 = 0.510– Model 2: S, A + E, R2 = 0.638– Change in R2 = 0.128, p = 0.001

SEX 0.49 <0.01AGE 0.08 0.46

EXTRO 0.44 <0.012

• Which is the best model?– Entrywise – OK– Stepwise – excluded age

• Did have a (small) effect

– Hierarchical• The change in R2 gives the best estimate

of the importance of extroversion

• Other problems with stepwise– F and df are wrong (cheats with df)– Unstable results

• Small changes (sampling variance) – large differences in models

– Uses a lot of paper– Don’t use a stepwise procedure to

pack your suitcase

Is Stepwise Always Evil?• Yes• All right, no• Research goal is predictive

(technological)– Not explanatory (scientific)– What happens, not why

• N is large – 40 people per predictor, Cohen, Cohen,

Aiken, West (2003)• Cross validation takes place

A quick note on R2

R2 is sometimes regarded as the ‘fit’ of a regression model– Bad idea

• If good fit is required – maximise R2

– Leads to entering variables which do not make theoretical sense

Critique of Multiple Regression

• Goertzel (2002)– “Myths of murder and multiple

regression”– Skeptical Inquirer (Paper B1)

• Econometrics and regression are ‘junk science’– Multiple regression models (in US)– Used to guide social policy

More Guns, Less Crime

– (controlling for other factors)• Lott and Mustard: A 1% increase in

gun ownership– 3.3% decrease in murder rates

• But: – More guns in rural Southern US– More crime in urban North (crack

cocaine epidemic at time of data)

Executions Cut Crime

• No difference between crimes in states in US with or without death penalty

• Ehrlich (1975) controlled all variables that effect crime rates– Death penalty had effect in reducing

crime rate

• No statistical way to decide who’s right

Legalised Abortion

• Donohue and Levitt (1999)– Legalised abortion in 1970’s cut crime in

1990’s

• Lott and Whitley (2001)– “Legalising abortion decreased murder

rates by … 0.5 to 7 per cent.”

• It’s impossible to model these data– Controlling for other historical events– Crack cocaine (again)

Another Critique

• Berk (2003)– Regression analysis: a constructive critique

(Sage)

• Three cheers for regression– As a descriptive technique

• Two cheers for regression– As an inferential technique

• One cheer for regression– As a causal analysis

Is Regression Useless?

• Do regression carefully– Don’t go beyond data which you have

a strong theoretical understanding of

• Validate models– Where possible, validate predictive

power of models in other areas, times, groups• Particularly important with stepwise

Lesson 7: Categorical Independent Variables

Introduction

• So far, just looked at continuous independent variables

• Also possible to use categorical (nominal, qualitative) independent variables– e.g. Sex; Job; Religion; Region; Type

(of anything)• Usually analysed with

t-test/ANOVA

Historical Note• But these (t-test/ANOVA) are

special cases of regression analysis– Aspects of General Linear Models

(GLMs)• So why treat them differently?

– Fisher’s fault– Computers’ fault

• Regression, as we have seen, is computationally difficult– Matrix inversion and multiplication– Unfeasible, without a computer

• In the special cases where:• You have one categorical IV• Your IVs are uncorrelated

– It is much easier to do it by partitioning of sums of squares

• These cases– Very rare in ‘applied’ research– Very common in ‘experimental’

research• Fisher worked at Rothamsted agricultural

research station• Never have problems manipulating

wheat, pigs, cabbages, etc

• In psychology– Led to a split between ‘experimental’

psychologists and ‘correlational’ psychologists

– Experimental psychologists (until recently) would not think in terms of continuous variables

• Still (too) common to dichotomise a variable– Too difficult to analyse it properly– Equivalent to discarding 1/3 of your

The Approach

• Recode the nominal variable – Into one, or more, variables to represent

that variable

• Names are slightly confusing– Some texts talk of ‘dummy coding’ to refer

to all of these techniques– Some (most) refer to ‘dummy coding’ to

refer to one of them– Most have more than one name

• If a variable has g possible categories it is represented by g-1 variables

• Simplest case: – Smokes: Yes or No– Variable 1 represents ‘Yes’– Variable 2 is redundant

• If it isn’t yes, it’s no

The Techniques

• We will examine two coding schemes– Dummy coding

• For two groups• For >2 groups

– Effect coding• For >2 groups

• Look at analysis of change– Equivalent to ANCOVA– Pretest-posttest designs

Dummy Coding – 2 Groups• Also called simple coding by SPSS• A categorical variable with two groups• One group chosen as a reference group

– The other group is represented in a variable

• e.g. 2 groups: Experimental (Group 1) and Control (Group 0)– Control is the reference group– Dummy variable represents experimental

group• Call this variable ‘group1’

• For variable ‘group1’ – 1 = ‘Yes’, 2=‘No’

Original Category

New Variable

Exp 1Con 0

• Some data• Group is x, score is y

Control Group

Experimental Group

Experiment 1 10 10Experiment 2 10 20Experiment 3 10 30

• Control Group = 0– Intercept = Score on Y when x = 0– Intercept = mean of control group

• Experimental Group = 1– b = change in Y when x increases 1

unit– b = difference between experimental

group and control group

Control Group Experimental Group

Experiment 1 Experiment 2 Experiment 3

Gradient of slope represents

difference between means

Dummy Coding – 3+ Groups

• With three groups the approach is the similar

• g = 3, therefore g-1 = 2 variables needed

• 3 Groups– Control – Experimental Group 1– Experimental Group 2

• Recoded into two variables– Note – do not need a 3rd variable

• If we are not in group 1 or group 2 MUST be in control group

• 3rd variable would add no information• (What would happen to determinant?)

Original Category

Gp1 Gp2

Con 0 0Gp1 1 0Gp2 0 1

• F and associated p– Tests H0 that

• b1 and b2 and associated p-values– Test difference between each

experimental group and the control group

• To test difference between experimental groups– Need to rerun analysis

1 2 3g g g

• One more complication– Have now run multiple comparisons– Increases – i.e. probability of type I

• Need to correct for this– Bonferroni correction– Multiply given p-values by two/three

(depending how many comparisons were made)

Effect Coding

• Usually used for 3+ groups• Compares each group (except the

reference group) to the mean of all groups– Dummy coding compares each group to the

reference group.

• Example with 5 groups– 1 group selected as reference group

• Group 5

• Each group (except reference) has a variable– 1 if the individual is in that group– 0 if not– -1 if in reference group

group group_1 group_2 group_3 group_4

1 1 0 0 02 0 1 0 03 0 0 1 04 0 0 0 15 -1 -1 -1 -1

Examples

• Dummy coding and Effect Coding• Group 1 chosen as reference group

each time• Data Group Mean SD

1 52.40 4.602 56.30 5.703 60.10 5.00

Total 56.27 5.88

• Dummy

Group dummy2

dummy3

Group Effect2 effect3

1 -1 -1

• Effect

DummyR=0.543, F=5.7,

df=2, 27, p=0.009b0 = 52.4,

b1 = 3.9, p=0.100

b2 = 7.7, p=0.002

EffectR=0.543, F=5.7,

df=2, 27, p=0.009b0 = 56.27,

b1 = 0.03, p=0.980

b2 = 3.8, p=0.007

In SPSS• SPSS provides two equivalent

procedures for regression– Regression (which we have been using)– GLM (which we haven’t)

• GLM will:– Automatically code categorical variables– Automatically calculate interaction terms

• GLM won’t:– Give standardised effects– Give hierarchical R2 p-values– Allow you to not understand

ANCOVA and Regression

• Test– (Which is a trick; but it’s designed to

make you think about it)

• Use employee data.sav– Compare the pay rise (difference

between salbegin and salary)– For ethnic minority and non-minority

staff• What do you find?

ANCOVA and Regression

• Dummy coding approach has one special use– In ANCOVA, for the analysis of change

• Pre-test post-test experimental design– Control group and (one or more)

experimental groups– Tempting to use difference score + t-test /

mixed design ANOVA– Inappropriate

• Salivary cortisol levels– Used as a measure of stress– Not absolute level, but change in level

over day may be interesting

• Test at: 9.00am, 9.00pm• Two groups

– High stress group (cancer biopsy) • Group 1

– Low stress group (no biopsy)• Group 0

• Correlation of AM and PM = 0.493 (p=0.008)

• Has there been a significant difference in the rate of change of salivary cortisol?– 3 different approaches

AM PM DiffHigh Stress 20.1 6.8 13.3Low Stress 22.3 11.8 10.5

• Approach 1 – find the differences, do a t-test– t = 1.31, df=26, p=0.203

• Approach 2 – mixed ANOVA, look for interaction effect– F = 1.71, df = 1, 26, p = 0.203– F = t2

• Approach 3 – regression (ANCOVA) based approach

– IVs: AM and group– DV: PM

– b1 (group) = 3.59, standardised b1=0.432, p = 0.01

• Why is the regression approach better?– The other two approaches took the

difference– Assumes that r = 1.00– Any difference from r = 1.00 and you add

error variance• Subtracting error is the same as adding error

• Using regression– Ensures that all the variance that is

subtracted is true– Reduces the error variance

• Two effects– Adjusts the means

• Compensates for differences between groups

– Removes error variance

In SPSS• SPSS automates all of this

– But you have to understand it, to know what it is doing

• Use Analyse, GLM, Univariate ANOVA

Outcome here

Categorical predictors here

Continuous predictors here

Click options

Select parameter estimaters

1 theory of regression. 2 the course 16 (or so) lessons –some flexibility depends how we feel what...

model slide

data slide

multiple regression

theory of regression

silly slide

right slide

model population

population model

Documents

flexibility and athletic performance. learning log what is...

it depends

“the future depends on what we do in the present !”

1 applying regression. 2 the course 14 (or so) lessons...

roxar sandlog wireless - emerson · depends on flow...

annalisa ferrando financial flexibility and investment...

the lifetime of a resource depends on… 1.how much we have...

flexibility/stretching by: kevin williams. flexibility...

introduction isolation vs insulation - psg home | pipe ......

open cnc improves flexibility and integrates intellectual...

abstract we need to study protein flexibility for a better...

happy friday! homework: we will write down at end (depends...

live your best retirement possible - glacier · • income...

flexibility stigma and worker’s use of flexible work...

rate-sensitive debt and financial flexibility€¦ ·...

muscular endurance co-ordination body composition ......

pact(r) systems - when clean water matters7...water...

product differentiation and process flexibility as base for...

how 20 leading companies are making flexibility work...how...

the performing - european commission · ensure quality,...