linear regression 1 oct 2010 cpsy501 dr. sean ho trinity western university please download from...

26
Linear Regression Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

Upload: rudolph-mitchell

Post on 31-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

Linear RegressionLinear Regression

1 Oct 2010CPSY501Dr. Sean HoTrinity Western University

Please download from“Example Datasets”:

Record2.savExamAnxiety.sav

Page 2: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 2

Outline for todayOutline for today

Correlation and Partial Correlation

Linear Regression (Ordinary Least Squares)Using Regression in Data AnalysisRequirements: VariablesRequirements: Sample Size

Assignments & Projects

Page 3: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 3

Causation from correlation?Causation from correlation?

Ancient proverb: “correlation ≠ causation”

But: sometimes correlation can suggest causality, in the right situations!

We need >2 variables, and it'sNot the “ultimate” (only, direct) cause, butDegree of influence one var has on another

(“causal relationship”, “causality”) Is a significant fraction of the variability in one

variable explained by the other variable(s)?

Page 4: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 4

How to get causality?How to get causality?

Use theory and prior empirical evidencee.g., temporal sequencing: can't be T2 → T1Order of causation? (gender & career aspir.)

Best is a controlled experiment:Randomly assign participants to groupsChange the level of the IV for each groupObserve impact on DV

But this is not always feasible/ethical!

Still need to “rule out” other potential factors

Page 5: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 5

Potential hidden factorsPotential hidden factors

What other vars might explain these correlatns?Height and hair length

e.g., both are related to genderIncreased time in bed and suicidal thoughts

e.g., depression can cause both# of pastors and # of prostitutes (!)

e.g., city size To find causality, must rule out hidden factors

e.g., cigarettes and lung cancer

Page 6: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 6

Example: immigration studyExample: immigration study

Level ofAcculturation

PsychologicalWell-being

Immigration Follow-up1 Year

LinguisticAbility

Psych.Well-being

LinguisticAbility

Page 7: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 7

Partial CorrelationPartial Correlation

Purpose: to measure the unique relationship between two variables – after the effects of other variables are “controlled for”.

Two variables may have a large correlation, butIt may be largely due to the effects of other

moderating variablesSo we must factor out their influence, andSee what correlation remains (partial)

SPSS algorithm assumes parametricityThere exist non-parametric methods, though

Page 8: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 8

Visualizing Partial CorrelationVisualizing Partial Correlation

Moderating Variable(e.g., city size)

Variable 2(e.g., prostitutes)

Variable 1(e.g., pastors)

PartialCorrelation

DirectionDirectionof Causation?of Causation?

??

Page 9: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 9

Practise: Partial CorrelationPractise: Partial Correlation

Dataset: Record2.sav

Analyze → Correlate → Partial,Variables: adverts, salesControlling for: airplay, attract

Or: Analyze → Regression → Linear:Dependent: sales; Independent: all other varStatistics → “Part and partial correlations”

Uncheck everything elseAll partial r, plus regular (0-order) r

Page 10: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 10

Linear RegressionLinear Regression

When we use correlation to try to inferdirection of causation, we call it regression:

Combining the influence of a number of variables (predictors) to determine their total effect on another variable (outcome).

Page 11: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 11

Simple Regression: 1 predictorSimple Regression: 1 predictor

Regression

Simple regression is predicting scores on an outcome variable from a single predictor variable

Mathematically equiv. to bivariate correlation

Page 12: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 12

Outcome

InterceptGradient

Predictor Error

OLS Regression ModelOLS Regression Model

In Ordinary Least-Squares (OLS) regression,the “best” model is defined as the line that minimizes the error between model and data (sum of squared differences).

Conceptual description of regression line(General Linear Model):

Y = b0 + b1X1 + (B2X2 … + BnXn) + ε

Page 13: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 13

Assessing Fit of a ModelAssessing Fit of a Model

R2: proportion of variance in outcome accounted for by predictors: SSmodel / SStot

Generalization of r2 from correlation F ratio: variance attributable to the model

divided by variance unexplained by modelF ratio converts to a p-value,

which shows whether the “fit” is good If fit is poor, predictors may not be linearly

related to the outcome variable

Page 14: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 14

Example: Record SalesExample: Record Sales

Dataset: Record2.sav Outcome variable: Record sales Predictor: Advertising Budget

Analyze → Regression → LinearDependent: sales; Independent: advertsStatistics → Estimates, Model fit,

Part and partial correlations

Page 15: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 15

Record2: Interpreting ResultsRecord2: Interpreting Results

Model Summary: R Square = .335adverts explains 33.5% of variability in sales

ANOVA: F = 99.587 and Sig. = .000There is a significant effect

Report: R2 = .335, F(1, 198) = 99.587, p < .001 Coefficients: (Constant) B = 134.140 (Y-intcpt)

Unstandardized advert B = .096 (slope)Standardized advert Beta = .578

(uses variables converted to z-scores)Linear model: Ŷ = .578 * ABz + 134

Page 16: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 16

Page 17: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 17

General Linear ModelGeneral Linear Model

Actually, nearly all parametric methods can be expressed as generalizations of regression!

Multiple Regression: several scale predictors

Logistic Regression: categorical outcome

ANOVA: categorical IVt-test: dichotomous IVANCOVA: mix of categorical + scale IVsFactorial ANOVA: several categorical IVs

Log-linear: categorical IVs and DV

We'll talk about each of these in detail later

Page 18: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 18

Regression Modelling ProcessRegression Modelling Process

1.Develop RQ: IVs/DVs, metricsCalc required sample size & collect data

2.Clean: data entry errors, missing data, outliers

3.Explore: assess requirements, xform if needed

4.Build model: try several models, see what fits

5.Test model: “diagnostic” issues:Multivariate outliers, overly influential cases

6.Test model: “generalizability” issues:Regression assumptions: rebuild model

Page 19: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 19

Selecting VariablesSelecting Variables

According to your model or theory, what variables might relate to your outcomes?

Does the literature suggest important vars? Do the variables meet all the requirements for

an OLS multiple regression?(more on this next week)

Record sales example:DV: what is a possible outcome variable?IV: what are possible predictors, and why?

Page 20: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 20

Choosing Good PredictorsChoosing Good Predictors

It's tempting to just throw in hundreds of predictors and see which ones contribute most

Don't do this! There are requirements on how the predictors interact with each other!

Also, more predictors → less power Have a theoretical/prior justification for them

Example: what's a DV you are interested in?Come up with as many possible good IVs as

you can – have a justification!Background, internal personal,

current external environment

Page 21: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 21

Using Derived VariablesUsing Derived Variables

You may want to use derived variables in regression, for example:

Transformed variables (to satisfy assumptions)

Interaction terms: (“moderating” variables)e.g., Airplay * Advertising Budget

Dummy variables:e.g., coding for categorical predictors

Curvilinear variables (non-linear regression)e.g., looking at X2 instead of X

Page 22: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 22

Required Sample SizeRequired Sample Size

Depends on effect size and # of predictorsUse G*Power to find exact sample sizeRough estimates on pp. 172-174 of Field

Consequences of insufficient sample size: Regression model may be overly influenced

by individual participants (not generalizable)Can't detect “real” effects of moderate size

Solutions: Collect more data from more participants! Reduce number of predictors in the model

Page 23: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 23

Requirements on DVRequirements on DV

Must be interval/continuous:If not: mathematics simply will not workSolutions:

Categorical DV: use Logistic RegressionOrdinal DV: use Ordinal Regression,

or possibly convert into categorical form Independence of scores (research design)

If not: invalid conclusions Solutions: redesign data set, or

Multi-level modelling instead of regression

Page 24: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 24

Requirements on DV, cont.Requirements on DV, cont.

Normal (use normality tests):If not: significance tests may be misleadingSolutions: Check for outliers, transform data,

use caution in interpreting significance Unbounded distribution (obtained range of

responses versus possible range of responses) If not: artificially deflated R2 Solutions:

Get more data in missing rangeUse more sensitive instrument

Str DisDisagree

NoneAgree

Str Agr

0

1

2

3

4

5

6

7

8

9

10

Page 25: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 25

Requirements on IVRequirements on IV

Scale-levelCan be categorical, too (see next page)If ordinal, either threshold into categorical or

treat as if scale (only if have sufficient number of ranks)

Full-range of valuesIf an IV only covers 1-3 on a scale of 1-10,

then the regression model will predict poorly for values beyond 3

More requirements on the behaviour of IVs with each other and the DV (covered next week)

Page 26: Linear Regression 1 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav ExamAnxiety.sav

1 Oct 2010CPSY501: multiple regression 26

Categorical PredictorsCategorical Predictors

Regression can work for categorical predictors:

If dichotomous: code as 0 and 1e.g., 1 dichotomous predictor, 1 scale DV:Regression is equivalent to t-test!

And the slope B1 is the difference of means

If n categories: use “dummy” variablesChoose a base categoryCreate n-1 dichotomous variablese.g., {BC, AB, SK}: dummys are isAB, isSK