a note on regression analysis

8/2/2019 A Note on Regression Analysis

1/3

A NOTE ON REGRESSION ANALYSIS

This is a step-by-step approach on how to do regression analysis in SPSS. This is to help you for

your projects point of view and it is not supposed to explain the fundamentals of regression

analysis.

A) How to do Multivariate Regression Analysis with only quantitative variables?

A multivariate regression analysis is one where we have more than one independent variables and

one dependent variable. The general form of multivariate regression can be written as

Y= 0+1X1+2X2+nXn+

Where X1, X2..Xn are the independent variables, Y is the dependent variable and is the error

term.

Step 1: Whenever you get a data set to do regression analysis, the first step is to run a scatter plot

with each of the independent variables and the dependent variable. In other words, take dependent

variable in the Y-axis and X1, X2, X3 ..Xn (one at a time) in the X-axis. The scatter plot will tell

you the type or relationship that you are likely to expect. If the relationship is linear, then you more

likely, but not absolutely certain, to get significant values. However, if the relationship is not

linear, then you might think of inserting higher order values of independent variables (say for

example, X2, X3, ln (x) etc).

Step 2: The next step is to run the descriptives statistics which is available on SPSS. Find out the

mean, standard deviation, sample size and variance. Look at standard deviation and see which

independent variable has the highest standard deviation. It might also help you in outlier selection

later. Look at the sample see and see if any missing data is present or not. In case of missing data,

either you have to remove that observation totally or do a missing value analysis (such as replacing

it by the mean value).

Step 3: Run the correlation matrix along with the significance value among the independent

variables with the help of SPSS. If the correlations are not significant, you are lucky!!!! But the

real life data will have significant correlations. If the correlations are significant, you have a

problem of multi-collinearity. As a business manager, you need to decide what level of

correlations you are comfortable with. As a thumb rule, any correlations below 0.40 are okay.

However, you are the best judge. Another way of deciding on the correlations is to look at

Variance Inflation factor (VIF), which I will deal in the subsequent steps.

Step 4: Run the regression analysis with the help of SPSS. The fist statistics you are to look into is

the F-statistic. If the F-statistics is not significant (i.e. the value of the significance term should be

less than 0.05), then your data set cannot be used for regression analysis. In such as case, you have


2/3

either to modify the model or look for errors in your raw data. If the F-statistic is significant, then

look at adjusted R2. This value tells you the amount of variation in your dependent variable that is

explained by your independents variables. Normally, we expect a high value of adjusted R2. The

high value also depends on the type of data. If your data is cross-sectional, i.e. time is not one of

the independent variable, an adjusted R2 or0.30 is acceptable. However, if your data is time series,

i.e. time is one of the independent variable or one of your independent variables is related to time,

adjusted R2 should be more than 0.70.We also look at the standard error of the estimate which

should be low.

Set 5: We now look into the table which estimates the co-efficient. The first thing that we look is

the significance of the t-statistics. The value should be lower or equal to than 0.05 in order for

co-efficient to be significant and to be included in the model. The value of the estimated co-

efficient is given in column titled Unstandarized co-efficient.

Step 6: Remember we also looked into correlation co-efficient and their significance of the

independent variables. In order to check whether the levels of collinearity is acceptable or not, werun the VIF statistic. The VIF statistics is available under the Analyze, Regression, and Linear tab

in SPSS. Once you have opened the regression command box, go to Statistics tab. A window will

open which will tell you regression co-efficient etc. Click on Collinearity diagnostics and click on

continue. The output sheet will show you VIF in the table where co-efficients are estimated. A

VIF of less than 10 is considered to be okay with no problem of multi-collinearity.

Step 7: In case your F-statistic is not significant, you can do some quick fix solutions. Look into

the scatter plot that you have drawn initially and check for outliers. A good starting point is seeing

which independent variable has the highest standard deviation. Remove the outlier if your sample

size is high and you do not mind losing a few observations. Rerun regression analysis and see ifyour F-statistics is significant. If you have low sample size, look at your scatter plot and see the

type of relationship. If the relationship is not linear, transform the independent variable and rerun.

See if you get significant results.

EVEN AFTER ALL THESE YOU STILL MIGHT NOT GET SIGNIFICANT RESULTS,

WHICH MEANS YOUR INDEPENDENT VARIABLES DOES NOT EXPLAIN THE

VARIATION IN THE DEPENDENT VARIABLE.

B) How to do regression analysis with qualitative variables?

We can also include categorical variables (qualitative variables) in our regression analysis.

Examples of categorical variables could be the sex of an individual (male/female), geographical

locations of a country (east, west, north, south) etc. In order to include such variables in our model,

we introduce a new type of variable called dummy variables.

Dummy variables are most commonly represented by 0 & 1, although there is no mathematical

logic for it. It can be represented by any number. The purpose of dummy variables is to inform


3/3

SPSS those variables that are coded as 0 represent one category of variable and are different from

those that are represented as 1. For example, the sex of an individual can be either male or female.

So it we represent males with 0, them males are represented as 1. Similarly if there are four

variables say, east, west, south and north, we can represent them using dummy variables in the

following way

Reference Dummy

It is evident from the table that East is represented as 100 while South is represented as 000. The

dummy that takes on the value of 0 for all its categories is called a reference dummy. So for n

categories, we would have (n-1) dummy variables.

Run regression analysis the same way as we have discussed with qualitative variables. However,

the co-efficient for a dummy variable regression signify something different from that of

quantitative variables. The mean value of the category represented by reference dummy is

represented by 0.

x0=0

Similarly, the mean value of the category which is represented by dummy variable X1 variable is

given by 0+1.

x1 = 0+1

1= x1- 0

1= x1-x0

From the regression co-efficient table, if we find that 1is significant and positive, we can deduce

that mean value of the category represented by dummy variable X1 is higher than that represented

by X0. Similarly, we can find the mean value of other categories and compare them.

Dummy variables X1 X2 X3

East 1 0 0

West 0 1 0

North 0 0 1

South 0 0 0

a note on regression analysis

Documents