a note on regression analysis

Upload: alok-sinha

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 A Note on Regression Analysis

    1/3

    A NOTE ON REGRESSION ANALYSIS

    This is a step-by-step approach on how to do regression analysis in SPSS. This is to help you for

    your projects point of view and it is not supposed to explain the fundamentals of regression

    analysis.

    A) How to do Multivariate Regression Analysis with only quantitative variables?

    A multivariate regression analysis is one where we have more than one independent variables and

    one dependent variable. The general form of multivariate regression can be written as

    Y= 0+1X1+2X2+nXn+

    Where X1, X2..Xn are the independent variables, Y is the dependent variable and is the error

    term.

    Step 1: Whenever you get a data set to do regression analysis, the first step is to run a scatter plot

    with each of the independent variables and the dependent variable. In other words, take dependent

    variable in the Y-axis and X1, X2, X3 ..Xn (one at a time) in the X-axis. The scatter plot will tell

    you the type or relationship that you are likely to expect. If the relationship is linear, then you more

    likely, but not absolutely certain, to get significant values. However, if the relationship is not

    linear, then you might think of inserting higher order values of independent variables (say for

    example, X2, X3, ln (x) etc).

    Step 2: The next step is to run the descriptives statistics which is available on SPSS. Find out the

    mean, standard deviation, sample size and variance. Look at standard deviation and see which

    independent variable has the highest standard deviation. It might also help you in outlier selection

    later. Look at the sample see and see if any missing data is present or not. In case of missing data,

    either you have to remove that observation totally or do a missing value analysis (such as replacing

    it by the mean value).

    Step 3: Run the correlation matrix along with the significance value among the independent

    variables with the help of SPSS. If the correlations are not significant, you are lucky!!!! But the

    real life data will have significant correlations. If the correlations are significant, you have a

    problem of multi-collinearity. As a business manager, you need to decide what level of

    correlations you are comfortable with. As a thumb rule, any correlations below 0.40 are okay.

    However, you are the best judge. Another way of deciding on the correlations is to look at

    Variance Inflation factor (VIF), which I will deal in the subsequent steps.

    Step 4: Run the regression analysis with the help of SPSS. The fist statistics you are to look into is

    the F-statistic. If the F-statistics is not significant (i.e. the value of the significance term should be

    less than 0.05), then your data set cannot be used for regression analysis. In such as case, you have

  • 8/2/2019 A Note on Regression Analysis

    2/3

    either to modify the model or look for errors in your raw data. If the F-statistic is significant, then

    look at adjusted R2. This value tells you the amount of variation in your dependent variable that is

    explained by your independents variables. Normally, we expect a high value of adjusted R2. The

    high value also depends on the type of data. If your data is cross-sectional, i.e. time is not one of

    the independent variable, an adjusted R2 or0.30 is acceptable. However, if your data is time series,

    i.e. time is one of the independent variable or one of your independent variables is related to time,

    adjusted R2 should be more than 0.70.We also look at the standard error of the estimate which

    should be low.

    Set 5: We now look into the table which estimates the co-efficient. The first thing that we look is

    the significance of the t-statistics. The value should be lower or equal to than 0.05 in order for

    co-efficient to be significant and to be included in the model. The value of the estimated co-

    efficient is given in column titled Unstandarized co-efficient.

    Step 6: Remember we also looked into correlation co-efficient and their significance of the

    independent variables. In order to check whether the levels of collinearity is acceptable or not, werun the VIF statistic. The VIF statistics is available under the Analyze, Regression, and Linear tab

    in SPSS. Once you have opened the regression command box, go to Statistics tab. A window will

    open which will tell you regression co-efficient etc. Click on Collinearity diagnostics and click on

    continue. The output sheet will show you VIF in the table where co-efficients are estimated. A

    VIF of less than 10 is considered to be okay with no problem of multi-collinearity.

    Step 7: In case your F-statistic is not significant, you can do some quick fix solutions. Look into

    the scatter plot that you have drawn initially and check for outliers. A good starting point is seeing

    which independent variable has the highest standard deviation. Remove the outlier if your sample

    size is high and you do not mind losing a few observations. Rerun regression analysis and see ifyour F-statistics is significant. If you have low sample size, look at your scatter plot and see the

    type of relationship. If the relationship is not linear, transform the independent variable and rerun.

    See if you get significant results.

    EVEN AFTER ALL THESE YOU STILL MIGHT NOT GET SIGNIFICANT RESULTS,

    WHICH MEANS YOUR INDEPENDENT VARIABLES DOES NOT EXPLAIN THE

    VARIATION IN THE DEPENDENT VARIABLE.

    B) How to do regression analysis with qualitative variables?

    We can also include categorical variables (qualitative variables) in our regression analysis.

    Examples of categorical variables could be the sex of an individual (male/female), geographical

    locations of a country (east, west, north, south) etc. In order to include such variables in our model,

    we introduce a new type of variable called dummy variables.

    Dummy variables are most commonly represented by 0 & 1, although there is no mathematical

    logic for it. It can be represented by any number. The purpose of dummy variables is to inform

  • 8/2/2019 A Note on Regression Analysis

    3/3

    SPSS those variables that are coded as 0 represent one category of variable and are different from

    those that are represented as 1. For example, the sex of an individual can be either male or female.

    So it we represent males with 0, them males are represented as 1. Similarly if there are four

    variables say, east, west, south and north, we can represent them using dummy variables in the

    following way

    Reference Dummy

    It is evident from the table that East is represented as 100 while South is represented as 000. The

    dummy that takes on the value of 0 for all its categories is called a reference dummy. So for n

    categories, we would have (n-1) dummy variables.

    Run regression analysis the same way as we have discussed with qualitative variables. However,

    the co-efficient for a dummy variable regression signify something different from that of

    quantitative variables. The mean value of the category represented by reference dummy is

    represented by 0.

    x0=0

    Similarly, the mean value of the category which is represented by dummy variable X1 variable is

    given by 0+1.

    x1 = 0+1

    1= x1- 0

    1= x1-x0

    From the regression co-efficient table, if we find that 1is significant and positive, we can deduce

    that mean value of the category represented by dummy variable X1 is higher than that represented

    by X0. Similarly, we can find the mean value of other categories and compare them.

    Dummy variables X1 X2 X3

    East 1 0 0

    West 0 1 0

    North 0 0 1

    South 0 0 0