assignment no 1

5
Marketing Analytics Assignment No 1 Narender Gupta Roll No. 002(Sec.A) Q1. The file logitsubscibedata.xls gives the number of people in each age group who subscribe and do not subscribe to a magazine. How does age influence the chance of subscribing the magazine? Data Summary: Age Group No Sub Sub Gender Mean Age Total 20-24 52 31 0 22 83 25-29 61 30 0 27 91 30-34 57 18 0 32 75 35-39 73 14 0 37 87 40-44 56 17 0 42 73 45-49 84 8 0 47 92 50-54 57 8 0 52 65 55-60 87 9 0 57 96 20-24 44 46 1 22 90 25-29 53 37 1 27 90 30-34 57 30 1 32 87 35-39 54 12 1 37 66 40-44 56 12 1 42 68 45-49 83 19 1 47 102 50-54 77 17 1 52 94 55-60 74 12 1 57 86 Solution:

Upload: ankitghildiyal

Post on 12-Dec-2015

218 views

Category:

Documents


3 download

DESCRIPTION

ltd.

TRANSCRIPT

Page 1: Assignment No 1

Marketing Analytics

Assignment No 1

Narender Gupta

Roll No. 002(Sec.A)

Q1. The file logitsubscibedata.xls gives the number of people in each age group who subscribe and do not subscribe to a magazine. How does age influence the chance of subscribing the magazine?

Data Summary:

Age Group

No Sub Sub GenderMean Age

Total

20-24 52 31 0 22 8325-29 61 30 0 27 9130-34 57 18 0 32 7535-39 73 14 0 37 8740-44 56 17 0 42 7345-49 84 8 0 47 9250-54 57 8 0 52 6555-60 87 9 0 57 9620-24 44 46 1 22 9025-29 53 37 1 27 9030-34 57 30 1 32 8735-39 54 12 1 37 6640-44 56 12 1 42 6845-49 83 19 1 47 10250-54 77 17 1 52 9455-60 74 12 1 57 86

Solution:

Logit Regression can be used to determine the impact of independent variables age and gender on the subsicription of magazine by a given individual

Page 2: Assignment No 1

SPSS Output:

Omnibus Tests of Model Coefficients

Chi-square df Sig.

Step 1

Step 94.806 2 .000

Block 94.806 2 .000

Model 94.806 2 .000

Hosmer and Lemeshow Test

Step Chi-square df Sig.

1 13.370 8 .100

High p value of .1 (significantly greater than .05) indicates that there is no significant difference between predictions by the model and observed values.

Variance explanation by model Summary: We use model summary is used to measure variation in the dependent variable that is explained by the model

Model Summary

Step -2 Log likelihood Cox & Snell R

Square

Nagelkerke R

Square

1 1381.112a .068 .102

a. Estimation terminated at iteration number 4 because

parameter estimates changed by less than .001.

Nagelkerke R Square is a modification of Cox & Snell Square. The explained variation in the

dependent variable based on our model ranges from 6.80% to 10.2%. Cox & Snell R square cant

achieve a value of 1,so we use Nagelkerke R Square value

Classsification Table: If the estimated probability of the event occurring is greater than or equal

to 0.5 (better than even chance), SPSS classifies the event as occurring (e.g., subscription of

magazine). If the probability is less than 0.5, SPSS classifies the event as not occurring (e.g., no

Page 3: Assignment No 1

subscription of magazine). It is very common to use binomial logistic regression to predict

whether cases can be correctly classified (i.e., predicted) from the independent variables.

Classification Tablea

Observed Predicted

Subscribe? Percentage

Correct0 1

Step 1Subscribe?

0 1025 0 100.0

1 320 0 .0

Overall Percentage 76.2

a. The cut value is .500

The percentage accuracy in classification (PAC), which reflects the percentage of cases

that can be correctly classified as "no" subscription of magazine with the independent

variables added.

In our case is 100%: Sensitivity, which is the percentage of cases that had the observed

characteristic (e.g., "yes" for subscription of magazine) which were correctly predicted by the

model (i.e., true positives).

In our case is 0%:  Specificity, which is the percentage of cases that did not have the

observed characteristic (e.g., "no" for subscription of magazine) and were also correctly

predicted as not having the observed characteristic (i.e., true negatives).

In our case is 76.2%: The positive predictive value, which is the percentage of correctly

predicted cases "with" the observed characteristic compared to the total number of cases

predicted as having the characteristic.

The negative predictive value, which is the percentage of correctly predicted cases "without"

the observed characteristic compared to the total number of cases predicted as not having the

characteristic.

Variables in the Equation

Variables in the Equation

Page 4: Assignment No 1

B S.E. Wald df Sig. Exp(B)

Step 1a

Age -.052 .006 78.998 1 .000 .949

Gender(1) .407 .134 9.264 1 .002 1.502

Constant .598 .231 6.701 1 .010 1.818

a. Variable(s) entered on step 1: Age, Gender.

From these results you can see that age (p = .00), gender (p = .002) Shows that it is a significant

model/prediction. We can use the information in the "Variables in the Equation" table to

predict the probability of an event occurring based on a one unit change in an independent

variable when all other independent variables are kept constant.

Probability of subscription = e{.598+0.407*Gender – 0.052*Age} 1+ e{.598+0.407*Gender – 0.052*Age}

Conclusion

A logistic regression was performed to ascertain the effects of age and gender on the likelihood that participants have subscribe the magazine or not. The logistic regression model was statistically significant, χ2(4) = 94.86, p < .0005. The model explained 10.20% (Nagelkerke R2) of the variance in subscription and correctly classified 76.2.0% of cases. Females were 1.52 times more likely to subscribe the magazine than males. Increasing age was associated with less l subscription of magazine.