mda output interpretation

Upload: chitra-belwal

Post on 06-Mar-2016

4 views

Category:

Documents


0 download

DESCRIPTION

MDA Output Interpretation

TRANSCRIPT

Discriminant

Multiple Discriminant Analysis: SPSS Output

After converting the data to SPSS format, click on Classify, then click Discriminant.

A box titled, Discriminant Analysis will come up. Click on Grouping Variable. This is the non-metric dependent variable. Define its range. Then enter the independent variables. There are two methods for MDA Enter method and Stepwise method. We will start with Enter method.

Next click on Statistics. There are 3 headings here: Descriptives (click all boxes), Function Coefficients (click none) and Matrices (click none). In descriptives we have Means, Univariate ANOVAs, and Boxs M.

Means: We use the group means for interpretation as in the HATB example.

Univariate ANOVAs: Pursuing these tests suggests which variables might be useful discriminants.

Boxs M: A test of equality of group variance-covariance matrices. For sufficiently large samples a high p-value signifies that there is insufficient evidence that the matrices differ.

H0: the variance/covariance matrices of independent variables across groups are the same;

H1: the variance/covariance matrices across groups are different.

Click continue and go to Classify.

Prior Probabilities: compute from group sizes: This incorporates the sizes of the groups as defined by the dependent variable into the classification of the cases using the discriminant functions.

Display: Casewise results: This will give you the classification details for each case in the output.

Display: Summary table: This will include summary tables comparing actual and predicted classification.

Display: Leave-one-out classification: This is to ask SPSS to include a cross-validation classification in the output. This option produces a less biased estimate of classification accuracy by sequentially holding each case out of the calculations for the discriminant functions, and using the derived functions to classify the case held out.

Use Covariance Matrix: Within-groups: The Covariance matrices are the measure of the dispersion in the groups defined by the dependent variable. If we fail the homogeneity of group variances test (Boxs M), our option is use Separate groups covariance in classification. Hence, it is good if the null hypothesis is accepted.

Plots: Combined-groups: This will help to obtain a visual plot of the relationship between functions and groups defined by the dependent variable.Discriminant

Introduction: Based on a discriminant analysis using the simultaneous method for including variables age [age], highest year of school completed [educ], gender [gender], and total family income [incom98] were found to be useful in distinguishing between groups defined by the dependent variable seen thriller movie in last year [tmovie].

A discriminant function differentiated survey respondents who had not seen a thriller movie in the last year from survey respondents who had seen thriller movie in the last year.

(I) Analysis Case Processing Summary

Unweighted Cases NPercent

Valid 15457.0

ExcludedMissing or out-of-range group codes7427.4

At least one missing discriminating variable3111.5

Both missing or out-of-range group codes and at least one missing discriminating variable114.1

Total11643.0

Total 270100.0

Interpretation: The minimum ratio of valid cases to independent variables for discriminant analysis is 5 to 1, with a preferred ratio of 20 to 1. In this analysis, there are 154 valid cases and 4 independent variables. The ratio of cases to independent variables is 38.5 to 1, which satisfies the minimum requirement. In addition, the ratio of 38.5 to 1 satisfies the preferred ratio of 20 to 1. Now, let us go to Prior Probabilities for Groups. Group Statistics

MeanStd. DeviationValid N (listwise)

SEEN THRILLER MOVIE IN LAST YEAR UnweightedWeighted

NO

AGE OF RESPONDENT46.3615.67111111.000

HIGHEST YEAR OF SCHOOL COMPLETED13.422.93111111.000

RESPONDENTS GENDER1.67.47111111.000

TOTAL FAMILY INCOME15.735.42111111.000

YES

AGE OF RESPONDENT39.1914.864343.000

HIGHEST YEAR OF SCHOOL COMPLETED13.582.844343.000

RESPONDENTS GENDER1.33.474343.000

TOTAL FAMILY INCOME16.604.444343.000

Total

AGE OF RESPONDENT44.3615.73154154.000

HIGHEST YEAR OF SCHOOL COMPLETED13.472.90154154.000

RESPONDENTS GENDER1.57.50154154.000

TOTAL FAMILY INCOME15.975.16154154.000

Group Statistics

MeanStd. DeviationValid N (listwise)

SEEN THRILLER MOVIE IN LAST YEAR UnweightedWeighted

NO

AGE OF RESPONDENT46.3615.67111111.000

HIGHEST YEAR OF SCHOOL COMPLETED13.422.93111111.000

RESPONDENTS GENDER1.67.47111111.000

TOTAL FAMILY INCOME15.735.42111111.000

YES

AGE OF RESPONDENT39.1914.864343.000

HIGHEST YEAR OF SCHOOL COMPLETED13.582.844343.000

RESPONDENTS GENDER1.33.474343.000

TOTAL FAMILY INCOME16.604.444343.000

Total

AGE OF RESPONDENT44.3615.73154154.000

HIGHEST YEAR OF SCHOOL COMPLETED13.472.90154154.000

RESPONDENTS GENDER1.57.50154154.000

TOTAL FAMILY INCOME15.975.16154154.000

Interpretation: (i) The average "age" for survey respondents who had not seen a thriller movie in the last year (mean=46.36) was higher than the average "age" for survey respondents who had seen a thriller movie in the last year (mean=39.19).

So, survey respondents who had not seen a thriller movie in the last year were older than survey respondents who had seen a thriller movie in the last year.

(ii) Since "gender" is a dichotomous variable, the mean is not directly interpretable. Its interpretation must take into account the coding by which 1 corresponds to male and 2 corresponds to female. The higher means (as compared to 1.57) for survey respondents who had not seen a thriller movie in the last year (mean=1.67), when compared to the means for survey respondents who had seen an thriller movie in the last year (mean=1.33), implies that the groups contained fewer survey respondents who were male and more survey respondents who were female.

Survey respondents who had not seen a thriller movie in the last year were more likely to be female than survey respondents who had seen a thriller movie in the last year. Let us now go to the next table Tests of Equality of Group Means.

Tests of Equality of Group Means

Wilks' LambdaFdf1df2Sig.

AGE OF RESPONDENT.9586.6841152.011

HIGHEST YEAR OF SCHOOL COMPLETED.999.0911152.763

RESPONDENTS GENDER.90416.0691152.000

TOTAL FAMILY INCOME.994.8891152.347

Interpretation: Interpretation: As we know, Wilks' Lambda tests the extent of equality of group means and their statistical significance for independent variables. In this case, we notice that gender and age have better values of Wilks' lambda statistic with a probability of pd | G=g) P(G=g | D=d)Squared Mahalanobis Distance to CentroidGroupP(G=g | D=d)Squared Mahalanobis Distance to CentroidFunction 1Function 2

pdf

Original122.6622.484.8261.3931.473.867.774

222.2332.6912.9171.2325.3312.076.479

512.7382.535.6071.3361.7641.098.395

611.2802.6362.5493.2243.514-1.645.956

822.9502.445.1031.387.606.558.269

1231.9532.490.0972.2731.036-.531.254

1311.6802.559.7723.2351.388-1.032.570

1411.9532.490.0972.2731.036-.531.254

4211.9412.437.1222.371.223.120.311

4331.2722.6602.6033.1874.007-1.4381.293

4421.9702.472.0612.333.533-.102.451

4511.4472.6021.6113.2302.418-1.366.781

4721.9992.465.0032.320.523-.205.283

4821.9872.449.0252.316.500-.253.079

4911.7382.565.6072.2392.100-.715.837

50ungrouped3.0002.81816.6031.12721.444-2.531-3.778

5132.1652.4823.5983.3153.556.782-1.898

5312.5352.5391.2531.3552.3191.193.802

For the original data, squared Mahalanobis distance is based on canonical functions. For the cross-validated data, squared Mahalanobis distance is based on observations.

** Misclassified case

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.

Classification Results

WELFAREPredicted Group Membership

Total

TOO LITTLEABOUT RIGHTTOO MUCH

OriginalCountTOO LITTLE4315664

ABOUT RIGHT2630662

TOO MUCH1710936

Ungrouped cases3328

%TOO LITTLE67.223.49.4100.0

ABOUT RIGHT41.948.49.7100.0

TOO MUCH47.227.825.0100.0

Ungrouped cases37.537.525.0100.0

Cross-validatedCountTOO LITTLE4315664

ABOUT RIGHT2630662

TOO MUCH1711836

%TOO LITTLE67.223.49.4100.0

ABOUT RIGHT41.948.49.7100.0

TOO MUCH47.230.622.2100.0

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.

b 50.6% of original grouped cases correctly classified.

c 50.0% of cross-validated grouped cases correctly classified.

Interpretation: The cross-validated accuracy rate computed by SPSS was 50.0% which was greater than or equal to the proportional by chance accuracy criteria of 43.7% (1.25 x 35.0% = 43.7%). The criteria for classification accuracy is satisfied.

Conclusion: Hours worked, self-employment, and education were the three independent variables we identified as strong contributors to distinguishing between the groups defined by the dependent variable.

The model was characterized as useful because it equaled the by chance accuracy criterion.

The summary correctly states the specific relationships between the dependent variable groups and the independent variables we interpreted.Survey respondents who thought we spend about the right amount of money on welfare worked fewer hours in the past week than survey respondents who thought we spend too little or too much money on welfare. Survey respondents who thought we spend too little money on welfare were less likely to be self-employed than survey respondents who feel we spend too much money on welfare. Survey respondents who thought we spend about the right amount of money on welfare had completed more years of school than survey respondents who thought we spend too little or too much money on welfare.

Question: Variables included in the analysis satisfy the level of measurement requirements?

Question: Number of variables and cases satisfy sample size requirements?

Sufficient statistically significant functions to differentiate among groups?

Groups defined by dependent variable differentiated by discriminant functions?

Interpretation of relationship between independent variable and dependent variable groups?

Classification accuracy sufficient to be characterized as a useful model?

Summary of findings correctly stated, including cautions?

Tests of Equality of Group Means

Wilks' LambdaFdf1df2Sig.

AGE OF RESPONDENT.9586.6841152.011

HIGHEST YEAR OF SCHOOL COMPLETED.999.0911152.763

RESPONDENTS GENDER.90416.0691152.000

TOTAL FAMILY INCOME.994.8891152.347

Analysis 1

Box's Test of Equality of Covariance Matrices

Log Determinants

SEEN THRILLER MOVIE IN LAST YEARRankLog Determinant

NO49.176

YES48.573

Pooled within-groups49.077

The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

Test Results

Box's M 10.220

FApprox..983

df110

df230348.624

Sig..455

Tests null hypothesis of equal population covariance matrices.

Interpretation: This is a test of variability, as mentioned above. But this is an overall judgement of all the indep variables taken together.

Boxs M = 10.220

Approx. F = 0.983

Conclusion: F-tab = FINV(0.05,10,30349) = 1.83. As calculated F is much lower than Tabular F, the null hyp is accepted. This is also confirmed by the p-value. Therefore, we can conclude that group homogeneity is present. Summary of Canonical Discriminant Functions

Eigenvalues

FunctionEigenvalue% of VarianceCumulative %Canonical Correlation

1.169100.0100.0.380

a First 1 canonical discriminant functions were used in the analysis.

Interpretation: There exists one Eigenvalue for one discriminant function. It depicts the relative discriminatory power of the discriminant functions. For two groups, it hardly makes any sense, but in case there are more than two groups, it allows to understand which function is better.

What Is an Eigenvalue in MDA?In the present example there are two groups, i.e., seen thriller movie last year = 1 and not seen thriller movie last year=0. As we know, when there are two groups only one discriminant function can be extracted from the data and its Eigenvalue (() is interpreted as follows:

In simple language, in the two-group case, we can define Between Sum of Squares (i.e., Sum of Squares Across Groups) as follows:

Within Sum of Squares (WSS) can be given as follows:

TSS (Total Sum of Squares) = BSS + WSSEigenvalue is defined as: . So, if ( = 0.00, the model has no discriminatory power (as BSS = 0). The larger the value of ( the greater the discriminatory power of the model.

Two groups can be separated by one discriminant function. Three groups require two discriminant functions. The required number of functions is usually one less than the number of groups.

With 4 independent variables and 2 groups defined by the dependent variable, the maximum possible number of discriminant functions was 1, which accounts for the 100% of variation by itself. (cross-check this with the 3-group case)

The significance of the maximum possible number of discriminant functions supports the interpretation of a solution using 1 discriminant function. Now let us go to Functions at Group Centriods.

(III) Wilks' Lambda

Test of Function(s)Wilks' LambdaChi-squaredfSig.

.85523.4404.000

Interpretation: The overall relationship in discriminant analysis is based on the existence of sufficient statistically significant discriminant functions to separate all of the groups defined by the dependent variable. As we see, the observed Chi-square (23.440) does not fall within the critical region (as =CHIINV(0.025,4) = 11.1433 and =CHIINV(0.975,4) = 0.484. The probability of pd | G=g)

P(G=g | D=d)Squared Mahalanobis Distance to CentroidGroupP(G=g | D=d)Squared Mahalanobis Distance to CentroidFunction 1

pdf

Original110.2061.5521.6021.448.126-1.011

200.3931.642.7301.358.003-.600

300.6011.863.2731.1372.054.777

400.0901.9482.8771.0526.7961.950

601.3881.563.7450.4373.147-1.520

900.0121.9756.3191.02511.7272.768

10ungrouped0.0561.9573.6471.0437.9552.164

1111.4351.544.6100.4562.863-1.438

1300.3741.898.7911.1023.2401.143

14ungrouped0.8361.825.0431.1751.248.461

1500.0691.9533.2981.0477.4362.070

16ungrouped0.0581.9573.6011.0437.8872.152

1700.9381.807.0061.193.976.332

18ungrouped1.3831.565.7610.4353.179-1.529

1900.5371.690.3811.310.086-.363

20ungrouped1.3021.6001.0660.4003.776-1.689

2111.4511.538.5680.4622.771-1.410

22ungrouped1.4261.548.6350.4522.916-1.453

2300.3121.9081.0241.0923.6961.266

2900.9641.789.0021.211.749.209

30ungrouped0.4561.885.5561.1152.7451.000

3110.2171.5591.5251.441.105-.980

3200.7321.741.1181.259.322-.089

3310.8981.815.0161.1851.079.382

3410.7831.834.0761.1661.407.530

3510.3591.629.8421.371.000-.663

3600.7441.840.1071.1601.531.581

3900.9891.798.0001.202.855.268

4000.9151.780.0111.220.646.148

4100.4981.879.4591.1212.523.932

4200.2281.5661.4521.434.086-.950

4301.4591.535.5490.4652.729-1.398

44ungrouped1.5501.503.3570.4972.274-1.254

4501.2771.6121.1790.3883.987-1.742

For the original data, squared Mahalanobis distance is based on canonical functions. For the cross-validated data, squared Mahalanobis distance is based on observations.

** Misclassified case

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.

Classification Results Predicted Group Membership Total

SEEN THRILLER MOVIE IN LAST YEARNOYES

OriginalCountNO9912111

YES291443

Ungrouped cases601474

%NO89.210.8100.0

YES67.432.6100.0

Ungrouped cases81.118.9100.0

Cross-validatedCountNO9912111

YES301343

%NO89.210.8100.0

YES69.830.2100.0

Classification Results

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.

b 73.4% of original grouped cases correctly classified.

c 72.7% of cross-validated grouped cases correctly classified.

Question: Variables included in the analysis satisfy the level of measurement requirements?

Question: Number of variables and cases satisfy sample size requirements?

Sufficient statistically significant functions to differentiate among groups?

Groups defined by dependent variable differentiated by discriminant functions?

Interpretation of relationship between independent variable and dependent variable groups?

Classification accuracy sufficient to be characterized as a useful model?

Summary of findings correctly stated, including cautions?

Ratio of valid cases: Indep variables = 20:1(ideal) & 5:1(okay). In our case ( 154:4

Min cases (at least 20) in smallest group > indep variables. In our case: cases = 42 & 4

Wilks Lambda (rel. measure)( WSS/TSS ( (group means differ) 0