mda output interpretation
DESCRIPTION
MDA Output InterpretationTRANSCRIPT
Discriminant
Multiple Discriminant Analysis: SPSS Output
After converting the data to SPSS format, click on Classify, then click Discriminant.
A box titled, Discriminant Analysis will come up. Click on Grouping Variable. This is the non-metric dependent variable. Define its range. Then enter the independent variables. There are two methods for MDA Enter method and Stepwise method. We will start with Enter method.
Next click on Statistics. There are 3 headings here: Descriptives (click all boxes), Function Coefficients (click none) and Matrices (click none). In descriptives we have Means, Univariate ANOVAs, and Boxs M.
Means: We use the group means for interpretation as in the HATB example.
Univariate ANOVAs: Pursuing these tests suggests which variables might be useful discriminants.
Boxs M: A test of equality of group variance-covariance matrices. For sufficiently large samples a high p-value signifies that there is insufficient evidence that the matrices differ.
H0: the variance/covariance matrices of independent variables across groups are the same;
H1: the variance/covariance matrices across groups are different.
Click continue and go to Classify.
Prior Probabilities: compute from group sizes: This incorporates the sizes of the groups as defined by the dependent variable into the classification of the cases using the discriminant functions.
Display: Casewise results: This will give you the classification details for each case in the output.
Display: Summary table: This will include summary tables comparing actual and predicted classification.
Display: Leave-one-out classification: This is to ask SPSS to include a cross-validation classification in the output. This option produces a less biased estimate of classification accuracy by sequentially holding each case out of the calculations for the discriminant functions, and using the derived functions to classify the case held out.
Use Covariance Matrix: Within-groups: The Covariance matrices are the measure of the dispersion in the groups defined by the dependent variable. If we fail the homogeneity of group variances test (Boxs M), our option is use Separate groups covariance in classification. Hence, it is good if the null hypothesis is accepted.
Plots: Combined-groups: This will help to obtain a visual plot of the relationship between functions and groups defined by the dependent variable.Discriminant
Introduction: Based on a discriminant analysis using the simultaneous method for including variables age [age], highest year of school completed [educ], gender [gender], and total family income [incom98] were found to be useful in distinguishing between groups defined by the dependent variable seen thriller movie in last year [tmovie].
A discriminant function differentiated survey respondents who had not seen a thriller movie in the last year from survey respondents who had seen thriller movie in the last year.
(I) Analysis Case Processing Summary
Unweighted Cases NPercent
Valid 15457.0
ExcludedMissing or out-of-range group codes7427.4
At least one missing discriminating variable3111.5
Both missing or out-of-range group codes and at least one missing discriminating variable114.1
Total11643.0
Total 270100.0
Interpretation: The minimum ratio of valid cases to independent variables for discriminant analysis is 5 to 1, with a preferred ratio of 20 to 1. In this analysis, there are 154 valid cases and 4 independent variables. The ratio of cases to independent variables is 38.5 to 1, which satisfies the minimum requirement. In addition, the ratio of 38.5 to 1 satisfies the preferred ratio of 20 to 1. Now, let us go to Prior Probabilities for Groups. Group Statistics
MeanStd. DeviationValid N (listwise)
SEEN THRILLER MOVIE IN LAST YEAR UnweightedWeighted
NO
AGE OF RESPONDENT46.3615.67111111.000
HIGHEST YEAR OF SCHOOL COMPLETED13.422.93111111.000
RESPONDENTS GENDER1.67.47111111.000
TOTAL FAMILY INCOME15.735.42111111.000
YES
AGE OF RESPONDENT39.1914.864343.000
HIGHEST YEAR OF SCHOOL COMPLETED13.582.844343.000
RESPONDENTS GENDER1.33.474343.000
TOTAL FAMILY INCOME16.604.444343.000
Total
AGE OF RESPONDENT44.3615.73154154.000
HIGHEST YEAR OF SCHOOL COMPLETED13.472.90154154.000
RESPONDENTS GENDER1.57.50154154.000
TOTAL FAMILY INCOME15.975.16154154.000
Group Statistics
MeanStd. DeviationValid N (listwise)
SEEN THRILLER MOVIE IN LAST YEAR UnweightedWeighted
NO
AGE OF RESPONDENT46.3615.67111111.000
HIGHEST YEAR OF SCHOOL COMPLETED13.422.93111111.000
RESPONDENTS GENDER1.67.47111111.000
TOTAL FAMILY INCOME15.735.42111111.000
YES
AGE OF RESPONDENT39.1914.864343.000
HIGHEST YEAR OF SCHOOL COMPLETED13.582.844343.000
RESPONDENTS GENDER1.33.474343.000
TOTAL FAMILY INCOME16.604.444343.000
Total
AGE OF RESPONDENT44.3615.73154154.000
HIGHEST YEAR OF SCHOOL COMPLETED13.472.90154154.000
RESPONDENTS GENDER1.57.50154154.000
TOTAL FAMILY INCOME15.975.16154154.000
Interpretation: (i) The average "age" for survey respondents who had not seen a thriller movie in the last year (mean=46.36) was higher than the average "age" for survey respondents who had seen a thriller movie in the last year (mean=39.19).
So, survey respondents who had not seen a thriller movie in the last year were older than survey respondents who had seen a thriller movie in the last year.
(ii) Since "gender" is a dichotomous variable, the mean is not directly interpretable. Its interpretation must take into account the coding by which 1 corresponds to male and 2 corresponds to female. The higher means (as compared to 1.57) for survey respondents who had not seen a thriller movie in the last year (mean=1.67), when compared to the means for survey respondents who had seen an thriller movie in the last year (mean=1.33), implies that the groups contained fewer survey respondents who were male and more survey respondents who were female.
Survey respondents who had not seen a thriller movie in the last year were more likely to be female than survey respondents who had seen a thriller movie in the last year. Let us now go to the next table Tests of Equality of Group Means.
Tests of Equality of Group Means
Wilks' LambdaFdf1df2Sig.
AGE OF RESPONDENT.9586.6841152.011
HIGHEST YEAR OF SCHOOL COMPLETED.999.0911152.763
RESPONDENTS GENDER.90416.0691152.000
TOTAL FAMILY INCOME.994.8891152.347
Interpretation: Interpretation: As we know, Wilks' Lambda tests the extent of equality of group means and their statistical significance for independent variables. In this case, we notice that gender and age have better values of Wilks' lambda statistic with a probability of pd | G=g) P(G=g | D=d)Squared Mahalanobis Distance to CentroidGroupP(G=g | D=d)Squared Mahalanobis Distance to CentroidFunction 1Function 2
Original122.6622.484.8261.3931.473.867.774
222.2332.6912.9171.2325.3312.076.479
512.7382.535.6071.3361.7641.098.395
611.2802.6362.5493.2243.514-1.645.956
822.9502.445.1031.387.606.558.269
1231.9532.490.0972.2731.036-.531.254
1311.6802.559.7723.2351.388-1.032.570
1411.9532.490.0972.2731.036-.531.254
4211.9412.437.1222.371.223.120.311
4331.2722.6602.6033.1874.007-1.4381.293
4421.9702.472.0612.333.533-.102.451
4511.4472.6021.6113.2302.418-1.366.781
4721.9992.465.0032.320.523-.205.283
4821.9872.449.0252.316.500-.253.079
4911.7382.565.6072.2392.100-.715.837
50ungrouped3.0002.81816.6031.12721.444-2.531-3.778
5132.1652.4823.5983.3153.556.782-1.898
5312.5352.5391.2531.3552.3191.193.802
For the original data, squared Mahalanobis distance is based on canonical functions. For the cross-validated data, squared Mahalanobis distance is based on observations.
** Misclassified case
a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.
Classification Results
WELFAREPredicted Group Membership
Total
TOO LITTLEABOUT RIGHTTOO MUCH
OriginalCountTOO LITTLE4315664
ABOUT RIGHT2630662
TOO MUCH1710936
Ungrouped cases3328
%TOO LITTLE67.223.49.4100.0
ABOUT RIGHT41.948.49.7100.0
TOO MUCH47.227.825.0100.0
Ungrouped cases37.537.525.0100.0
Cross-validatedCountTOO LITTLE4315664
ABOUT RIGHT2630662
TOO MUCH1711836
%TOO LITTLE67.223.49.4100.0
ABOUT RIGHT41.948.49.7100.0
TOO MUCH47.230.622.2100.0
a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.
b 50.6% of original grouped cases correctly classified.
c 50.0% of cross-validated grouped cases correctly classified.
Interpretation: The cross-validated accuracy rate computed by SPSS was 50.0% which was greater than or equal to the proportional by chance accuracy criteria of 43.7% (1.25 x 35.0% = 43.7%). The criteria for classification accuracy is satisfied.
Conclusion: Hours worked, self-employment, and education were the three independent variables we identified as strong contributors to distinguishing between the groups defined by the dependent variable.
The model was characterized as useful because it equaled the by chance accuracy criterion.
The summary correctly states the specific relationships between the dependent variable groups and the independent variables we interpreted.Survey respondents who thought we spend about the right amount of money on welfare worked fewer hours in the past week than survey respondents who thought we spend too little or too much money on welfare. Survey respondents who thought we spend too little money on welfare were less likely to be self-employed than survey respondents who feel we spend too much money on welfare. Survey respondents who thought we spend about the right amount of money on welfare had completed more years of school than survey respondents who thought we spend too little or too much money on welfare.
Question: Variables included in the analysis satisfy the level of measurement requirements?
Question: Number of variables and cases satisfy sample size requirements?
Sufficient statistically significant functions to differentiate among groups?
Groups defined by dependent variable differentiated by discriminant functions?
Interpretation of relationship between independent variable and dependent variable groups?
Classification accuracy sufficient to be characterized as a useful model?
Summary of findings correctly stated, including cautions?
Tests of Equality of Group Means
Wilks' LambdaFdf1df2Sig.
AGE OF RESPONDENT.9586.6841152.011
HIGHEST YEAR OF SCHOOL COMPLETED.999.0911152.763
RESPONDENTS GENDER.90416.0691152.000
TOTAL FAMILY INCOME.994.8891152.347
Analysis 1
Box's Test of Equality of Covariance Matrices
Log Determinants
SEEN THRILLER MOVIE IN LAST YEARRankLog Determinant
NO49.176
YES48.573
Pooled within-groups49.077
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.
Test Results
Box's M 10.220
FApprox..983
df110
df230348.624
Sig..455
Tests null hypothesis of equal population covariance matrices.
Interpretation: This is a test of variability, as mentioned above. But this is an overall judgement of all the indep variables taken together.
Boxs M = 10.220
Approx. F = 0.983
Conclusion: F-tab = FINV(0.05,10,30349) = 1.83. As calculated F is much lower than Tabular F, the null hyp is accepted. This is also confirmed by the p-value. Therefore, we can conclude that group homogeneity is present. Summary of Canonical Discriminant Functions
Eigenvalues
FunctionEigenvalue% of VarianceCumulative %Canonical Correlation
1.169100.0100.0.380
a First 1 canonical discriminant functions were used in the analysis.
Interpretation: There exists one Eigenvalue for one discriminant function. It depicts the relative discriminatory power of the discriminant functions. For two groups, it hardly makes any sense, but in case there are more than two groups, it allows to understand which function is better.
What Is an Eigenvalue in MDA?In the present example there are two groups, i.e., seen thriller movie last year = 1 and not seen thriller movie last year=0. As we know, when there are two groups only one discriminant function can be extracted from the data and its Eigenvalue (() is interpreted as follows:
In simple language, in the two-group case, we can define Between Sum of Squares (i.e., Sum of Squares Across Groups) as follows:
Within Sum of Squares (WSS) can be given as follows:
TSS (Total Sum of Squares) = BSS + WSSEigenvalue is defined as: . So, if ( = 0.00, the model has no discriminatory power (as BSS = 0). The larger the value of ( the greater the discriminatory power of the model.
Two groups can be separated by one discriminant function. Three groups require two discriminant functions. The required number of functions is usually one less than the number of groups.
With 4 independent variables and 2 groups defined by the dependent variable, the maximum possible number of discriminant functions was 1, which accounts for the 100% of variation by itself. (cross-check this with the 3-group case)
The significance of the maximum possible number of discriminant functions supports the interpretation of a solution using 1 discriminant function. Now let us go to Functions at Group Centriods.
(III) Wilks' Lambda
Test of Function(s)Wilks' LambdaChi-squaredfSig.
.85523.4404.000
Interpretation: The overall relationship in discriminant analysis is based on the existence of sufficient statistically significant discriminant functions to separate all of the groups defined by the dependent variable. As we see, the observed Chi-square (23.440) does not fall within the critical region (as =CHIINV(0.025,4) = 11.1433 and =CHIINV(0.975,4) = 0.484. The probability of pd | G=g)
P(G=g | D=d)Squared Mahalanobis Distance to CentroidGroupP(G=g | D=d)Squared Mahalanobis Distance to CentroidFunction 1
Original110.2061.5521.6021.448.126-1.011
200.3931.642.7301.358.003-.600
300.6011.863.2731.1372.054.777
400.0901.9482.8771.0526.7961.950
601.3881.563.7450.4373.147-1.520
900.0121.9756.3191.02511.7272.768
10ungrouped0.0561.9573.6471.0437.9552.164
1111.4351.544.6100.4562.863-1.438
1300.3741.898.7911.1023.2401.143
14ungrouped0.8361.825.0431.1751.248.461
1500.0691.9533.2981.0477.4362.070
16ungrouped0.0581.9573.6011.0437.8872.152
1700.9381.807.0061.193.976.332
18ungrouped1.3831.565.7610.4353.179-1.529
1900.5371.690.3811.310.086-.363
20ungrouped1.3021.6001.0660.4003.776-1.689
2111.4511.538.5680.4622.771-1.410
22ungrouped1.4261.548.6350.4522.916-1.453
2300.3121.9081.0241.0923.6961.266
2900.9641.789.0021.211.749.209
30ungrouped0.4561.885.5561.1152.7451.000
3110.2171.5591.5251.441.105-.980
3200.7321.741.1181.259.322-.089
3310.8981.815.0161.1851.079.382
3410.7831.834.0761.1661.407.530
3510.3591.629.8421.371.000-.663
3600.7441.840.1071.1601.531.581
3900.9891.798.0001.202.855.268
4000.9151.780.0111.220.646.148
4100.4981.879.4591.1212.523.932
4200.2281.5661.4521.434.086-.950
4301.4591.535.5490.4652.729-1.398
44ungrouped1.5501.503.3570.4972.274-1.254
4501.2771.6121.1790.3883.987-1.742
For the original data, squared Mahalanobis distance is based on canonical functions. For the cross-validated data, squared Mahalanobis distance is based on observations.
** Misclassified case
a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.
Classification Results Predicted Group Membership Total
SEEN THRILLER MOVIE IN LAST YEARNOYES
OriginalCountNO9912111
YES291443
Ungrouped cases601474
%NO89.210.8100.0
YES67.432.6100.0
Ungrouped cases81.118.9100.0
Cross-validatedCountNO9912111
YES301343
%NO89.210.8100.0
YES69.830.2100.0
Classification Results
a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.
b 73.4% of original grouped cases correctly classified.
c 72.7% of cross-validated grouped cases correctly classified.
Question: Variables included in the analysis satisfy the level of measurement requirements?
Question: Number of variables and cases satisfy sample size requirements?
Sufficient statistically significant functions to differentiate among groups?
Groups defined by dependent variable differentiated by discriminant functions?
Interpretation of relationship between independent variable and dependent variable groups?
Classification accuracy sufficient to be characterized as a useful model?
Summary of findings correctly stated, including cautions?
Ratio of valid cases: Indep variables = 20:1(ideal) & 5:1(okay). In our case ( 154:4
Min cases (at least 20) in smallest group > indep variables. In our case: cases = 42 & 4
Wilks Lambda (rel. measure)( WSS/TSS ( (group means differ) 0