Logistic Regression and Discriminant Function Analysis
Logistic Regression vs. Discriminant Function Analysis
• Similarities– Both predict group membership for each
observation (classification)– Dichotomous DV– Requires an estimation and validation sample
to assess predictive accuracy– If the split between groups is not more
extreme than 80/20, yield similar results in practice
Logistic Reg vs. Discrim: Differences• Discriminant Analysis
– Assumes MV normality – Assumes equality of VCV
matrices– Large number of predictors
violates MV normality can’t be accommodated
– Predictors must be continuous, interval level
– More powerful when assumptions are met
– Many assumptions, rarely met in practice
– Categorical IVs create problems
• Logistic Regression– No assumption of MV normality – No assumption of equality of
VCV matrices– Can accommodate large
numbers of predictors more easily
– Categorical predictors OK (e.g., dummy codes)
– Less powerful when assumptions are met
– Few assumptions, typically met in practice
– Categorical IVs can be dummy coded
Logistic Regression• Outline:
– Categorical Outcomes: Why not OLS Regression?
– General Logistic Regression Model– Maximum Likelihood Estimation– Model Fit– Simple Logistic Regression
Categorical Outcomes: Why not OLS Regression?
• Dichotomous outcomes:– Passed / Failed– CHD / No CHD– Selected / Not Selected– Quit/ Did Not Quit– Graduated / Did Not Graduate
• Example: Relationship b/w performance and turnover
Categorical Outcomes: Why not OLS Regression?
Performance
5.04.54.03.53.02.52.01.5
Tur
nove
r
1.5
1.0
.5
0.0
-.5
• Line of best fit?!• Errors (Y-Y’) across values of performance (X)?
Problems with Dichotomous Outcomes/DVs
• The regression surface is intrinsically non-linear• Errors assume one of two possible values, violate
assumption of normally distributed errors• Violates assumption of homoscedasticity• Predicted values of Y greater than 1 and smaller
than 0 can be obtained• The true magnitude of the effects of IVs may be
greatly underestimated• Solution: Model data using Logistic Regression,
NOT OLS Regression
• Logistic regression predicts a probability that an event will occur– Range of possible responses between 0 and 1– Must use an s-shaped curve to fit data
• Regression assumes linear relationships, can’t fit an s-shaped curve– Violates normal distribution– Creates heteroscedascity
Logistic Regression vs. Regression
Example: Relationship b/w Age and CHD (1 = Has CHD)
General Logistic Regression Model• Y’ (outcome variable) is the probability that
having one outcome or another based on a nonlinear function of the best linear combination of predictors
Where:• Y’ = probability of an event• Linear portion of equation (a + b1x1) used to
predict probability of event (0,1), not an end in itself
11
11
1' Xba
Xba
eeY
The logistic (logit) transformation• DV is dichotomous purpose is to
estimate probability of occurrences (0, 1)– Thus, DV is transformed into a likelihood
• Logit/logistic transformation accomplishes (linear regression eq. takes log of odds)
)0()1(
)1(1)1(
1
YPYP
YPYP
PPodds
ijjXBA
YY
PPPitodds
'1'ln
1ln)(log)log(
Probability Calculation
bXa
bXa
eeYP
1'
Where:The relation b/w logit (P) and X is intrinsically linearb = expected change of logit(P) given one unit change in Xa = intercepte = Exponential
Ordinary Least Squares (OLS) Estimation
• Purpose is obtain the estimates that would best minimize the sum of squared errors, sum(y-y’)2
• The estimates chosen best describe the relationships among the observed variables (IVs and DV)
• Estimates chosen maximize the probability of obtaining the observed data (i.e., these are the population values most likely to produce the data at hand)
• OLS can’t be used in logistic regression because of non-linear nature of relationships
• In ML, the purpose is to obtain the parameter estimates most likely to produce the data– ML estimators are those with the greatest joint
likelihood of reproducing the data• In logistic regression, each model yields a ML
joint probability (likelihood) value• Because this value tends to be very small
(e.g., .00000015), it is multiplied by -2log• The -2log transformation also yields a statistic
with a known distribution (chi-square distribution)
Maximum Likelihood (ML) estimation
Model Fit• In Logistic Regression, R & R2 don’t make sense• Evaluate model fit using the -2log likelihood (-2LL)
value obtained for each model (through ML estimation)– The -2LL value reflects fit of model; used to
compare fit of nested models – The -2LL measures lack of fit – extent to
which model fits data poorly– When the model fits the data perfectly, -2LL = 0
• Ideally, the -2LL value for the null model (i.e., model with no predictors, or “intercept-only” model) would be larger than then the model with predictors
Comparing Model Fit The fit of the null model can be tested against the fit of the model with predictors using chi-square test:
ModelNull LLLL 222 Where: 2 = chi-square for improvement in model fit (where df =
kNull – kModel)• -2LLMO = -2 Log likelihood value for null model
(intercept-only model)• -2LLM1 = -2 Log likelihood value for hypothesized model• Same test can be used to compare nested model with k
predictor(s) to model with k+1 predictors, etc.• Same logic as OLS regression, but the models are
compared using a different fit index (-2LL)
Pseudo R2
• Assessment of overall model fit• Calculation
• Two primary Pseudo R2 stats:– Nagelkerke less conservative
• preferred by some because max = 1– Cox & Snell more conservative
• Interpret like R2 in OLS regression
Null
ModelNull
LL
LLLLR
2
222
Unique Prediction• In OLS regression, the significance
tests for the beta weights indicate if the IV is a unique predictors
• In Logistic regression, the Wald test is used for the same purpose
Similarities to Regression• You can use all of the following
procedures you learned about OLS regression in logistic regression– Dummy coding for categorical IVs– Hierarchical entry of variables (compare
changes in % classification; significance of Wald test)
– Stepwise (but don’t use, its atheoretical)– Moderation tests
Simple Logistic Regression Example
• Data collected from 50 employees • Y = success in training program (1 =
pass; 0 = fail)• X1 = Job aptitude score (5 = very high;
1= very low)• X2 = Work-related experience
(months)
Syntax in SPSS
LOGISTIC REGRESSION PASS /METHOD = ENTER APT EXPER /SAVE = PRED PGROUP /CLASSPLOT /PRINT = GOODFIT /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
DV
IVs
Results• Block O: The Null Model results
– Can’t do any worse than this• Block 1: Method = Enter
– Tests of the model of interest– Interpret data from here
Omnibus Tests of Model Coefficients
10.169 2 .00610.169 2 .00610.169 2 .006
StepBlockModel
Step 1Chi-square df Sig.
Tests if model is significantly better than the null model. Significant chi-square means yes!
Step, Block & Model yield same results because all IVs entered in same block
Results ContinuedModel Summary
59.066a .184 .245Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 4 becauseparameter estimates changed by less than .001.
a.
-2 Log Likelihood an index of fit - smaller number means better fit (Perfect fit = 0)Pseudo R2 – Interpret like R2 in regression
Nagelkerke preferred by some because max = 1, Cox & Snell more conservative estimate uniformly
Classification: Null Model vs. Model Tested
Classification Tablea
16 8 66.76 20 76.9
72.0
Observedfailpass
PASS
Overall Percentage
Step 1fail pass
PASS PercentageCorrect
Predicted
The cut value is .500a.
Classification Tablea,b
0 24 .00 26 100.0
52.0
Observedfailpass
PASS
Overall Percentage
Step 0fail pass
PASS PercentageCorrect
Predicted
Constant is included in the model.a.
The cut value is .500b.
Null Model52% correctclassification
Model Tested72% correct classification
Variables in EquationVariables in the Equation
.549 .235 5.473 1 .019 1.731.111 .052 4.577 1 .032 1.118
-3.050 1.146 7.086 1 .008 .047
APTEXPERConstant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: APT, EXPER.a.
B effect of one unit change in IV on the log odds (hard to interpret)*Odds Ratio (OR) Exp(B) in SPSS = more interpretable; one unit change in aptitude increases the probability of passing by 1.7xWald Like t test, uses chi-square distributionSignificance to determine if wald test is significant
Histogram of Predicted Probabilities
To Flag Misclassified Cases
SPSS syntaxCOMPUTE PRED_ERR=0.IF LOW NE PGR_1 PRED_ERR=1.
You can use this for additional analyses to explore causes of misclassification
Results ContinuedHosmer and Lemeshow Test
6.608 8 .579Step1
Chi-square df Sig.
An index of model fit. Chi-square compares the fit of the data (the observed events) with the model (the predicted events). The n.s. results means that the observed and expected values are similar this is good!
Hierarchical Logistic Regression
• Question: Which of the following variables predict whether a woman is hired to be a Hooters girl?– Age– IQ– Weight
Simultaneous v. Hierarchical
Omnibus Tests of Model Coefficients
48.462 3 .00048.462 3 .00048.462 3 .000
StepBlockModel
Step 1Chi-square df Sig.
Omnibus Tests of Model Coefficients
.289 1 .591
.289 1 .591
.289 1 .591
StepBlockModel
Step 1Chi-square df Sig.
Block 1. IQ
Cox & Snell .002; Nagelkerke .003
Omnibus Tests of Model Coefficients
42.044 1 .00042.044 1 .00042.333 2 .000
StepBlockModel
Step 1Chi-square df Sig.
Block 2. Age
Cox & Snell .264; Nagelkerke .353Block 3. Weight
Omnibus Tests of Model Coefficients
6.129 1 .0136.129 1 .013
48.462 3 .000
StepBlockModel
Step 1Chi-square df Sig.
Cox & Snell .296; Nagelkerke .395
Block 1. IQ, Age, Weight
Model Summary
142.383a .296 .395Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 6 becauseparameter estimates changed by less than .001.
a.
Simultaneous v. HierarchicalBlock 1. IQ
Block 2. Age
Block 3. Weight
Block 1. IQ, Age, WeightClassification Tablea
53 12 81.526 47 64.4
72.5
Observednot hiredhired
Hired
Overall Percentage
Step 1not hired hired
Hired PercentageCorrect
Predicted
The cut value is .500a.
Classification Tablea
8 57 12.36 67 91.8
54.3
Observednot hiredhired
Hired
Overall Percentage
Step 1not hired hired
Hired PercentageCorrect
Predicted
The cut value is .500a.
Classification Tablea
55 10 84.628 45 61.6
72.5
Observednot hiredhired
Hired
Overall Percentage
Step 1not hired hired
Hired PercentageCorrect
Predicted
The cut value is .500a.
Classification Tablea
53 12 81.526 47 64.4
72.5
Observednot hiredhired
Hired
Overall Percentage
Step 1not hired hired
Hired PercentageCorrect
Predicted
The cut value is .500a.
Simultaneous v. Hierarchical
Variables in the Equation
-.009 .015 .372 1 .542 .991-.591 .125 22.224 1 .000 .554-.277 .117 5.630 1 .018 .7588.264 1.821 20.602 1 .000 3881.775
IQageweightConstant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: IQ, age, weight.a.
Block 1. IQ
Block 2. Age
Block 3. Weight
Block 1. IQ, Age, Weight
Variables in the Equation
-.009 .015 .372 1 .542 .991-.591 .125 22.224 1 .000 .554-.277 .117 5.630 1 .018 .7588.264 1.821 20.602 1 .000 3881.775
IQageweightConstant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: IQ, age, weight.a.
Variables in the Equation
-.003 .014 .032 1 .858 .997-.591 .120 24.220 1 .000 .5546.484 1.533 17.899 1 .000 654.298
IQageConstant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: age.a.
Variables in the Equation
.006 .012 .289 1 .591 1.006-.185 .585 .100 1 .752 .831
IQConstant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: IQ.a.
Multinomial Logistic Regression
• A form of logistic regression that allows prediction of probability into more than 2 groups– Based on a multinomial distribution
• Sometimes called polytomous logistic regression• Conducts an omnibus test first for each predictor
across 3+ groups (like ANOVA)– Then conduct pairwise comparisons (like post hoc
tests in ANOVA)
Objectives of Discriminant Analysis• Determining whether significant differences exist
between average scores on a set of variables for 2+ a priori defined groups
• Determining which IVs account for most of the differences in average score profiles for 2+ groups
• Establishing procedures for classifying objects into groups based on scores on a set of IVs
• Establishing the number and composition of the dimensions of discrimination between groups formed from the set of IVs
• Discriminant analysis develops a linear combination that can best separate groups.
• Opposite of MANOVA
• In MANOVA, groups are usually constructed by researcher and have clear structure (e.g., a 2 x 2 factorial design). Groups = IVs
• In discriminant analysis, the groups usually have no particular structure and their formation is not under experimental control. Groups = DVs
Discriminant Analysis
• Linear combinations (discriminant functions) are formed that maximize the ratio of between-groups variance to within-groups variance for a linear combination of predictors.
• Total # discriminant functions = # groups – 1 OR # of predictors (whichever is smaller)
• If more than one discriminant function is formed, subsequent discriminant functions are independent of prior combinations and account for as much remaining group variation as possible.
How Discrim Works
Assumptions in Discrim• Multivariate normality of IVs
– Violation more problematic if overlap between groups• Homogeneity of VCV matrices• Linear relationships• IVs continuous (interval scale)
– Can accommodate nominal but violates MV normality• Single categorical DV
Results influenced by:• Outliers (classification may be wrong)• Multicollinearity (interpretation of coefficients
difficult)
Sample Size Considerations• Observations: # Predictors
– Suggested 20 observations per predictor– Minimum required 5 observations per predictor
• Observations: Groups (in DV)– Minimum: smallest group size exceeds # of
IVs– Practical Guide: Each group should have 20+
observations – Wide variation in group size impacts results
(i.e., classification is incorrect)
In this hypothetical example, data from 500 graduate students seeking jobs were examined. Available for each student were three predictors: GRE(V+Q), Years to Finish the Degree, and Number of Publications. The outcome measure was categorical: “Got a job” versus “Did not get a job.” Half of the sample was used to determine the best linear combination for discriminating the job categories. The second half of the sample was used for cross-validation.
Example
DISCRIMINANT /GROUPS=job(1 2) /VARIABLES=gre pubs years /SELECT=sample(1) /ANALYSIS ALL /SAVE=CLASS SCORES PROBS /PRIORS SIZE /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW CORR COV GCOV TCOV TABLE CROSSVALID /PLOT=COMBINED SEPARATE MAP /PLOT=CASES /CLASSIFY=NONMISSING POOLED .
Interpreting Output
• Box’s M• Eigenvalues• Wilks Lambda• Discriminant Weights• Discriminant Loadings
Group Statistics
1296.20 96.913 179 179.0003.50 2.029 179 179.0006.47 2.094 179 179.000
1305.87 101.824 71 71.0006.55 1.593 71 71.0004.85 1.179 71 71.000
1298.94 98.224 250 250.0004.36 2.357 250 250.0006.01 2.016 250 250.000
GRE (V+Q)Number of PublicationsYears to Finish DegreeGRE (V+Q)Number of PublicationsYears to Finish DegreeGRE (V+Q)Number of PublicationsYears to Finish Degree
JOBOops!
Got One!
Total
Mean Std. Deviation Unweighted WeightedValid N (listwise)
Tests of Equality of Group Means
.998 .492 1 248 .483
.658 129.009 1 248 .000
.867 37.885 1 248 .000
GRE (V+Q)Number of PublicationsYears to Finish Degree
Wilks'Lambda F df1 df2 Sig.
Test Results
49.6798.137
6114277.8
.000
Box's MApprox.df1df2Sig.
F
Tests null hypothesis of equal population covariance matrices.
Violates Assumption of Homogeneity of VCV matrices. But this test is sensitive in general and sensitive to violations of multivariate normality too. Tests of significance in discriminant analysis are robust to moderate violations of the homogeneity assumption.
Eigenvalues
.693a 100.0 100.0 .640Function1
Eigenvalue % of Variance Cumulative %CanonicalCorrelation
First 1 canonical discriminant functions were used in theanalysis.
a.
Wilks' Lambda
.590 129.854 3 .000Test of Function(s)1
Wilks'Lambda Chi-square df Sig.
Standardized Canonical Discriminant Function Coefficients
-.308.944
-.423
GRE (V+Q)Number of PublicationsYears to Finish Degree
1Function
Structure Matrix
.866-.469.054
Number of PublicationsYears to Finish DegreeGRE (V+Q)
1Function
Pooled within-groups correlations between discriminatingvariables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function.
Data from both these outputs indicate that one of the predictors best discriminates who did/did not get a job. Which one is it?
Discriminant Loadings
Discriminant Weights
Canonical Discriminant Function Coefficients
-.003.493
-.2253.268
GRE (V+Q)Number of PublicationsYears to Finish Degree(Constant)
1Function
Unstandardized coefficients
Functions at Group Centroids
-.5221.317
JOBOops!Got One!
1Function
Unstandardized canonical discriminantfunctions evaluated at group means
This is the raw canonical discriminant function.
The means for the groups on the raw canonical discriminant function can be used to establish cut-off points for classification.
Prior Probabilities for Groups
.716 179 179.000
.284 71 71.0001.000 250 250.000
JOBOops!Got One!Total
Prior Unweighted WeightedCases Used in Analysis
Classification can be based on distance from the group centroids and take into account information about prior probability of group membership.
Classification Resultsb,c,d
170 9 17923 48 71
95.0 5.0 100.032.4 67.6 100.0169 10 17924 47 71
94.4 5.6 100.033.8 66.2 100.0175 10 18517 48 65
94.6 5.4 100.026.2 73.8 100.0
JOBOops!Got One!Oops!Got One!Oops!Got One!Oops!Got One!Oops!Got One!Oops!Got One!
Count
%
Count
%
Count
%
Original
Cross-validateda
Original
Cases Selected
Cases Not Selected
Oops! Got One!
Predicted GroupMembership
Total
Cross validation is done only for those cases in the analysis. In cross validation, each caseis classified by the functions derived from all cases other than that case.
a.
87.2% of selected original grouped cases correctly classified.b.
89.2% of unselected original grouped cases correctly classified.c.
86.4% of selected cross-validated grouped cases correctly classified.d.
Canonical Discriminant Function 1
JOB = Oops!50
40
30
20
10
0
Std. Dev = 1.10 Mean = -.55
N = 364.00
Two modes?
Canonical Discriminant Function 1
JOB = Got One!16
14
12
10
8
6
4
2
0
Std. Dev = .62 Mean = 1.30
N = 136.00
Violation of the homogeneity assumption can affect the classification. To check, the analysis can be conducted using separate group covariance matrices.
Classification Resultsa,b
165 14 17921 50 71
92.2 7.8 100.029.6 70.4 100.0168 17 185
11 54 6590.8 9.2 100.016.9 83.1 100.0
JOBOops!Got One!Oops!Got One!Oops!Got One!Oops!Got One!
Count
%
Count
%
Original
Original
Cases Selected
Cases Not Selected
Oops! Got One!
Predicted GroupMembership
Total
86.0% of selected original grouped cases correctly classified.a.
88.8% of unselected original grouped cases correctly classified.b.
No noticeable change in the accuracy of classification.
The group that did not get a job was actually composed of two subgroups—those that got interviews but did not land a job and those that were never interviewed. This accounts for the bimodality in the discriminant function scores. The discriminant analysis of the three groups allows for the derivation of one more discriminant function, perhaps indicating the characteristics that separate those who get interviews from those who don’t, or, those who have successful interviews from those whose interviews do not produce a job offer.
Discriminant Analysis: Three Groups
Remember this?Canonical Discriminant Function 1
JOB = Oops!50
40
30
20
10
0
Std. Dev = 1.10 Mean = -.55
N = 364.00
Two modes?
DISCRIMINANT /GROUPS=group(1 3) /VARIABLES=gre pubs years /SELECT=sample(1) /ANALYSIS ALL /SAVE=CLASS SCORES PROBS /PRIORS SIZE /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW CORR COV GCOV TCOV TABLE CROSSVALID /PLOT=COMBINED SEPARATE MAP /PLOT=CASES /CLASSIFY=NONMISSING POOLED .
Group Statistics
1307.54 85.491 54 54.0001.59 1.434 54 54.0008.57 1.797 54 54.000
1305.87 101.824 71 71.0006.55 1.593 71 71.0004.85 1.179 71 71.000
1291.30 101.382 125 125.0004.32 1.664 125 125.0005.56 1.467 125 125.000
1298.94 98.224 250 250.0004.36 2.357 250 250.0006.01 2.016 250 250.000
GRE (V+Q)Number of PublicationsYears to Finish DegreeGRE (V+Q)Number of PublicationsYears to Finish DegreeGRE (V+Q)Number of PublicationsYears to Finish DegreeGRE (V+Q)Number of PublicationsYears to Finish Degree
GROUPUnemployed
Got a Job
Interview Only
Total
Mean Std. Deviation Unweighted WeightedValid N (listwise)
Tests of Equality of Group Means
.994 .761 2 247 .468
.455 147.864 2 247 .000
.529 109.977 2 247 .000
GRE (V+Q)Number of PublicationsYears to Finish Degree
Wilks'Lambda F df1 df2 Sig.
Test Results
21.7961.780
12137372.4
.045
Box's MApprox.df1df2Sig.
F
Tests null hypothesis of equal population covariance matrices.
Separating the three groups produces better homogeneity of VCV matrices.
Still significant, but just barely. Not enough to worry about.
Eigenvalues
5.353a 99.1 99.1 .918.047a .9 100.0 .211
Function12
Eigenvalue % of Variance Cumulative %CanonicalCorrelation
First 2 canonical discriminant functions were used in theanalysis.
a.
Wilks' Lambda
.150 466.074 6 .000
.955 11.246 2 .004
Test of Function(s)1 through 22
Wilks'Lambda Chi-square df Sig.
Two significant linear combinations can be derived, but they are not of equal importance.
Standardized Canonical Discriminant Function Coefficients
.734 .194-1.246 .5211.032 .602
GRE (V+Q)Number of PublicationsYears to Finish Degree
1 2Function
Structure Matrix
-.466 .867*.401 .796*.008 .354*
Number of PublicationsYears to Finish DegreeGRE (V+Q)
1 2Function
Pooled within-groups correlations between discriminatingvariables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function.
Largest absolute correlation between each variable andany discriminant function
*.
What do the linear combinations mean now?
Loadings
Weights
Functions at Group Centroids
4.026 .162-2.469 .251-.337 -.213
GROUPUnemployedGot a JobInterview Only
1 2Function
Unstandardized canonical discriminantfunctions evaluated at group means
Canonical Discriminant Function Coefficients
.007 .002-.781 .326.701 .409
-10.496 -6.445
GRE (V+Q)Number of PublicationsYears to Finish Degree(Constant)
1 2Function
Unstandardized coefficients
Functions at Group Centroids
4.026 .162-2.469 .251-.337 -.213
GROUPUnemployedGot a JobInterview Only
1 2Function
Unstandardized canonical discriminantfunctions evaluated at group means
DF2
DF1
+4
+4
0
0-4
-4
-2
-2
+2
+2
unemployed
interview only
got a job
Loadings
DF1 DF2
No. Pubs -.466 .867
Yrs to finish .401 .796
GRE .008 .354
Weights
DF1 DF2
No. Pubs -1.246 .521
Yrs to finish 1.032 .602
GRE .734 .194
DF2
DF1
+4
+4
0
0-4
-4
-2
-2
+2
+2
unemployed
interview only
got a job
This figure shows that discriminant function #1, which is made up of number of publications and years to finish, reliably differentiates between those who got jobs, had interviews only, and had no job or interview. Specially, a high value on DF1 was associated with not getting a job, suggesting that having few publications (loading = -.466) and taking a long time to finish (loading = .401) was associated with not getting a job.
DF2
DF1
+4
+4
0
0-4
-4
-2
-2
+2
+2
unemployed
interview only
got a job
Prior Probabilities for Groups
.216 54 54.000
.284 71 71.000
.500 125 125.0001.000 250 250.000
GROUPUnemployedGot a JobInterview OnlyTotal
Prior Unweighted WeightedCases Used in Analysis
Classification Function Coefficients
.238 .190 .205-10.539 -5.440 -7.25611.018 6.503 7.808
-196.112 -123.212 -139.036
GRE (V+Q)Number of PublicationsYears to Finish Degree(Constant)
Unemployed Got a Job Interview OnlyGROUP
Fisher's linear discriminant functions
Territorial Map
Canonical DiscriminantFunction 2 -6.0 -4.0 -2.0 .0 2.0 4.0 6.0 ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô 6.0 ô 23 31 ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó 4.0 ô ô ô 23 ô 31ô ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó 2.0 ô ô ô 23 ô 31 ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó * 23 31 ó .0 ô ô ô 23 ô 31 * ô ó 23 * 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó -2.0 ô ô 23 ô ô31 ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó -4.0 ô ô 23 ô ô ô 31 ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó -6.0 ô 23 31 ô ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô -6.0 -4.0 -2.0 .0 2.0 4.0 6.0 Canonical Discriminant Function 1
Symbols used in territorial map
Symbol Group Label------ ----- --------------------
1 1 Unemployed 2 2 Got a Job 3 3 Interview Only * Indicates a group centroid
Canonical Discriminant Functions
Function 1
86420-2-4-6
Func
tion
24
3
2
1
0
-1
-2
-3
GROUP
Group Centroids
Interview Only
Got a Job
Unemployed
Interview OnlyGot a Job Unemployed
A classification function is derived for each group. The original data are used to estimate a classification score for each person, for each group. The person is then assigned to the group that produces the largest classification score.
Classification Function Coefficients
.238 .190 .205-10.539 -5.440 -7.25611.018 6.503 7.808
-196.112 -123.212 -139.036
GRE (V+Q)Number of PublicationsYears to Finish Degree(Constant)
Unemployed Got a Job Interview OnlyGROUP
Fisher's linear discriminant functions
Classification
Classification Resultsb,c,d
51 0 3 540 51 20 710 13 112 125
94.4 .0 5.6 100.0.0 71.8 28.2 100.0.0 10.4 89.6 100.051 0 3 540 51 20 710 13 112 125
94.4 .0 5.6 100.0.0 71.8 28.2 100.0.0 10.4 89.6 100.062 0 4 660 47 18 654 11 104 119
93.9 .0 6.1 100.0.0 72.3 27.7 100.0
3.4 9.2 87.4 100.0
GROUPUnemployedGot a JobInterview OnlyUnemployedGot a JobInterview OnlyUnemployedGot a JobInterview OnlyUnemployedGot a JobInterview OnlyUnemployedGot a JobInterview OnlyUnemployedGot a JobInterview Only
Count
%
Count
%
Count
%
Original
Cross-validateda
Original
Cases Selected
Cases Not Selected
Unemployed Got a Job Interview OnlyPredicted Group Membership
Total
Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by thefunctions derived from all cases other than that case.
a.
85.6% of selected original grouped cases correctly classified.b.
85.2% of unselected original grouped cases correctly classified.c.
85.6% of selected cross-validated grouped cases correctly classified.d.
Expected
Unemployed Got a Job Interview Only All
Actual
Unemployed 51 0 3 54
Got a Job 0 51 20 71
Interview Only 0 13 112 125
All 51 64 135 250
Is the classification better than would be expected by chance? Observed values:
Expected
Unemployed Got a Job Interview Only All
Actual
Unemployed(51x54)
250(64x54)
250(135x54)
25054
Got a Job(51x71)
250(64x71)
250(135x71)
25071
Interview Only
(51x125)250
(64x125)250
(135x125)250
125
All 51 64 135 250
Expected classification by chance
E = (Row x Column)/Total N
Expected
Unemployed Got a Job Interview Only All
Actual
Unemployed 11.016 13.824 29.16 54
Got a Job 14.484 18.176 38.34 71
Interview Only 25.5 32 67.5 125
All 54 71 125 250
Correct classification that would occur by chance:
The difference between chance expected and actual classification can be tested with a chi-square as well.
fff
ected
ectedobserved
exp
2
2 exp
Where degree of freedom = (# groups -1)2
df = 4
= 145.13 + 13.82 + 23.47 + 14.48 + 59.25 + 8.77 + 25.5 + 11.28 + 29.34
Chi squared = 331.04