multiple discriminant analysis. learning objectives upon completing this chapter, you should be able...

32
Multiple Discriminant Analysis Multiple Discriminant Analysis

Upload: ira-lloyd

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Multiple Discriminant AnalysisMultiple Discriminant Analysis

Page 2: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

LEARNING OBJECTIVESLEARNING OBJECTIVESUpon completing this chapter, you should be able to Upon completing this chapter, you should be able to

do the following:do the following:• State the circumstances under which a linear State the circumstances under which a linear

discriminant analysis should be used instead of discriminant analysis should be used instead of multiple regression.multiple regression.

• Identify the major issues relating to types of Identify the major issues relating to types of variables used and sample size required in the variables used and sample size required in the application of discriminant analysis.application of discriminant analysis.

• Understand the assumptions underlying Understand the assumptions underlying discriminant analysis in assessing its discriminant analysis in assessing its appropriateness for a particular problem.appropriateness for a particular problem.

Multiple Discriminant AnalysisMultiple Discriminant Analysis Multiple Discriminant AnalysisMultiple Discriminant Analysis

Page 3: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

LEARNING OBJECTIVES continued . . . LEARNING OBJECTIVES continued . . . Upon completing this chapter, you should be able to do Upon completing this chapter, you should be able to do

the following:the following:• Describe the two computation approaches for Describe the two computation approaches for

discriminant analysis and the method for assessing discriminant analysis and the method for assessing overall model fit.overall model fit.

• Explain what a classification matrix is and how to Explain what a classification matrix is and how to develop one, and describe the ways to evaluate the develop one, and describe the ways to evaluate the predictive accuracy of the discriminant function.predictive accuracy of the discriminant function.

• Tell how to identify independent variables with Tell how to identify independent variables with discriminatory power.discriminatory power.

• Justify the use of a split-sample approach for validation.Justify the use of a split-sample approach for validation.

Multiple Discriminant AnalysisMultiple Discriminant Analysis Multiple Discriminant AnalysisMultiple Discriminant Analysis

Page 4: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Multiple discriminant analysis . . . is an appropriate technique when the dependent variable is categorical (nominal or nonmetric) and the independent variables are metric. The single dependent variable can have two, three or more categories.

Discriminant Analysis Discriminant Analysis DefinedDefined

Examples:• Gender – Male vs. Female• Heavy Users vs. Light Users• Purchasers vs. Non-purchasers• Good Credit Risk vs. Poor Credit Risk• Member vs. Non-Member• Attorney, Physician or Professor

Page 5: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

KitchenAid Survey Results for the KitchenAid Survey Results for the Evaluation* of a New Consumer Evaluation* of a New Consumer

ProductProductXX33

StyleStyle

Group 1 Would purchase 1 8 9 6

2 6 7 53 10 6 34 9 4 45 4 8 2

Group Mean 7.4 6.8 4.0 Group 2 Would not purchase 6 5 4 7

7 3 7 28 4 5 59 2 4 3

10 2 2 2Group Mean 3.2 4.4 3.8

Difference between group means 4.2 2.4 0.2

Purchase IntentionPurchase Intention Subject Subject NumberNumber

XX11 DurabilitDurabilit

yy

XX22 PerformancePerformance

**Evaluations made on a 0 (very poor) to 10 (excellent) rating scale.

Page 6: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Discriminant Analysis Decision ProcessDiscriminant Analysis Decision Process

Stage 1: Objectives of Discriminant AnalysisStage 1: Objectives of Discriminant Analysis

Stage 2: Research Design for Discriminant AnalysisStage 2: Research Design for Discriminant Analysis

Stage 3: Assumptions of Discriminant AnalysisStage 3: Assumptions of Discriminant Analysis

Stage 4: Estimation of the Discriminant Model and Stage 4: Estimation of the Discriminant Model and Assessing Overall Fit Assessing Overall Fit

Stage 5: Interpretation of the ResultsStage 5: Interpretation of the Results

Stage 6: Validation of the ResultsStage 6: Validation of the Results

Page 7: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Stage 1: Objectives of Discriminant Stage 1: Objectives of Discriminant AnalysisAnalysis

1. Determine if statistically significant differences exist between the two (or more) a priori defined groups.

2. Identify the relative importance of each of the independent variables in predicting group membership.

3. Establish the number and composition of the dimensions of discrimination between groups formed from the set of independent variables. That is, when there are more than two groups, you should examine and "name" each significant discriminant function. The number of significant functions determines the "dimensions“ / discriminant functions and what they represent in distinguishing the groups.

4. Develop procedures for classifying objects (individuals, firms, products, etc.) into groups, and then examining the predictive accuracy (hit ratio) of the discriminant function to see if it is acceptable (> 25% increase).

Page 8: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

• Selection of dependent and Selection of dependent and independent variables.independent variables.

• Sample size (total & per variable).Sample size (total & per variable).

• Sample division for validation.Sample division for validation.

Stage 2: Research Design for Discriminant Stage 2: Research Design for Discriminant AnalysisAnalysis

Page 9: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Converting Metric Variables to Converting Metric Variables to NonmetricNonmetric

• Most common approachMost common approach = to use the metric scale = to use the metric scale responses to develop nonmetric categories. For responses to develop nonmetric categories. For example, use a question asking the typical number example, use a question asking the typical number of soft drinks consumed per day and develop a of soft drinks consumed per day and develop a three-category variable of 0 drinks for non-users, 1 three-category variable of 0 drinks for non-users, 1 – 5 for light users, and 5 or more for heavy users. – 5 for light users, and 5 or more for heavy users.

• Polar extremes approachPolar extremes approach = compares only the = compares only the extreme two groups and excludes the middle extreme two groups and excludes the middle group(s).group(s).

Page 10: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Rules of Thumb 5–1 Rules of Thumb 5–1

Discriminant Analysis DesignDiscriminant Analysis Design

• The dependent variable must be nonmetric, representing groups of objects that are expected to differ on the independent variables.

• Choose a dependent variable that: best represents group differences of interest, defines groups that are substantially different, and minimizes the number of categories while still meeting the

research objectives.

• In converting metric variables to a nonmetric scale for use as the dependent variable, consider using extreme groups to maximize the group differences.

• Independent variables must identify differences between at least two groups to be of any use in discriminant analysis.

Page 11: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Rules of Thumb 5–1 continued . . . Rules of Thumb 5–1 continued . . .

• The sample size must be large enough to: have at least one more observation per group than the number of

independent variables, but striving for at least 20 cases per group. have 20 cases per independent variable, with a minimum

recommended level of 5 observations per variable. have at least one more observation per group than the number of

independent variables, but striving for at least 20 cases per group. have a large enough sample to divide it into an estimation and holdout

sample, each meeting the above requirements.

• Assess the equality of covariance matrices with the Box’s M test, but apply a conservative significance level of .01.

• Examine the independent variables for univariate normality.

• Multicollinearity among the independent variables can markedly reduce the estimated impact of independent variables in the derived discriminant function(s), particularly if a stepwise estimation process is used.

Page 12: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Stage 3: Assumptions of Discriminant AnalysisStage 3: Assumptions of Discriminant Analysis

Key AssumptionsKey Assumptions

• Multivariate normality of the Multivariate normality of the

independent variables.independent variables.

• Equal variance and covariance Equal variance and covariance

for the groups.for the groups.

Page 13: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Other AssumptionsOther Assumptions

• Minimal multicollinearity among Minimal multicollinearity among

independent variables.independent variables.

• Group sample sizes relatively equal.Group sample sizes relatively equal.

• Linear relationships.Linear relationships.

• Elimination of outliers.Elimination of outliers.

Stage 3: Assumptions of Discriminant AnalysisStage 3: Assumptions of Discriminant Analysis

Page 14: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Stage 4: Estimation of the Discriminant Stage 4: Estimation of the Discriminant Model and Assessing Overall FitModel and Assessing Overall Fit

Selecting An Estimation Method . . . Selecting An Estimation Method . . .

1.1. Simultaneous Estimation – all Simultaneous Estimation – all independent variables are considered independent variables are considered concurrently.concurrently.

2.2. Stepwise Estimation – independent Stepwise Estimation – independent variables are entered into the variables are entered into the discriminant function one at a timediscriminant function one at a time..

Page 15: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Estimating the Discriminant FunctionEstimating the Discriminant Function

The stepwise procedure begins with all The stepwise procedure begins with all

independent variables not in the model, and selects independent variables not in the model, and selects

variables for inclusion based on:variables for inclusion based on:

• Statistically significant differences across the Statistically significant differences across the

groups (.05 or less required for entry), andgroups (.05 or less required for entry), and

• The largest Mahalanobis distance (DThe largest Mahalanobis distance (D22) between ) between

the groups.the groups.

Page 16: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Assessing Overall Model Fit Assessing Overall Model Fit

• Calculating discriminant Z scores Calculating discriminant Z scores for each observation,for each observation,

• Evaluating group differences on the Evaluating group differences on the discriminant Z scores, anddiscriminant Z scores, and

• Assessing group membership Assessing group membership prediction accuracy.prediction accuracy.

Page 17: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Assessing Group Membership Assessing Group Membership Prediction AccuracyPrediction Accuracy

Major Considerations: Major Considerations:

• The statistical and practical rational for The statistical and practical rational for developing classification matrices,developing classification matrices,

• The cutting score determination, The cutting score determination,

• Construction of the classification matrices, Construction of the classification matrices, and and

• Standards for assessing classification Standards for assessing classification accuracy.accuracy.

Page 18: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Rules of Thumb 5–2 Rules of Thumb 5–2

Model Estimation and Model FitModel Estimation and Model Fit

• Although stepwise estimation may seem “optimal” by Although stepwise estimation may seem “optimal” by selecting the most parsimonious set of maximally selecting the most parsimonious set of maximally discriminating variables, beware of the impact of discriminating variables, beware of the impact of multicollinearity on the assessment of each variable’s multicollinearity on the assessment of each variable’s discriminatory power. discriminatory power.

• Overall model fit assesses the statistical significance Overall model fit assesses the statistical significance between groups on the discriminant Z score(s), but between groups on the discriminant Z score(s), but does not assess predictive accuracy. does not assess predictive accuracy.

• With more than two groups, do not confine your With more than two groups, do not confine your analysis to only the statistically significant discriminant analysis to only the statistically significant discriminant function(s), but consider if nonsignificant functions function(s), but consider if nonsignificant functions (with significance levels of up to .3) add explanatory (with significance levels of up to .3) add explanatory power.power.

Page 19: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

5-19

Calculating the Optimum Cutting Calculating the Optimum Cutting ScoreScore

Issues . . . Issues . . .

• Define the prior probabilities based either on Define the prior probabilities based either on

the relative sample sizes of the observed the relative sample sizes of the observed

groups or specified by the researcher (either groups or specified by the researcher (either

assumed to be equal or with values set by the assumed to be equal or with values set by the

researcher), andresearcher), and

• Calculate the optimum cutting score value as a Calculate the optimum cutting score value as a

weighted average based on the assumed sizes weighted average based on the assumed sizes

of the groups (derived from the sample sizes).of the groups (derived from the sample sizes).

Page 20: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Establishing Standards of Establishing Standards of Comparison for the Hit Comparison for the Hit

RatioRatio

Group sizes determine standards based on: Group sizes determine standards based on:

• Equal Group SizesEqual Group Sizes

• Unequal Group Sizes – two criteria:Unequal Group Sizes – two criteria:

o Maximum Chance CriterionMaximum Chance Criterion

o Proportional Chance CriterionProportional Chance Criterion

Page 21: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Classification MatrixClassification MatrixHBAT’s New Consumer ProductHBAT’s New Consumer Product

ActualActualGroupGroup

WouldWouldPurchasePurchase

WouldWouldNotNot

PurchasePurchaseActualActualTotalTotal

PercentPercentCorrectCorrect

ClassificationClassification

Predicted GroupPredicted Group

Percent Correctly Classified (hit ratio) =Percent Correctly Classified (hit ratio) =

100 x [(22 + 20)/50] = 84%100 x [(22 + 20)/50] = 84%

(1)(1) 22 22 3 3 25 2588%88%

(2)(2) 5 5 20 20 25 2580%80%PredictePredicte

d Totald Total 2727 23 23 50 50

Page 22: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Rules of Thumb 5–3Rules of Thumb 5–3

Assessing Predictive AccuracyAssessing Predictive Accuracy

• The classification matrix and hit ratio replace R2 as the measure of model fit:

assess the hit ratio both overall and by group..

If the estimation and analysis samples both exceed 100 cases and each group exceeds 20 cases, derive separate standards for each sample. If not, derive a single standard from the overall sample.

• Analyze the missclassified observations both graphically (territorial map) and empirically (Mahalanobis D2).

Page 23: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Rules of Thumb 5–3 Continued . . . Rules of Thumb 5–3 Continued . . .

Assessing Predictive AccuracyAssessing Predictive Accuracy

• There are multiple criteria for comparison to the hit ratio:

The maximum chance criterion for evaluating the hit ratio is the most conservative, giving the highest baseline value to exceed.

Be cautious in using the maximum chance criterion in situations with overall samples less than 10 and/or group sizes under 20.

The proportional chance criterion considers all groups in establishing the comparison standard and is the most popular.

The actual predictive accuracy (hit ratio) should exceed the any criterion value by at least 25%.

Page 24: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Stage 5: Interpretation of the Stage 5: Interpretation of the ResultsResults

Three Methods . . . Three Methods . . .

1.1. Standardized discriminant weights,Standardized discriminant weights,

2.2. Discriminant loadings (structure Discriminant loadings (structure correlations), andcorrelations), and

3.3. Partial F values.Partial F values.

Page 25: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Interpretation of the ResultsInterpretation of the Results

Two or More Functions . . .Two or More Functions . . .

1.1. Rotation of discriminant functionsRotation of discriminant functions

2.2. Potency indexPotency index

Page 26: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Graphical Display of Graphical Display of Discriminant Scores and Discriminant Scores and

LoadingsLoadings

• Territorial Map = most common method.Territorial Map = most common method.

• Vector Plot of Discriminant Loadings, Vector Plot of Discriminant Loadings,

preferably the rotated loadings = simplest preferably the rotated loadings = simplest

approach.approach.

Page 27: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Plotting Procedure for VectorsPlotting Procedure for Vectors

Three Steps . . .Three Steps . . .

1.1. Selecting variables,Selecting variables,

2.2. Stretching the vectors, andStretching the vectors, and

3.3. Plotting the group centroids.Plotting the group centroids.

Page 28: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Figure 5.9 Territoral Map For

Three Group Discriminant Analysis

Function 1

420-2-4-6

Fu

nct

ion

2

4

3

2

1

0

-1

-2

-3

-4

X1 - Customer Type

Group Centroids

Over 5 years

1 to 5 years

Less than 1 year

Over 5 years

1 to 5 yearsLess than 1 year

Territorial Map for Three Group Territorial Map for Three Group Discriminant AnalysisDiscriminant Analysis

Page 29: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Rules of Thumb 5–4 Rules of Thumb 5–4

Interpreting and Validating Discriminant FunctionsInterpreting and Validating Discriminant Functions • Discriminant loadings are the preferred method to Discriminant loadings are the preferred method to

assess the contribution of each variable to a assess the contribution of each variable to a discriminant function because they are: discriminant function because they are: a standardized measure of importance (ranging a standardized measure of importance (ranging

from 0 to 1).from 0 to 1).available for all independent variables whether available for all independent variables whether

used in the estimation process or not.used in the estimation process or not.unaffected by multicollinearity. unaffected by multicollinearity.

• Loadings exceeding ±.40 are considered substantive Loadings exceeding ±.40 are considered substantive for interpretation purposes.for interpretation purposes.

Page 30: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Rules of Thumb 5–4 Rules of Thumb 5–4 continued . . .continued . . .

Interpreting and Validating Discriminant FunctionsInterpreting and Validating Discriminant Functions • If there is more than one discriminant function, be sure If there is more than one discriminant function, be sure

to: to: use rotated loadings.use rotated loadings.assess each variable’s contribution across all the assess each variable’s contribution across all the

functions with the potency index. functions with the potency index. • The discriminant function must be validated either with The discriminant function must be validated either with

a holdout sample or one of the “Leave-one-out” a holdout sample or one of the “Leave-one-out” procedures.procedures.

Page 31: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Stage 6: Validation of the ResultsStage 6: Validation of the Results

• Utilizing a Holdout SampleUtilizing a Holdout Sample

• Cross-ValidationCross-Validation

Page 32: Multiple Discriminant Analysis. LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under

Discriminant Analysis Discriminant Analysis Learning Checkpoint Learning Checkpoint

1.1. When should multiple discriminant analysis When should multiple discriminant analysis be used?be used?

2.2. What are the major considerations in the What are the major considerations in the application of discriminant analysis?application of discriminant analysis?

3.3. Which measures are used to assess the Which measures are used to assess the validity of the discriminant function?validity of the discriminant function?

4.4. How should you identify variables that How should you identify variables that predict group membership well?predict group membership well?