csrde discriminant analysis final

Using Discriminant Analysisto Identify Students for a Corequisite

College Algebra Course

Presented by: Dr. David G. Underwood

and Dr. Susan J. Underwood

• Arkansas set ACT Math at <19 for remediation

• 38.4% for all public colleges (two and four-year)

• 25.5% for the public, four-year institutions

• 38.3% for the public, four-year institutions **

*Reported by the Arkansas Department of Higher Education (ADHE)

**Adjusted with “selective” schools (University of Arkansas and Arkansas State University) removed

Arkansas Math Remediation Rate - Fall 2012 *

Context – Arkansas Tech University

First Generation College Students – Almost 60%

Some Type Financial Aid – 95.4%

Pell Grant Eligible – 61%

Graduation Rate – 40.2%

Math Remediation Rate Fall 2012 - 40%

ATU’s Approach to Math RemediationTwo-step process• MATH 0802 (Beginning Algebra) for students

scoring 16 or below on Math ACT • MATH 0903 (Intermediate Algebra) for students

earning 17 or 18 on Math ACTVery Traditional Approach• Students attend a mathematics class• Teachers provide lecture and homework• Students take traditional exams throughout

semester

Identify students scheduled for remediation

who could potentially be successful in college

algebra during their first semester if necessary

skills are provided in a corequisite.

Complete College America Challenge

Can a statistical model be developed, using variables that most public 4-year institutions in Arkansas will have in their database, that will identify students scoring less than 19 on the ACT Math section who are most likely to be successful in College Algebra if additional assistance is provided, at better than chance selection accuracy?

Primary Question

Stipulations:

Data must be “ambient” – data that are likely to be readily available to any state institution in Arkansas.

The statistical methodology should be something within the ability of most campuses to perform.

It should be relatively easy to interpret.

Discriminant Analysis

DA is used to classify cases into groups and to decide how to assign new cases to groups.

The interpretation is similar to multiple regression. The Canonical Correlation can be squared and interpreted similarly to R2 such that squaring it indicates the amount of variance accounted for by the model. The Canonical Discriminant Function Coefficients may be interpreted similarly to beta weights in multiple regression and so forth.

Data used were from the fall 2012 student body and included all students who were taking remedial mathematics for the first time during the fall 2012 semester. Success is defined as completing enough modules to enter college algebra with a grade equivalent to an “A”, “B”, or “C”.

Those classified as unsuccessful received a grade lower than “C”, or a “W”.

The grade of “W” was included for two reasons 1) those students did not successfully complete the class, and 2) although Analysis of Variance showed four significant differences between students who received a grade of “W” and those who received a failing grade on the variables in the analysis, in all cases the mean was lower for students receiving a “W” than those with an “F”.

GradeW N Mean SigHS_GPA 1.00 163 2.5847 .000

.00 609 2.8682HS_CLASS_RANK_PERCENTILE

1.00 156 35.93 .001.00

563 47.92ACT_COMP 1.00 150 18.22

.00542 18.56

ACT_MATH 1.00 150 17.05.00 542 17.08

Remedial Math 1.00=W .00=F

ACT_READGradeW N Mean Sig1.00.00

150542

19.7820.08

ACT_SCI 1.00 150 19.14.00 542 19.53

HS_CLASS_SIZE 1.00 156 162.01 .045

.00 562 179.98

HS_CLASS_RANK 1.00157 123.24 .036

.00 564 97.54

The total number used in the analysis was 640.

The groups were almost evenly split with 318 in the “unsuccessful” group and 322 in the “successful” group.

Variables Included In Analysis

ACT Composite ScoreACT Math ScoreACT Science ScoreACT Reading ScoreHigh School Grade Point AverageHigh School Class RankHigh School Class SizeHigh School Class Rank as a Percentile Score

Variables Found to Be Significant Predictors

ACT Comp ScoreACT Math ScoreACT Science ScoreHigh School Grade Point AverageHigh School Class RankHigh School Class Rank as a Percentile

Decision to Use Stepwise

1) The original analysis using all variables was found to violate the assumption of equality of covariance matrices, although large group sizes decrease the importance of the assumption being met.

2) several of the variables included in the full model, i.e., Class Rank and Class Rank as a Percentile, and ACT Math Score, ACT Science Score and ACT Comp Score, etc., could be highly correlated and therefore responsible for the violation of the assumption of equality of covariance matrices due to multicollinearity.

3) the stepwise function is designed to find the best set of predictors from among a larger number and use only those contributing a significant amount of unique variance to the model.

The stepwise procedure was used with an F to enter of .05 and an F to remove of .1 to identify only those variables adding a significant amount of explained variance to the model.

The stepwise method identified three significant predictors accounting for 22.8% of the explained variance. Box’s M was found to be insignificant, indicating the assumption of equality of covariance matrices was met.

The significant predictors identified from the Structure Matrix were High School Grade Point Average (.964), ACT Math Score (.239) and ACT Reading Score (.109).

Based on the Canonical Discriminant Function Coefficients, the discriminant function, used to compute a discriminant score, can be stated as:

D = (.223*ACT_Math) +(.-.056*ACT_Reading)+(2.534*HSGPA) -9.856

The discriminant score is important because, although the algorithm for computing the score was developed using this group of students, it is also the “best guess” for classifying students who might take this course in the future. The higher the discriminant score, in a positive direction, the more likely the student is to be successful. Conversely, the lower the discriminant score, in a negative direction, the less likely a student is to be successful. By knowing the likelihood of success or failure in advance, and the numbers of students in each category, one could decide which students are most likely to benefit from a corequisite, or, whether to suggest additional services such as tutoring, study groups, etc. to help with successful completion.

The model exceeds the commonly accepted level of providing at least a 25% improvement over chance assignment. Summing the squared prior probabilities provides a prior chance probability of 50%. Multiplying 50% by 1.25 provides a figure of 62.5%. An acceptable model should be equal to or greater than 62.5%. The cross validated classification model of 71.6% is above the commonly accepted threshold.

In this instance the Discriminant scores distribute themselves as an approximately standard normal distribution with a mean of 0 and a standard deviation of 1.

With this data, if a score of +1.5 or greater is selected, the model identifies 76 students. Of those, 70 were actually successful for a classification accuracy of 92.1%.

A similar analysis was conducted for students scoring 19 or above and entering directly into college algebra.

The total number used in the analysis was 1,874. The groups were unevenly split with 608 in the “unsuccessful” group and 1266 in the “successful” group.

The same 8 variables were allowed to enter the model and 7 were found to be significant predictors when the full model was used.

The stepwise method identified only 2 significant predictors accounting for 26.6% of the explained variance. Box’s M was found to be insignificant, indicating the assumption of equality of covariance matrices was met.

The significant predictors identified from the Structure Matrix were High School Grade Point Average (.986), and High School Class Size (.034).

Based on the Canonical Discriminant Function Coefficients, the discriminant function, used to compute a discriminant score, can be stated as:

D = (2.85*HSGPA) + (.001*HS_Class Size) – 8.042

The model exceeds the commonly accepted level of providing at least a 25% improvement over chance assignment. Summing the squared prior probabilities provides a prior chance probability of 56.2%. Multiplying 56.2% by 1.25 provides a figure of 70.25%. An acceptable model should be equal to or greater than 70.25%. The cross validated classification model of 76.5% is above the commonly accepted threshold.

In this case, the rationale would be to select those students with negative Discriminant Scores…Those least likely to be successful if some type of intervention is not applied.

The same method (using the discriminant score) can be used to determine the number of students to be selected as in the case with the remedial students.

Conclusions• Discriminant Analysis can be used to identify

students who are most likely to be successful or unsuccessful depending on which students one needs to identify.

• The classification is better than chance accuracy.

• The Discriminant Score can be used to determine how many students will be selected.

• Predictive variables of students with “W” grades may be worse than with “F” grades.

Reference• Burns, R., & Burns, R. (2008). Business

Research Methods and Statistics using SPSS. London: Sage Publications Ltd.

• Burns, R. & Burns, R. (2008). Chapter 25: Discriminant Analysis (WWW page). URL

http://www.uk.sagepub.com/burns/website%20material/Chapter%2025%20-%20Discriminant%20Analysis.pdf

csrde discriminant analysis final

Education

underwood arkansas

act math

remedial math redesign

discriminant analysis