mark hamner texas woman’s university department of mathematics and computer science
Post on 18-Jan-2016
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
Mark HamnerMark HamnerTexas Woman’s UniversityTexas Woman’s University
Department of Mathematics and Computer ScienceDepartment of Mathematics and Computer Science
Preet AhluwaliaPreet AhluwaliaCredit Risk Analyst-AmeriCreditCredit Risk Analyst-AmeriCredit
Predicting Real-Time Percent Enrollment IncreasePredicting Real-Time Percent Enrollment Increase ____________________________________
Texas Woman’s UniversityTexas Woman’s University
Denton . Dallas . Houston
Year 2005 FactsYear 2005 Facts
Total Enrollment – 11,344
Undergrad – 6,266
Graduate (Masters) – 4,369
Doctoral - 709
Campus Enrollment
Denton –9,157
Dallas – 921
Houston – 1,266
59 academic programs
(19 doctoral)
Female – 10,368
Male – 976
OutlineOutline
Problem Definition
Predicting Student Enrollment at Time ‘t’ Using Historical DataPredicting Student Enrollment at Time ‘t’ Using Historical Data
1. Enrollment Process - For Newly Enrolled
2. The predictive problem
3. Logistic Prediction Model
a. Data Issues and programming Solutions
4. Quadratic Prediction Model
a. Exploratory analysis to Identify Patterns
5. Combine for overall Prediction: Results
Enrollment
• Enrollment predictions can be broken into two fundamental pieces:
• The focus of this paper is the prediction of Newly Enrolled students.
NewlyEnrolledStudents
Re-Enrolling/Continuing Students
All Prospective Students
Applicants
FTIC Transfer Graduate Others
Admitted to TWU
New12th Day
Enrolled
New StudentsNew Students: Enrollment ProcessEnrollment Process
Idea Behind Enrollment Prediction at Time = t
Enrollment Prediction at Time ‘t’
Let Time = t denote the prediction date
For Applicants Before t , we will have data
For Applicants after time t (denoted by t’) , we will not have data
Total Enrollment = Enroll_t + Enroll_t’
Predict Predict
Time tBegin
PredictionFall 12th
Day
Weekly Partition of Prediction Interval
Predict Predict
Week 0 5 17
The prediction interval will be broken up into weekly Intervals
The diagram below illustrates prediction at Week = 5
At Week = 5 we have 35 more days of applicant data than at Week = 0
Total Enroll = Enroll_t + Enroll_t’
Enroll_t
Pt = {1, 2, …, Nt} -- Finite set of applicants at week = t
k Pt
Enrollment is a dichotomous response variable – yk
yk = 1 (student enrolled), yk = 0 (student did not enroll)
Enrollment of all applicants at week = t ,
1
t
t_N
kk
Enroll y
Model Dichotomous Variable
For each yk, k Pt
let θk represent the probability that yk = 1
There exists applicant information for each individual:
xk = (x1k, x2k, …, xpk) = (Distancek, SATk,…, Major_Ratiok)
Use Logistic Regression to model θk
Logistic Regression Model
• The probability of student k enrolling is
Lk = β0 + β1 Distancek + β2 SATk +…+ βp Major_Ratiok
( )1
k
k
L
k k Le
e
x
These are predictor variables
Predict Enroll_tPredict Enroll_t
1
2
tN
y
y
y
Y
Estimated Enroll_t is …
1
2
1
( )
( )ˆ ( _ ) 1 ( ) 1 ( )
( )
t
t
N
t k kk
N
E y
E yT E Enroll t E
E y
Y x
Let Y be the random vector of responses:
Thus,
1
1
t
Y_N
kk
Enroll t y Note: 1 is a Nt x 1 vector of ones
Logistic Model
• Predictor variables: Distance, DOB, Major_Ratio, SAT_M, SAT_V, Gender, Personal, etc.
• What variables will get picked for model building?
Year Prior
Applicant
Data
Current Year
Prediction
Use SAS to create possibly significant variables and dummy code categorical variables
Example: Major_Ratio, Ethnic, etc.
Backward Selection
Slightly different variables are selected
for: FTIC, Transfer, and Graduate.
Programming and Variable SelectionProgramming and Variable Selection
Start SaturatedModel
DropPredictor
StopFitted Model
No
Yes
SAS Programming:Exploratory and Variable Creation
FTIC Variable Selection
Variable Name Variable Type Variable Description
Twelve Response 1 if enrolled; 0 otherwise
Distance♦ Explanatory Continuous variable
SAT_M, SAT_V, ACT Explanatory Continuous Variable; SAT Math score, SAT Verbal score, Act Score
Give ACT♦ Explanatory 1 if score provided; 0 otherwise
Program Ratio♦ Explanatory Continuous variable
Major Ratio♦ Explanatory Continuous variable
Date of Birth Explanatory Continuous variable
Gender♦ Explanatory 1 if female; 0 for male
Apply Early♦ Explanatory 1 if apply before January 1; 0 otherwise
E1, E2, E3, E4, E5, E6, E7
Explanatory Dummy variables for Ethnicity
Personal♦ Explanatory Discrete Variable; Number of key information available for a student
Case Study-Logistic Model Prediction
Applicant data for 2003 to predict 2004 FTIC by weekly time intervals
• The Logistic Model does not predict after week = t
Week Total Apply Predict Enroll_t Actual Enroll Off % Off0 1,877 608 578 30 1.6%1 1,896 615 584 31 1.6%2 1,930 623 594 29 1.5%3 1,951 632 606 26 1.3%4 1,975 638 613 25 1.3%5 1,994 644 620 24 1.2%6 2,005 647 623 24 1.2%7 2,026 655 634 21 1.0%8 2,039 659 638 21 1.0%9 2,058 665 647 18 0.9%
10 2,065 667 650 17 0.8%11 2,081 669 653 16 0.8%12 2,097 673 661 12 0.6%13 2,111 678 668 10 0.5%14 2,118 680 671 9 0.4%15 2,122 681 673 8 0.4%16 2,123 681 674 7 0.3%17 2,146 690 687 3 0.1%
FTIC 2004 Enrollment
Enrollment after Week = t
• Total Enrollment = Enroll_t + Enroll_t’
• At any week = t, we need to predict Enroll_t’
• Identify historical relationships that may be helpful
Applicant Versus Enrolled by Year
• Both applications and enrollment have been increasing
• Notice enrollment yield is decreasing
66.4%
48.1% 46.7%43.5%
41.7%39.4%
34.6%01,0002,000
3,0004,0005,0006,000
7,0008,0009,000
1999 2000 2001 2002 2003 2004 2005
Year
To
tal
30%35%40%45%50%55%60%65%70%75%80%
Per
cen
t
Apply Enroll %Enroll
Is the % increase in enrollment matching the % increase in apply?
Applicant Yield By Strata
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
1999 2000 2001 2002 2003 2004 2005
Year
% A
ppli
cant
s E
nrol
l
Graduate FTIC Transfer
Enrollment is yield from applicant data is decreasing for each strata
How does this affect yearly increase in enrollment?
Percent Increase Applicant Vs. Enrolled
• Applicant increase is not a viable indicator of enrollment increase
102.7%
6.9%
31.7%
16.4%9.2% 13.4%
46.8%
3.9%
22.7%
11.5%3.3%
-0.4%
-20.0%
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
2000 2001 2002 2003 2004 2005
Year
Per
cen
t
% Increase Apply
% Increase Enroll
• What patterns are reliable to model?
Cumulative FTIC Enrollment by Week
• Notice the parallel lines, which implies equal slopes!
• At any week = t, we can relate Enroll_tEnroll_t to Total EnrollmentTotal Enrollment (Week = 17)
• Thus, (Total Enroll – Enroll_t) should be very similar from year to year
0
100
200
300
400
500
600
700
800
900
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Week
Cu
mm
ula
tive
En
roll
1999 2000 2001 2002 2003 2004 2005
Relationship Between Enrollment & Total Enrollment
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Week
En
roll
_t'
• By definition, (Total Enroll – Enroll_t) = Enroll_t’
• Model Enroll_t’ and smooth out the consistent patterns by week
Enroll_t’ Model
2003 FTIC Enroll_t'
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Week
En
roll
_t'
• Use 2003 Enroll_t’ Model to predict Enroll_t’ for 2004
Estimate of Enroll_t’: (R2 = 0.9857)
2t'T̂ = 0.1961 week - 10.514 week + 130.7
Predict 2004 Enroll_t’
Week Total Apply Actual Enroll Actual Enroll_t' Enroll_t' Off0 1,877 578 109 131 221 1,896 584 103 120 172 1,930 594 93 110 173 1,951 606 81 101 204 1,975 613 74 92 185 1,994 620 67 83 166 2,005 623 64 75 117 2,026 634 53 67 148 2,039 638 49 59 109 2,058 647 40 52 1210 2,065 650 37 45 811 2,081 653 34 39 512 2,097 661 26 33 713 2,111 668 19 27 814 2,118 671 16 22 615 2,122 673 14 17 316 2,123 674 13 13 017 2,146 687 0 9 9
FTIC 2004 Enrollment Predict
Predict 2004 FTIC Total Enroll
Total Enrollment = Enroll_t + Enroll_t’ Note: 2004 FTIC Actual Total is 687
Week Total Apply Enroll_t Enroll_t' Total Off0 1,877 608 131 739 521 1,896 615 120 736 492 1,930 623 110 733 463 1,951 632 101 733 464 1,975 638 92 729 425 1,994 644 83 727 406 2,005 647 75 721 347 2,026 655 67 722 358 2,039 659 59 718 319 2,058 665 52 717 3010 2,065 667 45 712 2511 2,081 669 39 708 2112 2,097 673 33 706 1913 2,111 678 27 705 1814 2,118 680 22 702 1515 2,122 681 17 698 1116 2,123 681 13 694 717 2,146 690 9 698 11
2004 FTIC Predict
Predict 2005 FTIC Total Enroll
Total Enrollment = Enroll_t + Enroll_t’ Note: 2005 FTIC Actual Total is 765
Week Total Apply Enroll_t Enroll_t' Total Off0 2,328 668 109 777 121 2,357 675 100 775 102 2,390 687 92 779 143 2,409 690 84 774 94 2,432 696 76 772 75 2,444 697 69 766 16 2,480 707 61 768 37 2,497 712 55 767 28 2,521 716 48 764 -19 2,534 719 42 761 -410 2,549 722 36 758 -711 2,564 727 31 758 -712 2,583 732 26 757 -813 2,595 736 21 756 -914 2,606 739 16 755 -1015 2,611 740 12 752 -1316 2,617 742 8 750 -1517 2,652 755 5 760 -5
2005 FTIC Predict
- END -- END -
Thank you!
Any Questions?
top related