2016 democrat primary: prediction of results for new york counties

16
2016 Democrat Primary: Prediction of results For New York Counties Lavneet Sidhu | Nikita Bali | Sanjita Jain | Subhasree Chatterjee

Upload: sanjita-jain

Post on 11-Feb-2017

109 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: 2016 Democrat Primary: Prediction of results For New York Counties

2016 Democrat Primary: Prediction of results For New York Counties

Lavneet Sidhu | Nikita Bali | Sanjita Jain | Subhasree Chatterjee

Page 2: 2016 Democrat Primary: Prediction of results For New York Counties

OBJECTIVE

Predicting the number of counties won by the Democrats in the primary US Presidential Election for the New York state based on demographic data of all other US counties

To find out if there is any pattern of how people vote based on their demographic information

Page 3: 2016 Democrat Primary: Prediction of results For New York Counties

DATA

29 states

1928

counties

33 Demographic

variables

Primary Results County Facts

1542Training set

counties80 percent

386Testing set counties

20 percent

Population %, Female %, Different ethinicity %

Educational background, income, Number of votes

Page 4: 2016 Democrat Primary: Prediction of results For New York Counties

Explanatory data analysis

Distribution of votes between Democrats

as per demographic information

The explanatory data analysis was done using python.

Correlation Matrix:1. The highest correlation is between the

population percentage where language other than English is spoken at home and Population that is either Hispanic or Latino.

2. There is also a very high correlation in population that is not born in the US and population where language other than English is spoken at home and also with population that is either Latino or Hispanic.

Page 5: 2016 Democrat Primary: Prediction of results For New York Counties

Explanatory Data Analysis

% African American vs Fraction of votes

% non English speaking vs. Fraction

of votes

% of females vs.

Fraction votes

% over 65 vs. Fraction votes

Page 6: 2016 Democrat Primary: Prediction of results For New York Counties

Logistic Regression ModelClinton

0

Sanders

1Winner

Full Model

Step AIC

Step BIC

AIC 1086.87 1068.88

1076.96

BIC 1268.46 1181.05

1157.08

AUC (training)

0.918 0.917 0.913

AUC (testing) 0.794 0.760 0.771

Model Selecti

on

Response Variable

Variable

Selection

Age, % females, % whites, % Afro-American, % native Indians, % Hispanic Latino, % foreign born, % education, % veterans, home ownership rate, median value of house, person per household, per capita income, % of Asian owned firms etc.

Page 7: 2016 Democrat Primary: Prediction of results For New York Counties

ROC and Misclassification Rate

Training ROC Testing ROC

Clinton

Sanders

Clinton 920 131Sanders

124 367Misclassification Rate: 0.165

Clinton

Sanders

Clinton 226 36Sanders

35 89Misclassification Rate: 0.184

Page 8: 2016 Democrat Primary: Prediction of results For New York Counties

Random ForestClassificati

on Type

1000 Trees

5 Variables tried at

each split

17.3% OOB

estimate of error rate

Clinton

Sander

Class.Error

Clinton 1169 136 0.104Sanders

197 426 0.316

Confusion Matrix

Importance

Page 9: 2016 Democrat Primary: Prediction of results For New York Counties

Principal Component Analysis Regression

The data is standardized to perform principal component analysis on the demographic data. It gives us 33 uncorrelated components. We can consider 8 of the 33 components for further analysis as they explain 80% of the variance in the data

Clinton

Sanders

Clinton 1129 176Sanders

256 367

Importance of

componentsROC

TestingAUC = 86%

Testing

Confusion Matrix

Testing

Error = 22%

Testing

Page 10: 2016 Democrat Primary: Prediction of results For New York Counties

Model Validation

Washington

39 counties

Hawaii5 counties

Alaska29 counties

0 39

C S

5 34

C S

0 5

Logistic Model

2 3

2 27

Actual

PredictedRandom

Forest

4 35

C S

0 5

1 28

PCA Regression

2 37

C S

0 5

0 29

0 29

Page 11: 2016 Democrat Primary: Prediction of results For New York Counties

Factor Analysis The purpose of factor analysis is find out some unobserved variables which

will be lower in number and uncorrelated in comparison to the observed variable.

By using those factors we should be able to differentiate the voting pattern for democrat candidates based on demographic data of the county.

We tried the factor analysis on the following levels:1. County demographic data2. State demographic data3. Winner wise demographic data

Page 12: 2016 Democrat Primary: Prediction of results For New York Counties

Factor Analysis(Cont’d) We got 2 factors for State and County Demographic data

1st factor describes ethnicity information. 2nd factor is based on population and industrial exposure.

State

County

All states seem to exhibit similar behavior except

Hawaii, Alaska & District of Columbia

All counties seem to exhibit similar behavior

Page 13: 2016 Democrat Primary: Prediction of results For New York Counties

Factor Analysis (Cont’d) We got 3 factors for winner based demographic data. • Factor 1 concentrates on the population and the median income of that county.• Factor 2 can be interpreted as the Hispanic and non-native American population. • Factor 3 can be interpreted as economic prosperity and white/black population of the county

Clinton gets majority of the votes from the counties where median income is higher and non-native and Hispanic Americans are more.

Page 14: 2016 Democrat Primary: Prediction of results For New York Counties

NEW YORK RESULTS

New York62 counties 13 4

9

C S

25

37

C S

Logistic Model

Actual

PredictedRandom

Forest

27

35

C S

PCA Regression

6 56

C S

Page 15: 2016 Democrat Primary: Prediction of results For New York Counties

CONCLUSION

Hillary Clinton seems to be favored in counties where:• Median Income is higher• Percentage of Hispanic, African American population is higher

People who vote Sanders are majority Whites Similar results were obtained from different modeling techniques

Page 16: 2016 Democrat Primary: Prediction of results For New York Counties

Thank You