homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/xu_rpt.docx · web viewgeorge bush...

National Election Prediction

Lei Xu

University of Wisconsin—Madison

ECE/CS 539 Introduction to Artificial Neuron Network

and Fuzzy Systems

ABSTRACTPresident election is always a hot topic for this country .And as

the new turn of election is coming ,I just curious about how the

individuals elect for their president ? Is there any regular we can

find to predict one’s choice based on his personality ?

For my project ,I’d like to construct a multi-layer perceptron

Artificial Neural Network using for make the prediction for an

individual’s vote result based on four main

features:race,age,education level and income. If we want to do

the prediction ,we need to do the classification task first and

train the artificial neuron network.The configuration of the

Neuron Network also plays a n essential role .In my

project ,discovering the most well-performed ANN structure is

also an key part. Finally ,if I can access to enough detailed

election data ,I will be able to predict the final result of the

election with the trained Artificial Neuron Network.

PROBLEM STATEMENTIt is extremely difficult, if not impossible, for a politician to

estimate if he/she can win in the coming election. A politician

may able to predict his ballot in each individual region.

However, the final election is always hard to foresee,which may

largely depends on your opponent’s campaign.the other

competitors in each event.

However in my project ,I simplify the problem into a

classification problem .Using 4 features of an individual voter to

judge one’s election choice.Using the Artificial Neuron Network

to process the voter data then generates the results.

BACKGROUND1.About the election

The United States presidential election of 2016, scheduled for

Tuesday, November 8, 2016, will be the 58th quadrennial U.S.

presidential election. Voters will select presidential electors who

in turn will elect a new president andvice president through the

Electoral College.

The series of presidential primary elections and caucuses is

taking place between February 1 and June 14, 2016, staggered

among the 50 states, the District of Columbia and U.S.

territories. This nominating process is also an indirect election,

where voters cast ballots for a slate of delegates to a political

party'snominating convention, who then in turn elect their

party's presidential nominee. The 2016 Republican National

Conventionwill take place from July 18 to July 21, 2016

in Cleveland, Ohio. The2016 Democratic National

Convention will take place from July 25 to July 28, 2016

inPhiladelphia, Pennsylvania.

Businessman and reality television personality Donald

Trump became thepresumptive nominee of the Republican

Party on May 3, 2016, after the suspensions of Ted

Cruz and John Kasich's campaigns, respectively, and his win in

theIndiana primary. He is expected to face the as of yet

undetermined nominee of the Democratic Party in the general

election, presumably either Hillary Clintonor Bernie Sanders.

2.Multilayer perceptron

A multi-layer perceptron is a

type of feed-forward neural

network of threshold units .

Multi-layer perceptrons are composed of an input layer of

neurons, successive layers of intermediate units, and a layer of

output neurons. The output of each layer is connected to the

input of the next layer. A synaptic weight is associated with the

each unique connection between neurons in neighboring layers.

Each neuron itself is associated with a hyper plane, and

classifies its input based on which side of the hyper plane the

input falls, this classification is then passed on to neurons in the

next layer. To be used for classification, the weights and

activation functions of each neuron must be calibrated so that

when feature vectors are inputted to the input layer of neurons,

the correct classification vector is outputted from the output

neurons.

3.Back-propagation

Backpropagation, is a common

method of training artificial

neural networks used in

conjunction with an optimization

method such as gradient descent.

The method calculates the

gradient of a loss function with respect to all the weights in the

network. The gradient is fed to the optimization method which

in turn uses it to update the weights, in an attempt to minimize

the loss function.

Backpropagation requires a known, desired output —an

individual’s election result ,for each input value—the four

dimension feature vector, in order to calculate the loss function

gradient. It is therefore usually considered to be a supervised

learning method,. It is a generalization of the delta rule to multi-

layered feedforward networks, made possible by using the chain

rule to iteratively compute gradients for each layer.

4.CROSS VALIDATION

Cross-validation is a model validation technique for assessing

how the results of a statistical analysis will generalize to an

independent data set. It is mainly used in settings where the goal

is prediction, and one wants to estimate how accurately a

predictive model will perform in practice. In a prediction

problem, a model is usually given a dataset of known data on

which training is run (training dataset), and a dataset

of unknown data (or first seen data) against which the model is

tested (testing dataset).

Itinvolves partitioning a sample of data into complementary sub

sets, training set, and testing set. To reduce variability, multiple

rounds of cross-validation are performed using different

partitions, and the validation results are averaged over the

rounds.

In summary, cross-validation combines (averages) measures of

fit (prediction error) to correct for the optimistic nature of

training error and derive a more accurate estimate of model

prediction performance.

IMPLEMENTATIONData

Currently,I decide to use the turnout data set from

https://vincentarelbundock.github.io/Rdatasets/doc/Zelig/

turnout.html ,which contains individual-level turnout data and

pools several American National Election Surveys conducted

during the 1992 presidential election year.

Example:race age educate income votewhite 60 14 3.3458 1white 51 10 1.8561 0white 24 12 0.6304 0

In the last column, result 1 represent the choice “Bill Clinton” and 0 represent the choice “George Bush”’

Feature vectorsThe features being analyzed in this project are Race Age

Education Background and Income(thousand per month).The

reason is simple that they are the data we are accessible from the

data set ,and also contributing to one’s final vote decision to

some extent.However,we cannot deny that there do exist some

other more essential factors play a more in one’s decision ,such

as political stand or occupation.At first I’ll try to use this data

set as samples ,future replacement will be made ,if more

reasonable data set can be found.

Each feature vector contains 4 features and 1 label,giving a total of 5 position .An example feature vector is as follow .

Race Age Educate Income Label

ModelFor this project ,the model I plan to use is a Multi-Layer

Perceptron（MLP）with 4 inputs, 1 output and 2 hidden layers

with 50 neurons each. In each neuron ,I’d like to use a sigmoidal

activation function with the alpha value 0.1 and the momentum

is set to 0.9 at first .

With the 2000 sample data ,I choose to use a n way cross-

validation method to modify the network.May choose 1500

samples to be the training data and the rest 500 sample as the

testing data. The MLP is applied to predict the vote choice of a

individual (Clinton or Bush) .The output label corresponding to

this as follow ： [1 0]——Clinton

[0 1]——Bush

RESULT

[10 10]; η:0.15; MSE=0.020619

[10 10]; η:0.01;MSE=0

[10 10]; η:0.10 MSE=0

[10 10]; η:0.15; MSE=0.020619

[10 10 10] η=0.15 MSE=0.1856

[5 5] η=0.15 MSE=0

o be the t The result above shows that with the different netrork

configuration.The performance of the ANN,measured by mean

square error(MSE) ,varies.Generally the simpler structure did a

better job.Because the complex structure of ANN may have a

over-fitting problem. In statistics and machine learning, one of

the most common tasks is to fit a "model" to a set of training

data, so as to be able to make reliable predictions on general

untrained data.

In overfitting, a statistical model describes random error or

noise instead of the underlying relationship. Overfitting occurs

when a model is excessively complex, such as having too

many parameters relative to the number of observations. A

model that has been overfit has poor predictive performance, as

it overreacts to minor fluctuations in the training data.

DISCUSSION AND FUTURE

My pr Due to the simplicity of the data ,it’s hard for me to

generalize a final conclusion of the prediction problem.Because

the types in the samples are limited ,many of them are very

similar but yield a different vote result .My project is almost

finish ,but it’s far way to perfect .The current data set is a little

simple ,and I would like to try with a more challenge and

complex data set and construct a more complex MLP in the

future.

REFERENCE

[1]King, Gary, Michael Tomz, Jason Wittenberg (2000).

“Making the Most of Statistical Analyses: Improving

Interpretation and Presentation,” American Journal of Political

Science, vol. 44, pp.341–355.

[2]http://heraqi.blogspot.com.eg/2015/11/mlp-neural-network-

with-backpropagation.html

[3]http://neuralnetworksanddeeplearning.com/chap3.html

[4]Professor Hu Lecture Slides

[5]Michael Nielsen “Neuron Network and Deep Learning”

http://neuralnetworksanddeeplearning.com/about.html

Appendix

Matlab code%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Multilayer Perceptron (MLP) Neural Network Function using MATLAB: %% An implementation for Multilayer Perceptron Feed Forward Fully %% Connected Neural Network with a sigmoid activation function. The %% training is done using the Backpropagation algorithm with options for %% Resilient Gradient Descent, Momentum Backpropagation, and Learning %% Rate Decrease. The training stops when the Mean Square Error (MSE) %% reaches zero or a predefined maximum number of epochs is reached. %% %% Four example data for training and testing are included with the %% project. They are generated by SharkTime Sharky Neural Network %% (http://sharktime.com/us_SharkyNeuralNetwork.html) %% %% Copyright (C) 9-2015 Hesham M. Eraqi. All rights reserved.

%% [email protected] %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Clear Variables, Close Current Figures, and Create Results Directory clc;clear all;close all;mkdir('Results//'); %Directory for Storing Results

%% Configurations/ParametersdataFileName = 'sharky.spirals.points'; %sharky.linear.points - sharky.circle.points - sharky.wave.points - sharky.spirals.pointsnbrOfNeuronsInEachHiddenLayer = [10]; %linear:[4] - circle:[10] - wave,spirals:[10 10]nbrOfOutUnits = 2;unipolarBipolarSelector = 0; %0 for Unipolar, -1 for Bipolar

learningRate = 0.15;nbrOfEpochs_max = 5000;

enable_resilient_gradient_descent = 1; %1 for enable, 0 for disablelearningRate_plus = 1.2;learningRate_negative = 0.5;0deltas_start = 0.9;deltas_min = 10^-6;deltas_max = 50;

enable_decrease_learningRate = 0; %1 for enable decreasing, 0 for disablelearningRate_decreaseValue = 0.0001;min_learningRate = 0.05;

enable_learningRate_momentum = 0; %1 for enable, 0 for disablemomentum_alpha = 0.05;

draw_each_nbrOfEpochs = 100;

%% Read Data

importedData = importdata(dataFileName, '\t', 6);Samples = importedData.data(:, 1:length(importedData.data(1,:))-1);TargetClasses = importedData.data(:, length(importedData.data(1,:)));TargetClasses = TargetClasses - min(TargetClasses);ActualClasses = -1*ones(size(TargetClasses));

%% Calculate Number of Input and Output NodesActivationsnbrOfInputNodes = length(Samples(1,:)); %=Dimention of Any Input Samples% nbrOfOutUnits = ceil(log2(length(unique(TargetClasses)))) + !; %Ceil(Log2( Number of Classes ))

nbrOfLayers = 2 + length(nbrOfNeuronsInEachHiddenLayer);nbrOfNodesPerLayer = [nbrOfInputNodes nbrOfNeuronsInEachHiddenLayer nbrOfOutUnits];

%% Adding the Bias as Nodes with a fixed Activation of 1nbrOfNodesPerLayer(1:end-1) = nbrOfNodesPerLayer(1:end-1) + 1;Samples = [ones(length(Samples(:,1)),1) Samples];

%% Calculate TargetOutputs %TODO needs to be general for any nbrOfOutUnitsTargetOutputs = zeros(length(TargetClasses), nbrOfOutUnits);for i=1:length(TargetClasses) if (TargetClasses(i) == 1) TargetOutputs(i,:) = [1 unipolarBipolarSelector]; else TargetOutputs(i,:) = [unipolarBipolarSelector 1]; endend

%% Initialize Random Wieghts MatricesWeights = cell(1, nbrOfLayers); %Weights connecting bias nodes with previous layer are useless, but to make code simpler and fasterDelta_Weights = cell(1, nbrOfLayers);ResilientDeltas = Delta_Weights; % Needed in case that Resilient Gradient Descent is usedfor i = 1:length(Weights)-1 Weights{i} = 2*rand(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1))-1; %RowIndex: From Node Number, ColumnIndex: To Node Number Weights{i}(:,1) = 0; %Bias nodes weights with previous layer (Redundant step) Delta_Weights{i} = zeros(nbrOfNodesPerLayer(i),

nbrOfNodesPerLayer(i+1)); ResilientDeltas{i} = deltas_start*ones(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1));endWeights{end} = ones(nbrOfNodesPerLayer(end), 1); %Virtual Weights for Output NodesOld_Delta_Weights_for_Momentum = Delta_Weights;Old_Delta_Weights_for_Resilient = Delta_Weights;

NodesActivations = cell(1, nbrOfLayers);for i = 1:length(NodesActivations) NodesActivations{i} = zeros(1, nbrOfNodesPerLayer(i));endNodesBackPropagatedErrors = NodesActivations; %Needed for Backpropagation Training Backward Pass

zeroRMSReached = 0;nbrOfEpochs_done = 0;

%% Iterating all the DataMSE = -1 * ones(1,nbrOfEpochs_max);for Epoch = 1:nbrOfEpochs_max

for Sample = 1:length(Samples(:,1)) %% Backpropagation Training %Forward Pass NodesActivations{1} = Samples(Sample,:); for Layer = 2:nbrOfLayers NodesActivations{Layer} = NodesActivations{Layer-1}*Weights{Layer-1}; NodesActivations{Layer} = Activation_func(NodesActivations{Layer}, unipolarBipolarSelector); if (Layer ~= nbrOfLayers) %Because bias nodes don't have weights connected to previous layer NodesActivations{Layer}(1) = 1; end end

% Backward Pass Errors Storage % (As gradient of the bias nodes are zeros, they won't contribute to previous layer errors nor delta_weights) NodesBackPropagatedErrors{nbrOfLayers} = TargetOutputs(Sample,:)-NodesActivations{nbrOfLayers}; for Layer = nbrOfLayers-1:-1:1

gradient = Activation_func_drev(NodesActivations{Layer+1}, unipolarBipolarSelector); for node=1:length(NodesBackPropagatedErrors{Layer}) % For all the Nodes in current Layer NodesBackPropagatedErrors{Layer}(node) = sum( NodesBackPropagatedErrors{Layer+1} .* gradient .* Weights{Layer}(node,:) ); end end

% Backward Pass Delta Weights Calculation (Before multiplying by learningRate) for Layer = nbrOfLayers:-1:2 derivative = Activation_func_drev(NodesActivations{Layer}, unipolarBipolarSelector); Delta_Weights{Layer-1} = Delta_Weights{Layer-1} + NodesActivations{Layer-1}' * (NodesBackPropagatedErrors{Layer} .* derivative); end end

%% Apply resilient gradient descent or/and momentum to the delta_weights if (enable_resilient_gradient_descent) % Handle Resilient Gradient Descent if (mod(Epoch,200)==0) %Reset Deltas for Layer = 1:nbrOfLayers ResilientDeltas{Layer} = learningRate*Delta_Weights{Layer}; end end for Layer = 1:nbrOfLayers-1 mult = Old_Delta_Weights_for_Resilient{Layer} .* Delta_Weights{Layer}; ResilientDeltas{Layer}(mult > 0) = ResilientDeltas{Layer}(mult > 0) * learningRate_plus; % Sign didn't change ResilientDeltas{Layer}(mult < 0) = ResilientDeltas{Layer}(mult < 0) * learningRate_negative; % Sign changed ResilientDeltas{Layer} = max(deltas_min, ResilientDeltas{Layer}); ResilientDeltas{Layer} = min(deltas_max, ResilientDeltas{Layer});

Old_Delta_Weights_for_Resilient{Layer} = Delta_Weights{Layer};

Delta_Weights{Layer} = sign(Delta_Weights{Layer}) .* ResilientDeltas{Layer}; end end if (enable_learningRate_momentum) %Apply Momentum for Layer = 1:nbrOfLayers Delta_Weights{Layer} = learningRate*Delta_Weights{Layer} + momentum_alpha*Old_Delta_Weights_for_Momentum{Layer}; end Old_Delta_Weights_for_Momentum = Delta_Weights; end if (~enable_learningRate_momentum && ~enable_resilient_gradient_descent) for Layer = 1:nbrOfLayers Delta_Weights{Layer} = learningRate * Delta_Weights{Layer}; end end

%% Backward Pass Weights Update for Layer = 1:nbrOfLayers-1 Weights{Layer} = Weights{Layer} + Delta_Weights{Layer}; end

% Resetting Delta_Weights to Zeros for Layer = 1:length(Delta_Weights) Delta_Weights{Layer} = 0 * Delta_Weights{Layer}; end

%% Decrease Learning Rate if (enable_decrease_learningRate) new_learningRate = learningRate - learningRate_decreaseValue; learningRate = max(min_learningRate, new_learningRate); end

%% Evaluation for Sample = 1:length(Samples(:,1)) outputs = EvaluateNetwork(Samples(Sample,:), NodesActivations, Weights, unipolarBipolarSelector); bound = (1+unipolarBipolarSelector)/2;

if (outputs(1) >= bound && outputs(2) < bound) %TODO: Not generic role for any number of output nodes ActualClasses(Sample) = 1; elseif (outputs(1) < bound && outputs(2) >= bound) ActualClasses(Sample) = 0; else if (outputs(1) >= outputs(2)) ActualClasses(Sample) = 1; else ActualClasses(Sample) = 0; end end end

MSE(Epoch) = sum((ActualClasses-TargetClasses).^2)/(length(Samples(:,1))); if (MSE(Epoch) == 0) zeroRMSReached = 1; end

%% Visualization if (zeroRMSReached || mod(Epoch,draw_each_nbrOfEpochs)==0) % Draw Mean Square Error subplot(2,1,2); MSE(MSE==-1) = []; plot([MSE(1:Epoch)]); ylim([-0.1 0.6]); title('Mean Square Error'); xlabel('Epochs'); ylabel('MSE'); grid on;

saveas(gcf, sprintf('Results//fig%i.png', Epoch),'jpg'); pause(0.05); end display([int2str(Epoch) ' Epochs done out of ' int2str(nbrOfEpochs_max) ' Epochs. MSE = ' num2str(MSE(Epoch)) ' Learning Rate = ' ... num2str(learningRate) '.']);

nbrOfEpochs_done = Epoch; if (zeroRMSReached) saveas(gcf, sprintf('Results//Final Result for %s.png', dataFileName),'jpg');

break; end

enddisplay(['Mean Square Error = ' num2str(MSE(nbrOfEpochs_done)) '.']);

homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/xu_rpt.docx · web viewgeorge bush...

Documents