lecture 3: introduction to neural networks with...

28
UNIVERSITY OF JYVÄSKYLÄ Lecture 3: Introduction to Neural Networks with TensorFlow (Classification) by: Dr. Oleksiy Khriyenko IT Faculty University of Jyväskylä TIES4911 Deep-Learning for Cognitive Computing for Developers Spring 2020

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

Lecture 3: Introduction toNeural Networks with TensorFlow

(Classification)

by:Dr. Oleksiy Khriyenko

IT FacultyUniversity of Jyväskylä

TIES4911 Deep-Learning for Cognitive Computing for DevelopersSpring 2020

Page 2: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

I am grateful to all the creators/owners of the images that I found from Google and haveused in this presentation.

2

Acknowledgement

16/01/2020 TIES4911 – Lecture 3

Page 3: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

3

Classification

Classification – is the process of categorizinga group of objects…

Existing classifiers are:o Logistic Regression,o Support Vector Machines (SVM),o Decision Tree,o Naive Bayes,o Neural Networks,o …

In comparison to other classifiers, when the patterns get reallycomplex, neural nets start to outperform all of their competition.The main advantage of DNN is automated feature extraction(those that a not specified/set explicitly).

TIES4911 – Lecture 316/01/2020

Page 4: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

4

Relevant links:https://www.edureka.co/blog/perceptron-learning-algorithm/

TIES4911 – Lecture 316/01/2020

Page 5: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

5

Logistic Regression

TIES4911 – Lecture 316/01/2020

Page 6: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

6

scikit-learn - Machine Learning in Pythonhttps://scikit-learn.org/stable/Dataset:https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/datasets/Social_Network_Ads.csv

https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Code/Day%206%20Logistic%20Regression.md

import numpy as npimport matplotlib.pyplot as pltimport pandas as pd# from sklearn.cross_validation import train_test_split ### deprecated in new sklearn versionfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import confusion_matrix

# Data Preprocessingdataset = pd.read_csv('Social_Network_Ads.csv')X = dataset.iloc[:, [2, 3]].valuesY = dataset.iloc[:, 4].valuesX_train, X_test, Y_train, Y_test = train_test_split( X, Y,

test_size = 1/4, random_state = 0)sc = StandardScaler()X_train = sc.fit_transform(X_train)X_test = sc.transform(X_test)

# Fitting Logistic Regression Model to the training setclassifier = LogisticRegression()classifier.fit(X_train, Y_train)# Predecting the ResultY_pred = classifier.predict(X_test)

#Evaluationcm = confusion_matrix(y_test, y_pred)

scikit-learn: Logistic Regression

TIES4911 – Lecture 316/01/2020

Page 7: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

7

Linear Classifierbinary logistic regressionPerceptron as a Linear Classifier

https://www.edureka.co/blog/perceptron-learning-algorithm/

#import required libraryimport tensorflow as tf#input1, input2 and biastrain_in = [[1., 1.,1], [1., 0,1], [0, 1.,1], [0, 0,1]]#outputtrain_out = [[1.], [0], [0], [0]]#weight variable initialized with random values using random_normal()w = tf.Variable(tf.random_normal([3, 1], seed=12))#Placeholder for input and outputx = tf.placeholder(tf.float32,[None,3])y = tf.placeholder(tf.float32,[None,1])#calculate outputoutput = tf.nn.relu(tf.matmul(x, w))#Mean Squared Loss or Errorloss = tf.reduce_sum(tf.square(output - y))#Minimize loss using GradientDescentOptimizer with a learning rate of 0.01optimizer = tf.train.GradientDescentOptimizer(0.01)train = optimizer.minimize(loss)#Initialize all the global variablesinit = tf.global_variables_initializer()sess = tf.Session()sess.run(init)print("w (initial): ", w.eval(session=sess))#Compute output and cost w.r.t to input vectorfor i in range(1000):

sess.run(train, {x:train_in,y:train_out})cost = sess.run(loss,feed_dict={x:train_in,y:train_out})print('Epoch--',i,'--loss--',cost)print("w (trained): ", w.eval(session=sess))output_pred = tf.nn.relu(tf.matmul(train_in, w))print("output_pred: ", output_pred.eval(session=sess))

tf.sigmoid(x, name=None)tf.nn.relu(x, name=None)tf.tanh(x, name=None)

a*x1 + b*x2 = c

TIES4911 – Lecture 316/01/2020

Page 8: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

8

Classification on MNIST dataset

Each image is a 28x28 array, flattened out tobe a 1-d tensor of size 784MNIST.train: 55,000 examplesMNIST.validation: 5,000 examplesMNIST.test: 10,000 examples

MNIST Dataset: https://en.wikipedia.org/wiki/MNIST_database; http://yann.lecun.com/exdb/mnist/MNIST Dataset in csv format: https://pjreddie.com/projects/mnist-in-csv/

Model: take a weighted sum of the features and add a bias to get the logit. Convert the logitto a probability via the logistic-sigmoid function.

= ∗ += ( )

= _ ( , ) , where Y is a one-hot vector

TIES4911 – Lecture 316/01/2020

Page 9: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

9

Softmax and Cross Entropy

https://www.ritchieng.com/machine-learning/deep-learning/neural-nets/https://www.youtube.com/watch?v=vq2nnJ4g6N0https://www.youtube.com/watch?v=tRsSi_sqXjI

TIES4911 – Lecture 316/01/2020

Page 10: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

10

A single-layer neural network based classification of handwritten digits form MNIST dataset

import tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_data

#Import datamnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# Create the modelx = tf.placeholder(tf.float32, [None, 784])W = tf.Variable(tf.zeros([784, 10]))b = tf.Variable(tf.zeros([10]))y = tf.nn.softmax( tf.matmul(x, W) + b)

# Define loss and optimizery_ = tf.placeholder(tf.float32, [None, 10])cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=y))train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)sess = tf.InteractiveSession()tf.global_variables_initializer().run()

# Trainfor _ in range(1000):

batch_xs, batch_ys = mnist.train.next_batch(100)sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Test trained modelcorrect_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))print(’Accuracy: ’, sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

MNIST Dataset: https://en.wikipedia.org/wiki/MNIST_database; http://yann.lecun.com/exdb/mnist/MNIST Dataset in csv format: https://pjreddie.com/projects/mnist-in-csv/

Linear Classifier

TIES4911 – Lecture 316/01/2020

Page 11: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

Let’s go Deeper!!!11

Classification

Linear model cannot serve non linear classification problems…

TIES4911 – Lecture 316/01/2020

Page 12: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

12

Neural NetworksHigh dimensionality is a one of thebottlenecks for Machine Learning.Handling and processing highdimensional data (where input &output is quite large) becomes verycomplex and resource exhaustive.

Relevant links:https://www.edureka.co/blog/deep-learning-tutorial#deep-learning-use-case

Deep learning is capable of handling thehigh dimensional data being efficient infocusing on the right features on its own(feature extraction process).

TIES4911 – Lecture 316/01/2020

Page 13: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

Relevant links:http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/https://www.youtube.com/watch?v=BR9h47Jtqyw

13

Neural NetworksNeural Network change an original representation of data into the mostappropriate representation to solve the problems …

Play with ConvnetJS demo: toy 2d classification with 2-layer neural networkhttp://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

TensorFlow Playground - neural network right in your browser: http://playground.tensorflow.org/https://cloud.google.com/blog/big-data/2016/07/understanding-neural-networks-with-tensorflow-playground

ConvNetJS (by Andrej Karpathy) is a Javascriptlibrary for training Deep Learning models (NeuralNetworks) entirely in your browser.http://cs.stanford.edu/people/karpathy/convnetjs/index.html

TIES4911 – Lecture 316/01/2020

Page 14: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

14

Neural NetworksOne of the most exciting features of deep learning is its performance in feature learning;the algorithms perform particularly well in being able to detect features from raw data.

Relevant links:https://udarajay.com/applied-machine-learning-the-less-confusing-guide/https://www.youtube.com/watch?v=aircAruvnKk

TIES4911 – Lecture 316/01/2020

Page 15: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

15

Neural NetworksForward propagation is a neural net'sway of classifying a set of inputs (wheninput data goes through the nodes andtheir activation functions and finally formconfidence levels for the nodes of outputlayer).

To train the net, the output from forward prop iscompared to the output that is known to be correct,and the cost is the difference of the two. The point oftraining is to make that cost as small as possible,across millions of training examples. To do this, thenet tweaks the weights and biases step by step untilthe prediction closely matches the correct output.

Relevant links:https://www.edureka.co/blog/backpropagation/https://en.wikipedia.org/wiki/Backpropagationhttps://www.youtube.com/watch?v=IHZwWFHWa-whttps://www.youtube.com/watch?v=Ilg3gGewQ5U

One way to train the model is called as Backpropagation -algorithm that looks for the minimum value of the error function inweight space using a technique called the delta rule or gradientdescent. The weights that minimize the error function is thenconsidered to be a solution to the learning problem.

TIES4911 – Lecture 316/01/2020

Page 16: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

16

Multi Layer Perceptron (MLP) Classification of handwritten digits form MNIST dataset

import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as np

# Import MNIST datafrom tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets("MNIST_data", one_hot=True)print('Test shape:',mnist.test.images.shape)print('Train shape:',mnist.train.images.shape)

# Parameterslearning_rate = 0.001training_epochs = 5batch_size = 100display_step = 1

# Network Parametersn_hidden_1 = 256 # 1st layer number of featuresn_hidden_2 = 256 # 2nd layer number of featuresn_input = 784 # MNIST data input (img shape: 28*28)n_classes = 10 # MNIST total classes (0-9 digits)

x = tf.placeholder("float", [None, n_input])y = tf.placeholder("float", [None, n_classes])

Non Linear Classification with DNN

TIES4911 – Lecture 316/01/2020

Page 17: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

17

Multi Layer Perceptron (MLP) Classification of handwritten digits form MNIST dataset

# Create modeldef multilayer_perceptron(x, weights, biases):

# Use tf.matmul instead of "*" because tf.matmul can change it's dimensions on the fly (broadcast)print( 'x:', x.get_shape(), 'W1:', weights['h1'].get_shape(), 'b1:', biases['b1'].get_shape())# Hidden layer with RELU activationlayer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1']) #(x*weights['h1']) + biases['b1']layer_1 = tf.nn.relu(layer_1)# Hidden layer with RELU activationprint( 'layer_1:', layer_1.get_shape(), 'W2:', weights['h2'].get_shape(), 'b2:', biases['b2'].get_shape())layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) # (layer_1 * weights['h2']) + biases['b2']layer_2 = tf.nn.relu(layer_2)# Output layer with linear activationprint( 'layer_2:', layer_2.get_shape(), 'W3:', weights['out'].get_shape(), 'b3:', biases['out'].get_shape())out_layer = tf.matmul(layer_2, weights['out']) + biases['out'] # (layer_2 * weights['out']) + biases['out']print('out_layer:',out_layer.get_shape())

return out_layer

# Store layers weight & biasweights = {

'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])), #784x256'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])), #256x256'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes])) #256x10

}biases = {

'b1': tf.Variable(tf.random_normal([n_hidden_1])), #256x1'b2': tf.Variable(tf.random_normal([n_hidden_2])), #256x1'out': tf.Variable(tf.random_normal([n_classes])) #10x1

}

Non Linear Classification with DNN

TIES4911 – Lecture 316/01/2020

Page 18: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

18

Multi Layer Perceptron (MLP) Classification of handwritten digits form MNIST dataset

# Construct modelpred = multilayer_perceptron(x, weights, biases)# Cross entropy loss functioncost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=pred))# On this case we choose the AdamOptimizeroptimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)# Initializing the variablesinit = tf.global_variables_initializer()# Launch the graphwith tf.Session() as sess:

sess.run(init)

# Training cyclefor epoch in range(training_epochs):

avg_cost = 0.total_batch = int(mnist.train.num_examples/batch_size)# Loop over all batchesfor i in range(total_batch):

batch_x, batch_y = mnist.train.next_batch(batch_size)# Run optimization op (backprop) and cost op (to get loss value)_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})# Compute average lossavg_cost += c / total_batch

# Display logs per epoch stepif epoch % display_step == 0:

print ("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))print("Optimization Finished!")

Non Linear Classification with DNN

TIES4911 – Lecture 316/01/2020

Page 19: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

19

Multi Layer Perceptron (MLP) Classification of handwritten digits form MNIST dataset

# Test modelcorrect_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))# Calculate accuracyaccuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))# To keep sizes compatible with modelprint ("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))

print("Test sample:")#Get 28x28 imagesample_1 = mnist.test.images[47].reshape(28,28)# Get corresponding integer label from one-hot encoded datasample_label_1 = np.where(mnist.test.labels[47] == 1)[0][0]# Plot sampleplt.imshow(sample_1, cmap='Greys')plt.title('label = {}'.format(sample_label_1))plt.show()x_test = mnist.test.images[47].reshape(1,784)predicted_value = tf.argmax(pred, 1)print("is classified as: {}".format(predicted_value.eval(session=sess, feed_dict={x: x_test})))

Fashion-MNIST: a Novel Image Dataset for Benchmarking MachineLearning Algorithms. Han Xiao, Kashif Rasul, Roland Vollgrafhttps://arxiv.org/abs/1708.07747https://github.com/zalandoresearch/fashion-mnisthttps://medium.com/tensorist/classifying-fashion-articles-using-tensorflow-fashion-mnist-f22e8a04728a

Non Linear Classification with DNN

TIES4911 – Lecture 316/01/2020

Page 20: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

20

Iris flower datasetIris flower dataset (Ronald Fisher's Iris data set ): https://en.wikipedia.org/wiki/Iris_flower_data_setThe dataset contains a set of 50 records under 5 attributes - Petal Length , Petal Width , Sepal Length , Sepal Width andClass. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor).https://gist.github.com/curran/a08a1080b88344b0c8a7

Relevant links:https://www.edureka.co/blog/deep-learning-tutorial#deep-learning-use-case

TIES4911 – Lecture 316/01/2020

Page 21: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

21

ClassificationNeural Network Classifier with TensorFlow Estimatorhttps://www.tensorflow.org/api_docs/python/tf/estimator

import osfrom six.moves.urllib.request import urlopenimport numpy as npimport tensorflow as tf# Data setsIRIS_TRAINING = "iris_training.csv"IRIS_TRAINING_URL =

"http://download.tensorflow.org/data/iris_training.csv"IRIS_TEST = "iris_test.csv"IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"def main():

# If the training and test sets aren't stored locally, download them.if not os.path.exists(IRIS_TRAINING):

raw = urlopen(IRIS_TRAINING_URL).read()with open(IRIS_TRAINING, "wb") as f: f.write(raw)

if not os.path.exists(IRIS_TEST):raw = urlopen(IRIS_TEST_URL).read()with open(IRIS_TEST, "wb") as f: f.write(raw)

# Load datasets.training_set = tf.contrib.learn.datasets.base.load_csv_with_header(

filename=IRIS_TRAINING, target_dtype=np.int,features_dtype=np.float32)

test_set = tf.contrib.learn.datasets.base.load_csv_with_header(filename=IRIS_TEST, target_dtype=np.int, features_dtype=np.float32)

# Specify that all features have real-value datafeature_columns = [tf.feature_column.numeric_column("x", shape=[4])]# Build 3 layer DNN with 10, 20, 10 units respectively.classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,

hidden_units=[10, 20, 10], n_classes=3, model_dir="")

# Define the training inputstrain_input_fn = tf.estimator.inputs.numpy_input_fn(

x={"x": np.array(training_set.data)},y=np.array(training_set.target), num_epochs=None,shuffle=True)

# Train model.classifier.train(input_fn=train_input_fn, steps=2000)# Define the test inputstest_input_fn = tf.estimator.inputs.numpy_input_fn(

x={"x": np.array(test_set.data)},y=np.array(test_set.target), num_epochs=1, shuffle=False)

# Evaluate accuracy.accuracy_score =

classifier.evaluate(input_fn=test_input_fn)["accuracy"]print("\nTest Accuracy: {0:f}\n".format(accuracy_score))# Classify two new flower samples.new_samples = np.array(

[[6.4, 3.2, 4.5, 1.5],[5.8, 3.1, 5.0, 1.7]], dtype=np.float32)

predict_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": new_samples}, num_epochs=1, shuffle=False)

predictions = list(classifier.predict(input_fn=predict_input_fn))predicted_classes = [p["classes"][0] for p in predictions]print("New Samples, Class Predictions: {}\n”

.format(predicted_classes))if __name__ == "__main__":

main()

TIES4911 – Lecture 316/01/2020

Page 22: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

22

OverfittingOverfitting is "the production of ananalysis that corresponds too closely orexactly to a particular set of data, and maytherefore fail to fit additional data or predictfuture observations reliably“

Relevant links:https://en.wikipedia.org/wiki/Overfitting

The most popular regularization technique for deep neural networks is Dropout. It isused to prevent overfitting via temporarily “dropping”/disabling a neuron withprobability p (a hyperparameter called the dropout-rate that is typically a numberaround 0.5) at each iteration during the training phase.

§ The dropped-out neurons are resampled with probability p at every training step (adropped out neuron at one step can be active at the next one). (Dropout is not appliedduring test time after the network is trained).

§ Dropout can be applied to input or hidden layer nodes but not the output nodes.§ Applying dropout regularization, it is recommended to increase a number of

training rounds.

Reason. Dropout forces every neuron to be able to operate independently andprevents the network to be too dependent on a small number of neurons.

We may face a case, when training loss keeps going down but the validation lossstarts increasing after certain epoch. It is a sign of overfitting. It tells us that ourmodel is memorizing the training data, but it’s failing to generalize to new instances.

TIES4911 – Lecture 316/01/2020

Page 23: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

23

Multi Layer Perceptron (MLP) Classification on MNIST dataset with Keras# imports for array-handlingimport numpy as np# let's keep our keras backend tensorflow quietimport osos.environ['TF_CPP_MIN_LOG_LEVEL']='3'# for testing on CPU#os.environ['CUDA_VISIBLE_DEVICES'] = ''

# keras imports for the dataset and building our neural networkfrom keras.datasets import mnistfrom keras.models import Sequential, load_modelfrom keras.layers.core import Dense, Dropout, Activationfrom keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# let's print the shape before we reshape and normalizeprint("X_train shape", X_train.shape)print("y_train shape", y_train.shape)print("X_test shape", X_test.shape)print("y_test shape", y_test.shape)

# building the input vector from the 28x28 pixelsX_train = X_train.reshape(60000, 784)X_test = X_test.reshape(10000, 784)X_train = X_train.astype('float32')X_test = X_test.astype('float32')

# normalizing the data to help with the trainingX_train /= 255X_test /= 255

# print the final input shape ready for trainingprint("Train matrix shape", X_train.shape)print("Test matrix shape", X_test.shape)print(np.unique(y_train, return_counts=True))

# one-hot encoding using keras' numpy-related utilitiesn_classes = 10print("Shape before one-hot encoding: ", y_train.shape)Y_train = np_utils.to_categorical(y_train, n_classes)Y_test = np_utils.to_categorical(y_test, n_classes)print("Shape after one-hot encoding: ", Y_train.shape)

# building a linear stack of layers with the sequential modelmodel = Sequential()model.add(Dense(512, input_shape=(784,)))model.add(Activation('relu'))model.add(Dropout(0.2))model.add(Dense(512))model.add(Activation('relu'))model.add(Dropout(0.2))model.add(Dense(10))model.add(Activation('softmax'))

# compiling the sequential modelmodel.compile(loss='categorical_crossentropy',metrics=['accuracy'], optimizer='adam')

Non Linear Classification with DNN

TIES4911 – Lecture 316/01/2020

Page 24: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

Relevant links:https://nextjournal.com/a/17592186058848https://cambridgespark.com/content/tutorials/deep-learning-for-complete-beginners-recognising-handwritten-digits/index.html

24

Multi Layer Perceptron (MLP) Classification on MNIST dataset with Keras# training the model and saving metrics in historyhistory = model.fit(X_train, Y_train, batch_size=128, epochs=8, verbose=2, validation_data=(X_test, Y_test))

# saving the modelsave_dir = "results/"model_name = 'keras_mnist.h5'model_path = os.path.join(save_dir, model_name)model.save(model_path)print('Saved trained model at %s ' % model_path)

#evaluation and predictionmnist_model = load_model('results/keras_mnist.h5')loss_and_metrics = mnist_model.evaluate(X_test, Y_test, verbose=2)

print("Test Loss", loss_and_metrics[0])print("Test Accuracy", loss_and_metrics[1])

# load the model and create predictions on the test setmnist_model = load_model('results/keras_mnist.h5')predicted_classes = mnist_model.predict_classes(X_test)

# see which we predicted correctly and which notcorrect_indices = np.nonzero(predicted_classes == y_test)[0]incorrect_indices = np.nonzero(predicted_classes != y_test)[0]print()print(len(correct_indices)," classified correctly")print(len(incorrect_indices)," classified incorrectly")

Keras

TIES4911 – Lecture 316/01/2020

Page 25: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

x_train = x_train.reshape(60000, 784)x_test = x_test.reshape(10000, 784)x_train = x_train.astype('float32')x_test = x_test.astype('float32')x_train /= 255x_test /= 255print(x_train.shape[0], 'train samples')print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matricesy_train = keras.utils.to_categorical(y_train, num_classes)y_test = keras.utils.to_categorical(y_test, num_classes)

def train_model(model):history = model.fit(x_train, y_train,

batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(x_test, y_test))

score = model.evaluate(x_test, y_test, verbose=0)print('Test loss:', score[0])print('Test accuracy:', score[1])

import kerasfrom keras.datasets import fashion_mnistfrom keras.models import Sequentialfrom keras.layers import Dense, Dropoutfrom keras.optimizers import RMSpropfrom keras.optimizers import SGDimport matplotlib.pyplot as plt%matplotlib inline

batch_size = 128num_classes = 10epochs = 20# the data, shuffled and split between train and test sets(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

print (y_train[1000])fig= plt.figure()plt.imshow(x_train[1000], cmap='gray')print (y_train[4000])fig= plt.figure()plt.imshow(x_train[4000], cmap='gray')

fig= plt.figure(figsize=(20,10))for i in range (1,41):

ax1 = fig.add_subplot(5,8,i)plt.xticks([], [])plt.yticks([], [])ax1.imshow(x_train[i], cmap='gray')

25

Different model performance comparison with Fashion-MNIST dataset…https://colab.research.google.com/github/rnditdev/PyData_London_2018_Computer_Vision/blob/master/1_History_of_neural_models.ipynb#scrollTo=3I-cTVJFp0KA

Keras

Label Description§ 0 T-shirt§ 1 Trouser§ 2 Pullover§ 3 Dress§ 4 Coat§ 5 Sandal§ 6 Shirt§ 7 Sneaker§ 8 Bag§ 9 Ankle boot

TIES4911 – Lecture 316/01/2020

Page 26: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

# Single layer with 1 unitmodel0 = Sequential()model0.add(Dense(1, activation='sigmoid', input_shape=(784,)))model0.add(Dense(num_classes, activation='softmax'))model0.summary()model0.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])train_model(model0)

26

Different model performance comparison with Fashion-MNIST dataset…

Keras

…Epoch 20/2060000/60000 [==============================] - 1s 16us/step - loss: 1.8718 - acc: 0.2270 - val_loss: 1.8672 - val_acc: 0.2349Test loss: 1.8672341890335082Test accuracy: 0.2349

# Single layer with 10 unitsmodel1 = Sequential()model1.add(Dense(10, activation='sigmoid', input_shape=(784,)))model1.add(Dense(num_classes, activation='softmax'))model1.summary()model1.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])train_model(model1)

…Epoch 20/2060000/60000 [==============================] - 1s 21us/step - loss: 0.7420 - acc: 0.7645 - val_loss: 0.7515 - val_acc: 0.7540Test loss: 0.7515270534515381Test accuracy: 0.754

TIES4911 – Lecture 316/01/2020

Page 27: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

# Single Layer with more unitsmodel2 = Sequential()model2.add(Dense(512, activation='sigmoid', input_shape=(784,)))model2.add(Dense(num_classes, activation='softmax'))model2.summary()model2.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])train_model(model2)

27

Different model performance comparison with Fashion-MNIST dataset…

Keras

…Epoch 20/2060000/60000 [==============================] - 5s 89us/step - loss: 0.2059 - acc: 0.9236 - val_loss: 0.3467 - val_acc: 0.8755Test loss: 0.3466758099913597Test accuracy: 0.8755

# Multiple Layers Without dropoutmodel3 = Sequential()model3.add(Dense(512, activation='relu', input_shape=(784,)))model3.add(Dense(512, activation='relu'))model3.add(Dense(num_classes, activation='softmax'))model3.summary()model3.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])train_model(model3)

…Epoch 20/2060000/60000 [==============================] - 8s 140us/step - loss: 0.2013 - acc: 0.9270 - val_loss: 0.4674 - val_acc: 0.8827Test loss: 0.4673706584453583Test accuracy: 0.8827

TIES4911 – Lecture 316/01/2020

Page 28: Lecture 3: Introduction to Neural Networks with TensorFlowusers.jyu.fi/~olkhriye/ties4911/lectures/Lecture03.pdf · Neural Networks One of the most exciting features of deep learning

UNIVERSITY OF JYVÄSKYLÄ

# Multiple Layers With dropoutmodel4 = Sequential()model4.add(Dense(512, activation='relu', input_shape=(784,)))model4.add(Dropout(0.2))model4.add(Dense(512, activation='relu'))model4.add(Dropout(0.2))model4.add(Dense(num_classes, activation='softmax'))model4.summary()model4.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])train_model(model4)

28

Different model performance comparison with Fashion-MNIST dataset…

Keras

…Epoch 20/2060000/60000 [==============================] - 9s 153us/step - loss: 0.2756 - acc: 0.9055 - val_loss: 0.4045 - val_acc: 0.8869Test loss: 0.40449330792427063Test accuracy: 0.8869

TIES4911 – Lecture 316/01/2020

§ Since TensorFlow 2.x includes Keras, there is no need to install it separately. Simply access if as:import tensorflow as tftf.keras. …

§ Difference between 'categorical_crossentropy‘ and 'sparse_categorical_crossentropy‘ losses:If use 'sparse_categorical_crossentropy‘, you do not need to transform the logist into categorical (via one hotrepresentation).