introduction to neural networks in tensorflow

@nfmcclure

Introduction to Neural Networks with Tensorflow

Nick McClure

July 27th, 2016

Seattle, WA

@nfmcclure

Who am I?

• ddd

www.github.com/nfmcclure/tensorflow_cookbook

@nfmcclure

Why Does Any of This Matter?

88

19 152

@nfmcclure

Why Neural Networks?• The human brain can be considered

to be one of the best processors. (Estimated to contain ~1011

neurons.)

• Studies show that our brain can process the equivalent of ~20Mb/sec just through the optical nerves.

• If we can copy this design, maybe we can solve the “hard for a computer – easy for humans” problems.

• Speech recognition• Facial identification• Reading emotions• Recognizing images• Sentiment analysis• Driving a vehicle• Disease diagnosis

@nfmcclure

The Basic Unit: Operational Gate

• F(x,y) = x * y

• How can we change the inputs to decrease the output?

• The intuitive answer is to decrease 3 and/or decrease 4.

• Mathematically, the answer is to use the derivative…• We know if we increase the (3) input by 1, the output goes up

by a total of 4. (4*4=16)

*3

4

12

@nfmcclure

Multiple Gates• F(x, y, z) = (x + y) * z

• If G(a, b) = a + b, then F(x,y,z) = G(x, y) * z

• Or (change notation) F = G*z (we have solved this problem before)

• Mathematically, we will use the “chain rule” to get the derivatives:

+2

3

20

4

*

𝑑𝐹

𝑑𝑥=𝑑𝐹

𝑑𝐺∙𝑑𝐺

𝑑𝑥

𝑑𝐹

𝑑𝑥= 𝑧 ∙ 1

@nfmcclure

Loss Functions• You can’t learn unless you know how bad/good your past

results have been.

• This is designated by a ‘Loss function’.

• The idea is we have a labeled data set and we look at each data point. We run the data point forward through the network and then see if we need to increase or decrease the output.

• We then change the parameters according to the derivatives in the network.

• We loop through this many times until we reach the lowest error in our training.

@nfmcclure

Learning Rate• The learning rate controls how much of a change we make

to our model parameters.

@nfmcclure

Logistic Regression as a Neural Network• A neural network is very similar to what we have been

doing, except that we bound the outputs between 0 and 1… with a sigmoid function. (Called an activation function)

• The sigmoid function is nice, because it turns out that the derivative is related to itself:

• We can use this fact in our ‘chain rule’ for derivatives.

𝐹(𝑥, 𝑎, 𝑏) = 𝜎(𝑎𝑥 + 𝑏) σ(𝑥) =1

1 + 𝑒−𝑥

𝑑𝜎

𝑑𝑥= 𝜎(𝑥) ∙ (1 − 𝜎 𝑥 )

x

a*

b+ Output

x

1Output

a

b

OR

https://github.com/nfmcclure/tensorflow_cookbook/tree/master/03_Linear_Regression/08_Implementing_Logistic_Regression

@nfmcclure

Why Activation Functions?• Turns out, no matter how many additions and multiplications we pile

on, we will never be able to model non-linear outputs without having non-linear transformations.

• The sigmoid function is just one example of a gate function.

• There is also the Rectified Linear Unit (ReLU):

• Softplus:

• Leaky ReLU:

• Where 0<a<<1

σ(𝑥) =1

1 + 𝑒−𝑥

σ(𝑥) = max(𝑥, 0)

σ(𝑥) = ln(1 + 𝑒𝑥)

σ(𝑥) = max(𝑥, 𝑎𝑥)

@nfmcclure

Many Layers

• Neural networks can have many ‘hidden layers’.

• We can make them as deep as we want.

• Neural Networks can also have as many inputs or outputs as we would like as well.

Depth of Neural NetworkFully Connected Layers

@nfmcclure

What is Tensorflow?• An open source framework for creating

computational graphs.

• Computational graphs are actually extremely flexible.• Linear Regression (regular, multiple, logistic, lasso, ridge,

elasticnet,…)

• Support Vector Machines (any kernel)

• Nearest Neighbor methods

• Neural Networks

• ODE/PDE Solvers

@nfmcclure

Why Tensorflow?

• Compare to other open source deep learning frameworks

Popularity

Speed/Size

Caffe

Others

@nfmcclure

Why Tensorflow?

• Tensorflow is also very portable and flexible!• Virtually no code change to go from CPU-> GPU or…

@nfmcclure

How is the Tensorflow Community?• Very active github community

• In approximately 6 months: >5k commits, >250 contributors

• Over 2,000 questions tagged on stackoverflow (>50% answered within 24 hours)

@nfmcclure

How do Tensorflow Algorithms Work?

Computational Graph

Model Variables

Output

Loss Function

@nfmcclure

A One Hidden Layer NN in Tensorflow• Iris dataset: Predict Petal width from (Sepal length, Sepal

width, and petal length)

S.L.

S.W.

P.L.

bias

h1

h2

h3

h4

h5

Output

bias

15 weights + 5 biases 5 weights + 1 bias = 26 variables

Computational Graph

Placeholders

https://github.com/nfmcclure/tensorflow_cookbook/tree/master/06_Neural_Networks/04_Single_Hidden_Layer_Network

@nfmcclure

More in Tensorflow: Distributed

• Computational graphs can take very long to train.

• Quicker training across multiple machines.

• Just need a dictionary mapping jobs to network addresses:

job_spec = {“my_workers”: [“worker1.location.com:22”, “worker2.location.com:22”],

“other_workers”: [“gpu_worker.location.com:22”]}

tf.train.ClusterSpec(job_spec)

with tf.device(“/job:my_workers/task:0”):

# do task

with tf.device(“/job:other_workers/task:0”):

# do another task

• Scheduling and communication is done for you (in an efficient and greedy manner)

@nfmcclure

More in Tensorflow: GPU Capabilities

• Sometimes neural networks can have hundreds of millions of parameters to train.

• Graphic cards (GPUs) are built to handle matrix calculations very fast and efficiently.• It’s quite common to get 10x to 50x speedup when switching the

same code to a GPU

(1) Install GPU version of T.F.(2) Done.

As a side note, you can also tell T.F. to use specific devices.

/cpu:0

/gpu:0

/gpu:100

- The CPU of the machine.- The GPU of the machine.- The 100th GPU of the machine.

@nfmcclure

More in Tensorflow: Tensorboard

• A built in way to visualize the computational graph, accuracies, loss functions, gradients…

• (1) Specify a specific way to write and store summaries to a log file.

• (2) Point Tensorboard at logging directory, and navigate to 0.0.0.0:6006. (Can view during training).

tf.histogram_summary(‘variable_histogram’, variable_value)

tf.scalar_summary(‘loss_value’, loss)

merged = tf.merge_all_summaries()

my_writer = tf.train.SummaryWriter('my_logging_directory', sess.graph)

$tensorboard –logdir=my_logging_directory

@nfmcclure

More in Tensorflow: Tensorboard

Collapsable/Expandable Named Scopes!!

with tf.name_scope(‘hidden1’) as hidden_scope:

#Perform calculations.

@nfmcclure

More in Tensorflow: skflow• Easier scikit learn type implementation of common

algorithms.

• 3 layer NN:

• RNN:

• Easy to add custom layers by adding in layer functions from vanilla Tensorflow.

import tensorflow.contrib.learn as skflow

my_model = skflow.TensorFlowDNNClassifier(hidden_units=[N1, N2, N3], n_classes=n_out)

my_model.fit(x_data, y_target)

my_rnn = skflow.TensorFlowRNNClassifier(…)

@nfmcclure

What is a Convolutional Neural Network?• A CNN us a large pattern recognition neural

network that uses various parameter reducing techniques.

• E.g. Alexnet (2012 ImageNet Winner)

@nfmcclure

Reduction of Parameters

• CNNs use tricks to reduce the amount of parameters to fit.• Convolution layers: Assume weights are the same on

arrows in the same direction of a window we move across data.

w1

w2

w3There are only three weights in this layer, w1, w2, and w3

Note: Usually multiple convolutions are performed, creating ‘Layers’.

@nfmcclure

Other Tricks- Pooling

• Layer that is an aggregation across a window from prior layer.

• Types of pooling:• Max, min, mean, median, …

Aggregation over three nodes in prior layer. E.g. Max().

@nfmcclure

Other Tricks- Dropout• Dropout is a way to increase the training strength and regularize

the parameters.

• During training, we randomly select activation functions and make the output equal to zero.• Compare this to ‘zoneout’, which randomly replaced activation

functions with prior iteration value.

• This helps the network find multiple pathways for prediction. Because of this, the network will not rely on just one connection or edge for prediction.

• This also helps the training of other parameters in the network.

@nfmcclure

Larger Network Shorthand• Since neural networks can get quite large, we do

not want to write out all the connections nor nodes.

• Here’s how to convey the meaning in a shorter, less complicated way:

“Max of a set of features”E.g., Here, it appears to bea max of 2 features.

@nfmcclure

What is a Recurrent Neural Network?

• Recurrent Neural Networks (RNN) are networks that can ‘see’ outputs from prior layers. Very helpful for sequences.

• The most common usage is to predict the next letter from the prior letter. We train this network on large text samples.

U,V,W = Weight ParametersS = layerX = inputO = output

@nfmcclure

Regional Convolutional Networks• We can convolve a whole network over patches of the

image and see what regions score highly for a category. (Object Detection)

https://www.youtube.com/watch?v=lr1qVxhGUDw

@nfmcclure

Regional CNN + Recurrent NN = Image Captioning• We take the object detection output and feed it through a

recurrent neural network to generate text describing the image.

Microsoft Research 2015

@nfmcclure

Deep Dream (2015)• We can select specific nodes in an image

recognition pipeline and up weight the output…

https://www.youtube.com/watch?v=DgPaCWJL7XIhttp://www.fastcodesign.com/3057368/inside-googles-first-deepdream-art-show

@nfmcclure

Stylenet (2015)

• Train network on a “style image”. Try to reconstruct another image from that same pretrained network.

https://vimeo.com/139123754

Recoloring of black and white photos

@nfmcclure

CNN for Text (2015)• Zhang, X., et. al. released a paper (http://arxiv.org/abs/1509.01626) that

showed great results for treating strings as 1-D numerical vectors.

• Ways to get strings into numerical vectors:• Word level one-hot encoding.

• Word2Vec encoding or similar (GloVe or Doc2Vec or …)

• Character level one-hot encoding.

• Can’t resize vectors like pictures, so instead we pick a max length and pad many entries with zeros.

• Like many NN results, depend heavily on dataset and architecture.

• Paper shows good results for low # of classes (ratings or sentiment).

http://arxiv.org/abs/1509.01626

@nfmcclure

Parsey McParseface

• Google released a model called ‘Syntaxnet’ in May 2016, based on Tensorflow.

• Syntaxnet is arguably one of the best part-of-speech parsers available.

• They trained an English version, called ‘Parsey McParseface’.

• This is a free and open source model to use.

@nfmcclure

The Possibilities are Endless

• Predict age from facial photo - https://how-old.net

• Predict dog breed from photo - https://www.what-dog.net

• Inside a self driving car brain : https://www.youtube.com/watch?v=ZJMtDRbqH40

• Memory networks: https://arxiv.org/pdf/1410.3916.pdf• Frodo journeyed to Mount-Doom. Frodo dropped the ring there. Sauron died.

• Frodo went back to the Shire. Bilbo travelled to the Grey-havens.

• Where is the ring? A: Mount-Doom

• Where is Bilbo now? A: Grey-havens

• “Synthia”- Virtual driving school for self driving cars:• http://www.gizmag.com/synthia-dataset-self-driving-cars/43895/

• Solving hand-written chalk board problems in real time:• http://www.willforfang.com/computer-vision/2016/4/9/artificial-intelligence-for-

handwritten-mathematical-expression-evaluation

https://how-old.net/

https://www.what-dog.net/

https://www.youtube.com/watch?v=ZJMtDRbqH40

https://arxiv.org/pdf/1410.3916.pdf

http://www.gizmag.com/synthia-dataset-self-driving-cars/43895/

http://www.willforfang.com/computer-vision/2016/4/9/artificial-intelligence-for-handwritten-mathematical-expression-evaluation

@nfmcclure

<Insert Name Here>-Neural Net• Last Month (June-July 2016):

• YodaNN: http://arxiv.org/abs/1606.05487• CMS-RCNN: http://arxiv.org/abs/1606.05413• Zoneout RNN: https://arxiv.org/abs/1606.01305• Pixel CNN: http://arxiv.org/abs/1606.05328• Memory CNN: http://arxiv.org/abs/1606.05262• DISCO Nets: http://arxiv.org/abs/1606.02556

• This Year (2016):• Stochastic Depth Net: https://arxiv.org/abs/1603.09382• Squeeze Net: https://arxiv.org/abs/1602.07360• Binary Nets: http://arxiv.org/abs/1602.02830

• Last Year:• PoseNet: http://arxiv.org/abs/1505.07427• YOLO Nets: http://arxiv.org/abs/1506.02640• Style Net: http://arxiv.org/abs/1508.06576• Residual Net: https://arxiv.org/abs/1512.03385• And so many more…

• What’s Your Architecture?



https://arxiv.org/abs/1606.01305











@nfmcclure

How to Keep ML Current• Machine learning advances happen more and more

frequently.• Can’t rely on journals that take multiple years to publish.

• Twitter

• http://arxiv.org/

• Others?

http://arxiv.org/

@nfmcclure

More Resources• More on CNNs:

• https://www.reddit.com/r/MachineLearning/

• https://www.reddit.com/r/deepdream/

• http://karpathy.github.io/

• RNNs:

• http://colah.github.io/posts/2015-08-Understanding-LSTMs/

• Recent Research: (Jan 16-June 16)

• Residual Conv. Nets: https://www.youtube.com/watch?v=1PGLj-uKT1w

• Determine location of a random image: https://www.technologyreview.com/s/600889/google-unveils-neural-network-with-superhuman-ability-to-determine-the-location-of-almost/

• Stochastic Depth Networks: http://arxiv.org/abs/1603.09382

Pre-trained models: http://www.vlfeat.org/matconvnet/pretrained/

@nfmcclure

Questions?

• Slides:• http://www.slideshare.net/NicholasMcClure1/introduction-to-

neural-networks-in-tensorflow

• Code and more:• https://github.com/nfmcclure/tensorflow_cookbook

http://www.slideshare.net/NicholasMcClure1/introduction-to-neural-networks-in-tensorflow

https://github.com/nfmcclure/tensorflow_cookbook

introduction to neural networks in tensorflow

Technology