introduction to neural networks in tensorflow

39
@nfmcclure Introduction to Neural Networks with Tensorflow Nick McClure July 27 th , 2016 Seattle, WA

Upload: nicholas-mcclure

Post on 06-Jan-2017

1.753 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Introduction to Neural Networks in Tensorflow

@nfmcclure

Introduction to Neural Networks with Tensorflow

Nick McClure

July 27th, 2016

Seattle, WA

Page 2: Introduction to Neural Networks in Tensorflow

@nfmcclure

Who am I?

• ddd

www.github.com/nfmcclure/tensorflow_cookbook

Page 3: Introduction to Neural Networks in Tensorflow

@nfmcclure

Why Does Any of This Matter?

88

19 152

Page 4: Introduction to Neural Networks in Tensorflow

@nfmcclure

Why Neural Networks?• The human brain can be considered

to be one of the best processors. (Estimated to contain ~1011

neurons.)

• Studies show that our brain can process the equivalent of ~20Mb/sec just through the optical nerves.

• If we can copy this design, maybe we can solve the “hard for a computer – easy for humans” problems.

• Speech recognition• Facial identification• Reading emotions• Recognizing images• Sentiment analysis• Driving a vehicle• Disease diagnosis

Page 5: Introduction to Neural Networks in Tensorflow

@nfmcclure

The Basic Unit: Operational Gate

• F(x,y) = x * y

• How can we change the inputs to decrease the output?

• The intuitive answer is to decrease 3 and/or decrease 4.

• Mathematically, the answer is to use the derivative…• We know if we increase the (3) input by 1, the output goes up

by a total of 4. (4*4=16)

*3

4

12

Page 6: Introduction to Neural Networks in Tensorflow

@nfmcclure

Multiple Gates• F(x, y, z) = (x + y) * z

• If G(a, b) = a + b, then F(x,y,z) = G(x, y) * z

• Or (change notation) F = G*z (we have solved this problem before)

• Mathematically, we will use the “chain rule” to get the derivatives:

+2

3

20

4

*

𝑑𝐹

𝑑𝑥=𝑑𝐹

𝑑𝐺∙𝑑𝐺

𝑑𝑥

𝑑𝐹

𝑑𝑥= 𝑧 ∙ 1

Page 7: Introduction to Neural Networks in Tensorflow

@nfmcclure

Loss Functions• You can’t learn unless you know how bad/good your past

results have been.

• This is designated by a ‘Loss function’.

• The idea is we have a labeled data set and we look at each data point. We run the data point forward through the network and then see if we need to increase or decrease the output.

• We then change the parameters according to the derivatives in the network.

• We loop through this many times until we reach the lowest error in our training.

Page 8: Introduction to Neural Networks in Tensorflow

@nfmcclure

Learning Rate• The learning rate controls how much of a change we make

to our model parameters.

Page 9: Introduction to Neural Networks in Tensorflow

@nfmcclure

Logistic Regression as a Neural Network• A neural network is very similar to what we have been

doing, except that we bound the outputs between 0 and 1… with a sigmoid function. (Called an activation function)

• The sigmoid function is nice, because it turns out that the derivative is related to itself:

• We can use this fact in our ‘chain rule’ for derivatives.

𝐹(𝑥, 𝑎, 𝑏) = 𝜎(𝑎𝑥 + 𝑏) σ(𝑥) =1

1 + 𝑒−𝑥

𝑑𝜎

𝑑𝑥= 𝜎(𝑥) ∙ (1 − 𝜎 𝑥 )

x

a*

b+ Output

x

1Output

a

b

OR

https://github.com/nfmcclure/tensorflow_cookbook/tree/master/03_Linear_Regression/08_Implementing_Logistic_Regression

Page 10: Introduction to Neural Networks in Tensorflow

@nfmcclure

Why Activation Functions?• Turns out, no matter how many additions and multiplications we pile

on, we will never be able to model non-linear outputs without having non-linear transformations.

• The sigmoid function is just one example of a gate function.

• There is also the Rectified Linear Unit (ReLU):

• Softplus:

• Leaky ReLU:

• Where 0<a<<1

σ(𝑥) =1

1 + 𝑒−𝑥

σ(𝑥) = max(𝑥, 0)

σ(𝑥) = ln(1 + 𝑒𝑥)

σ(𝑥) = max(𝑥, 𝑎𝑥)

Page 11: Introduction to Neural Networks in Tensorflow

@nfmcclure

Many Layers

• Neural networks can have many ‘hidden layers’.

• We can make them as deep as we want.

• Neural Networks can also have as many inputs or outputs as we would like as well.

Depth of Neural NetworkFully Connected Layers

Page 12: Introduction to Neural Networks in Tensorflow

@nfmcclure

What is Tensorflow?• An open source framework for creating

computational graphs.

• Computational graphs are actually extremely flexible.• Linear Regression (regular, multiple, logistic, lasso, ridge,

elasticnet,…)

• Support Vector Machines (any kernel)

• Nearest Neighbor methods

• Neural Networks

• ODE/PDE Solvers

Page 13: Introduction to Neural Networks in Tensorflow

@nfmcclure

Why Tensorflow?

• Compare to other open source deep learning frameworks

Popularity

Speed/Size

Caffe

Others

Page 14: Introduction to Neural Networks in Tensorflow

@nfmcclure

Why Tensorflow?

• Tensorflow is also very portable and flexible!• Virtually no code change to go from CPU-> GPU or…

Page 15: Introduction to Neural Networks in Tensorflow

@nfmcclure

How is the Tensorflow Community?• Very active github community

• In approximately 6 months: >5k commits, >250 contributors

• Over 2,000 questions tagged on stackoverflow (>50% answered within 24 hours)

Page 16: Introduction to Neural Networks in Tensorflow

@nfmcclure

How do Tensorflow Algorithms Work?

Computational Graph

Model Variables

Output

Loss Function

Page 17: Introduction to Neural Networks in Tensorflow

@nfmcclure

A One Hidden Layer NN in Tensorflow• Iris dataset: Predict Petal width from (Sepal length, Sepal

width, and petal length)

S.L.

S.W.

P.L.

bias

h1

h2

h3

h4

h5

Output

bias

15 weights + 5 biases 5 weights + 1 bias = 26 variables

Computational Graph

Placeholders

https://github.com/nfmcclure/tensorflow_cookbook/tree/master/06_Neural_Networks/04_Single_Hidden_Layer_Network

Page 18: Introduction to Neural Networks in Tensorflow

@nfmcclure

More in Tensorflow: Distributed

• Computational graphs can take very long to train.

• Quicker training across multiple machines.

• Just need a dictionary mapping jobs to network addresses:

job_spec = {“my_workers”: [“worker1.location.com:22”, “worker2.location.com:22”],

“other_workers”: [“gpu_worker.location.com:22”]}

tf.train.ClusterSpec(job_spec)

with tf.device(“/job:my_workers/task:0”):

# do task

with tf.device(“/job:other_workers/task:0”):

# do another task

• Scheduling and communication is done for you (in an efficient and greedy manner)

Page 19: Introduction to Neural Networks in Tensorflow

@nfmcclure

More in Tensorflow: GPU Capabilities

• Sometimes neural networks can have hundreds of millions of parameters to train.

• Graphic cards (GPUs) are built to handle matrix calculations very fast and efficiently.• It’s quite common to get 10x to 50x speedup when switching the

same code to a GPU

(1) Install GPU version of T.F.(2) Done.

As a side note, you can also tell T.F. to use specific devices.

/cpu:0

/gpu:0

/gpu:100

- The CPU of the machine.- The GPU of the machine.- The 100th GPU of the machine.

Page 20: Introduction to Neural Networks in Tensorflow

@nfmcclure

More in Tensorflow: Tensorboard

• A built in way to visualize the computational graph, accuracies, loss functions, gradients…

• (1) Specify a specific way to write and store summaries to a log file.

• (2) Point Tensorboard at logging directory, and navigate to 0.0.0.0:6006. (Can view during training).

tf.histogram_summary(‘variable_histogram’, variable_value)

tf.scalar_summary(‘loss_value’, loss)

merged = tf.merge_all_summaries()

my_writer = tf.train.SummaryWriter('my_logging_directory', sess.graph)

$tensorboard –logdir=my_logging_directory

Page 21: Introduction to Neural Networks in Tensorflow

@nfmcclure

More in Tensorflow: Tensorboard

Collapsable/Expandable Named Scopes!!

with tf.name_scope(‘hidden1’) as hidden_scope:

#Perform calculations.

Page 22: Introduction to Neural Networks in Tensorflow

@nfmcclure

More in Tensorflow: skflow• Easier scikit learn type implementation of common

algorithms.

• 3 layer NN:

• RNN:

• Easy to add custom layers by adding in layer functions from vanilla Tensorflow.

import tensorflow.contrib.learn as skflow

my_model = skflow.TensorFlowDNNClassifier(hidden_units=[N1, N2, N3], n_classes=n_out)

my_model.fit(x_data, y_target)

my_rnn = skflow.TensorFlowRNNClassifier(…)

Page 23: Introduction to Neural Networks in Tensorflow

@nfmcclure

What is a Convolutional Neural Network?• A CNN us a large pattern recognition neural

network that uses various parameter reducing techniques.

• E.g. Alexnet (2012 ImageNet Winner)

Page 24: Introduction to Neural Networks in Tensorflow

@nfmcclure

Reduction of Parameters

• CNNs use tricks to reduce the amount of parameters to fit.• Convolution layers: Assume weights are the same on

arrows in the same direction of a window we move across data.

w1

w2

w3There are only three weights in this layer, w1, w2, and w3

Note: Usually multiple convolutions are performed, creating ‘Layers’.

Page 25: Introduction to Neural Networks in Tensorflow

@nfmcclure

Other Tricks- Pooling

• Layer that is an aggregation across a window from prior layer.

• Types of pooling:• Max, min, mean, median, …

Aggregation over three nodes in prior layer. E.g. Max().

Page 26: Introduction to Neural Networks in Tensorflow

@nfmcclure

Other Tricks- Dropout• Dropout is a way to increase the training strength and regularize

the parameters.

• During training, we randomly select activation functions and make the output equal to zero.• Compare this to ‘zoneout’, which randomly replaced activation

functions with prior iteration value.

• This helps the network find multiple pathways for prediction. Because of this, the network will not rely on just one connection or edge for prediction.

• This also helps the training of other parameters in the network.

Page 27: Introduction to Neural Networks in Tensorflow

@nfmcclure

Larger Network Shorthand• Since neural networks can get quite large, we do

not want to write out all the connections nor nodes.

• Here’s how to convey the meaning in a shorter, less complicated way:

“Max of a set of features”E.g., Here, it appears to bea max of 2 features.

Page 28: Introduction to Neural Networks in Tensorflow

@nfmcclure

What is a Recurrent Neural Network?

• Recurrent Neural Networks (RNN) are networks that can ‘see’ outputs from prior layers. Very helpful for sequences.

• The most common usage is to predict the next letter from the prior letter. We train this network on large text samples.

U,V,W = Weight ParametersS = layerX = inputO = output

Page 29: Introduction to Neural Networks in Tensorflow

@nfmcclure

Regional Convolutional Networks• We can convolve a whole network over patches of the

image and see what regions score highly for a category. (Object Detection)

https://www.youtube.com/watch?v=lr1qVxhGUDw

Page 30: Introduction to Neural Networks in Tensorflow

@nfmcclure

Regional CNN + Recurrent NN = Image Captioning• We take the object detection output and feed it through a

recurrent neural network to generate text describing the image.

Microsoft Research 2015

Page 31: Introduction to Neural Networks in Tensorflow

@nfmcclure

Deep Dream (2015)• We can select specific nodes in an image

recognition pipeline and up weight the output…

https://www.youtube.com/watch?v=DgPaCWJL7XIhttp://www.fastcodesign.com/3057368/inside-googles-first-deepdream-art-show

Page 32: Introduction to Neural Networks in Tensorflow

@nfmcclure

Stylenet (2015)

• Train network on a “style image”. Try to reconstruct another image from that same pretrained network.

https://vimeo.com/139123754

Recoloring of black and white photos

Page 33: Introduction to Neural Networks in Tensorflow

@nfmcclure

CNN for Text (2015)• Zhang, X., et. al. released a paper (http://arxiv.org/abs/1509.01626) that

showed great results for treating strings as 1-D numerical vectors.

• Ways to get strings into numerical vectors:• Word level one-hot encoding.

• Word2Vec encoding or similar (GloVe or Doc2Vec or …)

• Character level one-hot encoding.

• Can’t resize vectors like pictures, so instead we pick a max length and pad many entries with zeros.

• Like many NN results, depend heavily on dataset and architecture.

• Paper shows good results for low # of classes (ratings or sentiment).

Page 34: Introduction to Neural Networks in Tensorflow

@nfmcclure

Parsey McParseface

• Google released a model called ‘Syntaxnet’ in May 2016, based on Tensorflow.

• Syntaxnet is arguably one of the best part-of-speech parsers available.

• They trained an English version, called ‘Parsey McParseface’.

• This is a free and open source model to use.

Page 35: Introduction to Neural Networks in Tensorflow

@nfmcclure

The Possibilities are Endless

• Predict age from facial photo - https://how-old.net

• Predict dog breed from photo - https://www.what-dog.net

• Inside a self driving car brain : https://www.youtube.com/watch?v=ZJMtDRbqH40

• Memory networks: https://arxiv.org/pdf/1410.3916.pdf• Frodo journeyed to Mount-Doom. Frodo dropped the ring there. Sauron died.

• Frodo went back to the Shire. Bilbo travelled to the Grey-havens.

• Where is the ring? A: Mount-Doom

• Where is Bilbo now? A: Grey-havens

• “Synthia”- Virtual driving school for self driving cars:• http://www.gizmag.com/synthia-dataset-self-driving-cars/43895/

• Solving hand-written chalk board problems in real time:• http://www.willforfang.com/computer-vision/2016/4/9/artificial-intelligence-for-

handwritten-mathematical-expression-evaluation

Page 36: Introduction to Neural Networks in Tensorflow

@nfmcclure

<Insert Name Here>-Neural Net• Last Month (June-July 2016):

• YodaNN: http://arxiv.org/abs/1606.05487• CMS-RCNN: http://arxiv.org/abs/1606.05413• Zoneout RNN: https://arxiv.org/abs/1606.01305• Pixel CNN: http://arxiv.org/abs/1606.05328• Memory CNN: http://arxiv.org/abs/1606.05262• DISCO Nets: http://arxiv.org/abs/1606.02556

• This Year (2016):• Stochastic Depth Net: https://arxiv.org/abs/1603.09382• Squeeze Net: https://arxiv.org/abs/1602.07360• Binary Nets: http://arxiv.org/abs/1602.02830

• Last Year:• PoseNet: http://arxiv.org/abs/1505.07427• YOLO Nets: http://arxiv.org/abs/1506.02640• Style Net: http://arxiv.org/abs/1508.06576• Residual Net: https://arxiv.org/abs/1512.03385• And so many more…

• What’s Your Architecture?

Page 37: Introduction to Neural Networks in Tensorflow

@nfmcclure

How to Keep ML Current• Machine learning advances happen more and more

frequently.• Can’t rely on journals that take multiple years to publish.

• Twitter

• http://arxiv.org/

• Others?

Page 38: Introduction to Neural Networks in Tensorflow

@nfmcclure

More Resources• More on CNNs:

• https://www.reddit.com/r/MachineLearning/

• https://www.reddit.com/r/deepdream/

• http://karpathy.github.io/

• RNNs:

• http://colah.github.io/posts/2015-08-Understanding-LSTMs/

• Recent Research: (Jan 16-June 16)

• Residual Conv. Nets: https://www.youtube.com/watch?v=1PGLj-uKT1w

• Determine location of a random image: https://www.technologyreview.com/s/600889/google-unveils-neural-network-with-superhuman-ability-to-determine-the-location-of-almost/

• Stochastic Depth Networks: http://arxiv.org/abs/1603.09382

Pre-trained models: http://www.vlfeat.org/matconvnet/pretrained/

Page 39: Introduction to Neural Networks in Tensorflow

@nfmcclure

Questions?

• Slides:• http://www.slideshare.net/NicholasMcClure1/introduction-to-

neural-networks-in-tensorflow

• Code and more:• https://github.com/nfmcclure/tensorflow_cookbook