utah code camp 2014 - learning from data by thomas holloway
DESCRIPTION
This is a fast paced guide to a branch of artificial intelligence in machine learning that was given at the Utah Code Camp 2014.TRANSCRIPT
![Page 1: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/1.jpg)
Learning from DataA fast-paced guide to machine learning and artificial intelligence
by Thomas HollowayCo-Founder/Software Engineer @ Nuvi (http://www.nuviapp.com)
![Page 2: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/2.jpg)
Thanks to our Sponsors!
To connect to wireless 1. Choose Uguest in the wireless list
2. Open a browser. This will open a Uof U website 3. Choose Login
![Page 3: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/3.jpg)
– H.B. BARLOW
“Intelligence is the art of good guesswork”
![Page 4: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/4.jpg)
General Intelligence Goals• Deduction, Reasoning, Problem
Solving
• Knowledge Representation
• Planning
• Learning
• Natural Language Processing
• Motion and Manipulation
• Perception
• Social Intelligence
• Creativity
• Early AI research began in the study of logic itself - leading to the algorithms that imitate step-by-step reasoning used to solve puzzles and problems. (heuristics)
• Contrast to methods pulled from economics and probability in the late 80’s/90’s led to very successful approaches for dealing with uncertainty or incompleteness.
• Statistical Approaches, Neural Networks (Probabilistic Nature of Humans to Guess)
![Page 5: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/5.jpg)
General Intelligence Goals• Deduction, Reasoning, Problem
Solving
• Knowledge Representation
• Planning
• Learning
• Natural Language Processing
• Motion and Manipulation
• Perception
• Social Intelligence
• Creativity
• Represent conceptually about objects, places, things, situations, events, things, times, language
• What they look like
• Categorical features
• Properties
• Relationships between each other
• Meta-knowledge (knowledge of what other people know)
• Causes, effects and lots of other less known research fields
• “what exists” = Ontology
![Page 6: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/6.jpg)
General Intelligence Goals• Deduction, Reasoning, Problem
Solving
• Knowledge Representation
• Planning
• Learning
• Natural Language Processing
• Motion and Manipulation
• Perception
• Social Intelligence
• Creativity
• Difficult Problems
• Working assumptions, default reasoning, qualification problem
• Commonsense Knowledge
• Major goal is to automatically acquire this largely through unsupervised learning
• Ontology Engineering
• Subsymbolic Form of Commonsense Knowledge
• Not all knowledge can be represented as facts or statements. (As such, intuition to avoid a decision, i.e. “feels too exposed” in a chess match)
![Page 7: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/7.jpg)
General Intelligence Goals• Deduction, Reasoning, Problem
Solving
• Knowledge Representation
• Planning
• Learning
• Natural Language Processing
• Motion and Manipulation
• Perception
• Social Intelligence
• Creativity
• Set Goals and achieve them
• (visualize the representation of the world, predict how actions will change it, make choices to maximize utility)
• Requires reasoning under uncertainty (as a result of the world/environment matches its predictions) -> error correction
• Move chess piece here, player responds to put me in a seemingly poor position, act accordingly
![Page 8: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/8.jpg)
General Intelligence Goals• Deduction, Reasoning, Problem
Solving
• Knowledge Representation
• Planning
• Learning
• Natural Language Processing
• Motion and Manipulation
• Perception
• Social Intelligence
• Creativity
• Machine Learning is the study of algorithms that automatically improve through experience.
• Probably the most central role to Artificial Intelligence.
• Unsupervised Learning - finding patterns
• Supervised Learning - classify categorically what something is/belongs and producing a function to represent input -> output
• Reinforcement Learning - rewards
• Developmental Learning - self-exploration, active learning, imitation, guidance, entropy
![Page 9: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/9.jpg)
General Intelligence Goals• Deduction, Reasoning, Problem
Solving
• Knowledge Representation
• Planning
• Learning
• Natural Language Processing
• Motion and Manipulation
• Perception
• Social Intelligence
• Creativity
• Read and understand text
• Listen and understand speech
• Information Retrieval
• Machine Translation
• Sentiment Analysis
• Category Theory (Quantum Logic in Information Flow Theory)
• Common techniques in semantic indexing, parse trees, syntactic and semantic analysis
• Major Goal to automatically build ontology (for knowledge representation) by scanning books, wikipedia, dictionaries… etc
• Recently used wikitionary and wikipedia to automatically build a part of speech tagger and sentiment analysis engine for multiple languages. *http://www.nuviapp.com/* <— PLUG
![Page 10: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/10.jpg)
• Entropic Force (Alex Wissner-Gross argument for intelligence)
• Language Discovery
• Automated Trading Systems
• Machine Translation
• Spam Detection
• Self-Driving Cars
• Facial Recognition
• Gesture Recognition
• Speech Recognition
• Nest
• Shazam
Statistical Machine Learning is the art of taking lots of data and turning it into statistically known probabilities.
• Spotify
• Netflix, Amazon Recommendations
• Duolingo
• Robot Movement
• Fraud Detection
• Intrusion Detection / State Anomaly
• DNA Sequence Alignment
• Siri, Google Voice, Google Now, Xinect
• Sentiment Analysis
• Text/Character Recognition (Scanning books)
• Health Monitoring (Healthcare)
• Pandora, iTunes / iGenius
![Page 11: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/11.jpg)
Types of Machine Learning
• Supervised Learning
• Unsupervised Learning
• Recommendation Systems
• Reinforcement Learning
• (rewards for good responses, punished for bad ones)
• Developmental Learning • (self-exploration, entropic force, cumulative acquisition of novel skills typical
of robot movement - autonomous interaction with environment and “teachers”, imitation, maturation)
![Page 12: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/12.jpg)
Supervised Learning
• Two types that we will discuss within supervised learning:
• Regression analysis (single-valued real output)
• Classification
![Page 13: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/13.jpg)
Linear Regression
![Page 14: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/14.jpg)
Optimization Objectives• Hypothesis:
• Parameters:
• Cost Function:
• Goal:
m = number of samplesx(i) = x at sample iy(i) = y at sample i
Our cost function is effectively taking the
square error difference between all predictions from our hypothesis and
the actual values y - and finally summing
the error up to a total “cost” error.
Minimize the error produced from the cost function by manipulating
the parameters theta.
![Page 15: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/15.jpg)
![Page 16: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/16.jpg)
Gradient Descent• First-Order Optimization Algorithm
• Finds Local Minimum of a function by taking steps proportional to the negative of the gradient of the function at the current point.
• Popular for large-scale optimization problems
• easy to implement
• works on just about any black-box function
• each iteration is relatively cheap
![Page 17: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/17.jpg)
Gradient Descent
repeat until convergence {
for ( j = 1 and j = 0)}
Hypothesis
Cost Function
![Page 18: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/18.jpg)
Gradient Descent
repeat until convergence {
for ( j = 1 and j = 0)}
repeat until convergence {
}
![Page 19: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/19.jpg)
Gradient Descent
repeat until convergence {
for ( j = 1 and j = 0)}
![Page 20: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/20.jpg)
Gradient Descentrepeat until convergence {
}
Hypothesis
1
* note: sometimes referred to as batch gradient descent (given that we iterate over all training examples to perform a single update on our
parameters)
![Page 21: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/21.jpg)
Multivariate Linear Regression
TV Budget Online Ads Billboards Sales
230.1 37.8 63.1 22.1
44.5 39.9 45.1 10.4
17.2 45.8 69.3 9.3
180.8 41.3 58.5 18.5
![Page 22: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/22.jpg)
• Hypothesis:
• Think of x as our example with features in a vector up to n features with
Multivariate Linear Regression
![Page 23: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/23.jpg)
Optimization Objectives• Hypothesis:
• Parameters:
• Cost Function:
• Goal:
m = number of samplesx(i) = x at sample iy(i) = y at sample i
Our cost function is effectively taking the
square error difference between all predictions from our hypothesis and
the actual values y - and finally summing
the error up to a total “cost” error.
Minimize the error produced from the cost function by manipulating
the parameters theta.
![Page 24: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/24.jpg)
Gradient Descent
repeat until convergence {
for ( j = 0…n)}
Hypothesis
Cost Function
![Page 25: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/25.jpg)
Gradient Descent
repeat until convergence {
for ( j = 0…n)}
Hypothesis
Cost Function
![Page 26: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/26.jpg)
![Page 27: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/27.jpg)
Techniques in managing input• Mean-Normalization (make sure all your input have similar ranges)
• FFT for audio
• Mean / Average / Range
• Graph your Cost Function over the number of iterations (make sure it is decreasing)
• Separate data sets (cross validation, test set)
• Train on a given set of data, manipulate regularization / extra features..etc and graph your cost function against the cross validation set
• Finally test against unseen data against your test set
• Typically this is 60-30-10, or even 70-20-10, depending on how you wish to split things up.
![Page 28: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/28.jpg)
Normal Equation• Analytically solves the parameters
• Useful when n is relatively small (n of features < 5000 or so)
• Uses the entire matrix of input
• Each sample = vector of features
![Page 29: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/29.jpg)
Supervised Learning - Classification
• Spam/Not Spam
• Benign/Malignant
• Biometric Identification
• Speech Recognition
• Fraudulent Transactions
• Pattern Recognition
• 0 = Negative Class
• 1 = Positive Class
![Page 30: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/30.jpg)
Logistic Regression / Classification
• What we want is a function that will produce a value between 0 and 1 for all weighted input we provide.
• Sigmoid Activation Unit
![Page 31: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/31.jpg)
Logistic Regression / Classification
• What we want is a function that will produce a value between 0 and 1 for all weighted input we provide.
• Sigmoid Activation Unit
![Page 32: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/32.jpg)
Logistic Regression Cost Function
• Hypothesis:
• Cost Function:Linear Regression Cost
Function
![Page 33: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/33.jpg)
Logistic Regression Cost Function
• Hypothesis:
• Cost Function:
![Page 34: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/34.jpg)
Logistic Regression Cost Function Intuition
![Page 35: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/35.jpg)
Logistic Regression Cost Function Intuition
In other words, if we predicted 0 when we should of predicted 1, in this case we are going to return back a very large cost.
![Page 36: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/36.jpg)
Logistic Regression Cost Function Intuition
![Page 37: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/37.jpg)
Logistic Regression Cost Function Intuition
In other words, if we predicted 1 when we should of predicted 0, in this case we are going to return back a very large cost.
![Page 38: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/38.jpg)
Logistic Regression Cost Function
• This is the “simplified” formula —>
![Page 39: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/39.jpg)
Gradient Descent
repeat until convergence {
for ( j = 0…n)}
Hypothesis
Cost Function
![Page 40: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/40.jpg)
Logistic Regression Decision Boundaries
• The threshold or line at which input data is favoring one class or another. This is usually the same point where we see our sigmoid function cross the 0.5 mark.
![Page 41: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/41.jpg)
Logistic Regression Decision Boundaries
• The threshold or line at which input data is favoring one class or another. This is usually the same point where we see our sigmoid function cross the 0.5 mark.
![Page 42: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/42.jpg)
Logistic Regression Decision Boundaries
• The threshold or line at which input data is favoring one class or another. This is usually the same point where we see our sigmoid function cross the 0.5 mark. (Sunny, Rainy, Cloudy..etc)
Multi class classification is dealing with multiple
categories of classification. Typically
done as a one-vs-all classification. Where each class is trained as (1 = positive for a given
class, 0 for everything else).
Find max probability of all classes tested against.
![Page 43: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/43.jpg)
Overfitting and Regularization
![Page 44: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/44.jpg)
Regularization
• Regularized Logistic Regression
• Regularized Gradient Descent
[ ]
Regularization Parameter
![Page 45: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/45.jpg)
Neural Networks
![Page 46: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/46.jpg)
Sophisticated Neural Networks
can do some really amazing things
Multi-layered (deep) neural networks can be built to identify extremely complex things with
potentially millions of features to train on.
Neural Networks can auto-encode (learn from the input itself/self-learn), classify into many categories at once, can be trained to output real-values, they can even be built to retain memory or long-term state (such as in the
case of hidden markov models or finite state automatons)
![Page 47: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/47.jpg)
Types of Neural Networks• Feedforward
• Recurrent
• Echo-State
• Long-Short-Term Memory
• Stochastic
• Bidirectional (propagates in both directions)
![Page 48: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/48.jpg)
Feed Forward Network
![Page 49: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/49.jpg)
Feed Forward Network+ 1 = bias
unit+1
+1
Input Features /
Input Layer
Output
![Page 50: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/50.jpg)
+1 +1 +1
What is the value of ?
Answer: the sigmoid activation of the sum of its weighted inputs
![Page 51: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/51.jpg)
+1 +1 +1
What is the output of ?
Answer: the activation (sigmoid) of it the sum of its weighted inputs
![Page 52: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/52.jpg)
+1 +1 +1
What is the output of ?
Answer: the activation (sigmoid) of it the sum of its weighted inputs
![Page 53: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/53.jpg)
+1 +1 +1
What is the output of ?
Answer: the activation (sigmoid) of it the sum of its weighted inputs
![Page 54: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/54.jpg)
+1 +1 +1
What is the output of ?
![Page 55: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/55.jpg)
+1 +1 +1
What is the output of ?
![Page 56: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/56.jpg)
+1 +1 +1
What is the output of ?
![Page 57: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/57.jpg)
+1 +1 +1
Feed Forward Propagation
![Page 58: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/58.jpg)
![Page 59: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/59.jpg)
![Page 60: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/60.jpg)
Backpropagation
• Gradient computation is done by computing the derivative gradient of our expected output versus our actual output and propagating that error backwards through the network.
• Calculate:
![Page 61: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/61.jpg)
Backpropagation+1 +1 +1 let y = 1 for this
sample
![Page 62: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/62.jpg)
Backpropagation+1 +1 +1
![Page 63: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/63.jpg)
Recurrent Neural Networks
![Page 64: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/64.jpg)
Recurrent Neural Networks• Connections units form a directed cycle
• Allows items to exhibit dynamic temporal behavior
• Useful for maintaining internal memory or state over time
• Ex: unsegmented hand writing recognition
• At any given time step, each non-input unit computes its current activation as a nonlinear function of the weighted sum of the activations of all units from which it receives connections.
• Training is done with back propagation through time
• vanishing gradient (solved with LSTM networks)
![Page 65: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/65.jpg)
Recurrent Neural Networks
![Page 66: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/66.jpg)
Recurrent Neural Networks
http://www.manoonpong.com/AMOSWD08.html
![Page 67: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/67.jpg)
LSTM Recurrent Neural Network• Long Short Term Memory
• Well suited for classifying, predicting and processing time series data with very long range dependencies.
• Achieves best known results in unsegmented handwriting recognition
• Traps error within a memory block (often referred to as an error carousel)
• Amazing applications in rhythm learning, grammar learning, music composition, robot control…etc
![Page 68: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/68.jpg)
Other classification techniques
• SVM (support vector machines)
• constructs a hyperplane in a high/infinite-dimensional space used for training/classification, regression..etc
• by defining a kernel function (or some function that will tell us similarity) svm will allow us to perform simple dot products between high-dimensional features
• high-margin (decision boundary has good separation between training points) which benefits good generalization
• Naive Bayes
![Page 69: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/69.jpg)
Unsupervised Learning• Categorization
• Clustering (density estimation)
• Selecting top clusters (k-means) and updating average centroid, assign data points to a cluster and iterate
• Blind Signal Separation
• Feature Extraction for Dimensionality Reduction
• Hidden Markov Models
• Non-normal & normal distribution analysis (finding the distributions of data)
• Self-Organizing Maps
![Page 70: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/70.jpg)
Autoencoders
Unsupervised Learning from Neural Networks
![Page 71: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/71.jpg)
![Page 72: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/72.jpg)
Autoencoders
![Page 73: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/73.jpg)
Knowing what to do next• Build your algorithm quick and dirty, don’t spend a lot of time on it until you have something to use
• Split up your training, cross validation and test sets (don’t test on your training data!)
• Move on to PCA or unsupervised pre-training for your supervised algorithms to help improve performance after: —>
• Don’t just try and get a lot of data to train on, implement your algorithm quick and dirty, use smaller data sets initially and determine bias/variance
• High variance: get more training data
• High variance: try fewer features
• High bias: add additional features
• High bias: add polynomial features
• High bias: decrease regularization
• High variance: increase regularization
![Page 74: Utah Code Camp 2014 - Learning from Data by Thomas Holloway](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c63ec14a7959c9388b476c/html5/thumbnails/74.jpg)
Follow me @ @nyxtom
Thank you!
Questions?
http://ml-class.org/
https://www.coursera.org/course/bluebrain
https://www.coursera.org/course/neuralnets