neural networks - umd department of computer … networks •today –what are neural networks?...
TRANSCRIPT
![Page 1: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/1.jpg)
Neural Networks
CMSC 422
MARINE CARPUAT
XOR slides by Graham Neubig (CMU)
![Page 2: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/2.jpg)
Neural Networks
• Today
– What are Neural Networks?
– How to make a prediction given an input?
– Why are neural networks powerful?
• Thursday
– how to train them?
![Page 3: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/3.jpg)
A warm-up example
sentiment analysis for movie review
• the movie was horrible +1
• the actors are excellent -1
• the movie was not horrible -1
• he is usually an excellent actor, but not in
this movie +1
![Page 4: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/4.jpg)
Binary classification
via hyperplanes
• At test time, we check on
what side of the
hyperplane examples fall
𝑦 = 𝑠𝑖𝑔𝑛(𝑤𝑇𝑥 + 𝑏)
![Page 5: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/5.jpg)
Function Approximation
with PerceptronProblem setting
• Set of possible instances 𝑋– Each instance 𝑥 ∈ 𝑋 is a feature vector 𝑥 = [𝑥1, … , 𝑥𝐷]
• Unknown target function 𝑓: 𝑋 → 𝑌– 𝑌 is binary valued {-1; +1}
• Set of function hypotheses 𝐻 = ℎ ℎ: 𝑋 → 𝑌}– Each hypothesis ℎ is a hyperplane in D-dimensional space
Input
• Training examples { 𝑥 1 , 𝑦 1 , … 𝑥 𝑁 , 𝑦 𝑁 } of unknown
target function 𝑓
Output
• Hypothesis ℎ ∈ 𝐻 that best approximates target function 𝑓
![Page 6: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/6.jpg)
Aside: biological inspiration
Analogy: the
perceptron
as a neuron
![Page 7: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/7.jpg)
Neural Networks
• We can think of neural networks as
combination of multiple perceptrons
– Multilayer perceptron
• Why would we want to do that?
– Discover more complex decision boundaries
– Learn combinations of features
![Page 8: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/8.jpg)
What does a 2-layer
perceptron look like?
(illustration on board)
• Key concepts:
– Input dimensionality
– Hidden units
– Hidden layer
– Output layer
– Activation functions
![Page 9: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/9.jpg)
Activation functions
• Activation functions are non-linear
functions
– sign function as in the perceptron
– hyperbolic tangent and other sigmoid
functions that approximate sign but are
differentiable
• What happens if the hidden units use the
identify function as an activation function?
![Page 10: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/10.jpg)
Matrix of hidden layer parameters
Vector of output layer parameters
![Page 11: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/11.jpg)
What functions can we approximate
with a 2 layer perceptron?Problem setting
• Set of possible instances 𝑋– Each instance 𝑥 ∈ 𝑋 is a feature vector 𝑥 = [𝑥1, … , 𝑥𝐷]
• Unknown target function 𝑓: 𝑋 → 𝑌– 𝑌 is binary valued {-1; +1}
• Set of function hypotheses 𝐻 = ℎ ℎ: 𝑋 → 𝑌}
Input
• Training examples { 𝑥 1 , 𝑦 1 , … 𝑥 𝑁 , 𝑦 𝑁 } of unknown
target function 𝑓
Output
• Hypothesis ℎ ∈ 𝐻 that best approximates target function 𝑓
![Page 12: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/12.jpg)
Two-Layer Networks are
Universal Function Approximators
• Theorem (Th 9 in CIML):Let F be a continuous function on a bounded subset of D-
dimensional space. Then there exists a two-layer neural
network 𝐹 with a finite number of hidden units that
approximates F arbitrarily well. Namely, for all x in the
domain of F,
![Page 13: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/13.jpg)
Example: a neural network to
solve the XOR problem
X
O
O
X
φ0(x2) = {1, 1}φ
0(x1) = {-1, 1}
φ0(x4) = {1, -1}φ
0(x3) = {-1, -1}
1
1
-1
-1
-1
-1
φ1
φ2
φ1[1]
φ1[0]
φ1[0]
φ1[1]
φ1(x1) = {-1, -1}
X Oφ
1(x2) = {1, -1}
O
φ1(x3) = {-1, 1}
φ1(x4) = {-1, -1}
![Page 14: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/14.jpg)
Example● In new space, the examples are linearly separable!
X
O
O
X
φ0(x2) = {1, 1}φ
0(x1) = {-1, 1}
φ0(x4) = {1, -1}φ
0(x3) = {-1, -1}
1
1
-1
-1
-1
-1
φ0[0]
φ0[1]
φ1[1]
φ1[0]
φ1[0]
φ1[1]
φ1(x1) = {-1, -1}
X O φ1(x2) = {1, -1}
Oφ1(x3) = {-1, 1}
φ1(x4) = {-1, -1}
1
1
1φ
2[0] = y
![Page 15: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/15.jpg)
Example
● The final net
tanh
tanh
φ0[0]
φ0[1]
1
φ0[0]
φ0[1]
1
1
1
-1
-1
-1
-1
1 1
1
1
tanh
φ1[0]
φ1[1]
φ2[0]
![Page 16: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/16.jpg)
Discussion
• 2-layer perceptron lets us
– Discover more complex decision boundaries than
perceptron
– Learn combinations of features that are useful for
classification
• Key design question
– How many hidden units?
– More hidden units yield more complex functions
– Fewer hidden units requires fewer examples to train
![Page 17: Neural Networks - UMD Department of Computer … Networks •Today –What are Neural Networks? –How to make a prediction given an input? –Why are neural networks powerful? •Thursday](https://reader034.vdocuments.us/reader034/viewer/2022051600/5aa38d1f7f8b9a46238e7859/html5/thumbnails/17.jpg)
Neural Networks
• Today
– What are Neural Networks?
• Multilayer perceptron
– How to make a prediction given an input?
• Simple matrix operations + non-linearities
– Why are neural networks powerful?
• Universal function approximators!
• Next
– How to train them?