![Page 1: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/1.jpg)
CS343:ArtificialIntelligence
DeepLearning
Prof.ScottNiekum—TheUniversityofTexasatAustin [TheseslidesbasedonthoseofDanKlein,PieterAbbeel,AncaDraganforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]
![Page 2: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/2.jpg)
Pleasefilloutcourseevalsonline!
![Page 3: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/3.jpg)
Review:LinearClassifiers
![Page 4: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/4.jpg)
FeatureVectors
Hello,
Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just
# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...
SPAM or +
PIXEL-7,12 : 1 PIXEL-7,13 : 0 ... NUM_LOOPS : 1 ...
“2”
![Page 5: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/5.jpg)
Some(Simplified)Biology
▪ Verylooseinspiration:humanneurons
![Page 6: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/6.jpg)
LinearClassifiers
▪ Inputsarefeaturevalues▪ Eachfeaturehasaweight▪ Sumistheactivation
▪ Iftheactivationis:▪ Positive,output+1▪ Negative,output-1
Σf1f2f3
w1
w2
w3>0?
![Page 7: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/7.jpg)
Non-Linearity
![Page 8: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/8.jpg)
Non-LinearSeparators
▪ Datathatislinearlyseparableworksoutgreatforlineardecisionrules:
▪ Butwhatarewegoingtodoifthedatasetisjusttoohard?
▪ Howabout…mappingdatatoahigher-dimensionalspace:
0
0
0
x2
x
x
x
ThisandnextslideadaptedfromRayMooney,UT
![Page 9: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/9.jpg)
Non-LinearSeparators
▪ Generalidea:theoriginalfeaturespacecanalwaysbemappedtosomehigher-dimensionalfeaturespacewherethetrainingsetisseparable:
Φ: x → φ(x)
![Page 10: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/10.jpg)
ComputerVision
![Page 11: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/11.jpg)
ObjectDetection
![Page 12: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/12.jpg)
ManualFeatureDesign
![Page 13: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/13.jpg)
FeaturesandGeneralization
[DalalandTriggs,2005]
![Page 14: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/14.jpg)
FeaturesandGeneralization
Image HoG
![Page 15: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/15.jpg)
ManualFeatureDesign!DeepLearning
▪ Manualfeaturedesignrequires:
▪ Domain-specificexpertise
▪ Domain-specificeffort
▪ Whatifwecouldlearnthefeatures,too?
▪ DeepLearning
![Page 16: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/16.jpg)
Perceptron
Σf1f2f3
w1
w2
w3
>0?
![Page 17: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/17.jpg)
Two-LayerPerceptronNetwork
Σ
f1
f2
f3w13
w23
w33
>0?
Σw12
w22
w32
>0?
Σw11
w21
w31
>0?
Σ
w1
w2
w3
>0?
![Page 18: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/18.jpg)
N-LayerPerceptronNetwork
Σ
f1
f2
f3
>0?
Σ >0?
Σ >0?
Σ
Σ >0?
Σ >0?
Σ >0?
Σ >0?
Σ >0?
Σ >0?…
…
…
>0?
![Page 19: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/19.jpg)
Performance
graph credit Matt Zeiler, Clarifai
![Page 20: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/20.jpg)
Performance
graph credit Matt Zeiler, Clarifai
![Page 21: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/21.jpg)
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
![Page 22: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/22.jpg)
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
![Page 23: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/23.jpg)
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
![Page 24: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/24.jpg)
SpeechRecognition
graph credit Matt Zeiler, Clarifai
![Page 25: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/25.jpg)
N-LayerPerceptronNetwork
Σ
f1
f2
f3
>0?
Σ >0?
Σ >0?
Σ
Σ >0?
Σ >0?
Σ >0?
Σ >0?
Σ >0?
Σ >0?…
…
…
>0?
![Page 26: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/26.jpg)
LocalSearch
▪ Simple,generalidea:▪ Startwherever▪ Repeat:movetothebestneighboringstate▪ Ifnoneighborsbetterthancurrent,quit▪ Neighbors=smallperturbationsofw
▪ Properties▪ Plateausandlocaloptima
Howtoescapeplateausandfindagoodlocaloptimum?Howtodealwithverylargeparametervectors?E.g.,
![Page 27: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/27.jpg)
Perceptron
Σf1f2f3
w1
w2
w3
>0?
▪ Objective:ClassificationAccuracy
▪ Issue:manyplateaus! howtomeasureincrementalprogresstowardacorrectlabel?
![Page 28: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/28.jpg)
Soft-Max
▪ Scorefory=1: Scorefory=-1:
▪ Probabilityoflabel:
▪ Objective:
▪ Log:
![Page 29: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/29.jpg)
Two-LayerNeuralNetwork
Σ
f1
f2
f3w13
w23
w33
>0?
Σw12
w22
w32
>0?
Σw11
w21
w31
>0?
Σ
w1
w2
w3
![Page 30: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/30.jpg)
N-LayerNeuralNetwork
Σ
f1
f2
f3
>0?
Σ >0?
Σ >0?
Σ
Σ >0?
Σ >0?
Σ >0?
Σ >0?
Σ >0?
Σ >0?…
…
…
![Page 31: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/31.jpg)
OurStatus
▪ Ourobjective ▪ Changessmoothlywithchangesinw▪ Doesn’tsufferfromthesameplateausastheperceptronnetwork
▪ Challenge:howtofindagoodw?
▪ Equivalently:
![Page 32: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/32.jpg)
1-doptimization
▪ Couldevaluate and▪ Thenstepinbestdirection
▪ Or,evaluatederivative:
▪ Tellswhichdirectiontostepinto
![Page 33: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/33.jpg)
2-DOptimization
Source: Thomas Jungblut’s Blog
![Page 34: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/34.jpg)
▪ Idea:
▪ Startsomewhere▪ Repeat:Takeastepinthesteepestdescentdirection
SteepestDescent
Figure source: Mathworks
![Page 35: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/35.jpg)
WhatistheSteepestDescentDirection?
![Page 36: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/36.jpg)
WhatistheSteepestDescentDirection?
▪ SteepestDirection=directionofthegradient
![Page 37: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/37.jpg)
OptimizationProcedure1:GradientDescent
▪ Init:
▪ Fori=1,2,…
▪ :learningrate---tweakingparameterthatneedstobechosencarefully
▪ How?Trymultiplechoices▪ Cruderuleofthumb:updatechangesabout0.1–1%
![Page 38: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/38.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201638
Suppose loss function is steep vertically but shallow horizontally:
Q: What is the trajectory along which we converge towards the minimum with Gradient Descent?
![Page 39: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/39.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201639
Suppose loss function is steep vertically but shallow horizontally:
Q: What is the trajectory along which we converge towards the minimum with Gradient Descent?
![Page 40: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/40.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201640
Suppose loss function is steep vertically but shallow horizontally:
Q: What is the trajectory along which we converge towards the minimum with Gradient Descent? very slow progress along flat direction, jitter along steep one
![Page 41: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/41.jpg)
OptimizationProcedure2:Momentum
▪ Init:
▪ Fori=1,2,…
▪ GradientDescent
▪ Init:
▪ Fori=1,2,…
-Physicalinterpretationasballrollingdownthelossfunction+friction(mucoefficient). -mu=usually~0.5,0.9,or0.99(Sometimesannealedovertime,e.g.from0.5->0.99)
▪ Momentum
![Page 42: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/42.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201642
Suppose loss function is steep vertically but shallow horizontally:
Q: What is the trajectory along which we converge towards the minimum with Momentum?
![Page 43: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/43.jpg)
Howdoweactuallycomputegradientw.r.t.weights?
Backpropagation!
![Page 44: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/44.jpg)
1
Backpropagation Learning
15-486/782: Artificial Neural NetworksDavid S. Touretzky
Fall 2006
![Page 45: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/45.jpg)
2
LMS / Widrow-Hoff Rule
Works fine for a single layer of trainable weights.
What about multi-layer networks?
S
wi
xi
y
wi = −y−dxi
![Page 46: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/46.jpg)
3
With Linear Units, Multiple Layers Don't Add Anything
U : 2×3 matrix
V : 3×4 matrix
x
Linear operators are closed under composition.Equivalent to a single layer of weights W=U×V
But with non-linear units, extra layers addcomputational power.
y
y = U×V x = U×V 2×4
x
![Page 47: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/47.jpg)
4
What Can be Done withNon-Linear (e.g., Threshold) Units?
1 layer oftrainable weights
separating hyperplane
![Page 48: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/48.jpg)
5
2 layers oftrainable weights
convex polygon region
![Page 49: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/49.jpg)
6
3 layers oftrainable weights
composition of polygons:convex regionsnon
![Page 50: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/50.jpg)
7
How Do We Train AMulti-Layer Network?
Error = d-yy
Error = ???
Can't use perceptron training algorithm becausewe don't know the 'correct' outputs for hidden units.
![Page 51: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/51.jpg)
8
How Do We Train AMulti-Layer Network?
y
Define sum-squared error:
E =1
2∑p
dp−yp2
Use gradient descent error minimization:
wij = −∂E
∂wij
Works if the nonlinear transfer function is differentiable.
![Page 52: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/52.jpg)
9
Deriving the LMS or “Delta” RuleAs Gradient Descent Learning
y = ∑i
wi xi
E = 1
2∑p
dp−yp2 dE
d y= y−d
∂E
∂wi
=dE
d y⋅∂ y
∂wi
= y−dxi
wi = −∂E
∂wi
= −y−dxixi
wi
y
How do we extend this to two layers?
![Page 53: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/53.jpg)
10
Switch to Smooth Nonlinear Units
net j = ∑i
wij yi
y j = gnet j
Common choices for g:
g x =1
1e−x
g 'x = gx⋅1−gx
g x=tanhx
g 'x=1 /cosh2x
g must be differentiable
![Page 54: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/54.jpg)
11
Gradient Descent with Nonlinear Units
y=g net=tanh ∑i wi xidE
dy=y−d,
dy
dnet=1/cosh
2net , ∂net
∂wi
=xi
∂E
∂wi
=dE
dy⋅dy
dnet⋅∂net
∂wi
= y−d/cosh2
∑i wi xi⋅xi
tanh(Swix
i)xi
wiy
![Page 55: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/55.jpg)
12
Now We Can Use The Chain Rule
yk
wjk
y j
wij
yi
∂E∂yk
= yk−dk
k =∂E
∂netk= yk−dk⋅g'netk
∂E
∂wjk
=∂E
∂netk⋅∂netk∂wjk
=∂E
∂netk⋅y j
∂E
∂y j
= ∑k ∂E
∂netk⋅∂netk∂ y j
j =
∂E∂net j
=∂E∂ y j
⋅g'net j
∂E
∂wij
=∂E
∂net j⋅yi
12
Now We Can Use The Chain Rule
yk
wjk
y j
wij
yi
∂E∂yk
= yk−dk
k =∂E
∂netk= yk−dk⋅g'netk
∂E
∂wjk
=∂E
∂netk⋅∂netk∂wjk
=∂E
∂netk⋅y j
∂E
∂y j
= ∑k ∂E
∂netk⋅∂netk∂ y j
j =
∂E∂net j
=∂E∂ y j
⋅g'net j
∂E
∂wij
=∂E
∂net j⋅yi
12
Now We Can Use The Chain Rule
yk
wjk
y j
wij
yi
∂E∂yk
= yk−dk
k =∂E
∂netk= yk−dk⋅g'netk
∂E
∂wjk
=∂E
∂netk⋅∂netk∂wjk
=∂E
∂netk⋅y j
∂E
∂y j
= ∑k ∂E
∂netk⋅∂netk∂ y j
j =
∂E∂net j
=∂E∂ y j
⋅g'net j
∂E
∂wij
=∂E
∂net j⋅yi
![Page 56: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/56.jpg)
13
Weight Updates
∂E
∂wjk
=∂E
∂netk⋅∂netk∂wjk
= k⋅y j
∂E
∂wij
=∂E
∂net j⋅∂net j∂wij
= j⋅yi
wjk = −⋅∂E
∂wjk
wij = −⋅∂E
∂wij
![Page 57: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/57.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201657
Deep learning is everywhere
[Krizhevsky 2012]
Classification Retrieval
![Page 58: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/58.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201658
[Faster R-CNN: Ren, He, Girshick, Sun 2015]
Detection Segmentation
[Farabet et al., 2012]
Deep learning is everywhere
![Page 59: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/59.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201659
NVIDIA Tegra X1
self-driving cars
Deep learning is everywhere
![Page 60: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/60.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201660
[Toshev, Szegedy 2014]
[Mnih 2013]
Deep learning is everywhere
![Page 61: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/61.jpg)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 25 Jan 201661
[Ciresan et al. 2013] [Sermanet et al. 2011] [Ciresan et al.]
Deep learning is everywhere
![Page 62: CS 343: Artificial Intelligencesniekum/classes/343-S19/... · 2019. 8. 29. · Data that is linearly separable works out great for linear decision rules: But what are we going to](https://reader033.vdocuments.us/reader033/viewer/2022052009/601deb5f55dee4438430146a/html5/thumbnails/62.jpg)
[Vinyals et al., 2015]
Image Captioning