0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/class01_all.pdf · the new golden age:...
TRANSCRIPT
!"#$%&'#(&#$)(*$+"#$,(-'(##.'(-$/0$
1(+#22'-#(&#
Shimon Ullman + Tomaso Poggio(with Owen Ardron Lewis)
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology
Cambridge, MA 02139 USA
9.S915 in Fall 2011
Friday, September 9, 2011
Overview
Context for this course: the Intelligence Initiative
Friday, September 9, 2011
!"#$%&'()#*$'+$,-.#)),/#-0#$,1$'-#$'+$."#$/.$%&'()#*1$,-$10,#-0#3$%&'(2()4$."#$/.#1.5
6#1#2&0"$'-$,-.#)),/#-0#$(4$-#7&'10,#-0#$2-8$0'*%7.#&$10,#-0#$9:;<=$
>$2$/.$,-.#))#0.72)$*,11,'-
>$?,))$"#)%$07&#$*#-.2)$8,1#21#1$2-8$8#@#)'%$*'&#$,-.#)),/#-.$2&A+20.1$
>$?,))$,*%&'@#$."#$*#0"2-,1*1$+'&$0'))#0A@#$8#0,1,'-1
!"#1#$28@2-0#1$?,))$(#$0&,A02)$.'$'+$'7&$1'0,#.4B1
>$+7.7&#$%&'1%#&,.4>$#8702A'-3$$"#2)."3$$1#07&,.4$
!"#$%&'()#*$'+$,-.#)),/#-0#1$"'2$,.$3&,4#4$,-$."#$(&3,-$3-5$"'2$.'$&#%),03.#$,.$,-$*30",-#4
Friday, September 9, 2011
Friday, September 9, 2011
!"#$%&'(#)*+&,#$-.'$/0)#++12#0,#3
$2$/),*%1#$,-.'$."#$*2&C#.%)20#$
+'&$."#$10,#-0#$2-8$."#$#-/,-##&,-/$'+$,-.#)),/#-0#
Friday, September 9, 2011
The new golden age: trying (again)
to understand and replicate Intelligence
• The first and last attempt in history was ~ fifty years ago -- at the beginning of Artificial Intelligence (Dartmouth workshop,1956)
• It is time to try again because
– we have seen in the last decade and we are going to see in the next 5 years several intelligent apps from the first wave of AI+ML research: search + vision + speech + language + Go...
– ...but these are not intelligent machines and this is not enough to understand intelligence...
– ...we need to integrate computation with neuroscience and cognition for the next jump in understanding
• Thus , the next wave of research on Integrated Intelligence...
Friday, September 9, 2011
7
Syllabus
Friday, September 9, 2011
Since the introduction of learning techniques in AI+vision, there have been significant (and not well known) advances in a few domains:
• Games (Checkers, Chess, Go, Poker, soccer???...) • Search (Google,...) • Speech recognition (Nuance,...)• Natural Language/Knowledge retrieval (Watson and Jeopardy,..)• Vision (Kinect and see later)
!"#$%&'(("&&"&%)*%+#(,)*"%$"#-*)*.%/#*0%123
Friday, September 9, 2011
9
Integration is fundamental
• Cognition, brain science, computation, biology, • Perception, action, language, reasoning, social, • Integration of innate structures with learning
Friday, September 9, 2011
10
Integration is early
Language and perception
Direction of gaze
Perception and planning
Action recognition
Friday, September 9, 2011
Rules of the game:
• active participation in class• survey at http://www.surveymonkey.com/s/FVK6NGD about
• whiteboard presentations in I2camps ?or
• Wikipedia articles ? or ....
Slides on the Web site http://web.mit.edu/9.s915/www/Staff mailing list is [email protected] Student list will be [email protected] Please fill form!
send email to us if you want to be added to mailing list
Classhttp://web.mit.edu/9.s915/www/
Friday, September 9, 2011
!"#$%&'#(&#$)(*$+"#$,(-'(##.'(-$/0$
1(+#22'-#(&#
Shimon Ullman + Tomaso Poggio(with Owen Ardron Lewis)
Department of Brain & Cognitive SciencesMassachusetts Institute of Technology
Cambridge, MA 02139 USA
9.S915 in Fall 2011
Friday, September 9, 2011
Class 1: Learning
1. Statistical Learning Theory: an informal introduction
2. The framework of statistical learning3. Conditions, principles and algorithms for learning
(necessary and sufficient conditions, sparsity smoothness etc)
4. Learning and representation in the Brain • Beyond Binary classification: Taxonomy and
Structures • Applications Image and Video Classification
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Sung & Poggio 1995, also Kanade& Baluja....
!"#$%&'()'"*&%&+(,-.-)/(012(3'"*4("+5
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Sung & Poggio 1995
!"#$%&'()'"*&%&+(,-.-)/(012(3'"*4("+5
Friday, September 9, 2011
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Face detection is now available in digital cameras (commercial systems)
!"#$%&'()'"*&%&+(,-.-)/(012(3'"*4("+5
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman
!"#$%&'()'"*&%&+(,-.-)/(067(3'"*4("+5
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman
!"#$%&'()'"*&%&+(,-.-)/(067(3'"*4("+5
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman
!"#$%&'()'"*&%&+(,-.-)/(067(3'"*4("+5
Friday, September 9, 2011
Friday, September 9, 2011
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Pedestrian and car detection are also “solved” (commercial systems, MobilEye)
!"#$%&'()'"*&%&+(
Friday, September 9, 2011
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Pedestrian and car detection are also “solved” (commercial systems, MobilEye)
!"#$%&'()'"*&%&+(
Friday, September 9, 2011
Friday, September 9, 2011
LEARNING THEORY+
ALGORITHMS
COMPUTATIONAL NEUROSCIENCE:
models+experimentsHow visual cortex works
Theorems on foundations of learning
Predictive algorithms
Pedestrian and car detection are also “solved” (commercial systems)
!"#$%&'()'"*&%&+(
Friday, September 9, 2011
Friday, September 9, 2011
http://www.volvocars.com/us/all-cars/volvo-s60/pages/5-things.aspx?p=5
!"#"$%&'(##"''"'&)$&*+&,$-&./0{(%"4&5)')1$
Friday, September 9, 2011
Since the introduction of learning/statistics in the ‘90s, computer vision has made significant (and not well known) advances in a few problem areas:
• Face identification under controlled conditions is “solved” (commercial systems) at ~ human level performance
• Face detection is available in cheap digital cameras (commercial systems)
• Pedestrian and car detection are also “solved” (commercial systems)
!",6&'(##"''"'&)$&*+0{(%"4&5)')1$
Friday, September 9, 2011
INPUT OUTPUTfGiven a set of l examples (data)
Question: find function f such that
is a good predictor of y for a future input x (fitting the data is not enough!)
!"#"$%"$&#'()*#+,$,-(./*0+12%34*+5$%*6('*#+,$,-
Friday, September 9, 2011
y
x
= data from f
= approximation of f
= function f
Generalization: estimating value of function where there are no data (good generalization means predicting the function well; important is for empirical or validation error to be a good proxy of the prediction error)
!"#"$%"$&#'()*#+,$,-(./*0+124+*6$&"$0,7(,0"(&3+5*(8$""$,-
Friday, September 9, 2011
!"#"$%"$&#'()*#+,$,-(./*0+124#+"(09(:#$,%"+*#:(:#"/(,0"(;3%"(%"#"$%"$&%
<=#'$#,"7(=#4,$>7(!:#'*7(?*50+*@@@A
Friday, September 9, 2011
Mukherjee, Niyogi, Poggio, Rifkin, Nature, March 2004
An algorithm is stable if the removal of any one training sample from any large set of samples results almost always in a small change in the learned function.
For ERM the following theorem holds
ERM on H generalizes if and only if the hypothesis space H is uGC and if and only if ERM on H is CVloo stable
!"#$%&'"$()"*)+,&'('-&.)/0&1$2$3)450"167810%2-'92,6:);--&<)1&="1:)(,&>2.2,6
Friday, September 9, 2011
It is possible to show that under quite general assumptions
generalization and well-posedness are equivalent, eg one implies the other.
Stability implies generalization:
a stable solution is predictive and (for ERM) also viceversa.
ERM ERM
H is u GC
ERM
CVloo
stability
For generalsymmetric algorithms
GeneralizationGeneralization+ consistency
Eloo+EEloo stability
ERM
Mukherjee, S., P. Niyogi, T. Poggio and R. Rifkin. Learning Theory: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization, Advances in Computational Mathematics, 25, 161-193, 2006; see also Nature, 2005
Friday, September 9, 2011
Conditions for generalization in learning theory
have deep, almost philosophical, implications:
they can be regarded as (equivalent) conditions that guarantee a
theory to be predictive (that is scientific)
‣$$+"#/.3$456+$7#$&"/6#($0./4$)$64)22$6#+
‣ $+"#/.3$6"/52*$(/+$&")(-#$45&"$8'+"$(#8$*)+)9994/6+$/0$+"#$:4#$
!"#"$%"$&#'()*#+,$,-(./*0+12"/*0+*:%(*B"*,6$,-(903,6#"$0,%(09(
'*#+,$,-("/*0+1(
Friday, September 9, 2011
Conditions for generalization in learning theory
have deep, almost philosophical, implications:
they can be regarded as equivalent conditions that guarantee a
theory to be predictive (that is scientific)
‣ theory must be chosen from a small set
‣ theory should not change much with new data...most of the time
!"#"$%"$&#'()*#+,$,-(./*0+12"/*0+*:%(*B"*,6$,-(903,6#"$0,%(09(
'*#+,$,-("/*0+1(
Friday, September 9, 2011
Equation includes splines, Radial Basis Functions and SVMs (depending on choice of V).
implies
For a review, see Poggio and Smale, 2003; see also Schoelkopf and Smola, 2002; Bousquet, O., S. Boucheron and G. Lugosi; Cucker and Smale; Zhou and Smale...
!"#"$%"$&#'()*#+,$,-(./*0+12C*+,*'(D#&/$,*%(*-(E*-3'#+$F#"$0,($,(
ECG!(
Friday, September 9, 2011
has a Bayesian interpretation:data term is a model of the noise and the stabilizer is a prior on the hypothesis space of functions f. That is, Bayes rule
leads to
!"#"$%"$&#'()*#+,$,-(./*0+12&'#%%$&#'(#'-0+$"/:%2(E*-3'#+$F#"$0,((
Friday, September 9, 2011
implies
Classical learning algorithms: Kernel Machines (eg Regularization in RKHS)
Remark (for later use):
Kernel machines correspond toshallow networks
X1
f
Xl
!"#"$%"$&#'()*#+,$,-(./*0+12E*-3'#+$F#"$0,
Friday, September 9, 2011
Two connected and overlapping strands in learning theory:
! Bayes, hierarchical models, graphical models…
! Statistical learning theory, regularization (closer to classical math, functional analysis+probability theory+empirical process theory…)
!"#"$%"$&#'()*#+,$,-(./*0+12,0"*
Friday, September 9, 2011
Friday, September 9, 2011
FOUNDATIONS OF COMPUTATIONAL ANDSTATISTICAL LEARNING
FOUNDATIONS OF COMPUTATIONAL AND STATISTICAL LEARNING
Tomaso Poggio and Lorenzo Rosascotp,[email protected]
September 9, 2011
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
COMPUTATIONAL LEARNING
STATISTICAL LEARNING THEORY
Learning is viewed as an inference problem from possibly smallsamples of high dimensional, noisy data.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LEARNING TASKS AND MODELS
SupervisedSemisupervisedUnsupervisedOnlineTransductiveActiveVariable SelectionReinforcement.....
One can consider the data to be created in a deterministic stochasticor even adversarial way.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
WHERE TO START?
STATISTICAL AND SUPERVISED LEARNING
Statistical Models are essentially to deal with noise sampling andother sources of uncertainty.Supervised Learning is by far the most understood class ofproblems
REGULARIZATION
Regularization provides a a fundamental framework to solvelearning problems and design learning algorithms.We present a set of tools and techniques which are at thecore of a multitude of different ideas and developments,beyond supervised learning.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
WHERE TO START?
STATISTICAL AND SUPERVISED LEARNING
Statistical Models are essentially to deal with noise sampling andother sources of uncertainty.Supervised Learning is by far the most understood class ofproblems
REGULARIZATION
Regularization provides a a fundamental framework to solvelearning problems and design learning algorithms.We present a set of tools and techniques which are at thecore of a multitude of different ideas and developments,beyond supervised learning.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
PLAN
Part I: Basic Concepts and NotationPart II: Foundational ResultsPart III: Algorithms
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
WARNING
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
PROBLEM AT A GLANCE
Given a training set of input-output pairs
Sn = (x1, y1), . . . , (xn, yn)
find fS such that fS(x) ∼ y .
e.g. the x �s are vectors and the y �s discrete labels in classificationand real values in regression.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LEARNING IS INFERENCE
For the above problem to make sense we need to assume input andoutput to be related!
STATISTICAL AND SUPERVISED LEARNING
Each input-output pairs is a sample from a fixed but unknowndistribution p(x , y).Under general condition we can write
p(x , y) = p(y |x)p(x).
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
NOISE ...
YX
p (y|x)
x
the same x can generate different y (according to p(y |x)):
the underlying process is deterministic, but there is noise in themeasurement of y ;the underlying process is not deterministic;the underlying process is deterministic, but only incompleteinformation is available.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
...AND SAMPLING
p(x)
y
x
x
EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING
the marginal p(x) distributionmight model
errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
...AND SAMPLING
p(x)
y
x
x
EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING
the marginal p(x) distributionmight model
errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
...AND SAMPLING
x
p(x)
y
x
EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING
the marginal p(x) distributionmight model
errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
...AND SAMPLING
y
x
p(x)
x
EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING
the marginal p(x) distributionmight model
errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LEARNING, GENERALIZATION AND OVERFITTING
PREDICTIVITY OR GENERALIZATION
Given the data, the goal is to learn how to make decisions/predictionsabout future data / data not belonging to the training set .
The problem is : Avoid overfitting!!
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
OVER-FITTING ILLUSTRATED
Among many possible solutions how can we choose one that correclyapplies to previously unseen data?
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
OVER-FITTING ILLUSTRATED
Among many possible solutions how can we choose one that correclyapplies to previously unseen data?
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
OVER-FITTING ILLUSTRATED
Among many possible solutions how can we choose one that correclyapplies to previously unseen data?
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
OVER-FITTING ILLUSTRATED
Among many possible solutions how can we choose one that correclyapplies to previously unseen data?
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LOSS FUNCTIONS
As we look for a deterministic estimator in stochastic environment wehave to expect to incur into errors.
LOSS FUNCTION
A loss function V : R× Y determines the price V (f (x), y) we pay,predicting f (x) when in fact the true output is y .
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LOSS FUNCTIONS FOR REGRESSION
The most common is the square loss or L2 loss
V (f (x), y) = (f (x)− y)2
Absolute value or L1 loss:
V (f (x), y) = |f (x)− y |
Vapnik’s �-insensitive loss:
V (f (x), y) = (|f (x)− y |− �)+
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LOSS FUNCTIONS FOR (BINARY) CLASSIFICATION
The most intuitive one: 0− 1-loss:
V (f (x), y) = θ(−yf (x))
(θ is the step function)The more tractable hinge loss:
V (f (x), y) = (1− yf (x))+
And again the square loss or L2 loss
V (f (x), y) = (1− yf (x))2
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LOSS FUNCTIONS
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
EXPECTED RISK
A good function should incur in only a few errors. We need a way toquantify this idea.
EXPECTED RISK
The quantity
I[f ] =
�
X×YV (f (x), y)p(x , y)dxdy .
is called the expected error and measures the loss averaged over theunknown distribution.
A good function should have small expected risk.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
TARGET FUNCTION
The expected risk is usually defined on some large space F possibledependent on p(x , y).The best possible error is
inff∈F
I[f ]
The infimum is often achieved at a minimizer f∗ that we call targetfunction.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
LEARNING ALGORITHMS AND GENERALIZATION
A learning algorithm can be seen as a map
Sn → fn
from the training set to the a set of candidate functions.
Ideally we would likeI[fn] ∼ I[f∗].
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
REMINDER
CONVERGENCE IN PROBABILITY
Let {Xn} be a sequence of bounded random variables. Then
limn→∞
Xn = X in probability
if∀� > 0 lim
n→∞P{|Xn − X | ≥ �} = 0
CONVERGENCE IN EXPECTATION
Let {Xn} be a sequence of bounded random variables. Then
limn→∞
Xn = X in expectation
iflim
n→∞E(|Xn − X |) = 0
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
CONSISTENCY AND UNIVERSAL CONSISTENCY
A basic requirement is for the algorithm to get better as we get moredata...
CONSISTENCY
We say that an algorithm is consistent if
∀� > 0 limn→∞
P{I[fn]− I[f∗] ≥ �} = 0
UNIVERSAL CONSISTENCY
We say that an algorithm is consistent if for all probability p,
∀� > 0 limn→∞
P{I[fn]− I[f∗] ≥ �} = 0
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
SAMPLE COMPLEXITY AND LEARNING RATES
The above requirements are asymptotic.
ERROR RATES
A more practical question is, how fast does the error decay? This canexpressed as
P{I[fn]− I[f∗] ≥ �(n, δ)} ≥ 1− δ.
SAMPLE COMPLEXITY
Or equivalently, ’ how many point do we need to achieve an error �with a prescribed probability δ?’This can expressed as
P{I[fn]− I[f∗] ≥ �} ≥ 1− δ,
for n = n(�, δ).
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
EMPIRICAL RISK AND GENERALIZATION
how do we design learning algorithms? A first idea is to look at thedata...
EMPIRICAL RISK
In practice we can try to replace the expected risk by the empiricalrisk
In[f ] =1n
n�
i=1
V (f (xi), yi).
GENERALIZATION ERROR
The effectiveness of such an approximation error is captured by thegeneralization error,
P{|I[fn]− In[fn]| ≥ �} ≥ 1− δ,
for n = n(�, δ).
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
EMPIRICAL RISK AND GENERALIZATION
how do we design learning algorithms? A first idea is to look at thedata...
EMPIRICAL RISK
In practice we can try to replace the expected risk by the empiricalrisk
In[f ] =1n
n�
i=1
V (f (xi), yi).
GENERALIZATION ERROR
The effectiveness of such an approximation error is captured by thegeneralization error,
P{|I[fn]− In[fn]| ≥ �} ≥ 1− δ,
for n = n(�, δ).
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
EMPIRICAL RISK AND GENERALIZATION
how do we design learning algorithms? A first idea is to look at thedata...
EMPIRICAL RISK
In practice we can try to replace the expected risk by the empiricalrisk
In[f ] =1n
n�
i=1
V (f (xi), yi).
GENERALIZATION ERROR
The effectiveness of such an approximation error is captured by thegeneralization error,
P{|I[fn]− In[fn]| ≥ �} ≥ 1− δ,
for n = n(�, δ).
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
SOME (THEORETICAL AND PRACTICAL) QUESTIONS
How do we go from data to an actual algorithm?Is minimizing error on the data a good idea?Are there fundamental limitations in what we can and cannotlearn?
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
PLAN
Part I: Basic Concepts and Notation
Part II: Foundational ResultsPart III: Algorithms
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
NO FREE LUNCH THEOREM
UNIVERSAL CONSISTENCY
Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?
YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...
NO FREE LUNCH THEOREM
Given a number of points (and a confidence), can we always achievea prescribed error?NO!
Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
NO FREE LUNCH THEOREM
UNIVERSAL CONSISTENCY
Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...
NO FREE LUNCH THEOREM
Given a number of points (and a confidence), can we always achievea prescribed error?NO!
Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
NO FREE LUNCH THEOREM
UNIVERSAL CONSISTENCY
Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...
NO FREE LUNCH THEOREM
Given a number of points (and a confidence), can we always achievea prescribed error?
NO!
Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
NO FREE LUNCH THEOREM
UNIVERSAL CONSISTENCY
Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...
NO FREE LUNCH THEOREM
Given a number of points (and a confidence), can we always achievea prescribed error?NO!
Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
HYPOTHESES SPACE
In statistical learning a first prior assumption amounts to choosing asuitable space of hypotheses H.
Examples: linear functions, polynomial, RBFs, Sobolev Spaces...A learning algorithm A is then a map from the data space to H,
A(Sn) = fn ∈ H
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
HYPOTHESES SPACE
In statistical learning a first prior assumption amounts to choosing asuitable space of hypotheses H.
Examples: linear functions, polynomial, RBFs, Sobolev Spaces...
A learning algorithm A is then a map from the data space to H,
A(Sn) = fn ∈ H
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
HYPOTHESES SPACE
In statistical learning a first prior assumption amounts to choosing asuitable space of hypotheses H.
Examples: linear functions, polynomial, RBFs, Sobolev Spaces...A learning algorithm A is then a map from the data space to H,
A(Sn) = fn ∈ H
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
EMPIRICAL RISK MINIMIZATION
How do we choose H? How do we design A?
ERMA prototype algorithm in statistical learning theory is Empirical RiskMinimization:
minf∈H
In[f ].
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
KEY THEOREM(S) ILLUSTRATED
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
KEY THEOREM(S)
UNIFORM GLIVENKO-CANTELLI CLASSES
We say that H is a uniform Glivenko-Cantelli (uGC) class, if for all p,
∀� > 0 limn→∞
P�
supf∈H
|I[f ]− In[f ]| > �
�= 0.
A necessary and sufficient condition for consistency of ERM is that His uGC.See: [Vapnik and Cervonenkis (71), Alon et al (97), Dudley, Gine, and Zinn(91)].
In turns the UGC property is equivalent to requiring H to have finitecapacity: Vγ dimension in general and VC dimension in classification.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
KEY THEOREM(S)
UNIFORM GLIVENKO-CANTELLI CLASSES
We say that H is a uniform Glivenko-Cantelli (uGC) class, if for all p,
∀� > 0 limn→∞
P�
supf∈H
|I[f ]− In[f ]| > �
�= 0.
A necessary and sufficient condition for consistency of ERM is that His uGC.See: [Vapnik and Cervonenkis (71), Alon et al (97), Dudley, Gine, and Zinn(91)].
In turns the UGC property is equivalent to requiring H to have finitecapacity: Vγ dimension in general and VC dimension in classification.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
STABILITY
z = (x , y)S = z1, ..., znSi = z1, ..., zi−1, zi+1, ...zn
CV STABILITY
A learning algorithm A is CV loo stability if for each n there exists aβ(n)
CV and a δ(n)CV such that for all p
P�|V (fSi , zi)− V (fS, zi)| ≤ β(n)
CV
�≥ 1− δ(n)
CV ,
with β(n)CV and δ(n)
CV going to zero for n →∞.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
KEY THEOREM(S) ILLUSTRATED
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
ERM AND ILL-POSEDNESS
Ill posed problems often arise if one tries to infer general laws fromfew data
the hypothesis space is too largethere are not enough data
In general ERM leads to ill-posedsolutions because
the solution may be too complex
it may be not unique
it may change radically whenleaving one sample out
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
REMINDER
ILL POSED PROBLEM
A mathematical problem is well posed in the sense of Hadamard ifthe solution existsthe solution is uniquethe solution depends continuously on the data
If a problem is not well posed it is called ill posed.
Regularization is the classical way to restore well posedness (andensure generalization).
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
BAYES INTERPRETATION
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
PLAN
Part I: Basic Concepts and Notation
Part II: Foundational ResultsPart III: Algorithms
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
HYPOTHESES SPACE
We are going to look at hypotheses spaces which are reproducingkernel Hilbert spaces.
RKHS are Hilbert spaces of point-wise defined functions.They can be defined via a reproducing kernel, which is asymmetric positive definite function.functions in the space are (the completion of) linear combinations
f (x) =p�
i=1
K (x , xi)ci .
the norm in the space is a natural measure of complexity
�f�2H
=p�
j,i=1
K (xj , xi)cicj .
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
EXAMPLES OF PD KERNELS
Very common examples of symmetric pd kernels are• Linear kernel
K (x , x �) = x · x �
• Gaussian kernel
K (x , x �) = e−�x−x��2
σ2 , σ > 0
• Polynomial kernel
K (x , x �) = (x · x � + 1)d , d ∈ N
For specific applications, designing an effective kernel is achallenging problem.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
KERNEL AND FEATURES
Often times kernels, are defined through a dictionary of features
D = {φj , i = 1, . . . , p | φj : X → R, ∀j}
setting
K (x , x �) =p�
i=1
φj(x)φj(x �).
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
IVANOV REGULARIZATION
We can regularize by explicitly restricting the hypotheses space H —for example to a ball of radius A
IVANOV REGULARIZATION
minf∈H
1n
n�
i=1
V (f (xi), yi)
subject to�f�2
H≤ A.
The above algorithm corresponds to a constrained optimizationproblem.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
TIKHONOV REGULARIZATION
Regularization can also be done implicitly via penalizartion
TIKHONOV REGULARIZARION
arg minf∈H
1n
n�
i=1
V (f (xi), yi) + λ �f�2H
.
The above algorithm can be seen as the Lagrangian formulation of aconstrained optimization problem.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
THE REPRESENTER THEOREM
AN IMPORTANT RESULT
The minimizer over the RKHS H, fS, of the regularized empiricalfunctional
IS[f ] + λ�f�2H,
can be represented by the expression
fn(x) =n�
i=1
ciK (xi , x),
for some (c1, . . . , cn) ∈ R.
Hence, minimizing over the (possibly infinite dimensional) Hilbertspace, boils down to minimizing over Rn.
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
SVM AND RLS
The way the coefficients c = (c1, . . . , cn) are computed depend on theloss function choice. .
RLS: Let Let y = (y1, . . . , yn) and Ki,j = K (xi , xj) thenc = (K + λnI)−1y.
SVM: Let αi = yici and Qi,j = yiK (xi , xj)yj
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
SVM AND RLS
The way the coefficients c = (c1, . . . , cn) are computed depend on theloss function choice. .
RLS: Let Let y = (y1, . . . , yn) and Ki,j = K (xi , xj) thenc = (K + λnI)−1y.SVM: Let αi = yici and Qi,j = yiK (xi , xj)yj
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
BAYES INTERPRETATION
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
REGULARIZATION APPROACH
More generally we can consider:
In(f ) + λR(f )
where, R(f ) is a regularizing functional and λ is a regularizationparameter trading-off between the two terms.
Sparsity based methodsManifold learningMulticlass...
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
SUMMARY
statistical learning as framework to deal with uncertainty in data.non free lunch and key theorem: no prior, no learning.regularization as a fundamental tool for enforcing a prior andensure stability and generalization
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
KERNEL AND DATA REPRESENTATION
In the above reasoning the kernel and the hypotheses space define arepresentation/parameterization of the problem and hence play aspecial role.
Where do they come from?
There are a few off the shelf choice (Gaussian, polynomial etc.)Often they are the product of problem specific engineering.
Are there principles– applicable in a wide range of situations– todesign effective data representation?
Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning
30
)*#+,$,-2(D#"/7(H,-$,**+$,-(<,0IA
More in general, since the introduction of supervised learning techniques, AI vision has made significant (and not well known) advances in a few domains:
• Vision (see above)• Graphics and morphing (see above)• Natural Language/Knowledge retrieval (Watson and Jeopardy)• Speech recognition (Nuance) • Games (Go, chess,...)
Friday, September 9, 2011
However, the full problem of vision (and intelligence!) is still not solved. it is likely the problems are not solvable with “classical” learning techniques.A reasonable bet is on novel neuroscience/cognitive inspired techniques:
• Recognition at human level of thousands of objects
• Scene understanding: the Turing test for Vision
)*#+,$,-2(D#"/7(H,-$,**+$,-(<,0IA
Friday, September 9, 2011
8$'(#59:;<"=5&">(9"+%#(5?(<$'(
@'&<*">(4<*'"9tomaso poggioMcGovern InstituteI2, CBCL,BCS, CSAILMIT
Very preliminary, provocative, possibly completely wrong, ...
Friday, September 9, 2011
VisionA Computational Investigation into the Human Representation and Processing of Visual InformationDavid MarrForeword by Shimon UllmanAfterword by Tomaso Poggio
David Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood. Researchers from a range of brain and cognitive sciences have long valued Marr's creativity, intellectual power, and ability to integrate insights and data from neuroscience, psychology, and computation. This MIT Press edition makes Marr's influential work available to a new generation of students and scientists.
In Marr's framework, the process of vision constructs a set of representations, starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment. A central theme, and one that has had far-reaching influence in both neuroscience and cognitive science, is the notion of different levels of analysis—in Marr's framework, the computational level, the algorithmic level, and the hardware implementation level.
Now, thirty years later, the main problems that occupied Marr remain fundamental open problems in the study of perception. Vision provides inspiration for the continui
A%4%5&B(
C$"<(%4(C$'*'
(
Friday, September 9, 2011
Desimone & Ungerleider 1989
ventral stream
8$'(@'&<*">(4<*'"9
Movshon et al.
Friday, September 9, 2011
The ventral stream hierarchy: V1, V2, V4, IT
A gradual increase in thereceptive field size, in the “complexity” of the preferred stimulus, in “invariance” to
position and scale changes
Kobatake & Tanaka, 1994
7)')1$0&5"$%4,6&'%4",2&
Friday, September 9, 2011
*Modified from (Gross, 1998)
[software available onlinewith CNS (for GPUs)]
Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007
((D'#5+&%=5&(%&(<$'(A'&<*">(E<*'"9B(
#>"44%#">/(F4<"&G"*GH(95G'>(I5?(%99'G%"<'(*'#5+&%=5&J
Friday, September 9, 2011
[software available online] Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007
• It is in the family of “Hubel-Wiesel” models (Hubel & Wiesel, 1959: qual. Fukushima, 1980: quant; Oram & Perrett, 1993: qual; Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998: not-bio; Amit & Mascaro, 2003: not-bio; Hinton, LeCun, Bengio not-bio; Deco & Rolls 2006…)
• Hardwired invariance to translation and scale
• As a biological model of object recognition in the ventral stream – from V1 to PFC -- it is perhaps the most quantitatively faithful to known neuroscience data
((D'#5+&%=5&(%&(A%4;">(-5*<'KB(LM#>"44%#">(95G'>H/(
4'>'#=@'("&G(%&@"*%"&<
Friday, September 9, 2011
V1:
Simple and complex cells tuning (Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982)
MAX-like operation in subset of complex cells (Lampl et al 2004)V2:
Subunits and their tuning (Anzai, Peng, Van Essen 2007)V4:
Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999)
MAX-like operation (Gawne et al 2002)Two-spot interaction (Freiwald et al 2005)
Tuning for boundary conformation (Pasupathy & Connor 2001, Cadieu, Kouh, Connor et al., 2007)Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996)
IT:
Tuning and invariance properties (Logothetis et al 1995, paperclip objects)Differential role of IT and PFC in categorization (Freedman et al 2001, 2002, 2003)
Read out results (Hung Kreiman Poggio & DiCarlo 2005)Pseudo-average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo 2007)
Human:
Rapid categorization (Serre Oliva Poggio 2007)Face processing (fMRI + psychophysics) (Riesenhuber et al 2004; Jiang et al 2006)
!"#$%$&'"&%()*##+,-$.%$+)/-+#(01"0)&-20"03#23)."3')-$)4$#+"&3))2#5$%()+%3%
((!5G'>(FC5*N4HB
%<("##5;&<4(?5*(O%<4(5?(:$34%5>5+3(P(:43#$5:$34%#4
Friday, September 9, 2011
Models of the ventral stream in cortexperform well compared to
engineered computer vision systems (in 2006)on several databases
Bileschi, Wolf, Serre, Poggio, 2007
((!5G'>(FC5*N4HB
%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>
Friday, September 9, 2011
Models of the ventral stream in cortexperform well compared to
engineered computer vision systems (in 2006)on several databases
Bileschi, Wolf, Serre, Poggio, 2007
((!5G'>(FC5*N4HB
%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>
Friday, September 9, 2011
Models of the ventral stream in cortexperform well compared to
engineered computer vision systems (in 2006)on several databases
Bileschi, Wolf, Serre, Poggio, 2007
((!5G'>(FC5*N4HB
%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>
Friday, September 9, 2011
Models of cortex lead to better systems for action recognition in videos: automatic phenotyping of mice
human agreement
72%
proposed system
77%
commercial system
61%
chance 12%
Performance
;"5)(-$<$=)../+#<$>5<$?"'2()('<$@/--'/<$A5+&"$%+##2#<$%#..#<$$B)+5.#$C/445('&)+/(6<$DEFE
((!5G'>(FC5*N4HB
%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>
Friday, September 9, 2011
Models of cortex lead to better systems for action recognition in videos: automatic phenotyping of mice
human agreement
72%
proposed system
77%
commercial system
61%
chance 12%
Performance
;"5)(-$<$=)../+#<$>5<$?"'2()('<$@/--'/<$A5+&"$%+##2#<$%#..#<$$B)+5.#$C/445('&)+/(6<$DEFE
((!5G'>(FC5*N4HB
%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>
Friday, September 9, 2011
((D'#5+&%=5&(%&(A%4;">(-5*<'KB(
#59:;<"=5&("&G(9"<$'9"=#">(<$'5*3
For 10years+...
I did not manage to understand how model works....
we need a theory -- not only a model!
Friday, September 9, 2011
Why do hierarchical architectures work?
Friday, September 9, 2011
• We have “factored out” viewpoint, scale, translation: is it now still “difficult” for a classifier to categorize dogs vs horses?
Leibo, 2011
)?5&,)2(),50)<&2$)%2@-#.,6)"*)">A0-,)10-"3$2'"$B
C0"<0,12-),1&$(*"1<&'"$()"1)2$,1&-.&(()9&12&>2.2,6B
Friday, September 9, 2011
D10),1&$(*"1<&'"$(),50)<&2$)%2@-#.,6*"1)>2"."32-&.)">A0-,)10-"3$2'"$B
Leibo, 2011 Friday, September 9, 2011
Theory: 6 main ideas, 4 key theorems
1. The computational goal of the ventral stream is to learn transformations (affine in R^2) during development and be invariant to them (McCulloch+Pitts 1945, Hoffman 1966, Jack Gallant, 1993,...)
2. A memory-based module, similar to simple-complex cells, can learn from a video of an image transforming (eg translating) to provide a signature to any single image of any object that is invariant under the transformation (Invariance Lemma)
3. Since the group Aff(2,R) can be factorized as a semidirect product of translations and GL(2, R), storage requirements of the memory-based module can be reduced by order of magnitudes (Factorization Theorem). I argue that this is the evolutionary reason for hierarchical architecture in the ventral stream.
Friday, September 9, 2011
4. Evolution selects translations by using small apertures in the first layer (the Stratification Theorem). Size of receptive fields in the sequence of ventral stream area automatically selects specific transformations (translations, scaling, shear and rotations).
5. If Hebb-like rule active at at synapses then the storage requirements can be reduced further: tuning of cell in each layer -- during development -- will mimic the spectrum (SVD) of stored templates (Linking Theorem).
6. At the top layers invariances learned for “places” and for class-specific transformations such changes of expression of a face or rotation in depth of a face or different poses of a body ===> patches for places and class-specific patches (faces, bodies...)
6 main ideas...
Friday, September 9, 2011
CaveatI am speaking about 2 separate stages
1. learning (during development)2. “run time” processing of images
Friday, September 9, 2011
Part I:Memory-based invariance module
Friday, September 9, 2011
E"'9&'"$7)%2(-12<2$&'"$)$",)10-"$(,1#-'"$7F"5$("$G/2$%0$(,1&#((
a few measurements are needed for selectivity -- not so important which ones -- but much fewer than pixels...
Friday, September 9, 2011
Example: a highly efficient signature for OCR
Friday, September 9, 2011
Hierarchical features for selectivity or invariance?
V1 provides very discriminative signatures....
Friday, September 9, 2011
Abstraction: learning by a memory-based invariance module
Friday, September 9, 2011
A templatebook
time
Templatebooks
Friday, September 9, 2011
Friday, September 9, 2011
The invariance lemma
Friday, September 9, 2011
Part II:Why evolution uses hierarchies and how
This is about evolution(mostly before development)
Friday, September 9, 2011
Representation of the affine group
Friday, September 9, 2011
Factorization lemma (why hierarchies)
Friday, September 9, 2011
Friday, September 9, 2011
Friday, September 9, 2011
Friday, September 9, 2011
Stratification conjecture (why size of receptive fields increase)
Stratification Conjecture:For aperture size increasing from zero (relative to resolution of the filtered “image”) the first transformation which can be estimated above the noise of measurements is translation.
Friday, September 9, 2011
Stratification conjecture (why size of receptive fields increase)
Stratification Conjecture:For aperture size increasing from zero (relative to resolution of the filtered “image”) the first transformation which can be estimated above the noise of measurements is translation.
Friday, September 9, 2011
Stratification conjecture (why size of receptive fields increase)
Stratification Conjecture:For aperture size increasing from zero (relative to resolution of the filtered “image”) the first transformation which can be estimated above the noise of measurements is translation.
Friday, September 9, 2011
Friday, September 9, 2011
Friday, September 9, 2011
Friday, September 9, 2011
…
Text
Layered architecturelooking at neural images in the layer below, through small aperturescorresponding to receptive fields, on a 2D lattice
)6%7#$#+)%$&'"3#&35$#0
Friday, September 9, 2011
Friday, September 9, 2011
Part III:Templatebooks and development of
cells tuning
Mostly development stage
Friday, September 9, 2011
Memory-based invariance module
Friday, September 9, 2011
+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)H6I),1&$(.&'"$()*1"<)
$&,#1&.)2<&30()
Rust et al. 2005 CarandiniFriday, September 9, 2011
+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)HMI),1&$(.&'"$()*1"<
$&,#1&.)2<&30()
Friday, September 9, 2011
+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)H6I),1&$(.&'"$()
$"2(0
Friday, September 9, 2011
+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)1",&'"$(
Friday, September 9, 2011
D)-&1,""$)"*),50)+7N)-0..
Friday, September 9, 2011
These are predicted features orthogonal to trajectories of Lie group:
for translation (x, y), expansion, rotation
V1
V2/V4?
This is from Hoffman, 1966 cited by Jack GallantFriday, September 9, 2011
Towards V2 and V4???
Friday, September 9, 2011
Why do we care about spectrumof templatebook
eg of sequence of frames portraying transformations
seen by S:C cells during development?
Friday, September 9, 2011
!"'6$4#)(6$+")+$)+$.5(:4#$+"#$4)G$'6$2'H#23$+/$&"//6#$+"#$I&/..#&+J$6'4K2#$$L&/..#6K/(*'(-$+/$+MN$)(*$K./O'*#$+"#$6)4#$O)25#$PP$'(*#K#(*#(+23$/0$!9$Q&./66$&/4K2#G$â$+"'6$6)36$+")+$8'+"$"'-"$K./7)7'2'+3$+"#$6'-()+5.#$'6$'(O).')(+$+/$!$0./4$2)3#.$+/$2)3#.$L0/.$5('0/.4$!N9
Theory: Linking conjecture
Choose a learning rule * for simple cells synaptic plasticity (eg receptive field) such as Oja’s update rule
or Hebb’s rule with normalized weights
Theorem:!"#$%&'!()*'+!%,-.'!/'0'(%&'+!('1'#2.'!3'*$+!('4'120/!&5'!+#'1&(%*!#(-#'(2'+!-6!&5'!&'7#*%&',--8
Friday, September 9, 2011
Top layer/patch example
Friday, September 9, 2011
((D'#5+&%Q%&+("(?"#'(?*59("(G%R'*'&<(@%'C:5%&<
R'#8P+5(#*$5('+6<$+5(#*$+/$0522P0)&#$+#4K2)+#6$0/.$*'S#.#(+$O'#8$)(-2#6
R'#8K/'(+$+/2#.)(+$5('+6$L&/4K2#G$5('+6N
9-*'(%01'!&-!%!&(%0+6-(7%2-0!7%:!,'!*'%(0'$!6(-7!';%7#*'+!-6!%!1*%++
Joel+Jim+tp, 2010
Friday, September 9, 2011
(()'"*&%&+(#>"44(4:'#%S#(<*"&4?5*9"=5&4B(@%'C:5%&<(
%&@"*%"&#'4(?5*(?"#'4
Friday, September 9, 2011
Some of the many implications
8 9-20"03#23)."3'):-03)-,)!/;<=)#>3#2+0)"3?
8 @-3) "2&-20"03#23) ."3') A":-2) B'-$4#C0) BB*A) :#&'%2"0:D)2-E2&-20"03#23) ."3') '"#$%$&'"#0) %2+) &-:4-0-E-2%("37) FG5"((#=)H#:%2???I
8 JK>4(%"20L)!#"2$"&'M%(("0)#N#&3)%2+):-$#)$#)6"O"&%$(-)0"2P(#)&#(()4(%0E&"37
8 O#4$"Q#) %) 05RS#&3) -,) 3$%20,-$:%E-20) +5$"2P) +#Q#(-4:#23)F03$-R-0&-4"&)#2Q"$-2:#23I???&(#%$)4$#+"&E-20T)
8 U:%P#)4$"-$=)0'%4#),#%35$#0=)2%35$%()":%P#0)4$"-$0)%$#)J"$$#(#Q%23L
Friday, September 9, 2011
-5>>"O5*"<5*4(%&(*'#'&<(C5*N
V?)/53&')=)V?)6#"R-=)6?)U0"W=A?)X((:%2=)A?)A:%(#=)6?)Y-0%0&-=)
!?)V'5%2P=)9?)B%2=))@?)K+#(:%2=)K?)/#7#$0=)Z?)O#0":-2#[)6#."0=)A)\-"2#%
Q26/T$$A9$U'#6#("57#.<$!9$%#..#<$%9$C"'HH#.5.<$Q9$V'7'6/(/<$;9$W/5O.'#<$A9$?/5"<$$$;9$X'C).2/<$,9$A'22#.<$$C9$C)*'#5<$Q9$Y2'O)<$C9$?/&"<$$Q9$C)K/((#Z/$<X9$$V)2+"#.<$$$[9$?(/72'&"<$$!9$A)6\5#2'#.<$%9$W'2#6&"'<$$]9$V/20<$,9$C/((/.<$X9$^#.6+#.<$19$])4K2<$%9$C"'HH#.5.<$=9$?.#'4)(<$B9$]/-/+"#:6
F2<)E#,-5
Friday, September 9, 2011