0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/class01_all.pdf · the new golden age:...

164
!"# %&'#(&# )(* +"# ,(-'(##.'(- /0 1(+#22'-#(&# Shimon Ullman + Tomaso Poggio (with Owen Ardron Lewis) Department of Brain & Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139 USA 9.S915 in Fall 2011 Friday, September 9, 2011

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

!"#$%&'#(&#$)(*$+"#$,(-'(##.'(-$/0$

1(+#22'-#(&#

Shimon Ullman + Tomaso Poggio(with Owen Ardron Lewis)

Department of Brain & Cognitive Sciences

Massachusetts Institute of Technology

Cambridge, MA 02139 USA

9.S915 in Fall 2011

Friday, September 9, 2011

Page 2: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Overview

Context for this course: the Intelligence Initiative

Friday, September 9, 2011

Page 3: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

!"#$%&'()#*$'+$,-.#)),/#-0#$,1$'-#$'+$."#$/&#2.$%&'()#*1$,-$10,#-0#3$%&'(2()4$."#$/&#2.#1.5

6#1#2&0"$'-$,-.#)),/#-0#$(4$-#7&'10,#-0#$2-8$0'*%7.#&$10,#-0#$9:;<=$

>$2$/&#2.$,-.#))#0.72)$*,11,'-

>$?,))$"#)%$07&#$*#-.2)$8,1#21#1$2-8$8#@#)'%$*'&#$,-.#)),/#-.$2&A+20.1$

>$?,))$,*%&'@#$."#$*#0"2-,1*1$+'&$0'))#0A@#$8#0,1,'-1

!"#1#$28@2-0#1$?,))$(#$0&,A02)$.'$'+$'7&$1'0,#.4B1

>$+7.7&#$%&'1%#&,.4>$#8702A'-3$$"#2)."3$$1#07&,.4$

!"#$%&'()#*$'+$,-.#)),/#-0#1$"'2$,.$3&,4#4$,-$."#$(&3,-$3-5$"'2$.'$&#%),03.#$,.$,-$*30",-#4

Friday, September 9, 2011

Page 4: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 5: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

!"#$%&'(#)*+&,#$-.'$/0)#++12#0,#3

$2$/),*%1#$,-.'$."#$*2&C#.%)20#$

+'&$."#$10,#-0#$2-8$."#$#-/,-##&,-/$'+$,-.#)),/#-0#

Friday, September 9, 2011

Page 6: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

The new golden age: trying (again)

to understand and replicate Intelligence

• The first and last attempt in history was ~ fifty years ago -- at the beginning of Artificial Intelligence (Dartmouth workshop,1956)

• It is time to try again because

– we have seen in the last decade and we are going to see in the next 5 years several intelligent apps from the first wave of AI+ML research: search + vision + speech + language + Go...

– ...but these are not intelligent machines and this is not enough to understand intelligence...

– ...we need to integrate computation with neuroscience and cognition for the next jump in understanding

• Thus , the next wave of research on Integrated Intelligence...

Friday, September 9, 2011

Page 7: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

7

Syllabus

Friday, September 9, 2011

Page 8: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Since the introduction of learning techniques in AI+vision, there have been significant (and not well known) advances in a few domains:

• Games (Checkers, Chess, Go, Poker, soccer???...) • Search (Google,...) • Speech recognition (Nuance,...)• Natural Language/Knowledge retrieval (Watson and Jeopardy,..)• Vision (Kinect and see later)

!"#$%&'(("&&"&%)*%+#(,)*"%$"#-*)*.%/#*0%123

Friday, September 9, 2011

Page 9: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

9

Integration is fundamental

• Cognition, brain science, computation, biology, • Perception, action, language, reasoning, social, • Integration of innate structures with learning

Friday, September 9, 2011

Page 10: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

10

Integration is early

Language and perception

Direction of gaze

Perception and planning

Action recognition

Friday, September 9, 2011

Page 11: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Rules of the game:

• active participation in class• survey at http://www.surveymonkey.com/s/FVK6NGD about

• whiteboard presentations in I2camps ?or

• Wikipedia articles ? or ....

Slides on the Web site http://web.mit.edu/9.s915/www/Staff mailing list is [email protected] Student list will be [email protected] Please fill form!

send email to us if you want to be added to mailing list

Classhttp://web.mit.edu/9.s915/www/

Friday, September 9, 2011

Page 12: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

!"#$%&'#(&#$)(*$+"#$,(-'(##.'(-$/0$

1(+#22'-#(&#

Shimon Ullman + Tomaso Poggio(with Owen Ardron Lewis)

Department of Brain & Cognitive SciencesMassachusetts Institute of Technology

Cambridge, MA 02139 USA

9.S915 in Fall 2011

Friday, September 9, 2011

Page 13: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Class 1: Learning

1. Statistical Learning Theory: an informal introduction

2. The framework of statistical learning3. Conditions, principles and algorithms for learning

(necessary and sufficient conditions, sparsity smoothness etc)

4. Learning and representation in the Brain • Beyond Binary classification: Taxonomy and

Structures • Applications Image and Video Classification

Friday, September 9, 2011

Page 14: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Sung & Poggio 1995, also Kanade& Baluja....

!"#$%&'()'"*&%&+(,-.-)/(012(3'"*4("+5

Friday, September 9, 2011

Page 15: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Sung & Poggio 1995

!"#$%&'()'"*&%&+(,-.-)/(012(3'"*4("+5

Friday, September 9, 2011

Page 16: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 17: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Face detection is now available in digital cameras (commercial systems)

!"#$%&'()'"*&%&+(,-.-)/(012(3'"*4("+5

Friday, September 9, 2011

Page 18: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman

!"#$%&'()'"*&%&+(,-.-)/(067(3'"*4("+5

Friday, September 9, 2011

Page 19: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman

!"#$%&'()'"*&%&+(,-.-)/(067(3'"*4("+5

Friday, September 9, 2011

Page 20: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman

!"#$%&'()'"*&%&+(,-.-)/(067(3'"*4("+5

Friday, September 9, 2011

Page 21: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 22: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 23: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Pedestrian and car detection are also “solved” (commercial systems, MobilEye)

!"#$%&'()'"*&%&+(

Friday, September 9, 2011

Page 24: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 25: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Pedestrian and car detection are also “solved” (commercial systems, MobilEye)

!"#$%&'()'"*&%&+(

Friday, September 9, 2011

Page 26: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 27: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING THEORY+

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning

Predictive algorithms

Pedestrian and car detection are also “solved” (commercial systems)

!"#$%&'()'"*&%&+(

Friday, September 9, 2011

Page 28: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 29: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

http://www.volvocars.com/us/all-cars/volvo-s60/pages/5-things.aspx?p=5

!"#"$%&'(##"''"'&)$&*+&,$-&./0&#123(%"4&5)')1$

Friday, September 9, 2011

Page 30: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Since the introduction of learning/statistics in the ‘90s, computer vision has made significant (and not well known) advances in a few problem areas:

• Face identification under controlled conditions is “solved” (commercial systems) at ~ human level performance

• Face detection is available in cheap digital cameras (commercial systems)

• Pedestrian and car detection are also “solved” (commercial systems)

!",6&'(##"''"'&)$&*+0&#123(%"4&5)')1$

Friday, September 9, 2011

Page 31: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

INPUT OUTPUTfGiven a set of l examples (data)

Question: find function f such that

is a good predictor of y for a future input x (fitting the data is not enough!)

!"#"$%"$&#'()*#+,$,-(./*0+12%34*+5$%*6('*#+,$,-

Friday, September 9, 2011

Page 32: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

y

x

= data from f

= approximation of f

= function f

Generalization: estimating value of function where there are no data (good generalization means predicting the function well; important is for empirical or validation error to be a good proxy of the prediction error)

!"#"$%"$&#'()*#+,$,-(./*0+124+*6$&"$0,7(,0"(&3+5*(8$""$,-

Friday, September 9, 2011

Page 33: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

!"#"$%"$&#'()*#+,$,-(./*0+124#+"(09(:#$,%"+*#:(:#"/(,0"(;3%"(%"#"$%"$&%

<=#'$#,"7(=#4,$>7(!:#'*7(?*50+*@@@A

Friday, September 9, 2011

Page 34: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Mukherjee, Niyogi, Poggio, Rifkin, Nature, March 2004

An algorithm is stable if the removal of any one training sample from any large set of samples results almost always in a small change in the learned function.

For ERM the following theorem holds

ERM on H generalizes if and only if the hypothesis space H is uGC and if and only if ERM on H is CVloo stable

!"#$%&'"$()"*)+,&'('-&.)/0&1$2$3)450"167810%2-'92,6:);--&<)1&="1:)(,&>2.2,6

Friday, September 9, 2011

Page 35: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

It is possible to show that under quite general assumptions

generalization and well-posedness are equivalent, eg one implies the other.

Stability implies generalization:

a stable solution is predictive and (for ERM) also viceversa.

ERM ERM

H is u GC

ERM

CVloo

stability

For generalsymmetric algorithms

GeneralizationGeneralization+ consistency

Eloo+EEloo stability

ERM

Mukherjee, S., P. Niyogi, T. Poggio and R. Rifkin. Learning Theory: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization, Advances in Computational Mathematics, 25, 161-193, 2006; see also Nature, 2005

Friday, September 9, 2011

Page 36: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Conditions for generalization in learning theory

have deep, almost philosophical, implications:

they can be regarded as (equivalent) conditions that guarantee a

theory to be predictive (that is scientific)

‣$$+"#/.3$456+$7#$&"/6#($0./4$)$64)22$6#+

‣ $+"#/.3$6"/52*$(/+$&")(-#$45&"$8'+"$(#8$*)+)9994/6+$/0$+"#$:4#$

!"#"$%"$&#'()*#+,$,-(./*0+12"/*0+*:%(*B"*,6$,-(903,6#"$0,%(09(

'*#+,$,-("/*0+1(

Friday, September 9, 2011

Page 37: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Conditions for generalization in learning theory

have deep, almost philosophical, implications:

they can be regarded as equivalent conditions that guarantee a

theory to be predictive (that is scientific)

‣ theory must be chosen from a small set

‣ theory should not change much with new data...most of the time

!"#"$%"$&#'()*#+,$,-(./*0+12"/*0+*:%(*B"*,6$,-(903,6#"$0,%(09(

'*#+,$,-("/*0+1(

Friday, September 9, 2011

Page 38: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Equation includes splines, Radial Basis Functions and SVMs (depending on choice of V).

implies

For a review, see Poggio and Smale, 2003; see also Schoelkopf and Smola, 2002; Bousquet, O., S. Boucheron and G. Lugosi; Cucker and Smale; Zhou and Smale...

!"#"$%"$&#'()*#+,$,-(./*0+12C*+,*'(D#&/$,*%(*-(E*-3'#+$F#"$0,($,(

ECG!(

Friday, September 9, 2011

Page 39: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

has a Bayesian interpretation:data term is a model of the noise and the stabilizer is a prior on the hypothesis space of functions f. That is, Bayes rule

leads to

!"#"$%"$&#'()*#+,$,-(./*0+12&'#%%$&#'(#'-0+$"/:%2(E*-3'#+$F#"$0,((

Friday, September 9, 2011

Page 40: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

implies

Classical learning algorithms: Kernel Machines (eg Regularization in RKHS)

Remark (for later use):

Kernel machines correspond toshallow networks

X1

f

Xl

!"#"$%"$&#'()*#+,$,-(./*0+12E*-3'#+$F#"$0,

Friday, September 9, 2011

Page 41: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Two connected and overlapping strands in learning theory:

! Bayes, hierarchical models, graphical models…

! Statistical learning theory, regularization (closer to classical math, functional analysis+probability theory+empirical process theory…)

!"#"$%"$&#'()*#+,$,-(./*0+12,0"*

Friday, September 9, 2011

Page 42: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 43: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

FOUNDATIONS OF COMPUTATIONAL ANDSTATISTICAL LEARNING

FOUNDATIONS OF COMPUTATIONAL AND STATISTICAL LEARNING

Tomaso Poggio and Lorenzo Rosascotp,[email protected]

September 9, 2011

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 44: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

COMPUTATIONAL LEARNING

STATISTICAL LEARNING THEORY

Learning is viewed as an inference problem from possibly smallsamples of high dimensional, noisy data.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 45: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING TASKS AND MODELS

SupervisedSemisupervisedUnsupervisedOnlineTransductiveActiveVariable SelectionReinforcement.....

One can consider the data to be created in a deterministic stochasticor even adversarial way.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 46: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

WHERE TO START?

STATISTICAL AND SUPERVISED LEARNING

Statistical Models are essentially to deal with noise sampling andother sources of uncertainty.Supervised Learning is by far the most understood class ofproblems

REGULARIZATION

Regularization provides a a fundamental framework to solvelearning problems and design learning algorithms.We present a set of tools and techniques which are at thecore of a multitude of different ideas and developments,beyond supervised learning.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 47: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

WHERE TO START?

STATISTICAL AND SUPERVISED LEARNING

Statistical Models are essentially to deal with noise sampling andother sources of uncertainty.Supervised Learning is by far the most understood class ofproblems

REGULARIZATION

Regularization provides a a fundamental framework to solvelearning problems and design learning algorithms.We present a set of tools and techniques which are at thecore of a multitude of different ideas and developments,beyond supervised learning.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 48: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

PLAN

Part I: Basic Concepts and NotationPart II: Foundational ResultsPart III: Algorithms

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 49: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

WARNING

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 50: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

PROBLEM AT A GLANCE

Given a training set of input-output pairs

Sn = (x1, y1), . . . , (xn, yn)

find fS such that fS(x) ∼ y .

e.g. the x �s are vectors and the y �s discrete labels in classificationand real values in regression.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 51: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING IS INFERENCE

For the above problem to make sense we need to assume input andoutput to be related!

STATISTICAL AND SUPERVISED LEARNING

Each input-output pairs is a sample from a fixed but unknowndistribution p(x , y).Under general condition we can write

p(x , y) = p(y |x)p(x).

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 52: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

NOISE ...

YX

p (y|x)

x

the same x can generate different y (according to p(y |x)):

the underlying process is deterministic, but there is noise in themeasurement of y ;the underlying process is not deterministic;the underlying process is deterministic, but only incompleteinformation is available.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 53: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

...AND SAMPLING

p(x)

y

x

x

EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING

the marginal p(x) distributionmight model

errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 54: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

...AND SAMPLING

p(x)

y

x

x

EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING

the marginal p(x) distributionmight model

errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 55: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

...AND SAMPLING

x

p(x)

y

x

EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING

the marginal p(x) distributionmight model

errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 56: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

...AND SAMPLING

y

x

p(x)

x

EVEN IN A NOISE FREE CASE WEHAVE TO DEAL WITH SAMPLING

the marginal p(x) distributionmight model

errors in the location of theinput points;discretization error for agiven grid;presence or absence ofcertain input instances

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 57: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING, GENERALIZATION AND OVERFITTING

PREDICTIVITY OR GENERALIZATION

Given the data, the goal is to learn how to make decisions/predictionsabout future data / data not belonging to the training set .

The problem is : Avoid overfitting!!

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 58: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

OVER-FITTING ILLUSTRATED

Among many possible solutions how can we choose one that correclyapplies to previously unseen data?

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 59: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

OVER-FITTING ILLUSTRATED

Among many possible solutions how can we choose one that correclyapplies to previously unseen data?

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 60: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

OVER-FITTING ILLUSTRATED

Among many possible solutions how can we choose one that correclyapplies to previously unseen data?

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 61: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

OVER-FITTING ILLUSTRATED

Among many possible solutions how can we choose one that correclyapplies to previously unseen data?

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 62: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LOSS FUNCTIONS

As we look for a deterministic estimator in stochastic environment wehave to expect to incur into errors.

LOSS FUNCTION

A loss function V : R× Y determines the price V (f (x), y) we pay,predicting f (x) when in fact the true output is y .

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 63: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LOSS FUNCTIONS FOR REGRESSION

The most common is the square loss or L2 loss

V (f (x), y) = (f (x)− y)2

Absolute value or L1 loss:

V (f (x), y) = |f (x)− y |

Vapnik’s �-insensitive loss:

V (f (x), y) = (|f (x)− y |− �)+

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 64: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LOSS FUNCTIONS FOR (BINARY) CLASSIFICATION

The most intuitive one: 0− 1-loss:

V (f (x), y) = θ(−yf (x))

(θ is the step function)The more tractable hinge loss:

V (f (x), y) = (1− yf (x))+

And again the square loss or L2 loss

V (f (x), y) = (1− yf (x))2

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 65: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LOSS FUNCTIONS

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 66: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

EXPECTED RISK

A good function should incur in only a few errors. We need a way toquantify this idea.

EXPECTED RISK

The quantity

I[f ] =

X×YV (f (x), y)p(x , y)dxdy .

is called the expected error and measures the loss averaged over theunknown distribution.

A good function should have small expected risk.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 67: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

TARGET FUNCTION

The expected risk is usually defined on some large space F possibledependent on p(x , y).The best possible error is

inff∈F

I[f ]

The infimum is often achieved at a minimizer f∗ that we call targetfunction.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 68: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

LEARNING ALGORITHMS AND GENERALIZATION

A learning algorithm can be seen as a map

Sn → fn

from the training set to the a set of candidate functions.

Ideally we would likeI[fn] ∼ I[f∗].

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 69: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

REMINDER

CONVERGENCE IN PROBABILITY

Let {Xn} be a sequence of bounded random variables. Then

limn→∞

Xn = X in probability

if∀� > 0 lim

n→∞P{|Xn − X | ≥ �} = 0

CONVERGENCE IN EXPECTATION

Let {Xn} be a sequence of bounded random variables. Then

limn→∞

Xn = X in expectation

iflim

n→∞E(|Xn − X |) = 0

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 70: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

CONSISTENCY AND UNIVERSAL CONSISTENCY

A basic requirement is for the algorithm to get better as we get moredata...

CONSISTENCY

We say that an algorithm is consistent if

∀� > 0 limn→∞

P{I[fn]− I[f∗] ≥ �} = 0

UNIVERSAL CONSISTENCY

We say that an algorithm is consistent if for all probability p,

∀� > 0 limn→∞

P{I[fn]− I[f∗] ≥ �} = 0

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 71: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

SAMPLE COMPLEXITY AND LEARNING RATES

The above requirements are asymptotic.

ERROR RATES

A more practical question is, how fast does the error decay? This canexpressed as

P{I[fn]− I[f∗] ≥ �(n, δ)} ≥ 1− δ.

SAMPLE COMPLEXITY

Or equivalently, ’ how many point do we need to achieve an error �with a prescribed probability δ?’This can expressed as

P{I[fn]− I[f∗] ≥ �} ≥ 1− δ,

for n = n(�, δ).

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 72: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

EMPIRICAL RISK AND GENERALIZATION

how do we design learning algorithms? A first idea is to look at thedata...

EMPIRICAL RISK

In practice we can try to replace the expected risk by the empiricalrisk

In[f ] =1n

n�

i=1

V (f (xi), yi).

GENERALIZATION ERROR

The effectiveness of such an approximation error is captured by thegeneralization error,

P{|I[fn]− In[fn]| ≥ �} ≥ 1− δ,

for n = n(�, δ).

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 73: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

EMPIRICAL RISK AND GENERALIZATION

how do we design learning algorithms? A first idea is to look at thedata...

EMPIRICAL RISK

In practice we can try to replace the expected risk by the empiricalrisk

In[f ] =1n

n�

i=1

V (f (xi), yi).

GENERALIZATION ERROR

The effectiveness of such an approximation error is captured by thegeneralization error,

P{|I[fn]− In[fn]| ≥ �} ≥ 1− δ,

for n = n(�, δ).

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 74: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

EMPIRICAL RISK AND GENERALIZATION

how do we design learning algorithms? A first idea is to look at thedata...

EMPIRICAL RISK

In practice we can try to replace the expected risk by the empiricalrisk

In[f ] =1n

n�

i=1

V (f (xi), yi).

GENERALIZATION ERROR

The effectiveness of such an approximation error is captured by thegeneralization error,

P{|I[fn]− In[fn]| ≥ �} ≥ 1− δ,

for n = n(�, δ).

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 75: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

SOME (THEORETICAL AND PRACTICAL) QUESTIONS

How do we go from data to an actual algorithm?Is minimizing error on the data a good idea?Are there fundamental limitations in what we can and cannotlearn?

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 76: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

PLAN

Part I: Basic Concepts and Notation

Part II: Foundational ResultsPart III: Algorithms

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 77: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

NO FREE LUNCH THEOREM

UNIVERSAL CONSISTENCY

Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?

YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...

NO FREE LUNCH THEOREM

Given a number of points (and a confidence), can we always achievea prescribed error?NO!

Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 78: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

NO FREE LUNCH THEOREM

UNIVERSAL CONSISTENCY

Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...

NO FREE LUNCH THEOREM

Given a number of points (and a confidence), can we always achievea prescribed error?NO!

Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 79: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

NO FREE LUNCH THEOREM

UNIVERSAL CONSISTENCY

Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...

NO FREE LUNCH THEOREM

Given a number of points (and a confidence), can we always achievea prescribed error?

NO!

Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 80: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

NO FREE LUNCH THEOREM

UNIVERSAL CONSISTENCY

Can we learn consistently any problem? Or equivalently douniversally consistent algorithms exist?YES! Neareast neighbors, Histogram rules, SVM with (so called)universal kernels...

NO FREE LUNCH THEOREM

Given a number of points (and a confidence), can we always achievea prescribed error?NO!

Devroye et al.The last statement can be interpreted as follows: inference from finitesamples can effectively performed if and only if the problem satisfies some apriori condition.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 81: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

HYPOTHESES SPACE

In statistical learning a first prior assumption amounts to choosing asuitable space of hypotheses H.

Examples: linear functions, polynomial, RBFs, Sobolev Spaces...A learning algorithm A is then a map from the data space to H,

A(Sn) = fn ∈ H

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 82: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

HYPOTHESES SPACE

In statistical learning a first prior assumption amounts to choosing asuitable space of hypotheses H.

Examples: linear functions, polynomial, RBFs, Sobolev Spaces...

A learning algorithm A is then a map from the data space to H,

A(Sn) = fn ∈ H

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 83: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

HYPOTHESES SPACE

In statistical learning a first prior assumption amounts to choosing asuitable space of hypotheses H.

Examples: linear functions, polynomial, RBFs, Sobolev Spaces...A learning algorithm A is then a map from the data space to H,

A(Sn) = fn ∈ H

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 84: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

EMPIRICAL RISK MINIMIZATION

How do we choose H? How do we design A?

ERMA prototype algorithm in statistical learning theory is Empirical RiskMinimization:

minf∈H

In[f ].

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 85: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

KEY THEOREM(S) ILLUSTRATED

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 86: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

KEY THEOREM(S)

UNIFORM GLIVENKO-CANTELLI CLASSES

We say that H is a uniform Glivenko-Cantelli (uGC) class, if for all p,

∀� > 0 limn→∞

P�

supf∈H

|I[f ]− In[f ]| > �

�= 0.

A necessary and sufficient condition for consistency of ERM is that His uGC.See: [Vapnik and Cervonenkis (71), Alon et al (97), Dudley, Gine, and Zinn(91)].

In turns the UGC property is equivalent to requiring H to have finitecapacity: Vγ dimension in general and VC dimension in classification.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 87: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

KEY THEOREM(S)

UNIFORM GLIVENKO-CANTELLI CLASSES

We say that H is a uniform Glivenko-Cantelli (uGC) class, if for all p,

∀� > 0 limn→∞

P�

supf∈H

|I[f ]− In[f ]| > �

�= 0.

A necessary and sufficient condition for consistency of ERM is that His uGC.See: [Vapnik and Cervonenkis (71), Alon et al (97), Dudley, Gine, and Zinn(91)].

In turns the UGC property is equivalent to requiring H to have finitecapacity: Vγ dimension in general and VC dimension in classification.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 88: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

STABILITY

z = (x , y)S = z1, ..., znSi = z1, ..., zi−1, zi+1, ...zn

CV STABILITY

A learning algorithm A is CV loo stability if for each n there exists aβ(n)

CV and a δ(n)CV such that for all p

P�|V (fSi , zi)− V (fS, zi)| ≤ β(n)

CV

�≥ 1− δ(n)

CV ,

with β(n)CV and δ(n)

CV going to zero for n →∞.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 89: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

KEY THEOREM(S) ILLUSTRATED

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 90: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

ERM AND ILL-POSEDNESS

Ill posed problems often arise if one tries to infer general laws fromfew data

the hypothesis space is too largethere are not enough data

In general ERM leads to ill-posedsolutions because

the solution may be too complex

it may be not unique

it may change radically whenleaving one sample out

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 91: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

REMINDER

ILL POSED PROBLEM

A mathematical problem is well posed in the sense of Hadamard ifthe solution existsthe solution is uniquethe solution depends continuously on the data

If a problem is not well posed it is called ill posed.

Regularization is the classical way to restore well posedness (andensure generalization).

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 92: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

BAYES INTERPRETATION

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 93: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

PLAN

Part I: Basic Concepts and Notation

Part II: Foundational ResultsPart III: Algorithms

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 94: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

HYPOTHESES SPACE

We are going to look at hypotheses spaces which are reproducingkernel Hilbert spaces.

RKHS are Hilbert spaces of point-wise defined functions.They can be defined via a reproducing kernel, which is asymmetric positive definite function.functions in the space are (the completion of) linear combinations

f (x) =p�

i=1

K (x , xi)ci .

the norm in the space is a natural measure of complexity

�f�2H

=p�

j,i=1

K (xj , xi)cicj .

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 95: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

EXAMPLES OF PD KERNELS

Very common examples of symmetric pd kernels are• Linear kernel

K (x , x �) = x · x �

• Gaussian kernel

K (x , x �) = e−�x−x��2

σ2 , σ > 0

• Polynomial kernel

K (x , x �) = (x · x � + 1)d , d ∈ N

For specific applications, designing an effective kernel is achallenging problem.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 96: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

KERNEL AND FEATURES

Often times kernels, are defined through a dictionary of features

D = {φj , i = 1, . . . , p | φj : X → R, ∀j}

setting

K (x , x �) =p�

i=1

φj(x)φj(x �).

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 97: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

IVANOV REGULARIZATION

We can regularize by explicitly restricting the hypotheses space H —for example to a ball of radius A

IVANOV REGULARIZATION

minf∈H

1n

n�

i=1

V (f (xi), yi)

subject to�f�2

H≤ A.

The above algorithm corresponds to a constrained optimizationproblem.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 98: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

TIKHONOV REGULARIZATION

Regularization can also be done implicitly via penalizartion

TIKHONOV REGULARIZARION

arg minf∈H

1n

n�

i=1

V (f (xi), yi) + λ �f�2H

.

The above algorithm can be seen as the Lagrangian formulation of aconstrained optimization problem.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 99: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

THE REPRESENTER THEOREM

AN IMPORTANT RESULT

The minimizer over the RKHS H, fS, of the regularized empiricalfunctional

IS[f ] + λ�f�2H,

can be represented by the expression

fn(x) =n�

i=1

ciK (xi , x),

for some (c1, . . . , cn) ∈ R.

Hence, minimizing over the (possibly infinite dimensional) Hilbertspace, boils down to minimizing over Rn.

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 100: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

SVM AND RLS

The way the coefficients c = (c1, . . . , cn) are computed depend on theloss function choice. .

RLS: Let Let y = (y1, . . . , yn) and Ki,j = K (xi , xj) thenc = (K + λnI)−1y.

SVM: Let αi = yici and Qi,j = yiK (xi , xj)yj

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 101: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

SVM AND RLS

The way the coefficients c = (c1, . . . , cn) are computed depend on theloss function choice. .

RLS: Let Let y = (y1, . . . , yn) and Ki,j = K (xi , xj) thenc = (K + λnI)−1y.SVM: Let αi = yici and Qi,j = yiK (xi , xj)yj

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 102: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

BAYES INTERPRETATION

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 103: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

REGULARIZATION APPROACH

More generally we can consider:

In(f ) + λR(f )

where, R(f ) is a regularizing functional and λ is a regularizationparameter trading-off between the two terms.

Sparsity based methodsManifold learningMulticlass...

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 104: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

SUMMARY

statistical learning as framework to deal with uncertainty in data.non free lunch and key theorem: no prior, no learning.regularization as a fundamental tool for enforcing a prior andensure stability and generalization

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 105: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

KERNEL AND DATA REPRESENTATION

In the above reasoning the kernel and the hypotheses space define arepresentation/parameterization of the problem and hence play aspecial role.

Where do they come from?

There are a few off the shelf choice (Gaussian, polynomial etc.)Often they are the product of problem specific engineering.

Are there principles– applicable in a wide range of situations– todesign effective data representation?

Foundations of Computational and Statistical Learning Foundations of Computational and Statistical Learning

Page 106: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

30

)*#+,$,-2(D#"/7(H,-$,**+$,-(<,0IA

More in general, since the introduction of supervised learning techniques, AI vision has made significant (and not well known) advances in a few domains:

• Vision (see above)• Graphics and morphing (see above)• Natural Language/Knowledge retrieval (Watson and Jeopardy)• Speech recognition (Nuance) • Games (Go, chess,...)

Friday, September 9, 2011

Page 107: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

However, the full problem of vision (and intelligence!) is still not solved. it is likely the problems are not solvable with “classical” learning techniques.A reasonable bet is on novel neuroscience/cognitive inspired techniques:

• Recognition at human level of thousands of objects

• Scene understanding: the Turing test for Vision

)*#+,$,-2(D#"/7(H,-$,**+$,-(<,0IA

Friday, September 9, 2011

Page 108: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

8$'(#59:;<"=5&">(9"+%#(5?(<$'(

@'&<*">(4<*'"9tomaso poggioMcGovern InstituteI2, CBCL,BCS, CSAILMIT

Very preliminary, provocative, possibly completely wrong, ...

Friday, September 9, 2011

Page 109: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

VisionA Computational Investigation into the Human Representation and Processing of Visual InformationDavid MarrForeword by Shimon UllmanAfterword by Tomaso Poggio

David Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood. Researchers from a range of brain and cognitive sciences have long valued Marr's creativity, intellectual power, and ability to integrate insights and data from neuroscience, psychology, and computation. This MIT Press edition makes Marr's influential work available to a new generation of students and scientists.

In Marr's framework, the process of vision constructs a set of representations, starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment. A central theme, and one that has had far-reaching influence in both neuroscience and cognitive science, is the notion of different levels of analysis—in Marr's framework, the computational level, the algorithmic level, and the hardware implementation level.

Now, thirty years later, the main problems that occupied Marr remain fundamental open problems in the study of perception. Vision provides inspiration for the continui

A%4%5&B(

C$"<(%4(C$'*'

(

Friday, September 9, 2011

Page 110: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Desimone & Ungerleider 1989

ventral stream

8$'(@'&<*">(4<*'"9

Movshon et al.

Friday, September 9, 2011

Page 111: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

The ventral stream hierarchy: V1, V2, V4, IT

A gradual increase in thereceptive field size, in the “complexity” of the preferred stimulus, in “invariance” to

position and scale changes

Kobatake & Tanaka, 1994

7)')1$0&5"$%4,6&'%4",2&

Friday, September 9, 2011

Page 112: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

*Modified from (Gross, 1998)

[software available onlinewith CNS (for GPUs)]

Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007

((D'#5+&%=5&(%&(<$'(A'&<*">(E<*'"9B(

#>"44%#">/(F4<"&G"*GH(95G'>(I5?(%99'G%"<'(*'#5+&%=5&J

Friday, September 9, 2011

Page 113: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

[software available online] Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007

• It is in the family of “Hubel-Wiesel” models (Hubel & Wiesel, 1959: qual. Fukushima, 1980: quant; Oram & Perrett, 1993: qual; Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998: not-bio; Amit & Mascaro, 2003: not-bio; Hinton, LeCun, Bengio not-bio; Deco & Rolls 2006…)

• Hardwired invariance to translation and scale

• As a biological model of object recognition in the ventral stream – from V1 to PFC -- it is perhaps the most quantitatively faithful to known neuroscience data

((D'#5+&%=5&(%&(A%4;">(-5*<'KB(LM#>"44%#">(95G'>H/(

4'>'#=@'("&G(%&@"*%"&<

Friday, September 9, 2011

Page 114: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

V1:

Simple and complex cells tuning (Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982)

MAX-like operation in subset of complex cells (Lampl et al 2004)V2:

Subunits and their tuning (Anzai, Peng, Van Essen 2007)V4:

Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999)

MAX-like operation (Gawne et al 2002)Two-spot interaction (Freiwald et al 2005)

Tuning for boundary conformation (Pasupathy & Connor 2001, Cadieu, Kouh, Connor et al., 2007)Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996)

IT:

Tuning and invariance properties (Logothetis et al 1995, paperclip objects)Differential role of IT and PFC in categorization (Freedman et al 2001, 2002, 2003)

Read out results (Hung Kreiman Poggio & DiCarlo 2005)Pseudo-average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo 2007)

Human:

Rapid categorization (Serre Oliva Poggio 2007)Face processing (fMRI + psychophysics) (Riesenhuber et al 2004; Jiang et al 2006)

!"#$%$&'"&%()*##+,-$.%$+)/-+#(01"0)&-20"03#23)."3')-$)4$#+"&3))2#5$%()+%3%

((!5G'>(FC5*N4HB

%<("##5;&<4(?5*(O%<4(5?(:$34%5>5+3(P(:43#$5:$34%#4

Friday, September 9, 2011

Page 115: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Models of the ventral stream in cortexperform well compared to

engineered computer vision systems (in 2006)on several databases

Bileschi, Wolf, Serre, Poggio, 2007

((!5G'>(FC5*N4HB

%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>

Friday, September 9, 2011

Page 116: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Models of the ventral stream in cortexperform well compared to

engineered computer vision systems (in 2006)on several databases

Bileschi, Wolf, Serre, Poggio, 2007

((!5G'>(FC5*N4HB

%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>

Friday, September 9, 2011

Page 117: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Models of the ventral stream in cortexperform well compared to

engineered computer vision systems (in 2006)on several databases

Bileschi, Wolf, Serre, Poggio, 2007

((!5G'>(FC5*N4HB

%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>

Friday, September 9, 2011

Page 118: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Models of cortex lead to better systems for action recognition in videos: automatic phenotyping of mice

human agreement

72%

proposed system

77%

commercial system

61%

chance 12%

Performance

;"5)(-$<$=)../+#<$>5<$?"'2()('<$@/--'/<$A5+&"$%+##2#<$%#..#<$$B)+5.#$C/445('&)+/(6<$DEFE

((!5G'>(FC5*N4HB

%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>

Friday, September 9, 2011

Page 119: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Models of cortex lead to better systems for action recognition in videos: automatic phenotyping of mice

human agreement

72%

proposed system

77%

commercial system

61%

chance 12%

Performance

;"5)(-$<$=)../+#<$>5<$?"'2()('<$@/--'/<$A5+&"$%+##2#<$%#..#<$$B)+5.#$C/445('&)+/(6<$DEFE

((!5G'>(FC5*N4HB

%<(:'*?5*94(C'>>("<(#59:;<"=5&">(>'@'>

Friday, September 9, 2011

Page 120: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

((D'#5+&%=5&(%&(A%4;">(-5*<'KB(

#59:;<"=5&("&G(9"<$'9"=#">(<$'5*3

For 10years+...

I did not manage to understand how model works....

we need a theory -- not only a model!

Friday, September 9, 2011

Page 121: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Why do hierarchical architectures work?

Friday, September 9, 2011

Page 122: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

• We have “factored out” viewpoint, scale, translation: is it now still “difficult” for a classifier to categorize dogs vs horses?

Leibo, 2011

)?5&,)2(),50)<&2$)%2@-#.,6)"*)">A0-,)10-"3$2'"$B

C0"<0,12-),1&$(*"1<&'"$()"1)2$,1&-.&(()9&12&>2.2,6B

Friday, September 9, 2011

Page 123: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

D10),1&$(*"1<&'"$(),50)<&2$)%2@-#.,6*"1)>2"."32-&.)">A0-,)10-"3$2'"$B

Leibo, 2011 Friday, September 9, 2011

Page 124: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Theory: 6 main ideas, 4 key theorems

1. The computational goal of the ventral stream is to learn transformations (affine in R^2) during development and be invariant to them (McCulloch+Pitts 1945, Hoffman 1966, Jack Gallant, 1993,...)

2. A memory-based module, similar to simple-complex cells, can learn from a video of an image transforming (eg translating) to provide a signature to any single image of any object that is invariant under the transformation (Invariance Lemma)

3. Since the group Aff(2,R) can be factorized as a semidirect product of translations and GL(2, R), storage requirements of the memory-based module can be reduced by order of magnitudes (Factorization Theorem). I argue that this is the evolutionary reason for hierarchical architecture in the ventral stream.

Friday, September 9, 2011

Page 125: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

4. Evolution selects translations by using small apertures in the first layer (the Stratification Theorem). Size of receptive fields in the sequence of ventral stream area automatically selects specific transformations (translations, scaling, shear and rotations).

5. If Hebb-like rule active at at synapses then the storage requirements can be reduced further: tuning of cell in each layer -- during development -- will mimic the spectrum (SVD) of stored templates (Linking Theorem).

6. At the top layers invariances learned for “places” and for class-specific transformations such changes of expression of a face or rotation in depth of a face or different poses of a body ===> patches for places and class-specific patches (faces, bodies...)

6 main ideas...

Friday, September 9, 2011

Page 126: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

CaveatI am speaking about 2 separate stages

1. learning (during development)2. “run time” processing of images

Friday, September 9, 2011

Page 127: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Part I:Memory-based invariance module

Friday, September 9, 2011

Page 128: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

E"'9&'"$7)%2(-12<2$&'"$)$",)10-"$(,1#-'"$7F"5$("$G/2$%0$(,1&#((

a few measurements are needed for selectivity -- not so important which ones -- but much fewer than pixels...

Friday, September 9, 2011

Page 129: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Example: a highly efficient signature for OCR

Friday, September 9, 2011

Page 130: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Hierarchical features for selectivity or invariance?

V1 provides very discriminative signatures....

Friday, September 9, 2011

Page 131: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Abstraction: learning by a memory-based invariance module

Friday, September 9, 2011

Page 132: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

A templatebook

time

Templatebooks

Friday, September 9, 2011

Page 133: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 134: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

The invariance lemma

Friday, September 9, 2011

Page 135: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Part II:Why evolution uses hierarchies and how

This is about evolution(mostly before development)

Friday, September 9, 2011

Page 136: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Representation of the affine group

Friday, September 9, 2011

Page 137: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Factorization lemma (why hierarchies)

Friday, September 9, 2011

Page 138: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 139: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 140: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 141: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Stratification conjecture (why size of receptive fields increase)

Stratification Conjecture:For aperture size increasing from zero (relative to resolution of the filtered “image”) the first transformation which can be estimated above the noise of measurements is translation.

Friday, September 9, 2011

Page 142: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Stratification conjecture (why size of receptive fields increase)

Stratification Conjecture:For aperture size increasing from zero (relative to resolution of the filtered “image”) the first transformation which can be estimated above the noise of measurements is translation.

Friday, September 9, 2011

Page 143: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Stratification conjecture (why size of receptive fields increase)

Stratification Conjecture:For aperture size increasing from zero (relative to resolution of the filtered “image”) the first transformation which can be estimated above the noise of measurements is translation.

Friday, September 9, 2011

Page 144: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 145: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 146: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 147: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Text

Layered architecturelooking at neural images in the layer below, through small aperturescorresponding to receptive fields, on a 2D lattice

)6%7#$#+)%$&'"3#&35$#0

Friday, September 9, 2011

Page 148: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Friday, September 9, 2011

Page 149: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Part III:Templatebooks and development of

cells tuning

Mostly development stage

Friday, September 9, 2011

Page 150: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Memory-based invariance module

Friday, September 9, 2011

Page 151: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)H6I),1&$(.&'"$()*1"<)

$&,#1&.)2<&30()

Rust et al. 2005 CarandiniFriday, September 9, 2011

Page 152: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)HMI),1&$(.&'"$()*1"<

$&,#1&.)2<&30()

Friday, September 9, 2011

Page 153: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)H6I),1&$(.&'"$()

$"2(0

Friday, September 9, 2011

Page 154: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

+2<#.&'"$()HF2<)E#,-5I7+JK)H"*),0<8.&,0>""LI)*"1)1",&'"$(

Friday, September 9, 2011

Page 155: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

D)-&1,""$)"*),50)+7N)-0..

Friday, September 9, 2011

Page 156: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

These are predicted features orthogonal to trajectories of Lie group:

for translation (x, y), expansion, rotation

V1

V2/V4?

This is from Hoffman, 1966 cited by Jack GallantFriday, September 9, 2011

Page 157: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Towards V2 and V4???

Friday, September 9, 2011

Page 158: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Why do we care about spectrumof templatebook

eg of sequence of frames portraying transformations

seen by S:C cells during development?

Friday, September 9, 2011

Page 159: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

!"'6$4#)(6$+")+$)+$.5(:4#$+"#$4)G$'6$2'H#23$+/$&"//6#$+"#$I&/..#&+J$6'4K2#$&#22$L&/..#6K/(*'(-$+/$+MN$)(*$K./O'*#$+"#$6)4#$O)25#$PP$'(*#K#(*#(+23$/0$!9$Q&./66$&/4K2#G$&#226$+"'6$6)36$+")+$8'+"$"'-"$K./7)7'2'+3$+"#$6'-()+5.#$'6$'(O).')(+$+/$!$0./4$2)3#.$+/$2)3#.$L0/.$5('0/.4$!N9

Theory: Linking conjecture

Choose a learning rule * for simple cells synaptic plasticity (eg receptive field) such as Oja’s update rule

or Hebb’s rule with normalized weights

Theorem:!"#$%&'!()*'+!%,-.'!/'0'(%&'+!('1'#2.'!3'*$+!('4'120/!&5'!+#'1&(%*!#(-#'(2'+!-6!&5'!&'7#*%&',--8

Friday, September 9, 2011

Page 160: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Top layer/patch example

Friday, September 9, 2011

Page 161: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

((D'#5+&%Q%&+("(?"#'(?*59("(G%R'*'&<(@%'C:5%&<

R'#8P+5(#*$5('+6<$+5(#*$+/$0522P0)&#$+#4K2)+#6$0/.$*'S#.#(+$O'#8$)(-2#6

R'#8K/'(+$+/2#.)(+$5('+6$L&/4K2#G$5('+6N

9-*'(%01'!&-!%!&(%0+6-(7%2-0!7%:!,'!*'%(0'$!6(-7!';%7#*'+!-6!%!1*%++

Joel+Jim+tp, 2010

Friday, September 9, 2011

Page 162: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

(()'"*&%&+(#>"44(4:'#%S#(<*"&4?5*9"=5&4B(@%'C:5%&<(

%&@"*%"&#'4(?5*(?"#'4

Friday, September 9, 2011

Page 163: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

Some of the many implications

8 9-20"03#23)."3'):-03)-,)!/;<=)#>3#2+0)"3?

8 @-3) "2&-20"03#23) ."3') A":-2) B'-$4#C0) BB*A) :#&'%2"0:D)2-E2&-20"03#23) ."3') '"#$%$&'"#0) %2+) &-:4-0-E-2%("37) FG5"((#=)H#:%2???I

8 JK>4(%"20L)!#"2$"&'M%(("0)#N#&3)%2+):-$#)$#&#23)6"O"&%$(-)0"2P(#)&#(()4(%0E&"37

8 O#4$"Q#) %) 05RS#&3) -,) 3$%20,-$:%E-20) +5$"2P) +#Q#(-4:#23)F03$-R-0&-4"&)#2Q"$-2:#23I???&(#%$)4$#+"&E-20T)

8 U:%P#)4$"-$=)0'%4#),#%35$#0=)2%35$%()":%P#0)4$"-$0)%$#)J"$$#(#Q%23L

Friday, September 9, 2011

Page 164: 0$ 1(+#22'-#(&#web.mit.edu/9.s915/www/classes/Class01_all.pdf · The new golden age: trying (again) to understand and replicate Intelligence • The first and last attempt in history

-5>>"O5*"<5*4(%&(*'#'&<(C5*N

V?)/53&')=)V?)6#"R-=)6?)U0"W=A?)X((:%2=)A?)A:%(#=)6?)Y-0%0&-=)

!?)V'5%2P=)9?)B%2=))@?)K+#(:%2=)K?)/#7#$0=)Z?)O#0":-2#[)6#."0=)A)\-"2#%

Q26/T$$A9$U'#6#("57#.<$!9$%#..#<$%9$C"'HH#.5.<$Q9$V'7'6/(/<$;9$W/5O.'#<$A9$?/5"<$$$;9$X'C).2/<$,9$A'22#.<$$C9$C)*'#5<$Q9$Y2'O)<$C9$?/&"<$$Q9$C)K/((#Z/$<X9$$V)2+"#.<$$$[9$?(/72'&"<$$!9$A)6\5#2'#.<$%9$W'2#6&"'<$$]9$V/20<$,9$C/((/.<$X9$^#.6+#.<$19$])4K2<$%9$C"'HH#.5.<$=9$?.#'4)(<$B9$]/-/+"#:6

F2<)E#,-5

Friday, September 9, 2011