class01

CBCL MIT

9.520, spring 2003

Statistical Learning Theory and Statistical Learning Theory and Applications Applications

Sayan Mukherjee and Ryan Rifkin+ tomaso poggio

+ Alex Rakhlin

9.520

CBCL MIT

9.520, spring 2003

Learning: Brains and Learning: Brains and MachinesMachines

Learning is the gateway to understanding the brain and to making intelligent machines.

Problem of learning: a focus for o modern math o computer algorithms o neuroscience

CBCL MIT

9.520, spring 2003

Learning theory+ algorithms

Computational Neuroscience:

models+experiments

ENGINEERING APPLICATIONS

2

1

1min ( , ( ))i i Kf H

i

V y f x f

• Information extraction (text,Web…)• Computer vision and graphics• Man-machine interfaces• Bioinformatics (DNA arrays)• Artificial Markets (society of learning agents)

Learning to recognize objects in visual cortex

Multidisciplinary Approach to LearningMultidisciplinary Approach to Learning

CBCL MIT

9.520, spring 2003

ClassClass

Rules of the game: problem sets (2 + 1 optional) final project grading participation!

Web site: www.ai.mit.edu/projects/cbcl/courses/course9.520/index.html

CBCL MIT

9.520, spring 2003

Overview of overviewOverview of overview

o Supervised learning: the problem and how to o Supervised learning: the problem and how to frame it within classical mathframe it within classical math

o Examples of in-house applicationso Examples of in-house applications

o Learning and the braino Learning and the brain

CBCL MIT

9.520, spring 2003

Learning from Learning from ExamplesExamples

INPUT OUTPUT

EXAMPLESEXAMPLES

INPUT1 OUTPUT1

INPUT2 OUTPUT2

...........INPUTn OUTPUTn

f

CBCL MIT

9.520, spring 2003

Learning from ExamplesLearning from Examples: formal : formal setting setting

Given Given a set of a set of ll examples examples (past data)(past data)

QuestionQuestion: : find function find function ff such that such that

is ais a good predictorgood predictor of of yy for a for a futurefuture input input xx

yxf ˆ)(

),(...,,),(),,( 2211 yxyxyx

CBCL MIT

9.520, spring 2003

Classical equivalent view: supervised learning as Classical equivalent view: supervised learning as problem of multivariate function approximationproblem of multivariate function approximation

y

x

= data from f

= approximation of f

= function f

Generalization: estimating value of function where there are no data

Regression: function is real valued Classification: function is binary

Problem is ill-posed!

CBCL MIT

9.520, spring 2003

J. S. Hadamard, 1865-1963

Well-posed problems

A problem is well-posed if its solution

• exists

• is unique

• is stable, eg depends continuously on the data

Inverse problems (tomography, radar, scattering…)

are typically ill-posed

CBCL MIT

9.520, spring 2003

Two key requirements to solve the problem of Two key requirements to solve the problem of learning from examples: learning from examples:

well-posedness and consistencywell-posedness and consistency..A standard way to learn from examples is

The problem is in general ill-posed and does not have a predictive (or consistent) solution. By choosing an appropriate hypothesis space H it can be made well-posed. Independently, it is also known that an appropriate hypothesis space can guarantee consistency.

We have proved recently a surprising theorem: consistency and well posedness are equivalent, eg one implies the other.

Mukherjee, Niyogi, Poggio, 2002

Thus a stable solution is also predictive and viceversa.

CBCL MIT

9.520, spring 2003

A simple, “magic” algorithm ensures consistency A simple, “magic” algorithm ensures consistency and well-posednessand well-posedness

For a review, see Poggio and Smale, Notices of the AMS, 2003

Equation includes Regularization Networks (examples are Radial Basis Functions and Support Vector Machines)

2

1

))((1

minKii

iHf

fyxfV

)(),()( xxxx pKf i

l

i i implies

CBCL MIT

9.520, spring 2003

Classical framework but with more Classical framework but with more general loss functiongeneral loss function

Girosi, Caprile, Poggio, 1990

The algorithm uses a The algorithm uses a quite generalquite general space of functions or space of functions or “hypotheses” : RKHSs.“hypotheses” : RKHSs. n of the classical framework can provide a

better measure of “loss” (for instance for classification)…

2

1

))((1

minKii

iHf

fyxfV

CBCL MIT

9.520, spring 2003

Formally…Formally…

CBCL MIT

9.520, spring 2003

…and can be “written” as the same type of network..

X1

K K

+

f

K

Equivalence to networks

Xl

bKf i

l

i i ),()( xxx

Many different V lead to the same solution…

CBCL MIT

9.520, spring 2003

Unified framework: RN, SVMR and Unified framework: RN, SVMR and SVMCSVMC

Review by Evgeniou, Pontil and PoggioAdvances in Computational Mathematics, 2000

Equation includes Regularization Networks, eg Radial Basis Functions, and Support Vector Machines (classification and regression) and some multilayer Perceptrons.

2

1

))((1

minKii

iHf

fyxfV

CBCL MIT

9.520, spring 2003

The theoretical foundations of statistical The theoretical foundations of statistical learning are becoming part of mainstream learning are becoming part of mainstream

mathematicsmathematics

CBCL MIT

9.520, spring 2003

Theory summaryTheory summary

In the course we will introduce • Stability (well-posedness) • Consistency (predictivity)• RKHSs hypotheses spaces• Regularization techniques leading to RN and SVMs• Generalization bounds based on stability• Alternative bounds (VC and Vgamma dimensions)

• Related topics, extensions beyond SVMs

• Applications• A new key result

CBCL MIT

9.520, spring 2003


o Supervised learning: real matho Supervised learning: real math

o Examples of recent and ongoing in-house o Examples of recent and ongoing in-house research on applicationsresearch on applications


CBCL MIT

9.520, spring 2003

Learning from Examples: Learning from Examples: engineering applicationsengineering applications

Bioinformatics Artificial MarketsObject categorizationObject identification Image analysisGraphicsText Classification…..

…

INPUTINPUT OUTPUTOUTPUT

CBCL MIT

9.520, spring 2003

Learning from examples paradigm

Examples

Prediction Statistical Learning Algorithm

Prediction

New sample

Bioinformatics application: predicting type of cancer from DNA chips signals

CBCL MIT

9.520, spring 2003

Bioinformatics application: predicting type of cancer from DNA chips

New feature selection SVM:

Only 38 training examples, 7100 features

AML vs ALL: 40 genes 34/34 correct, 0 rejects. 5 genes 31/31 correct, 3 rejects of which 1 is an error.

CBCL MIT

9.520, spring 2003



…


CBCL MIT

9.520, spring 2003

Face identification: exampleFace identification: example

An old view-based system: 15 viewsAn old view-based system: 15 views

Performance: 98% on 68 person databaseBeymer, 1995

CBCL MIT

9.520, spring 2003

Face identification

SVM Classifier

Training Data A Identification Result

Feature extraction

New face image

Bernd Heisele, Jennifer Huang, 2002

CBCL MIT

9.520, spring 2003

Real time detection and identificationReal time detection and identification

Ho, Heisele, Poggio, 2000

Ported to Oxygen’s H21 (handheld Ported to Oxygen’s H21 (handheld device) Identification of rotated faces device) Identification of rotated faces up to about 45ºup to about 45º

Robust against changes in Robust against changes in illumination and backgroundillumination and background

Frame rate of 15 HzFrame rate of 15 Hz

Weinstein, Ho, Heisele, Poggio, Steele, Agarwal, 2002

CBCL MIT

9.520, spring 2003



…


CBCL MIT

9.520, spring 2003

Learning Object Detection: Learning Object Detection: Finding Frontal Faces ...Finding Frontal Faces ...

Training DatabaseTraining Database1000+ Real, 3000+ 1000+ Real, 3000+ VIRTUALVIRTUAL

50,0000+ Non-Face Pattern50,0000+ Non-Face Pattern

Sung, Poggio 1995Sung, Poggio 1995

CBCL MIT

9.520, spring 2003

Recent work on face detection

• Detection of faces in images

• Robustness against slight rotations in depth and imageplane

• Full face vs. component-based classifier

Heisele, Pontil, Poggio, 2000

CBCL MIT

9.520, spring 2003

The best existing system for face The best existing system for face detection?detection?

Test: CBCL set / 1,154 real faces / 14,579 non-faces / 132,365 shifted windowsTraining: 2,457 synthetic faces (58x58) / 13,654 non-faces

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001

False positives / number of windows

Co

rre

ct

component classifier

whole face classifier

Heisele, Serre, Poggio et al., 2000

CBCL MIT

9.520, spring 2003

Trainable System for Object Detection: Pedestrian detection - Results

Papageorgiou and Poggio, 1998

CBCL MIT

9.520, spring 2003

Trainable System for Object Detection: Trainable System for Object Detection: Pedestrian detection - TrainingPedestrian detection - Training

Papageorgiou and Poggio, 1998

CBCL MIT

9.520, spring 2003

The system was tested in a test car The system was tested in a test car (Mercedes)(Mercedes)

CBCL MIT

9.520, spring 2003

CBCL MIT

9.520, spring 2003

System installed in experimental System installed in experimental MercedesMercedes

A fast version, integrated with a real-time obstacle

detection system

MPEG

Constantine Papageorgiou

CBCL MIT

9.520, spring 2003

People classification/detection: training People classification/detection: training the systemthe system

Representation: overcomplete dictionary of Haar wavelets; highdimensional feature space (>1300 features)

. . . . . .

pedestrian detection system

Core learning algorithm:Support Vector Machineclassifier

1848 patterns 7189 patterns

CBCL MIT

9.520, spring 2003

An improved approach: combining component An improved approach: combining component detectors detectors

Mohan, Papageorgiou and Poggio, 1999

CBCL MIT

9.520, spring 2003

ResultsResults

The system is capable of detecting partially occluded people

CBCL MIT

9.520, spring 2003

System PerformanceSystem Performance

Combination systemsCombination systems (ACC) (ACC) perform best. perform best.

All component based All component based systems perform better than systems perform better than full body personfull body person detectors. detectors.

A. Mohan, C. Papageorgiou, T. Poggio

CBCL MIT

9.520, spring 2003

Learning from Examples: ApplicationsLearning from Examples: Applications

Object identificationObject categorizationImage analysisGraphicsFinanceBioinformatics…


CBCL MIT

9.520, spring 2003

Image AnalysisImage Analysis

IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE ESTIMATIONESTIMATION

Bear (0° view)

Bear (45° view)

MI/ CBCL/AI

MIT

NTT, Atsugi, January 2002

CBCL MIT

9.520, spring 2002

Computer vision: analysis of facial expressions

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85

0

2

4

6

8

10

12

The main goal is to estimate basic facial parameters, e.g. degree of mouth openness, through learning. One of the

main applications is video-speech fusion to improve speech recognition systems.

Kumar, Poggio, 2001

CBCL MIT

9.520, spring 2002

Combining Top-Down Constraints and Bottom-Up Data

Learning Engine

SVM, RBF, etc

Morphable Model

= + ++ …1a

2a

3a

Kumar, Poggio, 2001

CBCL MIT

9.520, spring 2003

The Three Stages

Face Detection Localization of Facial Features

Localization of Facial Features

Analysis of Facial parts

Analysis of Facial parts

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85

0

2

4

6

8

10

12

For more details Appendix 2

CBCL MIT

9.520, spring 2003

Learning from Examples: engineering applications


…


CBCL MIT

9.520, spring 2003

Image SynthesisImage Synthesis

UNCONVENTIONAL GRAPHICSUNCONVENTIONAL GRAPHICS

= 0° view

= 45° view

CBCL MIT

9.520, spring 2003

SupermodelsSupermodels

MPEG (Steve Lines)

Cc-cs.mpg

CBCL MIT

9.520, spring 2003

A trainable system for TTVSA trainable system for TTVS

Input: TextInput: Text

Output: Output: photorealistic photorealistic talking face talking face uttering textuttering text

Tony Ezzat

MITCBCL

9.520, spring 2002

From 2D to a better 2d and from 2D to 3D

Two extensions of our text-to-visual-speech (TTVS) system:

• morphing of 3D models of faces to output a 3D model of a speaking face (Blanz)

• Learn facial appearance, dynamics and coarticulation (Ezzat)

MITCBCL

9.520, spring 2002

Towards 3D

Neutral face shape was reconstructed from a single image,and animation transfered to new face.

Click for animation

3D face scans were collected.

Click for animation

Blanz, Ezzat, Poggio

MITCBCL

9.520, spring 2002

Using the same basic learning techniques: : Trainable Videorealistic Face Animation

(voice is real, video is synthetic)

Ezzat, Geiger, Poggio, SigGraph 2002

MITCBCL

9.520, spring 2002

Trainable Videorealistic Face Animation

pp ,

/B/ /AE/ /AE/ /JH//SIL/ /B/ /AE/ /JH/ /JH//SIL/

Phone Stream

Trajectory Synthesis

MMM

Phonetic Models

Image Prototypes

ii CI ,

1. Learning

System learns from 4 mins of video the face appearance (Morphable Model) and the speech dynamics of the person

Tony Ezzat,Geiger, Poggio, SigGraph 2002

2. Run Time

For any speech input the system provides as output a synthetic video stream

CBCL MIT

9.520, spring 2002

Novel !!!: Trainable Videorealistic Face Animation

Ezzat, Poggio, 2002

Let us look at video!!!

CBCL MIT

9.520, spring 2003

Blanz and Vetter,MPISigGraph ‘99

Reconstructed 3D Face Models from 1 image

CBCL MIT

9.520, spring 2003

Blanz and Vetter,MPISigGraph ‘99

Reconstructed 3D Face Models Reconstructed 3D Face Models from 1 imagefrom 1 image

CBCL MIT

9.520, spring 2003

Learning from Examples: ApplicationsLearning from Examples: Applications

Object identificationObject categorizationImage analysisGraphicsFinanceBioinformatics…


CBCL MIT

9.520, spring 2003

Artificial Agents – learning algorithms Artificial Agents – learning algorithms – buy and sell stocks– buy and sell stocks

Bid/Ask Prices Bid/Ask Sizes

Buy/Sell

Inventory control

All available market Information

User control/calibration

Public Limit/Market Orders

Learning Feedback Loop

Example of a subproject: The Electronic Market Maker

Nicholas Chang

CBCL MIT

9.520, spring 2003



…


CBCL MIT

9.520, spring 2003


o Supervised learning: the problem and how to o Supervised learning: the problem and how to frame it within classical mathframe it within classical math

o Examples of in-house applicationso Examples of in-house applications


CBCL MIT

9.520, spring 2003

The Ventral Visual Stream: From V1 to ITThe Ventral Visual Stream: From V1 to IT

modified from Ungerleider and Haxby, 1994

Hubel & Wiesel, 1959Desimone, 1991

Desimone, 1991

CBCL MIT

9.520, spring 2003

Summary of “basic facts”Summary of “basic facts”

Accumulated evidence points to three (mostly accepted) properties of the ventral visual stream architecture:

• Hierarchical build-up of invariances (first to translation and scale, then to viewpoint etc.) , size of the receptive fields and complexity of preferred stimuli

• Basic feed-forward processing of information (for “immediate” recognition tasks)

• Learning of an individual object generalizes to scale and position

CBCL MIT

9.520, spring 2003

The The standardstandard model: following Hubel and Wiesel… model: following Hubel and Wiesel…

LearningLearning(supervised)(supervised)

LearningLearning(unsupervise(unsupervise

d)d)

Riesenhuber & Poggio, Nature Neuroscience, 2000

CBCL MIT

9.520, spring 2003

The standard modelThe standard model

Interprets or predicts many existing data in microcircuits and system physiology, and also in cognitive science:

• What some complex cells and V4 cells do and how: MAX…

• View-tuning of IT cells (Logothetis)• Response to pseudomirror views• Effect of scrambling • Multiple objects• Robustness to clutter• Consistent with K. Tanaka’s simplification procedure• Categorization tasks (cats vs dogs)• Invariance to translation, scale etc…

• Gender classification• Face inversion effect : experience, viewpoint, other-race, configural vs. featural representation• Transfer of generalization• No binding problem, no need for oscillations

CBCL MIT

9.520, spring 2003

Model’s early predictions: neurons Model’s early predictions: neurons becomebecome

view-tuned during recognition view-tuned during recognition

Poggio, Edelman, Riesenhuber (1990, 2000)

Logothetis, Pauls, and Poggio, 1995;Logothetis, Pauls, 1995

CBCL MIT

9.520, spring 2003

Recording Sites in Anterior Recording Sites in Anterior ITIT

LUNLAT

IOS

STS

AMTSLAT

STS

AMTS

Ho=0

Logothetis, Pauls, and Poggio, 1995;Logothetis, Pauls, 1995

…neurons tuned to faces are close by….

CBCL MIT

9.520, spring 2003

The Cortex: Neurons Tuned to Object The Cortex: Neurons Tuned to Object Views as predicted by modelViews as predicted by model

Logothetis, Pauls, Poggio, 1995

CBCL MIT

9.520, spring 2003

View-tuned cells: scale invariance View-tuned cells: scale invariance (one training view only)!(one training view only)!

Scale Invariant Responses of an IT Neuron

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

4.0 deg(x 1.6)

4.75 deg(x 1.9)

5.5 deg(x 2.2)

6.25 deg(x 2.5)

2.5 deg(x 1.0)

1.0 deg(x 0.4)

1.75 deg(x 0.7)

3.25 deg(x 1.3)

Logothetis, Pauls, and Poggio, 1995; Logothetis, Pauls, 1995

CBCL MIT

9.520, spring 2003

Category boundary

Prototypes100% Cat

80% Cat Morphs

60% CatMorphs

60% Dog Morphs 80% Dog

MorphsPrototypes 100% Dog

Joint Project (Max Riesenhuber) with Earl Miller & David Freedman (MIT): Neural Correlate of

Categorization (NCC)

Define categories in morph space

CBCL MIT

9.520, spring 2003

Categorization task

.

.

.

.

.

FixationSample

Delay

Test(Nonmatch)

Delay

(Match)

Test(Match)

600 ms.1000 ms.

500 ms.

.

.

Train monkey on categorization task

After training, record from neurons in IT & PFC

CBCL MIT

9.520, spring 2003

-500 0 500 1000 1500 20001

4

7

10

13

Time from sample stimulus onset (ms)

Firi

ng R

ate

(H

z)

Dog 100% Dog 80% Dog 60%

ChoiceSample Delay

Cat 100%

Fixation

Cat 60% Cat 80%

Single cell example: a PFC neuron that responds more strongly to DOGS than CATS

D. Freedman + E. Miller + M. Riesenhuber+T. Poggio (Science, 2001)

CBCL MIT

9.520, spring 2003

Is this what’s going on in cortex?

HMAX

The model suggests same computations for different recognition tasks (and objects!)

Task-specific units (e.g., PFC)

General representati

on (IT)

class01

Documents

problem of learning

function f

f h lequation

f h l9

examples isthe problem

training examples

thatf x

solutionf x