programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/slides/afia_afih.pdf ·...

56
Programmation: l’` ere des sp´ ecifications, l’` ere de l’apprentissage, l’` ere du feedback Mich` ele Sebag TAO AFIA - AFIHM, 2015 1 / 50

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Programmation: l’ere des specifications,

l’ere de l’apprentissage, l’ere du feedback

Michele Sebag

TAO

AFIA − AFIHM, 2015

1 / 50

Page 2: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Revisiting the art of programming

1970s Specifications Languages & thm proving

1990s Programming by Examples Pattern recognition & ML

2010s Interactive Learning and OptimizationI Optimizing coffee taste Herdy, 96I Visual rendering Brochu et al., 10I Choice query Viappiani et al., 10I Information retrieval Joachims et al., 12I Robotics

Akrour et al., 12; Wilson et al., 12; Knox et al. 13; Saxena et al 13

Programming with the Human in the Loop

Interaction, Learning, Optimization

2 / 50

Page 3: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Centennial + 3

Computing Machinery and Intelligence Turing 1950

... the problem is mainly one of programming.

brain estimates: 1010 to 1015 bits

Fruit Fly 105

Cockroach 106

Cat 109

Chimpanzee 7 109

Elephant 23 109

Human 89 109

I can produce about a thousand digits of program lines a day

[Therefore] more expenditious method seems desirable.

⇒ Machine Learning

3 / 50

Page 4: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization

All you need is expert’s feedbackReinforcement learningProgramming by Feedback

Programming, An AI Frontier

4 / 50

Page 5: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

ML: All you need is logic

Perception → Symbols → Reasoning → Symbols → Actions

Let’s forget about perception and actions for a while...

Symbols → Reasoning → Symbols

Requisite

I Strong representation

I Strong background knowledge

I Strong optimization tool

5 / 50

Page 6: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

The Robot Scientist

King et al, 04, 11

Adam: generate hypotheses from background knowledge andexperimental data, design experiments to confirm/infirmhypothesesEve: drug screening, hit conformation, and cycles of QSARhypothesis learning and testing.

6 / 50

Page 7: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

ML: The logic era

So efficient

I Search: Reuse constraint solving, graph pruning,..

Requirement / Limitations

I Initial conditions: critical mass of high-order knowledge

I ... and unified search space

I Symbol grounding, noise

Of primary value: intelligibility

I (A means: for debugging)

I An end: to keep the expert involved.

7 / 50

Page 8: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization

All you need is expert’s feedbackReinforcement learningProgramming by Feedback

Programming, An AI Frontier

8 / 50

Page 9: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

All you needed was data

Ast. series Pierre de Rosette

Maths.

World

Data / Principles

Naturalphenomenons

Modelling

Human−relatedphenomenons

You are here

CommonSense

9 / 50

Page 10: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

All you need is big data

Sc. data

Maths.

World

Data / Principles

Naturalphenomenons

Modelling

Human−relatedphenomenons

You are here

CommonSense

10 / 50

Page 11: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Big data

IBM Watson defeats human champions at the quiz game Jeopardy

i 1 2 3 4 5 6 7 81000i kilo mega giga tera peta exa zetta yotta bytes

I Google: 24 petabytes/day

I Facebook: 10 terabytes/day; Twitter: 7 terabytes/day

I Large Hadron Collider: 40 terabytes/seconds

11 / 50

Page 12: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

The Higgs boson ML Challenge

Balazs Kegl, Cecile Germain et al.

https://www.kaggle.com/c/higgs-boson September 2014, 15th

12 / 50

Page 13: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization

All you need is expert’s feedbackReinforcement learningProgramming by Feedback

Programming, An AI Frontier

13 / 50

Page 14: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

ML: All you need is optimization

Old times

I Find the best hypothesisI Find the best optimization criterion

I statistically soundI a well-posed optimization problemI tractable

14 / 50

Page 15: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Episode 1

I NNs are universal approximators,...

I ... but their training yields non-convex optimization problems

I ... and some cannot reproduce the results of some others...

15 / 50

Page 16: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Episode 2

I At last, SVMs arrive ! Vapnik 92; Cortes &Vapnik 95

I PrincipleI Min ||h||2I subject to Constraint on h(x)

h(xi ).yi > 1, |h(xi )− yi | < ε, h(xi ) < h(x ′i ), h(xi ) > 1...

I Convex optimization ! (well, except for hyper-parameters)

I More sophisticated optimization (alternate, upper bounds)...

Hastie 04; Bach 04; Srebro 11; ...16 / 50

Page 17: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Episode 3

I Did you forget our AI goal ?(learning ↔ learning representation)

I At last Deep learning arrives !

Principle

I We always knew that many-layered NNs offered compactrepresentations Hasted 87

17 / 50

Page 18: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Episode 3

I Did you forget our AI goal ?(learning ↔ learning representation)

I At last Deep learning arrives !

Principle

I We always knew that many-layered NNs offered compactrepresentations Hasted 87

I But, so many local optima ! (poor optima)

17 / 50

Page 19: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Episode 3

I Did you forget our AI goal ?(learning ↔ learning representation)

I At last Deep learning arrives !

Principle

I We always knew that many-layered NNs offered compactrepresentations Hasted 87

I But, so many local optima ! (poor optima)

I Breakthrough: unsupervised layer-wise learningHinton 06; Bengio 06

17 / 50

Page 20: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

From prototypes to features

I n prototypes → n regions

I n features → 2n regions

18 / 50

Page 21: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

19 / 50

Page 22: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

I Ciresan et al: use prior knowledge (non linear invarianceoperators) to generate new examples

I Caruana: use deep NN to label hosts of examples; use them totrain a shallow NN.

19 / 50

Page 23: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

I SVMers’ view: the main thing is linear learning complexity

Take home message

I It works

I But why ?

I Intelligibility ?

19 / 50

Page 24: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

I SVMers’ view: the main thing is linear learning complexity

Take home message

I It works

I But why ?

I Intelligibility ?

19 / 50

Page 25: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization

All you need is expert’s feedbackReinforcement learningProgramming by Feedback

Programming, An AI Frontier

20 / 50

Page 26: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Interactive optimization

Optimizing the coffee taste Herdy et al., 96Black box optimization:

F : Ω→ IR Find arg max F

The user in the loop replaces F

Optimizing visual rendering Brochu et al., 07

Optimal recommendation sets Viappiani & Boutilier, 10

Information retrieval Shivaswamy & Joachims, 12

21 / 50

Page 27: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Interactive optimization

Optimizing the coffee taste Herdy et al., 96Black box optimization:

F : Ω→ IR Find arg max F

The user in the loop replaces F

Optimizing visual rendering Brochu et al., 07

Optimal recommendation sets Viappiani & Boutilier, 10

Information retrieval Shivaswamy & Joachims, 12

21 / 50

Page 28: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Interactive optimization

Features

I Search space X ⊂ IRd(recipe x : 33% arabica, 25% robusta, etc)

I A non-computable objective

I Expert can (by tasting) emit preferences x ≺ x ′.

Scheme

1. Alg. generates candidates x , x ′, x“, ..

2. Expert emits preferences

3. goto 1.

Issues

I Asking as few questions as possible 6= active ranking

I Modelling the expert’s taste surrogate model

I Enforce the exploration vs exploitation trade-off22 / 50

Page 29: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization

All you need is expert’s feedbackReinforcement learningProgramming by Feedback

Programming, An AI Frontier

23 / 50

Page 30: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Reinforcement Learning

Generalities

I An agent, spatially and temporally situated

I Stochastic and uncertain environment

I Goal: select an action in each time step,

I ... in order maximize expected cumulative reward over a timehorizon

What is learned ?A policy = strategy = state 7→ action

24 / 50

Page 31: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Reinforcement Learning, formal background

Notations

I State space SI Action space AI Transition p(s, a, s ′) 7→ [0, 1]

I Reward r(s)

I Discount 0 < γ < 1

Goal: a policy π mapping states onto actions

π : S 7→ A

s.t.

Maximize E [π|s0] = Expected discounted cumulative reward= r(s0) +

∑t γ

t+1 p(st , a = π(st), st+1)r(st+1)

25 / 50

Page 32: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Find the treasure

Single reward: on the treasure.

26 / 50

Page 33: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Wandering robot

Nothing happens...

27 / 50

Page 34: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

The robot finds it

28 / 50

Page 35: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Robot updates its value function

V (s, a) == “distance“ to the treasure on the trajectory.

29 / 50

Page 36: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Reinforcement learning* Robot most often selects a = arg max V (s, a)* and sometimes explores (selects another action).* Lucky exploration: finds the treasure again

30 / 50

Page 37: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Updates the value function

* Value function tells how far you are from the treasure given theknown trajectories.

31 / 50

Page 38: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Finally

* Value function tells how far you are from the treasure

32 / 50

Page 39: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Finally

Let’s be greedy: selects the action maximizing the value function

33 / 50

Page 40: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Reinforcement learning

Three tasks

I Learn values

I Learn transition model

I Explore

Issues

I Exploration / Exploitation dilemma

I Representation, approximation, scaling up

I REWARDS designer’s duty

34 / 50

Page 41: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Relaxing Expertise Requirements

35 / 50

Page 42: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Relaxing Expertise Requirements in RL

Expert

I Associates a reward to each state RL

I Demonstrates a (nearly) optimal behavior Inverse RL

I Compares and revises agent demonstrations Co-active PL

I Compares demonstrations Preference PL, PF

Ex-per-tise

Agent

I Computes optimal policy based on rewards RL

I Imitates verbatim expert’s demonstration IRL

I Imitates and modifies IRL

I Learns the expert’s utility IRL, CPL

I Learns, and selects demonstrations CPL, PPL, PF

I Accounts for the expert’s mistakes PF

Au-ton-omy

36 / 50

Page 43: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization

All you need is expert’s feedbackReinforcement learningProgramming by Feedback

Programming, An AI Frontier

37 / 50

Page 44: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Programming by feedback

Akrour &al. 14

Loop

1. Computer presents the expert with a pair of behaviors y1, y2

2. Expert emits preferences y1 y2

3. Computer learns expert’s utility function 〈w , y〉4. Computer searches for behaviors with best utility

Critical issues

I Asks few questions

I Be robust wrt noise (expert makes mistakes & changes hismind)

38 / 50

Page 45: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Programming by Feedback

Ingredients

I Modelling the expert’s competence

I Learning the expert’s utility

I Selecting the next best behaviorsI Which optimization criterionI How to optimize it

39 / 50

Page 46: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Modelling the expert’s competence

Noise model δ ∼ U[0,M]Given preference margin z = 〈w∗, y− y′〉

P(y ≺ y′ | w∗, δ) =

0 if z < −δ1 if z > δ1+z2 otherwise

Prob of error

1/2

delta−delta

Preference margin Z

40 / 50

Page 47: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Experimental validation

I Sensitivity to expert competenceSimulated expert, grid world

I Continuous case, no generative modelThe cartpole

I Continuous case, generative modelThe bicycle

I Training in-situThe Nao robot

41 / 50

Page 48: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Sensitivity to (simulated) expert incompetence

Grid world: discrete case, no generative model25 states, 5 actions, horizon 300, 50% transition noise

ME Expert incompetenceMA > ME Computer estimate of expert’s incompetence

1

1/2

1/2

1/4

1/4

1/4

1/64

1/64

1/641/128

1/128

1/256

......

True w∗ on gridworld

42 / 50

Page 49: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Sensitivity to simulated expert incompetence, 2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

Tru

e u

tilit

y

#Queries

ME = .25 MA = .25ME = .25 MA = .5ME = .25 MA = 1ME = .5 MA = .5ME = .5 MA = 1ME = 1 MA = 1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 10 20 30 40 50 60

Expert

err

or

rate

#Queries

ME = .25 MA = .25ME = .25 MA = 1ME = .5 MA = 1ME = 1 MA = 1

True utility of xt expert’s mistakes

Two notions

I The true human’s competence

I The learner’s confidence in the human competence

What is best: trusting a (mildly) competent human, or (mildly)distrusting a competent human ?

43 / 50

Page 50: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Sensitivity to simulated expert incompetence, 3

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

Tru

e u

tilit

y

#Queries

ME = .25 MA = .25ME = .25 MA = .5ME = .25 MA = 1ME = .5 MA = .5ME = .5 MA = 1ME = 1 MA = 1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 10 20 30 40 50 60

Expert

err

or

rate

#Queries

ME = .25 MA = .25ME = .25 MA = 1ME = .5 MA = 1ME = 1 MA = 1

True utility of xt expert’s mistakes

A cumulative (dis)advantage phenomenon:

The number of expert’s mistakes increases as the computer

underestimates the expert’s competence.

For low MA, the computer learns faster, submits more relevant demonstrations

to the expert, thus priming a virtuous educational process.

44 / 50

Page 51: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization

All you need is expert’s feedbackReinforcement learningProgramming by Feedback

Programming, An AI Frontier

45 / 50

Page 52: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Conclusion

Feasibility of Programming by Feedback for simple tasks

Back on track:

One could carry through the organization of anintelligent machine with only two interfering inputs,one for pleasure or reward, and the other for pain orpunishment.

46 / 50

Page 53: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Programming by Feedback

About interaction: as designer; as user

I No need to debug if you can just say: No !and the computer reacts (appropriately).

I I had a dream: a world where I don’t need to read the fuckingmanual...

47 / 50

Page 54: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Future: Tackling the Under-Specified

Knowledge-constrained Computation, memory-constrained

48 / 50

Page 55: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Acknowledgments

Riad AkrourMarc SchoenauerAlexandre ConstantinJean-Christophe Souplet

49 / 50

Page 56: Programmation: l’ ere des sp eci cations, l’ ere de l ...sebag/Slides/AFIA_AFIH.pdf · 2010sInteractive Learning and Optimization I Optimizing co ee taste Herdy, 96 I Visual rendering

Related

I Percy Liang, Michael I. Jordan, and Dan Klein, Learningprograms: A hierarchical bayesian approach, in ICML 10.

I Sumit Gulwani, Automating string processing in spreadsheetsusing input-output examples, ACM SIGPLAN Notices 2011

I Dianhuan Lin& al., Bias reformulation for one-shot functioninduction, ECAI 2014.

50 / 50