question of the day machine learning 2d1431 learning 2d1431 lecture 1: introduction to machine...

1

Machine Learning 2D1431

Lecture 1:Introduction to Machine Learning

Question of the Day

n What is the next symbol in this series?

Machine Learning

n lecturer:

Frank Hoffmann [email protected] lab assistants:

Mikael Huss [email protected]

Martin Rehn [email protected]

Course Requirementsn four mandatory labs

n Location: Spelhallen, Sporthallen Dates:

n Lab 1: Thursday 14/11/02 13-17n Lab 2: Thursday 21/11/02 13-17n Lab 3: Thursday 28/11/02 13-17n Lab 4: Thursday 5/12/02 13-17

n written examn Location: L21-22n Date: 14/12/02 8-13

2

Grading

n Exam grade:n U : 0-22pn 3 : 23-28pn 4 : 29-34pn 5 : 35-40p

n Final grade:n To pass the course you need at least a 3 in the examn For each lab presented in time you get 1.5 bonus

pointsn Example: exam 25 points

3 labs in time 4.5 bonus pointtotal: 29.5 points, final grade 4

Labsn Preparation

n Learn or refresh your knowledge on Matlabn Start at least 2 weeks before the labn Read the lab instructionsn Read the reference materialn Complete the assignments, write the Matlab code,

answer the questions

n Presentationn No more than two students per groupn Both students need to understand the entire

assignment and coden Book a time for presentationn Present results and code to the teaching assistant

Exam

n Examn theoretical questionsn small practical exercises

n Scope

n It is not sufficient to just study the course book!!!n attend lectures (lecture slides available)n study course book and read additional literaturen participate in the labs and complete the assignments

Course Informationn course webpage

http://www.nada.kth.se/kurser/k th/2D1431/02/index.html

n course newsgroupnews:nada.kurser .mi

n course directory/info/mi02

n course modulecourse join mi02

n course registration in RESres checkin mi02

n NADA UNIX accounthttp://www.sgr.nada.kth.se/

3

Course LiteratureTextbook (required): • Machine Learning

Tom M. Mitchell, McGraw Hill,1997ISBN: 0-07-115467-1 (paperback)

Additional literature:• Reinforcement Learning – An Introduction

Richard S. Sutton, Andrew G. Barto, MIT Press, 1998http://www-anw.cs.umass.edu/~rich/book/the-book.html

• Pattern Classification 2nd editionRichard O. Duda, Peter E. Hart, David G. Stork

• Neural Networks – A Comprehensive Foundation 2nd editionSimon Haykin, Prentice-Hall, 1999

Matlab

n Labs in the course are based on Matlabn learn or refresh your knowledge on Matlab

n Matlab Primer, Kermit Sigmonn A Practical Introduction to Matlab, Mark S.

Gockenbackn Matlab at Googlehttp://directory.google.com/Top/Science/Math/Software/MATLAB

Course Overviewn introduction to machine learningn concept learningn decision treesn artificial neural networksn evolutionary algorithmsn instance based learningn reinforcement learningn Bayesian learningn computational learning theoryn fuzzy logicn machine learning in robotics

Software Packages & Datasets

• Machine Learning at Google• http://directory.google.com/Top/Computers/Artificial_

Intelligence/Machine_Learning

• Matlab Toolbox for Pattern Recognition• http://www.ph.tn.tudelft.nl/~bob/PRTOOLS.html

• MIT GALIB in C++• http://lancet.mit.edu/ga

• Machine Learning Data Repository UC Irvine• http://www.ics.uci.edu/~mlearn/ML/Repository.html

4

Learning & Adaptation

• ”Modification of a behavioral tendency by expertise.” (Webster 1984)

• ”A learning machine, broadly defined is any device whoseactions are influenced by past experiences.” (Nilsson 1965)

• ”Any change in a system that allows it to perform betterthe second time on repetition of the same task or on anothertask drawn from the same population.” (Simon 1983)

• ”An improvement in information processing ability that resultsfrom information processing activity.” (Tanimoto 1990)

Learning

Definition:A computer program is said to learn from

experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.

Disciplines relevant to ML

n Artificial intelligencen Bayesian methodsn Control theoryn Information theoryn Computational complexity theoryn Philosophyn Psychology and neurobiologyn Statistics

Applications of ML

n Learning to recognize spoken wordsn SPHINX (Lee 1989)

n Learning to drive an autonomous vehiclen ALVINN (Pomerleau 1989)

n Learning to classify celestial objectsn (Fayyad et al 1995)

n Learning to play world-class backgammonn TD-GAMMON (Tesauro 1992)

n Designing the morphology and control structure of electro-mechanical artefactsn GOLEM (Lipton, Pollock 2000)

5

ALVINNAutomated driving at 70 mph on a public highway

Camera image

30x32 pixelsas inputs

30 outputsfor steering

30x32 weightsinto one out offour hiddenunit

4 hiddenunits

Artificial Lifen GOLEM Project (Nature: Lipson, Pollack 2000)

http://demo.cs.brandeis.edu/golemn Evolve simple electromechanical locomotion machines from

basic building blocks (bars, acuators, artificial neurons) in a simulation of the physical world (gravity, friction).

n The individuals that demonstrate the best locomotion ability are fabricated through rapid prototyping technology.

Evolvable Robot Golem Movie

6

Evolved CreaturesEvolved creatures: Sims (1994)http://genarts.com/karl/evolved-virtual-creatures.htmlDarwinian evolution of virtual block creatures for

swimming, jumping, following, competing for a block

Learning

n Learning problemsn Learning with a teachern Learning with a criticn Unsupervised learning

n Learning tasksn Pattern associationn Pattern recognition (classification)n Function approximationn Controln Filtering

Credit Assignment Problem

n The problem of assigning credit or blame for the overall outcomes to each of the internal decisions made by the learning machine which contributed to these outcomes

n Temporal credit assignment problemn Involves the instants of time when the actions

that deserve credit were takenn Structural credit assignment problem

n Involves assigning credit to the internal structuresof of actions generated by the system

Learning with a Teachern supervised learningn knowledge represented by a set of input-output

examples (xi,yi)n minimize the error between the actual response of

the learner and the desired response

Environment Teacher

Learning system

state x

Σ

desiredresponse

actualresponse

error signal

+-

7

Learning with a Criticn learning through interaction with the environmentn exploration of states and actionsn feed-back through delayed primary reinforcement

signal (temporal credit assignment problem)n goal: maximize accumulated future reinforcements

Environment Critic

Learning system

state

heuristic reinforcement signal

action

primary reinforcement signal

Unsupervised Learningn self-organized learningn no teacher or criticn task independent quality measuren identify regularities in the data and discover classes

automaticallyn competitive learning

EnvironmentLearningsystem

state

Pattern Recognitionn A pattern/signal is assigned to one of a prescribed

number of classes/categories

rice raisins soup sugar

fanta teabox

Object Recognition

n goal: recognize objects in the image

n input: cropped raw RGB image n decision: contains object - yes/no n training examples: images of the

object in different poses and differentbackgrounds

n possible features:n raw image data

n color histograms

n spatial filters

n edge, corner detection

8

Function Approximation

n The goal is to approximate an unknown function

d = f(x) such that the mapping F(x) realized by the learning system is close enough to f(x).

|F(x)-f(x)|<ε for all xn System identification and modeling:

Describe the input-output relationship of an unknown time-invariant multiple input – multiple output system

Pose Estimation from Images

n goal: estimate the pose (orientation, position) of an object from its appearance

n input: image data

n output: 3-D pose (x, y, z, θ , ϕ , ψ)n training examples: pairs of images with known object

pose

Control Learning

n Adjust the parameters of a controller such that the closed loop control system demonstrates a desiredbehaviour.

Controller Plantplant input

Σ+

errorsignal

referencesignal

plant output

-unity feedback

Control LearningLearning to choose actionsn Robot learns navigation and obstacle avoidancen Learning to choose actions to optimize a factory outputn Learning to play BackgammonProblem characteristics:n Delayed reward instead of immediate reward for good or

bad actions, temporal credit assignment problemn No supervised learning (no training examples in form of

correct state, action pairs)n Learning with a criticn Need for active exploration

9

Learning to play Backgammon

n state : board staten actions : possible movesn reward function

n +100 winn -100 loosen 0 for all other actions/states

n trained by playing 1.5 million games against itselfn now approximately equal to the best human playern link: http://www.research.ibm.com/massive/tdl.htmln reading assignment Tesauro [1995])

Reinforcement Learning

Agent

Environment

state st

st+1

rt+1

reward rt action at

s0a0

r1s1

a1

r2s2

a2

r3

Ziel: Learn a policy a=π(s) which maximizes future accumulated rewards

R = rt+γ rt+1+ γ2 rt+2+… +…= Σi=0 rt+i γi

s3

Upswing of an Inverse Pendulumreward r: +1000 penalty r: -1000

Upswing of an Inverse Pendulumn state s:

n angle ϕn angular velocity ω

n control action a: n leftn rightn brake

upswing_1.MOV

10

Learning Problem

Learning: improving with experience at some taskn Improve over task Tn With respect to performance measure Pn Based on experience E

Example: Learn to play checkers:n T: play checkersn P: percentage of games won in a tournamentn E: opportunity to play against itself

Learning to play checkers

n T: play checkersn P: percentage of games wonn What experience?n What exactly should be learned?n How shall it be represented?n What specific algorithm to learn it?

Type of Training Experience

n Direct or indirect?n Direct: board state -> correct moven Indirect: outcome of a complete gamen Credit assignment problem

n Teacher or not ?n Teacher selects board statesn Learner can select board states

n Is training experience representative of performance goal?n Training playing against itselfn Performance evaluated playing against world champion

Choose Target Function

n ChooseMove : B → M : board state → moven Maps a legal board state to a legal move

n Evaluate : B→V : board state → board valuen Assigns a numerical score to any given board

state, such that better board states obtain a higher score

n Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score

11

Definition of Target Functionn If b is a final board state that is won then V(b) =

100n If b is a final board state that is lost then V(b) = -

100n If b is a final board state that is drawn then V(b)=0n If b is not a final board state, then V(b)=V(b’),

where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game.

n Gives correct values but is not operational

State Space Search

V(b)= ?

V(b)= maxi V(bi)

m1 : b→b1m2 : b→b2

m3 : b→b3

State Space Search

V(b1)= ?

m4 : b→b4m5 : b→b5

m6 : b→b6

V(b1)= mini V(bi)

Final Board States

Black wins: V(b)=-100

Red wins: V(b)=100

draw: V(b)=0

12

Number of Board States

Tic-Tac-Toe:

#board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + …… + 9!/(2! 4! 3!) + … 9 = 6045

4 x 4 checkers: (no queens)

#board states = ?

#board states < 8x7x6x5*22/(2!*2!) = 1680

Regular checkers (8x8 board, 8 pieces each)

#board states < 32!*216/(8! * 8! * 16!) = 5.07*1017

Representation of Target Function

n table look-upn collection of rulesn neural networksn polynomial function of board featuresn trade-off in choosing an expressive

representation: n approximation accuracyn number of training examples required to

learn the target function

Representation of Target Function

V(b)=ω0 + ω1bp(b) + ω2rp(b) + ω3bk(b) + ω4rk(b) + ω5bt(b) + ω6rt(b)

n bp(b): #black piecesn rb(b): #red piecesn bk(b): #black kingsn rk(b): #red kingsn bt(b): #red pieces threatened by blackn rt(b): #black pieces threatened by red

Obtaining Training Examples

n V(b) : true target functionn V’(b) : learned target functionn Vtrain(b) : training valuen Rule for estimating training values:n Vtrain(b) ← V’(Successor(b))

13

Choose Weight Training Rule

LMS weight update rule:n Select a training example b at random1. Compute error(b)

error(b) = V train(b) – V’(b)

2. For each board feature fi, update weight ωi ← ωi + η fi error(b)

η : learning rate approx. 0.1

Example: 4x4 checkers V(b)=ω0 + ω1rp(b) + ω2bp(b)Initial weights: ω0=-10, ω1 =75, ω2 =-60

V(b0)=ω0 + ω1*2 + ω2*2 = 20

m1 : b→b1 V(b1)=20

m2 : b→b2V(b2)=20

m3 : b→b3V(b3)=20

Example 4x4 checkers

V(b1)=20V(b0)=20

1. Compute error(b0) = Vtrain(b) – V(b0) = V(b1) – V(b0) = 0

2. For each board feature fi, update weight

ωi ← ωi + η fi error(b)

ω0 ← ω0 + 0.1 * 1 * 0

ω1 ← ω1 + 0.1 * 2 * 0

ω2 ← ω2 + 0.1 * 2 * 0

Example: 4x4 checkers

V(b2)=20

V(b0)=20 V(b1)=20

V(b3)=20

14


V(b4b)=-55V(b4a)=20

V(b3)=20


V(b4)=-55V(b3)=20

1. Compute error(b3) = Vtrain(b) – V(b3) = V(b4) – V(b3) = -75

2. For each board feature fi, update weight

ωi ← ωi + η fi error(b) : ω0=-10, ω1 =75, ω2 =-60ω0 ← ω0 - 0.1 * 1 * 75, ω0 = -17.5

ω1 ← ω1 - 0.1 * 2 * 75, ω1 = 60

ω2 ← ω2 - 0.1 * 2 * 75, ω2 = -75


V(b5)=-107.5V(b4)=-107.5

ω0 = -17.5 , ω1 = 60, ω2 = -75


V(b6)=-167.5V(b5)=-107.5

error(b5) = V train(b) – V(b5) = V(b6) – V(b5) = -60

ω0=-17.5, ω1 =60, ω2 =-75ωi ← ωi + η fi error(b)

ω0 ← ω0 - 0.1 * 1 * 60, ω0 = -23.5

ω1 ← ω1 - 0.1 * 1 * 60, ω1 = 54

ω2 ← ω2 - 0.1 * 2 * 60, ω2 = -87

15


V(b6)=-197.5

error(b6) = V train(b) – V(b6) = Vf(b6) – V(b6) = 97.5

ω0=-23.5, ω1 =54, ω2 =-87ωi ← ωi + η fi error(b)

ω0 ← ω0 + 0.1 * 1 * 97.5, ω0 = –13.75

ω1 ← ω1 + 0.1 * 0 * 97.5, ω1 = 54

ω2 ← ω2 + 0.1 * 2 * 97.5, ω2 = -67.5

Final board state: black won V f(b)=-100

Evolution of Value Function

Training data:beforeafter

Design ChoicesDetermine Type of Training Experience

Games againstexperts

Games against self

Table of correctmoves

Board→Move

Determine Target Function

Board→Value

Determine Representationof Learned Function

polynomial Linear function ofsix features

Artificial neuralnetwork

Determine Learning Algorithm

Gradient descent Linear programming

Learning Problem Examples

n Credit card applicationsn Task T: Distinguish ”good” applicants from

”risky” applicants.

n Performance measure P : ?n Experience E : ? (direct/indirect)

n Target function : ?

16

Performance Measure P:

n Error based: minimize percentage of incorrectly classified customers : P = Nfp + Nfn / N

Nfp: # false positives (rejected good customers) Nfn: # false negatives (accepted bad customers)

n Utility based: maximize expected profit of credit card business: P = Ncp *Ucp+ Nfn *Ufn

Ucp : expected utility of an accepted good customer Ufn : expected utility/loss of an accepted bad customer

Experience E:n Direct: Decisions on credit card applications

made by a human financial expert Training data: <customer inf., reject/accept>

n Direct: Actual customer behavior based on previously accepted customers

Training data: <customer inf., good/bad>

Problem: Distribution of applicants Papplicant is not identical with training data P train

n Indirect: Evaluate a decision policy based on the profit you made over the past N years.

Distribution of Applicants

Good customersBad customers

Assume we want to minimize classification error:

What is the optimal decision boundary?

Cw=38

Distribution of Accepted Customers

Good customersBad customers

Cw=43

What is the optimal decision boundary?

17

Target Function

Customer record:

income, owns house, credit history, age, employed, accept$40000, yes, good, 38, full-time, yes$25000, no, excellent, 25, part-time, no$50000, no, poor, 55, unemployed, no

n T: Customer data → accept/reject

n T: Customer data → probability good customern T: Customer data → expected utility/profit

Learning methods

n Decision rules:n If income < $30.000 then reject

n Bayesian network:n P(good | income, credit history,….)

n Neural Network:n Nearest Neighbor:

n Take the same decision as for the customer in the data base that is most similar to the applicant

Learning Problem Examples

n Obstacle Avoidance Behavior of a Mobile Robotn Task T: Navigate robot safely through an

environment.n Performance measure P : ?

n Experience E : ?

n Target function : ?

Performance Measure P:

n P: Maximize time until collision with obstacle

n P: Maximize distance travelled until collision with obstacle

n P: Minimize rotational velocity, maximize translational velocity

n P: Minimize error between control action of a human operator and robot controller in the same situation

18

Training Experience E:

n Direct: Monitor human operator and use her control actions as training data:n E = { <perceptioni , actioni>}

n Indirect: Operate robot in the real world or in a simulation. Reward desirable states, penalize undesirable statesn V(b) = +1 if v > 0.5 m/sn V(b) = +2 if ω < 10 deg/sn V(b) = -100 if bumper state = 1Question: Internal or external reward ?

Target Function

n Choose action:n A: perception → actionSonar readings: s1(t)…sn(t) → <v,ω>

n Evaluate perception/state: n V: s1(t)…sn(t) → V(s1(t)…sn(t))n Problem: states are only partially observable

therefore world seems non-deterministicn Markov Decision Process : successor state s(t+1) is

a probabilistic function of current state s(t) and action a(t)

n Evaluate state/action pairs: n V: s1(t)…sn(t), a(t) → V(s1(t)…sn(t),a(t))

Learning Methods

n Neural Networks n Require direct training experience

n Reinforcement Learningn Indirect training experience

n Evolutionary Algorithmsn Indirect training experience

Issues in Machine Learning

n What algorithms can approximate functions well and when?

n How does the number of training examples influence accuracy?

n How does the complexity of hypothesis representation impact it?

n How does noisy data influence accuracy?

n What are the theoretical limits of learnability?

question of the day machine learning 2d1431 learning 2d1431 lecture 1: introduction to machine...

Documents