question of the day machine learning 2d1431 learning 2d1431 lecture 1: introduction to machine...
TRANSCRIPT
1
Machine Learning 2D1431
Lecture 1:Introduction to Machine Learning
Question of the Day
n What is the next symbol in this series?
Machine Learning
n lecturer:
Frank Hoffmann [email protected] lab assistants:
Mikael Huss [email protected]
Martin Rehn [email protected]
Course Requirementsn four mandatory labs
n Location: Spelhallen, Sporthallen Dates:
n Lab 1: Thursday 14/11/02 13-17n Lab 2: Thursday 21/11/02 13-17n Lab 3: Thursday 28/11/02 13-17n Lab 4: Thursday 5/12/02 13-17
n written examn Location: L21-22n Date: 14/12/02 8-13
2
Grading
n Exam grade:n U : 0-22pn 3 : 23-28pn 4 : 29-34pn 5 : 35-40p
n Final grade:n To pass the course you need at least a 3 in the examn For each lab presented in time you get 1.5 bonus
pointsn Example: exam 25 points
3 labs in time 4.5 bonus pointtotal: 29.5 points, final grade 4
Labsn Preparation
n Learn or refresh your knowledge on Matlabn Start at least 2 weeks before the labn Read the lab instructionsn Read the reference materialn Complete the assignments, write the Matlab code,
answer the questions
n Presentationn No more than two students per groupn Both students need to understand the entire
assignment and coden Book a time for presentationn Present results and code to the teaching assistant
Exam
n Examn theoretical questionsn small practical exercises
n Scope
n It is not sufficient to just study the course book!!!n attend lectures (lecture slides available)n study course book and read additional literaturen participate in the labs and complete the assignments
Course Informationn course webpage
http://www.nada.kth.se/kurser/k th/2D1431/02/index.html
n course newsgroupnews:nada.kurser .mi
n course directory/info/mi02
n course modulecourse join mi02
n course registration in RESres checkin mi02
n NADA UNIX accounthttp://www.sgr.nada.kth.se/
3
Course LiteratureTextbook (required): • Machine Learning
Tom M. Mitchell, McGraw Hill,1997ISBN: 0-07-115467-1 (paperback)
Additional literature:• Reinforcement Learning – An Introduction
Richard S. Sutton, Andrew G. Barto, MIT Press, 1998http://www-anw.cs.umass.edu/~rich/book/the-book.html
• Pattern Classification 2nd editionRichard O. Duda, Peter E. Hart, David G. Stork
• Neural Networks – A Comprehensive Foundation 2nd editionSimon Haykin, Prentice-Hall, 1999
Matlab
n Labs in the course are based on Matlabn learn or refresh your knowledge on Matlab
n Matlab Primer, Kermit Sigmonn A Practical Introduction to Matlab, Mark S.
Gockenbackn Matlab at Googlehttp://directory.google.com/Top/Science/Math/Software/MATLAB
Course Overviewn introduction to machine learningn concept learningn decision treesn artificial neural networksn evolutionary algorithmsn instance based learningn reinforcement learningn Bayesian learningn computational learning theoryn fuzzy logicn machine learning in robotics
Software Packages & Datasets
• Machine Learning at Google• http://directory.google.com/Top/Computers/Artificial_
Intelligence/Machine_Learning
• Matlab Toolbox for Pattern Recognition• http://www.ph.tn.tudelft.nl/~bob/PRTOOLS.html
• MIT GALIB in C++• http://lancet.mit.edu/ga
• Machine Learning Data Repository UC Irvine• http://www.ics.uci.edu/~mlearn/ML/Repository.html
4
Learning & Adaptation
• ”Modification of a behavioral tendency by expertise.” (Webster 1984)
• ”A learning machine, broadly defined is any device whoseactions are influenced by past experiences.” (Nilsson 1965)
• ”Any change in a system that allows it to perform betterthe second time on repetition of the same task or on anothertask drawn from the same population.” (Simon 1983)
• ”An improvement in information processing ability that resultsfrom information processing activity.” (Tanimoto 1990)
Learning
Definition:A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.
Disciplines relevant to ML
n Artificial intelligencen Bayesian methodsn Control theoryn Information theoryn Computational complexity theoryn Philosophyn Psychology and neurobiologyn Statistics
Applications of ML
n Learning to recognize spoken wordsn SPHINX (Lee 1989)
n Learning to drive an autonomous vehiclen ALVINN (Pomerleau 1989)
n Learning to classify celestial objectsn (Fayyad et al 1995)
n Learning to play world-class backgammonn TD-GAMMON (Tesauro 1992)
n Designing the morphology and control structure of electro-mechanical artefactsn GOLEM (Lipton, Pollock 2000)
5
ALVINNAutomated driving at 70 mph on a public highway
Camera image
30x32 pixelsas inputs
30 outputsfor steering
30x32 weightsinto one out offour hiddenunit
4 hiddenunits
Artificial Lifen GOLEM Project (Nature: Lipson, Pollack 2000)
http://demo.cs.brandeis.edu/golemn Evolve simple electromechanical locomotion machines from
basic building blocks (bars, acuators, artificial neurons) in a simulation of the physical world (gravity, friction).
n The individuals that demonstrate the best locomotion ability are fabricated through rapid prototyping technology.
Evolvable Robot Golem Movie
6
Evolved CreaturesEvolved creatures: Sims (1994)http://genarts.com/karl/evolved-virtual-creatures.htmlDarwinian evolution of virtual block creatures for
swimming, jumping, following, competing for a block
Learning
n Learning problemsn Learning with a teachern Learning with a criticn Unsupervised learning
n Learning tasksn Pattern associationn Pattern recognition (classification)n Function approximationn Controln Filtering
Credit Assignment Problem
n The problem of assigning credit or blame for the overall outcomes to each of the internal decisions made by the learning machine which contributed to these outcomes
n Temporal credit assignment problemn Involves the instants of time when the actions
that deserve credit were takenn Structural credit assignment problem
n Involves assigning credit to the internal structuresof of actions generated by the system
Learning with a Teachern supervised learningn knowledge represented by a set of input-output
examples (xi,yi)n minimize the error between the actual response of
the learner and the desired response
Environment Teacher
Learning system
state x
Σ
desiredresponse
actualresponse
error signal
+-
7
Learning with a Criticn learning through interaction with the environmentn exploration of states and actionsn feed-back through delayed primary reinforcement
signal (temporal credit assignment problem)n goal: maximize accumulated future reinforcements
Environment Critic
Learning system
state
heuristic reinforcement signal
action
primary reinforcement signal
Unsupervised Learningn self-organized learningn no teacher or criticn task independent quality measuren identify regularities in the data and discover classes
automaticallyn competitive learning
EnvironmentLearningsystem
state
Pattern Recognitionn A pattern/signal is assigned to one of a prescribed
number of classes/categories
rice raisins soup sugar
fanta teabox
Object Recognition
n goal: recognize objects in the image
n input: cropped raw RGB image n decision: contains object - yes/no n training examples: images of the
object in different poses and differentbackgrounds
n possible features:n raw image data
n color histograms
n spatial filters
n edge, corner detection
8
Function Approximation
n The goal is to approximate an unknown function
d = f(x) such that the mapping F(x) realized by the learning system is close enough to f(x).
|F(x)-f(x)|<ε for all xn System identification and modeling:
Describe the input-output relationship of an unknown time-invariant multiple input – multiple output system
Pose Estimation from Images
n goal: estimate the pose (orientation, position) of an object from its appearance
n input: image data
n output: 3-D pose (x, y, z, θ , ϕ , ψ)n training examples: pairs of images with known object
pose
Control Learning
n Adjust the parameters of a controller such that the closed loop control system demonstrates a desiredbehaviour.
Controller Plantplant input
Σ+
errorsignal
referencesignal
plant output
-unity feedback
Control LearningLearning to choose actionsn Robot learns navigation and obstacle avoidancen Learning to choose actions to optimize a factory outputn Learning to play BackgammonProblem characteristics:n Delayed reward instead of immediate reward for good or
bad actions, temporal credit assignment problemn No supervised learning (no training examples in form of
correct state, action pairs)n Learning with a criticn Need for active exploration
9
Learning to play Backgammon
n state : board staten actions : possible movesn reward function
n +100 winn -100 loosen 0 for all other actions/states
n trained by playing 1.5 million games against itselfn now approximately equal to the best human playern link: http://www.research.ibm.com/massive/tdl.htmln reading assignment Tesauro [1995])
Reinforcement Learning
Agent
Environment
state st
st+1
rt+1
reward rt action at
s0a0
r1s1
a1
r2s2
a2
r3
Ziel: Learn a policy a=π(s) which maximizes future accumulated rewards
R = rt+γ rt+1+ γ2 rt+2+… +…= Σi=0 rt+i γi
s3
Upswing of an Inverse Pendulumreward r: +1000 penalty r: -1000
Upswing of an Inverse Pendulumn state s:
n angle ϕn angular velocity ω
n control action a: n leftn rightn brake
upswing_1.MOV
10
Learning Problem
Learning: improving with experience at some taskn Improve over task Tn With respect to performance measure Pn Based on experience E
Example: Learn to play checkers:n T: play checkersn P: percentage of games won in a tournamentn E: opportunity to play against itself
Learning to play checkers
n T: play checkersn P: percentage of games wonn What experience?n What exactly should be learned?n How shall it be represented?n What specific algorithm to learn it?
Type of Training Experience
n Direct or indirect?n Direct: board state -> correct moven Indirect: outcome of a complete gamen Credit assignment problem
n Teacher or not ?n Teacher selects board statesn Learner can select board states
n Is training experience representative of performance goal?n Training playing against itselfn Performance evaluated playing against world champion
Choose Target Function
n ChooseMove : B → M : board state → moven Maps a legal board state to a legal move
n Evaluate : B→V : board state → board valuen Assigns a numerical score to any given board
state, such that better board states obtain a higher score
n Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score
11
Definition of Target Functionn If b is a final board state that is won then V(b) =
100n If b is a final board state that is lost then V(b) = -
100n If b is a final board state that is drawn then V(b)=0n If b is not a final board state, then V(b)=V(b’),
where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game.
n Gives correct values but is not operational
State Space Search
V(b)= ?
V(b)= maxi V(bi)
m1 : b→b1m2 : b→b2
m3 : b→b3
State Space Search
V(b1)= ?
m4 : b→b4m5 : b→b5
m6 : b→b6
V(b1)= mini V(bi)
Final Board States
Black wins: V(b)=-100
Red wins: V(b)=100
draw: V(b)=0
12
Number of Board States
Tic-Tac-Toe:
#board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + …… + 9!/(2! 4! 3!) + … 9 = 6045
4 x 4 checkers: (no queens)
#board states = ?
#board states < 8x7x6x5*22/(2!*2!) = 1680
Regular checkers (8x8 board, 8 pieces each)
#board states < 32!*216/(8! * 8! * 16!) = 5.07*1017
Representation of Target Function
n table look-upn collection of rulesn neural networksn polynomial function of board featuresn trade-off in choosing an expressive
representation: n approximation accuracyn number of training examples required to
learn the target function
Representation of Target Function
V(b)=ω0 + ω1bp(b) + ω2rp(b) + ω3bk(b) + ω4rk(b) + ω5bt(b) + ω6rt(b)
n bp(b): #black piecesn rb(b): #red piecesn bk(b): #black kingsn rk(b): #red kingsn bt(b): #red pieces threatened by blackn rt(b): #black pieces threatened by red
Obtaining Training Examples
n V(b) : true target functionn V’(b) : learned target functionn Vtrain(b) : training valuen Rule for estimating training values:n Vtrain(b) ← V’(Successor(b))
13
Choose Weight Training Rule
LMS weight update rule:n Select a training example b at random1. Compute error(b)
error(b) = V train(b) – V’(b)
2. For each board feature fi, update weight ωi ← ωi + η fi error(b)
η : learning rate approx. 0.1
Example: 4x4 checkers V(b)=ω0 + ω1rp(b) + ω2bp(b)Initial weights: ω0=-10, ω1 =75, ω2 =-60
V(b0)=ω0 + ω1*2 + ω2*2 = 20
m1 : b→b1 V(b1)=20
m2 : b→b2V(b2)=20
m3 : b→b3V(b3)=20
Example 4x4 checkers
V(b1)=20V(b0)=20
1. Compute error(b0) = Vtrain(b) – V(b0) = V(b1) – V(b0) = 0
2. For each board feature fi, update weight
ωi ← ωi + η fi error(b)
ω0 ← ω0 + 0.1 * 1 * 0
ω1 ← ω1 + 0.1 * 2 * 0
ω2 ← ω2 + 0.1 * 2 * 0
Example: 4x4 checkers
V(b2)=20
V(b0)=20 V(b1)=20
V(b3)=20
14
Example: 4x4 checkers
V(b4b)=-55V(b4a)=20
V(b3)=20
Example 4x4 checkers
V(b4)=-55V(b3)=20
1. Compute error(b3) = Vtrain(b) – V(b3) = V(b4) – V(b3) = -75
2. For each board feature fi, update weight
ωi ← ωi + η fi error(b) : ω0=-10, ω1 =75, ω2 =-60ω0 ← ω0 - 0.1 * 1 * 75, ω0 = -17.5
ω1 ← ω1 - 0.1 * 2 * 75, ω1 = 60
ω2 ← ω2 - 0.1 * 2 * 75, ω2 = -75
Example: 4x4 checkers
V(b5)=-107.5V(b4)=-107.5
ω0 = -17.5 , ω1 = 60, ω2 = -75
Example 4x4 checkers
V(b6)=-167.5V(b5)=-107.5
error(b5) = V train(b) – V(b5) = V(b6) – V(b5) = -60
ω0=-17.5, ω1 =60, ω2 =-75ωi ← ωi + η fi error(b)
ω0 ← ω0 - 0.1 * 1 * 60, ω0 = -23.5
ω1 ← ω1 - 0.1 * 1 * 60, ω1 = 54
ω2 ← ω2 - 0.1 * 2 * 60, ω2 = -87
15
Example 4x4 checkers
V(b6)=-197.5
error(b6) = V train(b) – V(b6) = Vf(b6) – V(b6) = 97.5
ω0=-23.5, ω1 =54, ω2 =-87ωi ← ωi + η fi error(b)
ω0 ← ω0 + 0.1 * 1 * 97.5, ω0 = –13.75
ω1 ← ω1 + 0.1 * 0 * 97.5, ω1 = 54
ω2 ← ω2 + 0.1 * 2 * 97.5, ω2 = -67.5
Final board state: black won V f(b)=-100
Evolution of Value Function
Training data:beforeafter
Design ChoicesDetermine Type of Training Experience
Games againstexperts
Games against self
Table of correctmoves
Board→Move
Determine Target Function
Board→Value
Determine Representationof Learned Function
polynomial Linear function ofsix features
Artificial neuralnetwork
Determine Learning Algorithm
Gradient descent Linear programming
Learning Problem Examples
n Credit card applicationsn Task T: Distinguish ”good” applicants from
”risky” applicants.
n Performance measure P : ?n Experience E : ? (direct/indirect)
n Target function : ?
16
Performance Measure P:
n Error based: minimize percentage of incorrectly classified customers : P = Nfp + Nfn / N
Nfp: # false positives (rejected good customers) Nfn: # false negatives (accepted bad customers)
n Utility based: maximize expected profit of credit card business: P = Ncp *Ucp+ Nfn *Ufn
Ucp : expected utility of an accepted good customer Ufn : expected utility/loss of an accepted bad customer
Experience E:n Direct: Decisions on credit card applications
made by a human financial expert Training data: <customer inf., reject/accept>
n Direct: Actual customer behavior based on previously accepted customers
Training data: <customer inf., good/bad>
Problem: Distribution of applicants Papplicant is not identical with training data P train
n Indirect: Evaluate a decision policy based on the profit you made over the past N years.
Distribution of Applicants
Good customersBad customers
Assume we want to minimize classification error:
What is the optimal decision boundary?
Cw=38
Distribution of Accepted Customers
Good customersBad customers
Cw=43
What is the optimal decision boundary?
17
Target Function
Customer record:
income, owns house, credit history, age, employed, accept$40000, yes, good, 38, full-time, yes$25000, no, excellent, 25, part-time, no$50000, no, poor, 55, unemployed, no
n T: Customer data → accept/reject
n T: Customer data → probability good customern T: Customer data → expected utility/profit
Learning methods
n Decision rules:n If income < $30.000 then reject
n Bayesian network:n P(good | income, credit history,….)
n Neural Network:n Nearest Neighbor:
n Take the same decision as for the customer in the data base that is most similar to the applicant
Learning Problem Examples
n Obstacle Avoidance Behavior of a Mobile Robotn Task T: Navigate robot safely through an
environment.n Performance measure P : ?
n Experience E : ?
n Target function : ?
Performance Measure P:
n P: Maximize time until collision with obstacle
n P: Maximize distance travelled until collision with obstacle
n P: Minimize rotational velocity, maximize translational velocity
n P: Minimize error between control action of a human operator and robot controller in the same situation
18
Training Experience E:
n Direct: Monitor human operator and use her control actions as training data:n E = { <perceptioni , actioni>}
n Indirect: Operate robot in the real world or in a simulation. Reward desirable states, penalize undesirable statesn V(b) = +1 if v > 0.5 m/sn V(b) = +2 if ω < 10 deg/sn V(b) = -100 if bumper state = 1Question: Internal or external reward ?
Target Function
n Choose action:n A: perception → actionSonar readings: s1(t)…sn(t) → <v,ω>
n Evaluate perception/state: n V: s1(t)…sn(t) → V(s1(t)…sn(t))n Problem: states are only partially observable
therefore world seems non-deterministicn Markov Decision Process : successor state s(t+1) is
a probabilistic function of current state s(t) and action a(t)
n Evaluate state/action pairs: n V: s1(t)…sn(t), a(t) → V(s1(t)…sn(t),a(t))
Learning Methods
n Neural Networks n Require direct training experience
n Reinforcement Learningn Indirect training experience
n Evolutionary Algorithmsn Indirect training experience
Issues in Machine Learning
n What algorithms can approximate functions well and when?
n How does the number of training examples influence accuracy?
n How does the complexity of hypothesis representation impact it?
n How does noisy data influence accuracy?
n What are the theoretical limits of learnability?