Download - L16 Machine Learning
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 1/38
241-320 Design Architecture and Engineeringfor Intelligent System
Suntorn Witosurapot
Contact Address: Phone: 074 287369 or
Email: [email protected]
January 2010
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 2/38
Lecture 16:
Machine Learning - Part 1-
(Learning from Observations)
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 3/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning
from Observations) 3
Motivation
An AI agent operating in a complex world requires anawful lot of knowledge:
– state representations, constraints,
action descriptions, heuristics, probabilities, ...
More and more, AI agents are designed to acquireknowledge through learning (การเรยีนรู )
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 4/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning
from Observations) 4
Outline
What is Learning?
Learning Agents
Introduction to inductive learning
Logic-based inductive learning:
– Decision-tree induction
Function-based inductive learning
– Neural nets
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 5/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning
from Observations) 5
What’s Learning?
Learning is essential for unknown environments
– i.e. when designer lacks omniscience
Learning is useful as system construction method
– i.e. expose agent to reality rather than try towrite it down
Learning modifies the agent's decision mechanismsto improve performance
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 6/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning
from Observations) 6
Outline
What is Learning?
Learning Agents
Inductive learning
Logic-based inductive learning:
– Decision-tree induction
Function-based inductive learning
– Neural nets
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 7/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning
from Observations) 7
Learning agents
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 8/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning
from Observations) 8
Learning agents (cont.)
Main idea:
– agents should use their percepts not only for acting,but also for improving their future ability to act
Wide range of methods
Major design issue is thetype of feedback that willbe available to the agent
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 9/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning
from Observations) 9
Types of learning from feedback
Supervised learning
– Given a set of example inputs and outputs
– Goal is to learn a function relating the two
(เช น เทคนคิ Decision Tree) Unsupervised learning
– Given inputs, but no outputs
– Goal is to group input into different classes
(แยกแยะขอมูลออกเป็นกลุ มๆ เช น เทคนคิ Nearest
Neighborhood)
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 10/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 10
Examples
Supervised learning
– Taxi learning to brake with instructor
– Spam filter
Unsupervised learning
– Market research
– Data mining
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 11/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 11
Other factors affects to learning
Representation of learned information
Availability of prior knowledge
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 12/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 12
Outline
Why learning?
Types of learning
Inductive learning
Logic-based inductive learning:
– Decision-tree induction
Function-based inductive learning
– Neural nets
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 13/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 13
Inductive Learning
เป็นการเรยีนรู จากเหตกุารณ์ หรอื Feedback โดยที ่รับทราบข อมูลหรอืคาความจรงิเพยีงบางส วน แตกพ็ยายามคนหาหรือประมาณการคาที ่แท จรงิ (หรอืคาที ่ใกลเคยีง) ของ
ข อมูลอ ่ืนๆ ได Ex: การเรยีนรู ของพนักงานขาย โดย
– ศ ึกษาพฤตกิรรม / บุคลกิลูกค า / ความสนใจขณะสาธติส ินค า
– กอ็าจคนหาความตองการที ่แท จรงิของลูกค า เพ ่ือสรางความส ําเร็จในงานขายได
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 14/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 14
Inductive learning
Simplest form: learn a function from examples
Let's call an example a pair (x,f(x)), where x is theinput and f(x) is the output of the function applied to x
Pure inductive inference:
– Given a collection of examples (aka training set ) off , return a function h that approximates f
– h is called a hypothesis How can we tell if a hypothesis is good?
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 15/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 15
Example
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 16/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 16
Example
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 17/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 17
Example
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 18/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 18
Example
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
Q: How do wedecide amongthese hypothesesthat all agree withour data?
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 19/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 19
What we desire from a hypothesis
Since we will use the hypothesis h most oftento predict the output of f(x) on examples wehaven't seen yet, we want it to do well on
these We call this generalization
Ideally we would like to find an h such that
h = f
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 20/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 20
Tradeoff: complexity vs data-fit
Generally, the larger and more complex the hypothesisis, the better we can fit our data
However, we need to take into account thecomputational complexity of learning
– Fitting straight lines = easy
– Fitting high-degree polynomials = harder
Also want to take into account how hard it is to use h.
– Prefer fast computation time
Learning typically focuses on“simple” representations
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 21/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 21
Outline
Why learning?
Basic Ideas
Inductive learning
Logic-based inductive learning:
– Decision-tree induction
Function-based inductive learning
– Neural nets
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 22/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 22
Logic-Based Inductive Learning:Decision Tree Method
It is a supervised learning technique
– ใช คาดคะเน หรอืทํานายเหตกุารณท์ ่ีจะเกดิข ้ึนลวงหนา ตามสถานการณต์างๆ ท ่ีเกดิข ้ึน โดยใช ผลลัพธท์ ่ีได จากการตัดส ินใจตามผังโครงสรางขอมูลแบบตนไม
Widely used algorithm (even in our daily life)
Structure of Decision Tree:
– Root & Leaves connecting with branches
– Searching along any branch is upon the situation
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 23/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 23
Ex: More Complex of Decision Tree
Problem: decide whether to wait for a table at arestaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 24/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 24
Attribute-based representations
Examples described by attribute values (Boolean, discrete,
continuous), e.g. situations where I will/won't wait for a table:
Classification of examples is positive (T) or negative (F)
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 25/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 25
Decision trees
One possible representation for hypotheses E.g., here is the “true” tree for deciding whether to wait:
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 26/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 26
Decision trees
Another possible representation for hypotheses
This decision tree looks less complex and morerealistic than the one in the previous slide – It concerns at hungriness, rather than estimate waiting time
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 27/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 27
Decision trees
Occam’s razor: prefer the simplest hypothesis consistent with data
หากพจิารณาตามเกณฑ์ของ “มดีโกนของอ็อกแคม” ข างตนจะเห็นวา Decision Tree ที ่เล็กที ่สดุนาจะเป็นอันที ่ดทีี ่สดุ
แตกระบวนการสราง Decision Tree น้ันซ ับซ อนมากเพิ ่มข ้ึนตามจํานวน Node ที ่รวมดวย (Hypothesis spaces)
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions= number of distinct truth tables with 2n rows = 22n
E.g., with 6 Boolean attributes, there are18,446,744,073,709,551,616 trees
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 28/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 28
Expressiveness
Decision trees can express any function of the input attributes.
E.g., for Boolean functions, truth table row → path to leaf:
Trivially, there is a consistent decision tree for any training setwith one path to leaf for each example, but it probably won'tgeneralize to new examples
Prefer to find more compact decision trees
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 29/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 29
Decision tree learning
Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as
root of (sub)tree
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 30/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 30
Choosing an attribute
Idea: a good attribute splits the examples into subsetsthat are (ideally) "all positive" or "all negative"
Patrons? is a better choice
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 31/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 31
Choosing an attribute via Information Theory
To implement Choose-Attribute in the DTL algorithm,
It is needed to find Information Content (Entropy):
I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)
คาเอนโทรปีชุดข อมูลโดยเฉลี ่ย = ผลรวมของ (-log2(ความนาจะเป็นของข อมูลแตละตัว)
คาเอนโทรปีนี ้จะนําไปใช ในการประเมิน “เน ้ือหาของขอมูลสารสนเทศ” ชุดตางๆ วา มีความเหมือนหรอืแตกตางกันได
– ใช ประกอบการพจิารณาตอไปวาจะสามารถลดจํานวนก ่ิงทเีหมือนกันๆออกไปไดหรอืไม ในกระบวนการแปลงตารางข อมูลเป็นแผนผังต นไม
– ดูตัวอยางการโยนเหรยีญหัวก อย ในสไลดถั์ดไป
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 32/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 32
Ex: ข อมูลการโยนเหรียญ
ชุดข อมูล (M) = {หัว, ก อย}
ความนาจะเป็นในการออกหัว และก อย =P(หัว), P(ก อย) ตามลําดับ
คาเอนโทรปีโดยเฉลี ่ยของขอมูลชุดนี ้ = I(M)
จะเห็นวา เม ่ือออกหัวหรือก อยหมด คาเอนโทรปีจะเป็นศูนย์ แตคานี ้จะเพ ่ิมข ้ึนเรื ่อยๆ จนสงูสดุท ่ีโอกาสของการเป็นหัวหรือก อยมเีทากัน ดังนั้น
เอนโทรปีมีคาน อย จะบงช ี ้ได วาชดุข อมูลนี ้มีความแตกตางกันน อย หรืออาจะเป็นชุดเดียวกัน หรือกรณีตรงขามท ่ีเอนโทรปีมีคามาก ชดุข อมูลจะแตกตางกันมากดวย
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 33/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 33
Information gain
A chosen attribute A divides the training set E intosubsets E 1, … , E v
according to their values for A,where A has v distinct values.
Information Gain (IG) or reduction in entropy fromthe attribute test:
– คดิจากคาเอนโทรปีรวม ลบด วยคาเอนโทรปีหลังจากเลือก attribute
อันหนึ ่งเป็นราก Choose the attribute with the largest IG
– เพ ่ือนํามาใช ในการพจิารณาเป็น “ราก” ส ําหรับการตัดส ินใจตอไป
∑= +++
+=
v
i ii
i
ii
iii
n p
n
n p
p I
n p
n p Aremainder
1
),()(
)(),()( Aremainder n p
n
n p
p I A IG −
++
=
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 34/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 34
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Consider the attributes Patrons and Type (and others too):
Since Patrons has the highest IG of all attributes, so it ischosen by the DTL algorithm as the root
bits0)]4
2,
4
2(
12
4)
4
2,
4
2(
12
4)
2
1,
2
1(
12
2)
2
1,
2
1(
12
2[1)(
bits0541.)]6
4,6
2(12
6)0,1(12
4)1,0(12
2[1)(
=+++−=
=++−=
I I I I Type IG
I I I Patrons IG
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 35/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 35
Example (cont.)
Decision tree learned from the 12 examples:
Substantially simpler than “true” tree---a more complexhypothesis isn’t justified by small amount of data
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 36/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 36
Performance measurement
Q: How do we know that h ≈ f ?1. Use theorems of computational/statistical learning theory
2. Try h on a new test set of examples
(use same distribution over example space as training set)
Learning curve = % correct on test set as a function oftraining set size
8/3/2019 L16 Machine Learning
http://slidepdf.com/reader/full/l16-machine-learning 37/38
241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 37
Summary
Learning is needed forunknown environments, lazy designers
Learning agent = feedback + learning element +
performance element
For supervised learning, the aim is to find a simplehypothesis approximately consistent with trainingexamples
Decision tree learning using information gain
Learning performance = prediction accuracymeasured on test set