machine learning chapter 2. concept learning and the general-to-specific ordering gun ho lee...
TRANSCRIPT
Machine Learning
Chapter 2. Concept Learning and The General-to-specific Ordering
Gun Ho Lee
Soongsil University, Seoul
2
Outline
Learning from examples General-to-specific ordering over hypotheses Version spaces and candidate elimination
algorithm Picking new examples The need for inductive bias
Note: simple approach assuming no noise,
illustrates key concepts
3
A Concept
Examples of Concepts– “birds”, “car”, “situations” in which I should
study more in order to pass the exam”
Concept – Some subset of objects or events defined over
a larger set, or – A boolean-valued function defined over this
larger set.– Concept “birds” is the subset of animals that
constitute birds.
4Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007
A Concept Let there be a set of objects, X.
X = { 백구 , 야옹이 , 도그 , 강세이 }
A concept C is…
– A subset of X
C = dogs = { 백구 , 도그 , 강세이 }
– A function that returns 1 only for elements in the concept
C( 백구 ) = 1, C( 야옹이 ) = 0
4
5
Instance Representation
Represent an object (or instance) as an n-tuple of attributes
Example: Days (6-tuples)
5
Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same No 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes
6
Concept Learning
Learning – Inducing general functions from specific training
examples
Concept learning– Acquiring the definition of a general category
given a sample of positive and negative training examples of the category
– Inferring a boolean-valued function from training examples of its input and output.
7
A Concept Learning Task
Target concept EnjoySport– “days on which 박지성 enjoys water sport”
Hypothesis– A vector of 6 constraints, specifying the values of the six
attributes
Sky, AirTemp, Humidity, Wind, Water, Forecast.
– <?, Cold, High, ?, ?, ?>
the hypothesis : 박지성은 일기상태 (cold, high humidity) 에서 수상 스포츠를 즐긴다 .
Sky, AirTemp, Humidity, Wind, Water, Forecast
8
Representing Hypotheses
Many possible representationsHere, h is conjunction of constraints on attributes
Each constraint can be a specific value (e.g., Water = Warm) don’t care (e.g., “Water =?”) no value allowed (e.g., “Water=Φ”)
For example, Sky AirTemp Humid Wind Water Forecst <Sunny ? ? Strong ? Same>
9
Task: Learn a hypothesis from a dataset
10
Example Concept Function
“Days on which my friend 박지성 enjoys his favorite water sport”
Sky Temp Humid Wind Water Forecast C(x)
sunny warm normal strong warm same 1
sunny warm high strong warm same 1
rainy cold high strong warm change 0
sunny warm high strong cool change 1
INPUTOUTPUT
10
11
The Learning Task
Given: – Hypotheses space H: conjunction of constraints on attributes. E.g. conjunction of literals: < Sunny ? ? Strong ? Same >
– Target concept c: E.g., EnjoySport X {0,1}
– Instances X: set of items over which the concept is defined. E.g., days decribed by attributes: Sky, Temp, Humidity, Wind, Water, Forecast
• Training examples (positive/negative): <x,c(x)>• Training set D: positive, negative examples of the target function: <x1,c(x1)>,…, <xn,c(xn)>
Determine:– A hypothesis h in H such that h(x) = c(x), for all x in X
12
Assumption 1
We will explore the space of all conjunctions. We assume the target concept falls within this space.
Target concept c
H, Hypotheses space
13
Assumption 2
A hypothesis close to target concept c obtained after
seeing many training examples will result in high
accuracy on the set of unobserved examples.
Training set DHypothesis h is good
Complement set D’Hypothesis h is good
Inductive learning hypothesis
14
Inductive Learning Hypothesis Learning task is to determine h identical to c over the
entire set of instances X.
But the only information about c is its value over D (training set).
Inductive learning algorithms can at best guarantee that the induced h fits c over D.
Inductive learning hypothesis– Any good hypothesis over a sufficiently large set of training
examples will also approximate the target function well over unseen examples.
15
Concept Learning as Search Search
– Find a hypothesis that best fits training examples– Efficient search in hypothesis space (finite/infinite)
Search space in EnjoySport• Sky has 3 (Sunny, Cloudy, and Rainy)• Temp has 2 (Warm and Cold)• Humidity has 2 (Normal and High)• Wind has 2 (Strong and Weak)• Water has 2 (Warm and Cool)• Forecast has 2 (Same and Change)
– 3x2x2x2x2x2 = 96 distinct instances – 5x4x4x4x4x4 = 5120 syntactically distinct hypotheses within H (considering Φ
and ? in addition)
<Sky AirTemp Humid Wind Water Forecst>
16Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007
Mammal
Concept Generality
A concept P is more general than or equal to another concept Q iff the set of instances represented by P includes the set of instances represented by Q.
Canine
Wolf Pig Dog
White_fang Lassie
Scooby_dooWilbur
Charlotte
16
17Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007
General to Specific Order
Consider two hypotheses:
– h1=< Sunny,?,?,Strong,?,?>
– h2=< Sunny,?,?, ?, ?,?>
Definition: hj is more general than or equal to hk iff:
This imposes a partial order on a hypothesis space.
1)(1)( xhxhxhh jkkj
17
18
Instance, Hypotheses, and More-General-Than
The Most General Hypothesis : < ?, ?, ?, ?, ?, ? >
The Most Specific Hypothesis : < Ø, Ø, Ø, Ø, Ø, Ø >
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007
x1=< Sunny,Warm,High,Strong,Cool,Same>
x2=< Sunny,Warm,High,Light,Warm,Same>
h1=< Sunny, ?, ?, Strong, ? ,?>
h2=< Sunny, ?, ?, ?, ?, ?>
h3=< Sunny, ?, ?, ?, Cool,?>
Instances
x2
x1
Hypotheses
h2
h3h1
h2 h1
h2 h3
specific
general
19
Find-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x– For each attribute constraint ai in h
If the constraint ai in h is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint
satisfied by x
3. Output hypothesis h
Finding a Maximally Specific Hypothesis
20
Hypothesis Space Search by Find-S
Instances Hypotheses
specific
general
h0
h0=< Ø, Ø, Ø, Ø, Ø, Ø,>
h1
x1=<Sunny,Warm,Normal,Strong,Warm,Same> +
x1
h1=< Sunny,Warm,Normal,Strong,Warm,Same>
x3=<Rainy,Cold,High,Strong,Warm,Change> -
x3
h2,3
x2=<Sunny,Warm,High,Strong,Warm,Same> +
x2
h2,3=< Sunny,Warm, ?, Strong,Warm,Same>
h4
x4=<Sunny,Warm,High,Strong,Cool,Change> +
x4
h4=< Sunny,Warm, ?, Strong, ?, ?>
21
Properties of Find-SFind-S
Ignores every negative example (no revision to h required in response to negative examples).
Guaranteed to output the most specific hypothesis consistent with the positive training examples (for conjunctive hypothesis space).
Final h also consistent with negative examples provided the target c is in H and no error in D.
22
Weaknesses of Find-S
Has the learner converged to the correct target concept ? No way to know whether the solution is unique.
Why prefer the most specific hypothesis? How about the most general hypothesis?
Are the training examples consistent ? Training sets containing errors or noise can severely mislead the algorithm Find-S.
What if there are several maximally specific consistent hypotheses? No backtrack to explore a different branch of partial ordering.
23
Partial order of hypotheses I
24
Partial order of hypotheses II
25
Partial order of hypotheses III
26
The space of hypotheses
27
The space of hypotheses I
28
The space of hypotheses II
29
The space of hypotheses III
30
A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example <x, c(x)> in D.
Consistent(h, D) ≡ ( <∀ x, c(x)> D) ∈ h(x) = c(x)
31
The version space, VSH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D.
VSH,D ≡ {h ∈ H | Consistent(h, D)}
Version Space
32
Version Spaces
A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example <x, c(x)> in D.
Consistent(h, D) ≡ ( <∀ x, c(x)> D) ∈ h(x) = c(x)
The version space, V SH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D.
VSH,D ≡ {h ∈ H | Consistent(h, D)}
33
The List-Then-Eliminate Algorithm:
1. VersionSpace a list containing every hypothesis in H
2. For each training example, <x, c(x)>
remove from VersionSpace any hypothesis h for which
h(x) c(x)
3. Output the list of hypotheses in VersionSpace
34
Drawbacks of List-Then-Eliminate
The algorithm requires exhaustively enumerating all hypotheses in H– An unrealistic approach ! (full search)
If insufficient (training) data is available, the algorithm will output a huge set of hypotheses consistent with the observed data– 학습 data 가 불충분할때 지나치게 많은 hypotheses 를 양산
!!
35Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007
Example Version Space
{<Sunny,Warm,?,Strong,?,?>}S:
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }G:
<Sunny,?,?,Strong,?,?> <Sunny,Warm,?,?,?,?> <?,Warm,?,Strong,?,?>
x1 = <Sunny Warm Normal Strong Warm Same> +x2 = <Sunny Warm High Strong Warm Same> +x3 = <Rainy Cold High Strong Warm Change> -x4 = <Sunny Warm High Strong Cool Change> +
36
Representing Version Spaces
The General boundary, G, of version space VSH,D is the set of
its maximally general members
The Specific boundary, S, of version space VSH,D is the set of
its maximally specific members
Every member of the version space lies between these
boundaries
VSH,D = {h ∈ H | (∃s ∈ S)(∃g ∈ G) (g ≥ h ≥ s)}
where x ≥ y means x is more general or equal to y
37
Relevant bounds
38
Basic Idea of Candidate Elimination Algorithm
1. Initialize G to the set of maximally general hypotheses in H
2. Initialize S to the set of maximally specific hypotheses in H
3. For each training example x, do If x is positive: generalize S if necessary If x is negative: specialize G if necessary
39
Candidate Elimination Algorithm (1/2)
G ← maximally general hypotheses in HS ← maximally specific hypotheses in H
For each training example d, do If d is a positive example
– G 로 부터 학습 data d 와 일관하지 않는 가설은 제거한다 . – 학습 data d 와 일관하지 않는 각 가설 s(s ∈ S) 에 대하여
• Remove s from S• 다음과 같이 최소로 일반화된 가설 h(all minimal generalizations h) 를 S 에
추가한다 .
1. 가설 h 가 학습 data d 와 일관하고 2. G 의 일부가 가설 h 보다 일반화 (general) 되어 있는 경우 만약 , S 의 다른 가설 보다 더 일반화 된 어떤 가설이 있다면 삭제한다 . (Specific boundary 유지 )
hG:
S:
inconsistent with d from G
Add minimal generalizations
40
Candidate Elimination Algorithm (2/2)
If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)
• Remove g from G
• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .
1. 가설 h 가 학습 data d 에 일관하고 ,
2. S 의 일부가 가설 h 보다 더 특수화 되어 있는• 만약 , G 의 다른 가설 보다 덜 일반화 ( less general) 된 어떤 가설이
있다면 G 에서 삭제한다 . (General Boundary 유지 )
h
G:
S:
inconsistent with d
Add minimal specializations
41
Candidate-Elimination Algorithm
– When does this halt?– If S and G are both singleton sets, then:
• if they are identical, output value and halt. • if they are different, the training cases were inconsistent.
Output this and halt.
– Else continue accepting new training examples.
42Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007
Example Candidate Elimination
• Instance space: integer points in the x,y plane with 0 x,y 10• hypothesis space : rectangles, that means hypotheses are of the form a x b , c y d , assume
a b
c
d
42
43
Example Candidate Elimination
• examples = {ø} • G= {a, b, c, d} = {0,10,0,10}• S={ø}
G
00
10
10
44
Example Candidate Elimination
• examples = {(3,4),+} • G={(0,10,0,10)}• S={(3,3,4,4)}
G
+
S
0
10
0 10
45
Example Trace
First initialize the S and G sets:
S0 : 0,0,0,0,0,0>
G0 : { < ?,?,?,?,?,?> }
46
Given 1st Example
The first example is positive:
< <Sunny, Warm, Normal, Strong, Warm, Same>, Yes>
h: <Sunny, Warm, Normal, Strong, Warm, Same>,
S0 : 0,0,0,0,0,0>
S1 : { <Sunny, Warm, Normal, Strong, Warm, Same> }
G0, G1 : { < ?,?,?,?,?,?> }
For each training example d, do If d is a positive example
– G 로 부터 학습 data d 와 일관하지 않는 가설은 제거한다 . – 학습 data d 와 일관하지 않는 각 가설 s(s ∈ S) 에 대하여
• Remove s from S• 다음과 같이 최소로 일반화된 가설 h(all minimal generalizations h) 를 S 에 추가한다 .
1. 가설 h 가 학습 data d 와 일관하고 2. G 의 일부가 가설 h 보다 일반화 (general) 되어 있는 경우 만약 , S 의 다른 가설 보다 더 일반화 된 어떤 가설이 있다면 삭제한다 .(Specific boundary 유지 )
(g ≥ h ≥ s) ?
Candidate Elimination Algorithm
47
Given 2nd Example
The second example is positive:
< <Sunny,Warm,High, Strong,Warm, Same>, Yes>
S1 : { <Sunny, Warm, Normal, Strong, Warm, Same> }
S2 : { <Sunny, Warm, ?, Strong, Warm, Same> }
G1, G2 : { < ?,?,?,?,?,?> }
For each training example d, do If d is a positive example
– G 로 부터 학습 data d 와 일관하지 않는 가설은 제거한다 . – 학습 data d 와 일관하지 않는 각 가설 s(s ∈ S) 에 대하여
• Remove s from S• 다음과 같이 최소로 일반화된 가설 h(all minimal generalizations h) 를 S 에 추가한다 .
1. 가설 h 가 학습 data d 와 일관하고 2. G 의 일부가 가설 h 보다 일반화 (general) 되어 있는 경우 만약 , S 의 다른 가설 보다 더 일반화 된 어떤 가설이 있다면 삭제한다 .(Specific boundary 유지 )
h: <Sunny, Warm, ?, Strong, Warm, Same>
(g ≥ h ≥ s) ?
Candidate Elimination Algorithm
48
Given 3rd Example
The third example is negative:
< <Rainy, Cold, High, Strong, Warm, Change>, No>
S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }
G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}
G2: { < ?,?,?,?,?,?> }
If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)
• Remove g from G
• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .
1. 가설 h 가 학습 data d 에 일관하고 ,
2. S 의 일부가 가설 h 보다 더 특수화 되어 있는• 만약 , G 의 다른 가설 보다 덜 일반화 ( less general) 된 어떤 가설이 있다면 G
에서 삭제한다 . (General Boundary 유지 )
(g ≥ h ≥ s) ?
Candidate Elimination Algorithm
Given 3rd Example
49
The third example is negative:
< <Rainy, Cold, High, Strong, Warm, Change>, No>
S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }
가능한 가설 h 들 {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}, <?,?, Normal,?,?,? >, <?, ?, ?, Strong, ?, ?>, <?, ?, ?, Warm, ?>}
G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}, 은 가능 ?
<?,?, Normal,?,?,? >, <?, ?, ?, Strong, ?, ?>, <?, ?, ?, Warm, ?> 은 불가능 ?
G2: { < ?,?,?,?,?,?> }
Candidate Elimination Algorithm
Given 3rd Example
50
따라서 S 로부터 삭제할 가설은 없다 !!
If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다
Candidate Elimination Algorithm
Given 3rd Example
51
If d is a negative example– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)
• Remove g from G
• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .
1. 가설 h 가 학습 data d 에 일관하고 ,
2. S 의 일부가 가설 h 보다 더 특수화 되어 있는
Candidate Elimination Algorithm
Why is <?,?, Normal,?,?,? > not included in G ?
Given 3rd Example
학습 data d : < <Rainy, Cold, High, Strong, Warm, Change>, No>
S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }
가설 h: <?,?, Normal,?,?,? >, yes 의 경우 h(x)=no, d(x) =no
<?,?, Normal,?,?,? > ≥ <Sunny, Warm, ?, Strong, Warm, Same> 인가 ?
VSH,D = {h ∈ H | (∃s ∈ S)(∃g ∈ G) (g ≥ h ≥ s)}
where x ≥ y means x is more general or equal to y
Candidate Elimination Algorithm
53
Given 3rd Example
The third example is negative:
< <Rainy, Cold, High, Strong, Warm, Change>, No>
S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }
G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}
G2: { < ?,?,?,?,?,?> }
If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)
• Remove g from G
• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .
1. 가설 h 가 학습 data d 에 일관하고 ,
2. S 의 일부가 가설 h 보다 더 특수화 되어 있는• 만약 , G 의 다른 가설 보다 덜 일반화 ( less general) 된 어떤 가설이 있다면 G
에서 삭제한다 . (General Boundary 유지 )
Candidate Elimination Algorithm
54
Given 4th Example
The 4th example is negative:
< <Sunny,Warm,High, Strong,Cool,Change ,Yes>
S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }
S4 : {<Sunny, Warm, ?, Strong, ?, ?> }
G4 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}
G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same> }
(g ≥ h ≥ s) ?
Candidate Elimination Algorithm
55
<Sunny, Warm, ?, Strong, Warm, Same>S2
= G2
Example Trace
<Ø, Ø, Ø, Ø, Ø, Ø>S0
<?, ?, ?, ?, ?, ?>G0
d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes>
d2: <Sunny, Warm, High, Strong, Warm, Same, Yes>
d3: <Rainy, Cold, High, Strong, Warm, Change, No>
d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>= S3
<Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same>G3
<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>
<Sunny, Warm, ?, Strong, ?, ?>S4
G4 <Sunny, ?, ?, ?, ?, ?>
<?, Warm, ?, ?, ?, ?>
<Sunny, Warm, Normal, Strong, Warm, Same>S1
= G1
Candidate Elimination Algorithm
56
The hypothesis space after four cases
Candidate Elimination Algorithm
57
The hypothesis space after four cases
Candidate Elimination Algorithm
58
Remarks on Candidate Elimination
What training example should the learner request next?
Will the CE algorithm converge to the correct hypothesis ?
How can partially learned concepts be used?
Candidate Elimination Algorithm
59
What Next Training Example?
60
Who Provides Examples?
Two methods– Fully supervised learning: External teacher provides all
training examples (input + correct output)– Learning by query: The learner generates instances
(queries) by conducting experiments, then obtains the correct classification for this instance from an external oracle (nature or a teacher).
Negative training examples specializes G, positive ones generalize S.
61
When Does CE Converge?
Will the Candidate-Elimination algorithm converge to the correct hypothesis?
Prerequisites– 1. No error in training examples– 2. The target hypothesis exists which correctly describes c(x).
If S and G boundary sets converge to an empty set, this means there is no hypothesis in H consistent with observed examples.
(S 와 G 모두 empty set 에 이르게 된다면 학습된 예제들과는 부합하는 hypothesis 이 없다는 것을 의미한다 .)
62
Can a partially learned classifier be used?
63
How to Use Partially Learned Concepts?
Suppose the learner is asked to classify the four new instances
shown in the following table.
Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport
A Sunny Warm Normal Strong Cool Change + 6/0 B Rainy Cold Normal Light Warm Same - 0/6 C Sunny Warm Normal Light Warm Same ? 3/3 D Sunny Cold Normal Strong Warm Same ? 2/4
{<Sunny,Warm,?,Strong,?,?>}S:
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }G:
<Sunny,?,?,Strong,?,?> <Sunny,Warm,?,?,?,?> <?,Warm,?,Strong,?,?>
64
Can a partially learned classifier be used?
A Biased Hypothesis Space
65
x1 = <Sunny Warm Normal Strong Cool Change> +x2 = <Cloudy Warm Normal Strong Cool Change> +
x3 = <Rainy Warm Normal Strong Cool Change> -
S2 : { <?, Warm, Normal, Strong, Cool, Change> } overly gerenal !!, incorrectly covers x3
S3 : {} The third example x3 contradicts the already overly
general hypothesis space specific boundary S2.
We have Biased the learner to consider only conjunctive hypothesis !!
66
An UnBiased Learner
Idea: Choose H that expresses every teachable concept (i.e., H is the
power set of X) Consider H' = disjunctions, conjunctions, negations over previous H. E.g., <Sunny Warm Normal ? ? ?> ∨<? ? ? ? ? Change>
What are S, G in this case? S ← G ←
target concept 들이 H 내에 존재하기
위해서는 모든 teachable concept 을
표현이 가능하도록 해야 함 !!
67
Unbiased Learner
Assume positive examples (x1, x2, x3) and negative examples (x4, x5)
S : { (x1 v x2 v x3) } G : { (x4 v x5) }
How would we classify some new instance x6 ?
For any instance not in the training exampleshalf of the version space says +the other half says –
=> To learn the target concept, one would have to present every single instance in X as a training example (Rote learning)
68
What Justifies this Inductive Leap?
New examples
d1: <Sunny Warm Normal Strong Cool Change>, Yes
d2: <Sunny Warm Normal Light Warm Same>, No
S : {<Sunny Warm Normal ? ? ?> +}
Overly general hypothesis
space
69
Inductive bias
• Our hypothesis space is unable to represent a simple disjunctive target concept : (Sky=Sunny) v (Sky=Cloudy)
x1 = <Sunny Warm Normal Strong Cool Change> +S1 : { <Sunny, Warm, Normal, Strong, Cool, Change> }
x2 = <Cloudy Warm Normal Strong Cool Change> +S2 : { <?, Warm, Normal, Strong, Cool, Change> }
x3 = <Rainy Warm Normal Strong Cool Change> -S3 : {}
The third example x3 contradicts the already overly general hypothesis space specific boundary S2.
Overly general hypothesis
space
70
Why believe we can classify the unseen ?
S : {<Sunny Warm Normal ? ? ?> +}
Unseen example: <Sunny Warm Normal Strong Warm Same>, ….
Why Inductive learning hypothesis ?
71
Why believe we can classify the unseen ?
S : {<Sunny Warm Normal ? ? ?> +}
Unseen example: <Sunny Warm Normal Strong Warm Same>, ….
Why Inductive learning hypothesis ?
Inductive learning hypothesis: “If the hypothesis works for enough data then it will work on new examples.”
72
Inductive bias The inductive bias of a learning algorithm is the set
of assumptions that the learner uses to predict outputs given inputs that it has not encountered (Mitchell, 1980).
Example
– Occam’s Razor
– Target concept c ∈ H (hypothesis space) of candidate-elimination algorithm
73
Inductive Bias
Consider concept learning algorithm L instances X, target concept c training examples Dc = {<x, c(x)>} let L(xi, Dc) denote the classification assigned to the instance xi by
L after training on data Dc.
Definition:The inductive bias of L is any minimal set of assertions B suchthat for any target concept c and corresponding training
examples Dc
(∀xi ∈ X)[(B ∧ Dc ∧ xi) ├ L(xi, Dc)]where A├ B means A logically entails B
74
Inductive bias II
75
Inductive bias II
76
Inductive Systems and EquivalentDeductive Systems
Candidate EliminationAlgorithm
Using HypothesisSpace H
Inductive System
Theorem Prover
Equivalent Deductive System
Training Examples
New Instance
Training Examples
New Instance
Assertion { c H }
Inductive bias made explicit
Classification of New Instance(or “Don’t Know”)
Classification of New Instance(or “Don’t Know”)
77
Three Learners with Different Biases
Rote Learner– Weakest bias: anything seen before, i.e., no bias
– Store examples
– Classify x if and only if it matches previously observed example
Version Space Candidate Elimination Algorithm– Stronger bias: concepts belonging to conjunctive H
– Store extremal generalizations and specializations
– Classify x if and only if it “falls within” S and G boundaries (all members agree)
Find-S– Even stronger bias: most specific hypothesis
– Prior assumption: any instance not observed to be positive is negative
– Classify x based on S set
78
Summary Points
1. Concept learning as search through H
2. General-to-specific ordering over H
3. Version space candidate elimination algorithm
4. S and G boundaries characterize learner’s uncertainty
5. Learner can generate useful queries
6. Inductive leaps possible only if learner is biased
7. Inductive learners can be modelled by equivalent
deductive systems