cs 484 – artificial intelligence1 announcements list of 5 source for research paper homework 5 due...
TRANSCRIPT
CS 484 – Artificial Intelligence 1
Announcements
• List of 5 source for research paper• Homework 5 due Tuesday, October 30• Book Review due Tuesday, October 30
Classification problems and Machine Learning
Lecture 10
CS 484 – Artificial Intelligence 3
EnjoySport concept learning task
• Given• Instances X: Possible days, each described by the attributes
• Sky (with possible values Sunny, Cloudy, and Rainy)• AirTemp (with values Warm and Cold)• Humidity (with values Normal and High)• Wind (with values Strong and Weak)• Water (with values Warm and Cool), and• Forecast (with values Same and Change)
• Hypothesis H: Each hypothesis is described by a conjunction of constraints on the attributes. The constraints may be "?", "Ø", or a specific value
• Target concept c: EnjoySport : X → {0,1}• Training Examples D: Positive or negative examples of the target function
• Determine• A hypothesis h in H such that h(x) = c(x) for all x in X
CS 484 – Artificial Intelligence 4
Find-S: Finding a maximally Specific Hypothesis (review)
1. Initialize h to the most specific hypothesis in H2. For each positive training instance x
• For each attribute constraint ai in h• If the constraint ai is satisfied by x• Then do nothing• Else replace ai in h by the next more general constraint that is
satisfied by x
3. Output hypothesis h• Begin: h ← <Ø, Ø, Ø, Ø, Ø, Ø>
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
CS 484 – Artificial Intelligence 5
Candidate Elimination
• Candidate elimination aims to derive one hypothesis which matches all training data (including negative examples).
{<Sunny, Warm, ?, Strong, ?, ?>}
{<Sunny, ?, ?, ?, ?, ?>}, {<?, Warm, ?, ?, ?, ?>} G:
S:
<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>
CS 484 – Artificial Intelligence 6
Candidate-Elimination Learning Algorithm
• Initialize G to the set of maximally general hypotheses in H• Initialize S to the set of maximally specific hypotheses in H• For each training example d, do
• If d is a positive example• Remove from G any hypothesis inconsistent with d• For each hypothesis s in S that is not consistent with d
• Remove s from S• Add to S all minimal generalizations h of s such that
• h is consistent with d, and some member of G is more general than h• Remove from S any hypothesis that is more general than another hypothesis
in S• If d is a negative example
• Remove from S any hypothesis inconsistent with d• For each hypothesis g in G that is not consistent with d
• Remove g from G • Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h• Remove from G any hypothesis that is less general than another hypothesis
in G
CS 484 – Artificial Intelligence 7
Example
• G0 ← {<?,?,?,?,?,?>}
• G1 ←
• G2 ←
• S0 ← {< Ø, Ø, Ø, Ø, Ø, Ø>}
• S1 ←
• S2 ←
E1 = <Sunny, Warm, Normal, Strong, Warm, Same> positive
E2 = <Sunny, Warm, High, Strong, Warm, Same> positive
CS 484 – Artificial Intelligence 8
Example (cont. 2)
• G2 ←
• G3 ←
• S2 ←
• S3 ←
E3 = <Rainy, Cold, High, Strong, Warm, Change> negative
CS 484 – Artificial Intelligence 9
Example (cont. 3)
• G3 ←
• G4 ←
• S3 ←
• S4 ←
E4 = <Sunny, Warm, High, Strong, Cool, Change> positive
CS 484 – Artificial Intelligence 10
Decision Tree Learning• Has two major benefits over Find-S and Candidate
Elimination• Can cope with noisy data• Capable of learning disjunctive expressions
• Limitation• May be many valid decision trees given the training
data• Prefers small trees over large trees
• Apply to board range of learning tasks• Classify medical patients by their disease• Classify equipment malfunctions by their cause• Classify loan applicants by their likelihood of
defaulting on payments
CS 484 – Artificial Intelligence 11
Decision Tree Example
Outlook
Humidity WindYes
SunnyOvercast
Rain
Yes YesNoNo
High Normal Strong Weak
Days on which to play tennis
CS 484 – Artificial Intelligence 12
Decision Tree Induction (1)
• Decision tree induction involves creating a decision tree from a set of training data that can be used to correctly classify the training data.
• ID3 is an example of a decision tree learning algorithm.
• ID3 builds the decision tree from the top down, selecting the features from the training data that provide the most information at each stage.
CS 484 – Artificial Intelligence 13
Decision Tree Induction (2)
• ID3 selects attributes based on information gain. • Information gain is the reduction in entropy caused by a
decision. • Entropy is defined as:
H(S) = - p1 log2 p1 - p0 log2 p0 • p1 is the proportion of the training data which are positive examples
• p0 is the proportion which are negative examples
• Intuition about H(S)• Zero (min value) when all the examples are the same (positive or
negative) • One (max value) when half are positive and half are negative.
CS 484 – Artificial Intelligence 14
Example – Training DataDay Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
CS 484 – Artificial Intelligence 15
Calculate Information Gain• Initial Entropy
• All examples in one class• 9 positive examples, 5 negative examples
• H(init) = -.643 log2 .643 - .357 log2 .357 = 0.940• Calculate Entropy for each attribute and then combine as a weighted
sum• Entropy of "Outlook"
• Sunny• 5 examples, 2 positives, 3 negatives• H(Sunny) = -(2/5) log2 (2/5) – (3/5) log2 (3/5) = 0.971
• Overcast• 4 examples, 4 positives, 0 negatives• H(Overcast) = -1 log2 (1) – (0) log2 (0) = 0 (0 log2 0 is defined as 0)
• Rain• 5 examples, 3 positives, 2 negatives• H(Rain) = -(3/5) log2 (3/5) – (2/5) log2 (2/5) = 0.971
• H(Outlook) = .357(0.971) + .286(0) + .357(0.971) = 0.694• Information Gain = H(0) – H(1)
• Gain = 0.940 – 0.694 = .246
CS 484 – Artificial Intelligence 16
Maximize Information Gain
• Gain of each attribute• Gain(Outlook) = 0.246• Gain(Humidity) = 0.151• Gain(Wind) = 0.048• Gain(Temperature) = 0.029
Outlook
? ?
SunnyOvercast
Rain
{D1, D2, …, D14}[9+,5-]
Yes
{D3, D7, D12, D13}[4+,0-]
{D4, D5, D6,D10, D14}[3+,2-]
{D1, D2, D8,D9, D11}[2+,3-]
CS 484 – Artificial Intelligence 17
Unbiased Learner• Provide a hypothesis space capable of
representing every teachable concept• Every possible subset of the instances X (power set of
X)• How large is this space?
• For EnjoySport, there are 96 instances in X• The power set is 2|X|
• EnjoySport has 1028 distinct target concepts• Allows disjunctions, conjunctions, and negations• Can no longer generalize beyond observed
examples
CS 484 – Artificial Intelligence 18
Inductive Bias
• All learning methods have an inductive bias.• The inductive bias of a learning method is the set
of restrictions on the learning method.• Without inductive bias, a learning method could
not learn to generalize.• A learner that makes no a priori assumptions regarding
the identity of the target concept has no rational basis for classifying any unseen instances
CS 484 – Artificial Intelligence 19
Bias in Learning Algorithms
• Rote-Learner: If the instance is found in memory, the stored classification is returned. Otherwise, the system refuses to classify the new instance
• Find-S: Finds the most specific hypothesis consistent with the training examples. It then uses this hypothesis to classify all subsequent instances
CS 484 – Artificial Intelligence 20
Candidate-Elimination Bias
• Candidate-Elimination will converge to true target concept provided accurate training examples and its initial hypothesis space contains the true target concept• Only consider conjunctions of attribute values• Cannot represent "Sky = Sunny or Sky = Cloudy"
• What if the target concept is not contained in the hypothesis space?
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Cool Change Yes
2 Cloudy Warm Normal Strong Cool Change Yes
3 Rainy Warm Normal Strong Cool Change No
CS 484 – Artificial Intelligence 21
Bias of ID3
• Choose the first acceptable tree it encounters in its simple-to-complex, hill-climbing search• Favors shorter trees over longer ones• Selects trees that place the attributes with
highest information gain closest to the root
• Interaction between attribute selection heuristic and training examples makes it difficult to precisely characterize its bias
CS 484 – Artificial Intelligence 22
ID3 vs. Candidate Elimination
• Difference between the types of inductive bias• Hypothesis space
• ID3 searches a complete hypothesis space• Inductive bias is a consequence of the ordering of
hypotheses by its search strategy
• Candidate-Elimination searches an incomplete hypothesis space
• Searches the space completely
• Inductive bias is a consequence of the expressive power of its hypothesis representation
CS 484 – Artificial Intelligence 23
Why Prefer Short Hypotheses?• Occam's razor
• Prefer the simplest hypothesis that fits the data• Appling Occam's razor
• Fewer short hypotheses than long ones, so it is less likely that one will find a short hypothesis that coincidentally fits the training data
• A 5-node tree is less likely to be a statistical coincidence and prefer this hypothesis over the 500-node hypothesis
• Problems with this argument• By the same argument, you could put many more qualifications on
the decision tree. Would that be better?• Size is determined by the particular representation used internally
by the learner• Don't reject Occam's razor all together
• Evolution will create internal representations that make the learning algorithm's inductive bias a self-fulfilling prophecy, simply because it can alter the representation easier than it can alter the learning algorithm
CS 484 – Artificial Intelligence 24
The Problem of Overfitting
Black dots represent positive examples, white dots negative.
The two lines represent two different hypotheses.
In the first diagram, there are just a few items of training data, correctly classified by the hypothesis represented by the darker line.
In the second and third diagrams we see the complete set of data, and that the simpler hypothesis which matched the training data less well matches the rest of the data better than the more complex hypothesis, which overfits.
CS 484 – Artificial Intelligence 25
The Nearest Neighbor Algorithm (1)• This is an example of instance based learning.• Instance based learning involves storing training
data and using it to attempt to classify new data as it arrives.
• The nearest neighbor algorithm works with data that consists of vectors of numeric attributes.
• Each vector represents a point in n-dimensional space.
CS 484 – Artificial Intelligence 26
The Nearest Neighbor Algorithm (2)• When an unseen data item is to be classified, the Euclidean
distance is calculated between this item and all training data.
• the distance between <x1, y1> and <x2, y2> is:
• The classification for the unseen data is usually selected as the one that is most common amongst the few nearest neighbors.
• Shepard’s method involves allowing all training data to contribute to the classification with their contribution being proportional to their distance from the data item to be classified.