decision tree learning

Decision Tree LearningDecision Tree Learning

Brought to you by Chris Brought to you by Chris CreswellCreswell

Why learn about decision Why learn about decision trees?trees?

• A practical way to get AI to adapt to A practical way to get AI to adapt to player – a simple form of user player – a simple form of user modelingmodeling– Enhances replayabilityEnhances replayability– Player’s bot allies can be more effectivePlayer’s bot allies can be more effective– Opponent bots can learn player’s Opponent bots can learn player’s

tactics, player can’t repeat the same tactics, player can’t repeat the same strategy over and overstrategy over and over

What we’ll learnWhat we’ll learn

• What is a decision treeWhat is a decision tree

• How do we build a decision treeHow do we build a decision tree

• What has been done with decision What has been done with decision trees in gamestrees in games– What else can we do with themWhat else can we do with them

What is a decision treeWhat is a decision tree

• Decision Tree Learning (DTL) is a Decision Tree Learning (DTL) is a form of inductive learning task, form of inductive learning task, meaning it has the following meaning it has the following objective: use a training set of objective: use a training set of examples to create a hypothesis that examples to create a hypothesis that makes general conclusionsmakes general conclusions

What is a decision tree – What is a decision tree – terms/conceptsterms/concepts

• Attribute: a variable that we take into Attribute: a variable that we take into account in making a decisionaccount in making a decision

• Target attribute: the attribute that Target attribute: the attribute that we want to take on a certain value, we want to take on a certain value, we’ll decide based on itwe’ll decide based on it

What is a decision tree – What is a decision tree – an examplean example

ExampleExample HourHour WeatherWeather AccidentAccident StallStall Target -- Target -- CommuteCommute

D1D1 8AM8AM SunnySunny NoNo NoNo LongLong

D2D2 8AM8AM CloudyCloudy NoNo YesYes LongLong

D3D3 10AM10AM SunnySunny NoNo NoNo ShortShort

D4D4 9AM9AM RainyRainy YesYes NoNo LongLong

D5D5 9AM9AM SunnySunny YesYes YesYes LongLong

D6D6 10AM10AM SunnySunny NoNo NoNo ShortShort

D7D7 10AM10AM CloudyCloudy NoNo NoNo ShortShort

D8D8 9AM9AM RainyRainy NoNo NoNo MediumMedium

D9D9 9AM9AM SunnySunny YesYes NoNo LongLong

D10D10 10AM10AM CloudyCloudy YesYes YesYes LongLong

D11D11 10AM10AM RainyRainy NoNo NoNo ShortShort

D12D12 8AM8AM CloudyCloudy YesYes NoNo LongLong

D13D13 9AM9AM SunnySunny NoNo NoNo MediumMedium

What is a decision tree – What is a decision tree – an examplean example

Hour

10AM 8AM 9AM

Stall

No Yes

Short Long

Long Accident

No Yes

Medium Long

What is a decision tree – how What is a decision tree – how to use itto use it

• Given a set of circumstances (values Given a set of circumstances (values of attributes), use it to traverse the of attributes), use it to traverse the tree from root to leaftree from root to leaf

• The leaf node is a decisionThe leaf node is a decision

Why is this usefulWhy is this useful

• The hypothesis formed from the The hypothesis formed from the training set can be used to draw training set can be used to draw conclusions about sets of conclusions about sets of circumstances not present in the circumstances not present in the training set – it will generalizetraining set – it will generalize

How do we construct a How do we construct a decision tree?decision tree?

• Guiding principle of inductive learning:Guiding principle of inductive learning:– Occam’s razor – choose the simplest Occam’s razor – choose the simplest

possible hypothesis that is consistent with possible hypothesis that is consistent with the provided examplesthe provided examples

• General idea: recursively classify the General idea: recursively classify the examples based on one of the attributes examples based on one of the attributes until all examples have been useduntil all examples have been used

• Here’s the algorithm:Here’s the algorithm:

node LearnTree(node LearnTree(examplesexamples, , targetAttributetargetAttribute, , attributesattributes))

examplesexamples is the training set is the training set

targetAttributetargetAttribute is what to learn is what to learn

attributesattributes is the set of available attributes is the set of available attributes

returns a tree nodereturns a tree node

beginbegin

if all the if all the examplesexamples have the same have the same targetAttributetargetAttribute value, value,

return a leaf with that valuereturn a leaf with that value

else if the set of else if the set of attributesattributes is empty is empty

return a leaf with the most common return a leaf with the most common targetAttributetargetAttribute value among examples value among examples

else beginelse begin

A = the “best” attribute among A = the “best” attribute among attributesattributes having a range of values v1, v2, …, vk having a range of values v1, v2, …, vk

Partition examples according to their value for A into sets S1, S2, …, SkPartition examples according to their value for A into sets S1, S2, …, Sk

Create a decision node N with attribute ACreate a decision node N with attribute A

for i = 1 to kfor i = 1 to k

beginbegin

Attach a branch B to node N with test VAttach a branch B to node N with test V ii

if Sif Sii has elements (is non-empty) has elements (is non-empty)

Attach B to LearnTree(SAttach B to LearnTree(S ii, , targetAttributetargetAttribute, , attributesattributes – {A}); – {A});

elseelse

Attach B to a leaf node with most common Attach B to a leaf node with most common targetAttributetargetAttribute

endend

return decision node Nreturn decision node N

endend

end end

This is how we construct a This is how we construct a decision treedecision tree

• This very simple pseudo-code basically This very simple pseudo-code basically implements the construction of a decision implements the construction of a decision tree, except for one key thing that is tree, except for one key thing that is abstracted away, this is …abstracted away, this is …

• Key step in the algorithm: choosing the Key step in the algorithm: choosing the “best” attribute to classify on“best” attribute to classify on

• One algorithm for doing this is ID3 (used in One algorithm for doing this is ID3 (used in Black and White)Black and White)– We’ll get to the algorithm in a bitWe’ll get to the algorithm in a bit

This is how we construct a This is how we construct a decision tree – pseudo-code decision tree – pseudo-code

walkthroughwalkthrough• First, LearnTree is called with all First, LearnTree is called with all

examples, the targetAttribute, and all examples, the targetAttribute, and all attributes to classify onattributes to classify on

• It chooses the “best” (we’ll get to It chooses the “best” (we’ll get to that) attribute to split on, creates a that) attribute to split on, creates a decision node for it, then recursively decision node for it, then recursively calls LearnTree for each partition of calls LearnTree for each partition of the examplesthe examples


walkthroughwalkthrough• Recursion stops when:Recursion stops when:

– All examples have the same valueAll examples have the same value– There are no more attributesThere are no more attributes– There are no more examplesThere are no more examples

• The first two need some explanation, The first two need some explanation, the third one is trivial – all examples the third one is trivial – all examples have been classifiedhave been classified


walkthroughwalkthrough• Recursion stops when all examples Recursion stops when all examples

have the same value, when does this have the same value, when does this happen?happen?– When ancestor attributes and When ancestor attributes and

corresponding branch values, as well as corresponding branch values, as well as the target attribute and value, are the the target attribute and value, are the same across examplessame across examples


walkthroughwalkthrough• Recursion stops when there are no more Recursion stops when there are no more

attributesattributes– This happens when training set is inconsistent, This happens when training set is inconsistent,

e. g. there are 2 or more examples having the e. g. there are 2 or more examples having the same values for all but the target attributesame values for all but the target attribute

– The way our pseudo-code is written, it guesses The way our pseudo-code is written, it guesses when this happens, it picks the most popular when this happens, it picks the most popular target attribute valuetarget attribute value

– This is a decision left up to the implementerThis is a decision left up to the implementer– This is a weakness of the algorithmThis is a weakness of the algorithm

• It doesn’t handle “noise” in its training set wellIt doesn’t handle “noise” in its training set well

This is how we construct a This is how we construct a decision tree – pseudo-code decision tree – pseudo-code walkthroughwalkthrough• Let’s watch the algorithm in action …Let’s watch the algorithm in action …

• http://www.cs.ualberta.ca/~aixplore/lhttp://www.cs.ualberta.ca/~aixplore/learning/DecisionTrees/InterArticle/2-earning/DecisionTrees/InterArticle/2-DecisionTree.htmlDecisionTree.html

http://www.cs.ualberta.ca/~aixplore/learning/DecisionTrees/InterArticle/2-DecisionTree.html



ID3 algorithmID3 algorithm

• Picks the best attribute to classify on Picks the best attribute to classify on in a call of LearnTree, does so by in a call of LearnTree, does so by quantifying how useful an attribute quantifying how useful an attribute will be w/respect to the remaining will be w/respect to the remaining examplesexamples

• How? Using Shannon’s Information How? Using Shannon’s Information theory, pick the attribute that favors theory, pick the attribute that favors the best reduction in entropythe best reduction in entropy

ID3 algorithm – Shannon’s ID3 algorithm – Shannon’s Information TheoryInformation Theory

• Choose an attribute that favors the best Choose an attribute that favors the best reduction in entropyreduction in entropy

• Entropy quantifies the variation in a set of Entropy quantifies the variation in a set of examples with respect to the target examples with respect to the target attribute valuesattribute values

• A set of ex’s with mostly the same A set of ex’s with mostly the same targetAttr value has very low entropy targetAttr value has very low entropy (that’s good)(that’s good)

• A set of ex’s with many varying targetAttr A set of ex’s with many varying targetAttr values will have high entropy (bad)values will have high entropy (bad)

• Ready? Here come some equations …Ready? Here come some equations …

ID3: Shannon’s Information ID3: Shannon’s Information TheoryTheory

• In the following, S is the set of examples, In the following, S is the set of examples, Si is a subset of S with value Vi under the Si is a subset of S with value Vi under the target Attributetarget Attribute

)||

||(log

||

||)( 2

1 S

S

S

SSEntropy i

k

i

i


• Expected entropy of candidate attribute A Expected entropy of candidate attribute A is weighted sum of subsetis weighted sum of subset

• In the following, k is the size of range of In the following, k is the size of range of attribute A:attribute A:

)(||

||

1i

k

i

i SEntropyS

S


• What we really want is to maximize What we really want is to maximize information gain, defined:information gain, defined:

k

ii

i SEntropyS

SSEntropy

1

)(||

||)(


• Entropy of the commute time example:Entropy of the commute time example:

41956.1)13

7(log

13

7)

13

2(log

13

2)

13

4(log

13

4222

The thirteens are because there are thirteen examples. The fours, twos, and sevens come from how many short, medium, and long commutes there are, respectively.


AttributeAttribute Expected EntropyExpected Entropy Info GainInfo Gain

HourHour 0.651100.65110 0.7684490.768449

WeatherWeather 1.288841.28884 0.1307190.130719

AccidentAccident 0.923070.92307 0.4964790.496479

StallStall 1.170711.17071 0.2488420.248842

ID3: DrawbacksID3: Drawbacks

• Does not guarantee the smallest possible Does not guarantee the smallest possible decision treedecision tree– Selects classifying attribute based on best Selects classifying attribute based on best

expected information gain, not always rightexpected information gain, not always right• Not very good with continuous values, Not very good with continuous values,

best with symbolic databest with symbolic data– When given lots of distinct continuous values, When given lots of distinct continuous values,

ID3 will create very “bushy” trees – 1 or 2 ID3 will create very “bushy” trees – 1 or 2 levels deep, lots and lots of leaveslevels deep, lots and lots of leaves

– We can make this less serious, but it’s still a We can make this less serious, but it’s still a drawbackdrawback

Decision Trees in gamesDecision Trees in games

• First successful use of a decision tree was in First successful use of a decision tree was in “Black and White” (Lionhead “Black and White” (Lionhead studios, 2001)studios, 2001)

• http://www.gameai.com/blackandwhite.htmlhttp://www.gameai.com/blackandwhite.html• ““In Black & White you can be the god you In Black & White you can be the god you

want to be. Will you rule with a fair hand, want to be. Will you rule with a fair hand, making life better for your people? Or will you making life better for your people? Or will you be evil and scare them into prayer and be evil and scare them into prayer and submission? No one can tell you which way to submission? No one can tell you which way to be. You, as a god, can play the game any way be. You, as a god, can play the game any way you choose.”you choose.”

Decision Trees in gamesDecision Trees in games• ““And as a god, you get to own a Creature. Chosen by you And as a god, you get to own a Creature. Chosen by you

from magical, special animals, your Creature will copy you, from magical, special animals, your Creature will copy you, you will teach him and he will learn by himself. He will you will teach him and he will learn by himself. He will grow, ultimately to 30 metres, and can do anything you grow, ultimately to 30 metres, and can do anything you can do in the game. Your Creature can help the people or can do in the game. Your Creature can help the people or can kill and eat them. He can cast Miracles to bring rain to can kill and eat them. He can cast Miracles to bring rain to their crops or he can drown them in thetheir crops or he can drown them in the sea. Your sea. Your Creature is your physical manifestation in the world of Creature is your physical manifestation in the world of Eden, He isEden, He iswhatever you want him to be. whatever you want him to be. ......And the game also boasts a new level of artificial And the game also boasts a new level of artificial intelligence. Your Creature is almost a living, breathing intelligence. Your Creature is almost a living, breathing thing. He learns, remembers and makes connections. His thing. He learns, remembers and makes connections. His huge range of abilities and decisions is born of a ground-huge range of abilities and decisions is born of a ground-breakingly powerful and complex AI system.”breakingly powerful and complex AI system.”


• So you teach your creature by giving So you teach your creature by giving it feedback – it learns to perform it feedback – it learns to perform actions that get it the highest actions that get it the highest feedbackfeedback

• Problem: feedback is a continuous Problem: feedback is a continuous variablevariable

• We have to make it discreteWe have to make it discrete

• We do so using K-means clusteringWe do so using K-means clustering


• In K-means clustering, we find out how In K-means clustering, we find out how many clusters we want to create, then use many clusters we want to create, then use an algorithm to successively associate or an algorithm to successively associate or dissociate instances with clusters until dissociate instances with clusters until associations stabilize around k clustersassociations stabilize around k clusters

• The author’s reference for this is from a The author’s reference for this is from a computer vision textbookcomputer vision textbook– I wasn’t about to go buy itI wasn’t about to go buy it

• Not important to know clustering Not important to know clustering algorithmalgorithm


• Example Example from from B&W: B&W: should should your your creature creature attack a attack a towntown

• ExampleExample

ss::

ExampleExample AllegianceAllegiance DefenseDefense TribeTribe FeedbackFeedback

D1D1 FriendlyFriendly WeakWeak CelticCeltic -1.0-1.0

D2D2 EnemyEnemy WeakWeak CelticCeltic 0.40.4

D3D3 FriendlyFriendly StrongStrong NorseNorse -1.0-1.0

D4D4 EnemyEnemy StrongStrong NorseNorse -0.2-0.2

D5D5 FriendlyFriendly WeakWeak GreekGreek -1.0-1.0

D6D6 EnemyEnemy MediumMedium GreekGreek 0.20.2

D7D7 EnemyEnemy StrongStrong GreekGreek -0.4-0.4

D8D8 EnemyEnemy MediumMedium AztecAztec 0.00.0

D9D9 FriendlyFriendly WeakWeak AztecAztec -1.0-1.0


• If we ask for 4 clusters, K-means If we ask for 4 clusters, K-means clustering will create clusters around clustering will create clusters around -1, 0.4, 0.1, -0.3. The memberships -1, 0.4, 0.1, -0.3. The memberships in these clusters will be {D1, D3, D5, in these clusters will be {D1, D3, D5, D9}, {D2}, {D6, D8}, {D4, D7} D9}, {D2}, {D6, D8}, {D4, D7} respectively.respectively.

• The tree ID3 will create using these The tree ID3 will create using these examples and clusters:examples and clusters:


Allegiance

Friendly Enemy

-1.0 Defense

Weak Medium Strong

0.4 0.1 -0.3


• So in this case, the tree the creature So in this case, the tree the creature learned can be reduced to a nice learned can be reduced to a nice compact logical expression:compact logical expression:

• ((Allegiance = Enemy) AND (Defense ((Allegiance = Enemy) AND (Defense = weak)) OR ((Allegiance = Enemy) = weak)) OR ((Allegiance = Enemy) AND (Defense = Medium))AND (Defense = Medium))

• This happens sometimesThis happens sometimes• Makes it easier and more efficient to Makes it easier and more efficient to

applyapply

An Extension to ID3 to better An Extension to ID3 to better handle continuous valueshandle continuous values

• Seems simple, use an inequality, right?Seems simple, use an inequality, right?

• Not that simple – need to pick cut pointsNot that simple – need to pick cut points

• Cut points are the boundaries we create Cut points are the boundaries we create for our inequalities, where do they go?for our inequalities, where do they go?

• Key insight: optimal cut points must Key insight: optimal cut points must always reside at boundary pointsalways reside at boundary points

• Okay, so what are boundary points?Okay, so what are boundary points?

An Extension to ID3 to better An Extension to ID3 to better handle continuous valueshandle continuous values

• If we sort the list of examples according to If we sort the list of examples according to their values of the candidate attribute, a their values of the candidate attribute, a boundary point is a value in this list between boundary point is a value in this list between 2 adjacent instances having different 2 adjacent instances having different attribute values of the target attribute.attribute values of the target attribute.

• In the worst case, the number of boundary In the worst case, the number of boundary points is about equal to the number of points is about equal to the number of instancesinstances– This happens if the target attribute oscillates back This happens if the target attribute oscillates back

and forth between good and badand forth between good and bad

Example software on CDExample software on CD

• Show an example made using the Show an example made using the software on the CDsoftware on the CD

ConclusionsConclusions

• Decision Trees are an elegant way of Decision Trees are an elegant way of learning – it is easy to expose their logic learning – it is easy to expose their logic and understand what it has learnedand understand what it has learned

• Decision Trees are not always the best Decision Trees are not always the best way to learn – they have some way to learn – they have some weaknessesweaknesses

• But it also has its own set of strengthsBut it also has its own set of strengths


• Decision Trees work best for Decision Trees work best for symbolic, discrete valuessymbolic, discrete values

• Can be extended to work with Can be extended to work with continuous valuescontinuous values

• B&W had to do some clustering of B&W had to do some clustering of feedback values to use decision treesfeedback values to use decision trees


• Up to now, the only use of Decision Trees in Up to now, the only use of Decision Trees in games has been in B&Wgames has been in B&W

• What are they good for?What are they good for?– User modeling -- teaching the computer how to react to User modeling -- teaching the computer how to react to

the player, enhances replayabilitythe player, enhances replayability– Can be used to make bots that are the player’s allies Can be used to make bots that are the player’s allies

more effective as in B&Wmore effective as in B&W– Could also make enemies more intelligent – the player Could also make enemies more intelligent – the player

would be forced to come up with new strategieswould be forced to come up with new strategies

• How else can they be used?How else can they be used?– This is relatively unexplored territory people – if you This is relatively unexplored territory people – if you

think you have a great idea, go for itthink you have a great idea, go for it

decision tree learning

Documents

decision tree learningbrought

decision trees

decision treehow

decision treewhat

decision treethis

tree nodebeginif

set of attributes

training set of examples