ordinal classification
DESCRIPTION
Ordinal Classification. Rob Potharst Erasmus University Rotterdam. What is ordinal classification?. Company: catering service Swift. total liabilities / total assets1 net income / net worth3 …… managers’work experience5 market niche-position3 … . - PowerPoint PPT PresentationTRANSCRIPT
SIKS-Advanced Course on Computational Intelligence, October 2001
1
Ordinal Classification
Rob Potharst
Erasmus University Rotterdam
SIKS-Advanced Course on Computational Intelligence, October 2001
2
What is ordinal classification?
SIKS-Advanced Course on Computational Intelligence, October 2001
3
Company: catering service Swift
• total liabilities / total assets 1
• net income / net worth 3
• … …
• managers’work experience 5
• market niche-position 3
• … ...
bankruptcy risk + (acceptable)
SIKS-Advanced Course on Computational Intelligence, October 2001
4
2 2 2 2 1 3 5 3 5 4 2 4 +4 5 2 3 3 3 5 4 5 5 4 5 +3 5 1 1 2 2 5 3 5 5 3 5 +2 3 2 1 2 4 5 2 5 4 3 4 +3 4 3 2 2 2 5 3 5 5 3 5 +3 5 3 3 3 2 5 3 4 4 3 4 +3 5 2 3 4 4 5 4 4 5 3 5 +1 1 4 1 2 3 5 2 4 4 1 4 +3 4 3 3 2 4 4 2 4 3 1 3 +3 4 2 1 2 2 4 2 4 4 1 4 +2 5 1 1 3 4 4 3 4 4 3 4 +3 3 4 4 3 4 4 2 4 4 1 3 +1 1 2 1 1 3 4 2 4 4 1 4 +2 1 1 1 4 3 4 2 4 4 3 3 +2 3 2 1 1 2 4 4 4 4 2 5 +2 3 4 3 1 5 4 2 4 3 2 3 +2 2 2 1 1 4 4 4 4 4 2 4 +2 1 3 1 1 3 5 2 4 2 1 3 +2 1 2 1 1 3 4 2 4 4 2 4 +2 1 2 1 1 5 4 2 4 4 2 4 +
2 1 1 1 1 3 2 2 4 4 2 3 ?1 1 3 1 2 1 3 4 4 4 3 4 ?2 1 2 1 1 2 4 3 3 2 1 2 ?1 1 1 1 1 1 3 2 4 4 2 3 ?2 2 2 1 1 3 3 2 4 4 2 3 ?2 2 1 1 1 3 2 2 4 4 2 3 ?2 1 2 1 1 3 2 2 4 4 2 4 ?1 1 4 1 3 1 2 2 3 3 1 2 ?3 4 4 3 2 3 3 4 4 4 3 4 ?3 1 3 3 1 2 2 3 4 4 2 3 ?
1 1 2 1 1 1 3 3 4 4 2 3 -3 5 2 1 1 1 3 2 3 4 1 3 -2 2 1 1 1 1 3 3 3 4 3 4 -2 1 1 1 1 1 2 2 3 4 3 4 -1 1 2 1 1 1 3 1 4 3 1 2 -1 1 3 1 2 1 2 1 3 3 2 3 -1 1 1 1 1 1 2 2 4 4 2 3 -1 1 3 1 1 1 1 1 4 3 1 3 -2 1 1 1 1 1 1 1 2 1 1 2 -
Data set: 39 companies
20: + (acceptable)
9: - (unacceptable)
10: ? (uncertain)
from: Greco, Matarazzo, Slowinski (1996)
SIKS-Advanced Course on Computational Intelligence, October 2001
5
Possible classifier
if man.exp. > 4, then class = ‘+’if man.exp. < 4 and net.inc/net.worth = 1, then class = ‘-’
all other cases: class = ‘?’
• when applied to dataset of 39: 3 mistakes
SIKS-Advanced Course on Computational Intelligence, October 2001
6
What is classification?
The act of assigning objects to classes, using the values of relevant features of those objects
So we need:
• objects (individuals, cases), all belonging to some domain
• classes, number and kind prescribed
• features (attributes, variables)
• a classifier (classification function) that assigns a class to any object
SIKS-Advanced Course on Computational Intelligence, October 2001
7
Building classifiers
= induction from a training set of examples:
data without noise
data with noise
SIKS-Advanced Course on Computational Intelligence, October 2001
8
induction-methods (especially from AI world)
• decision trees: C4.5, CART (from 1984 on)
• neural networks: backpropagation (from1986, with false start from 1974)
• rule induction algorithms: CN2 (1989)
• newer methods: rough sets, fuzzy methods, decision lists, pattern based methods, etc.
SIKS-Advanced Course on Computational Intelligence, October 2001
9
Decision tree: example
+
?
?-
man.exp. < 3
gen.exp./sales = 1
tot.liab/cashfl = 1
y n
y n
y nclassifies 37 out of 39
ex’s correctly
SIKS-Advanced Course on Computational Intelligence, October 2001
10
Ordinal classification
• features have ordinal scale
• classes have ordinal scale
• the ordering must be preserved!
SIKS-Advanced Course on Computational Intelligence, October 2001
11
Preservation of orderingF1 F2 F3 F4
comp A 1 2 2 3
comp B 2 4 3 3
A classifier is monotone iff: if A < B, then also class(A) < class(B)
SIKS-Advanced Course on Computational Intelligence, October 2001
12
Relevance of ordinal classification
• selection-problems
• credit worthiness
• pricing (e.g. real estate)
• etc.
SIKS-Advanced Course on Computational Intelligence, October 2001
13
Induction of monotone decision trees
• using C4.5 or CART: non-monotone trees
• needed: an algorithm that guarantees to generate only monotone trees
• Makino, Ibaraki, etc. (1996),
• only for 2-class problems, cumbersome
• Potharst & Bioch (2000)
• for k-class problems, fast and efficient
SIKS-Advanced Course on Computational Intelligence, October 2001
14
The algorithmtry to split subset T:
1) update D for subset T
2) if D T is homogeneous then
assign class label to T and make T a leaf definitively
else
split T into two non-empty subsets TL and TR using entropy
try to split subset TL
try to split subset TR
SIKS-Advanced Course on Computational Intelligence, October 2001
15
The update rule
update D for T:
1) if min(T) is not in D then
- add min(T) to D
- class ( min(T) ) = the maximal value allowed, given D
2) if max(T) is not in D then
- add max(T) to D
- class ( max(T) ) = the minimal value allowed, given D
SIKS-Advanced Course on Computational Intelligence, October 2001
16
The minimal value allowed given D
• For each x X \ D it is possible to calculate the minimal and the maximal class value possible, given D.
• Let x be the downset { y X | y x } of x
• Let y* be an element in D x with highest class value
• Then the minimal class value possible for x is class (y*).
SIKS-Advanced Course on Computational Intelligence, October 2001
17
The maximal value allowed given D
• Let x be the upset { y X | y x } of x
• Let y* be an element in D x with lowest class value
• Then the maximal class value possible for x is class (y*)
• if there is no such element then take the maximal class value (or the minimal, in the former case)
SIKS-Advanced Course on Computational Intelligence, October 2001
18
Example
0 0 1 0
0 0 2 1
1 1 2 2
2 0 2 2
2 1 2 3
attr. 1: values 0,1,2
attr. 2: values 0,1,2
attr. 3: values 0,1,2
classes: 0, 1, 2, 3
D: 0 0 0
1 0 0
0 1 0
….
2 2 2
X:
Let us calculate the min and max poss value for x = 022:
minvalue: y* = 002, so the min-value = 1
maxvalue: there is no y*, so the max-value = 3
SIKS-Advanced Course on Computational Intelligence, October 2001
19
Tracing the algorithmTry to split subset T = X:
update D for X:
min(X) = 000 is not in D; maxvalue of 000 is 0
add 000 with class 0 to D
max(X) = 222 is not in D; minvalue of 222 is 3
add 222 with class 3 to D
D X is not homogeneous
so consider all the possible splits:
A1 0; A1 1; A2 0; A2 1; A3 0; A3 1
0 0 0 0
0 0 1 0
0 0 2 1
1 1 2 2
2 0 2 2
2 1 2 3
2 2 2 3
SIKS-Advanced Course on Computational Intelligence, October 2001
20
The entropy of each split
The split A1 0 splits X into TL = [000,022] and TR = [100,222]
0 0 0 0
0 0 1 0
0 0 2 1
D TR
1 1 2 2
2 0 2 2
2 1 2 3
2 2 2 3
Entropy = 1
D TL
Entropy = 0.92
Average entropy of this split = 3/7 x 0.92 + 4/7 x 1 = 0.97
SIKS-Advanced Course on Computational Intelligence, October 2001
21
Going on with the trace The split with lowest entropy is A1 0, so we go on with T = TL = [000,022]:
Try to split subset T = [000,022]:
update D for T:
min(T) = 000 is already in D
max(T) = 022 has minimum value 1, so it is added to D
0 0 0 0
0 0 1 0
0 0 2 1
0 2 2 1
1 1 2 2
2 0 2 2
2 1 2 3
2 2 2 3
D T is not homogeneous, so we go on to consider
the following splits: A2 0; A2 1; A3 0; A3 1
Lowest entropy
SIKS-Advanced Course on Computational Intelligence, October 2001
22
We now have the following tree:
A1 0
A3 1
? ?
?
SIKS-Advanced Course on Computational Intelligence, October 2001
23
Going on...The split A3 1 splits T into TL = [000,021] and TR = [002,022]
We go on with T = TL = [000,021]
Try to split subset T = [000,021]:
min(T) = 000 is already in D
max(T) = 021 has minimum value 0, so it is added to D
D T is homogeneous, so we stop and make T into a leaf with class value 0
Next, we go on with T = TR = [002,022], etc.
SIKS-Advanced Course on Computational Intelligence, October 2001
24
Finally...
A1 0
A3 1
0
32
12
A1 1
A2 0
SIKS-Advanced Course on Computational Intelligence, October 2001
25
A monotone tree for the Bankruptcy problem
• can be seen on p. 107 of the paper that was handed out with this course
• a tree with 6 leaves
• uses the same attributes as those that come up with an ordinal version of the rough set approach: see Viara Popova’s lecture
SIKS-Advanced Course on Computational Intelligence, October 2001
26
Conclusions and remaining problems
• We described an efficient algorithm for the induction of monotone decision trees, in case we have a monotone dataset
• We also have an algorithm to repair a non-monotone decision tree, but it makes the tree larger
• What if we have noise in the dataset?
• Is it possible to repair by pruning?