![Page 1: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/1.jpg)
Fuzzy Interpretation of Discretized Intervals
Author: Dr. Xindong Wu
IEEE TRANSACTIONS ON FUZZY SYSTEMVOL. 7, NO. 6, DECEMBER 1999
Presented by: Gong Chen
![Page 2: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/2.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusions• Answers for Final Exam
![Page 3: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/3.jpg)
Concepts Review
• Induction: Generalize rules from training data• Deduction: Apply generalized rules to testing data• Three possible results of Deduction:
– Single match– No match– Multiple match
![Page 4: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/4.jpg)
Concepts Review
• Discretization of Continuous domains
– Continuous numerical domains can be discretized into intervals
– The discretized intervals can be treated as nominal values
![Page 5: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/5.jpg)
Concepts Review
• Using Information Gain Heuristic for Discretization:
(employed by HCV)– x = (xi + xi+1)/2 for (i = 1, …, n-1)
– x is a possible cut point if xi and xi+1 are of different classes
– Use IGH to find best x– Recursively split on left and right– Stop recursive splitting when some criteria is met
![Page 6: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/6.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam
![Page 7: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/7.jpg)
Overview
Training Data
Discretizaion induction rules
Testing Data Deduction
No match
Single match
Multiple match
Fuzzy Borders
![Page 8: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/8.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Several Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam
![Page 9: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/9.jpg)
Problem
• Discretization of continuous domains does not always fit accurate interpretation!
• Recall, using Info Gain, --a kind of heuristic measure applying in training data, cannot accurately fit “data in real world”.
• Example
![Page 10: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/10.jpg)
Problem• Heuristic 1(e.g. Information Gain)
• Heuristic 2(e.g. Gain Ratio)
18 35
young
49
old
49.49
18 35
young
50
old
49.49
![Page 11: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/11.jpg)
Problem
• Suppose after induction, we just get one rule:
• If (age=old) then Class=MORE_EXPERIENCE
According to Heuristic 2,
Instance(age=49.49) No match!
![Page 12: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/12.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam
![Page 13: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/13.jpg)
Solution
• More safe way to describe age=49.49 is to say: To some degree, it is young; To some degree, it is old.
• Rather than using one assertion that definitely tells it is young or old.
• Thus, to some degree, it can get its rule and classification result other than no match.– No matchSingle match or multiple match with some
degree
• This is so-called fuzzy match!
![Page 14: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/14.jpg)
Solution
• “Fuzziness is a type of deterministic uncertainty. It describes the event class ambiguity.”
• “Fuzziness works when there are the outcomes that belong to several event classes at the same time but to different degrees.”
• “Fuzziness measures the degree to which an event occurs.”
– Jim Bezdek, Didier Dubois, Bart osko, Henri Prade
![Page 15: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/15.jpg)
Solution
• “to some degree”?– Membership function describes “degree”– Membership function tells you to what degree, an eve
nt belongs to one class.– Membership function calculates this degree.
• Three widely used membership functions are employed by HCV.– Linear – Polynomial– Arctan
![Page 16: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/16.jpg)
Solution
• Linear membership function
xleft xright
l
sl
k = 1/2sl; a = -kxleft + ½; b = kxright + ½
linleft(x) = kx + a
linright(x) = -kx + b
lin(x) = MAX(0, MIN(1,linleft(x),linright(x)))
S: is user-specifiedparameter.
e.g.0.1 indicates the interval spreads out into adjacent intervals for 10% of its original length at each end.
![Page 17: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/17.jpg)
Solution
• Polynomial Membership Function—using more smooth curve function instead of linear function.
• Arctan Membership Function
• Experimental results shows that no significant difference between three kinds of functions—so Polynomial Membership Function is chosen.
![Page 18: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/18.jpg)
Solution
polyside(x) = asidex3 + bsidex2 + csidex + dside
aside = 1/(4(ls)3)bside = -3asidexside side {left,right}cside = 3aside(xside
2 - (ls)2)dside = -a(xside
3 -3xside(ls)2 + 2(ls)3)
polyleft(x), if xleft -ls x xleft + lspoly(x) = polyright(x), if xright -ls x xright +ls
1, if xleft +ls x xright -ls0, otherwise
To what degree, x belongs to one
interval
![Page 19: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/19.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems
![Page 20: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/20.jpg)
Related Techniques
– No match• Largest Class
– Assign all no match examples to the largest class, the default class
– Multiple match• Largest Rule
– Assign examples to the rules which cover the largest number of examples
• Estimate of Probability– Fuzzy borders can bring multiple match--conflicts, so
hybrid method is desired for the whole progress
![Page 21: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/21.jpg)
Related Techniques
• Estimate of Probability# of e.g.s in training se
t covered by conj
The probability of e belongs to clas
s ci Conj1 and Conj2 are two rules supporting e belongs to Ci
![Page 22: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/22.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems
![Page 23: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/23.jpg)
Algorithms Design in HCV
• HCV(Large)– No match: Largest Class– Multiple match: Largest Rule
• HCV(Fuzzy)– No match: Fuzzy Match – Multiple match: Fuzzy Match
• HCV(Hybrid)– No match: Fuzzy Match– Multiple match: Estimate of Probability
![Page 24: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/24.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems
![Page 25: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/25.jpg)
Experimental Results
• Data:– 17 datasets from UCI Machine Learning Repository– Why select these:
1) Numerical data
2) Situations where no rules clearly apply
• Test conditions– 68 parameters in HCV are all default except deductio
n strategy– Parameters for C4.5 and NewID are adopted as the o
ne recommended by respective inventors
![Page 26: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/26.jpg)
Experimental ResultsDataset HCV HCV (large) HCV C4.5 C4.5 NewID
(hybrid) (fuzzy) (R 8) (R 5)
Anneal 98.00% 93.00% 93.00% 95.00% 93.00% 81.00%
Bupa 57.60% 55.90% 55.90% 71.20% 61.00% 73.00%
Cleveland 2 78.00% 68.10% 73.60% 71.40% 76.90% 67.00%
Cleveland 5 54.90% 56.00% 52.70% 51.60% 56.00% 47.30%
CRX 82.50% 72.50% 82.00% 83.00% 80.00% 79.00%
Glass (w/out ID) 72.30% 60.00% 60.00% 71.50% 64.60% 66.00%
Hungarian 2 86.30% 85.00% 85.00% 81.20% 80.00% 78.00%
Hypothroid 97.80% 86.30% 96.30% 99.40% 99.40% 92.00%
Imports 85 62.70% 59.30% 61.00% 61.00% 67.80% 61.00%
Ionosphere 88.00% 81.20% 81.20% 86.30% 85.50% 82.00%
Labor Neg 76.50% 76.50% 76.50% 82.40% 82.40% 65.00%
Pima 73.90% 69.10% 69.10% 73.50% 75.50% 73.00%
Swiss 2 96.90% 96.90% 96.90% 96.90% 96.90% 97.00%
Swiss 5 28.10% 25.00% 28.10% 40.60% 31.20% 22.00%
Va 2 78.90% 78.90% 78.90% 77.50% 70.40% 77.00%
Va 5 28.20% 25.40% 29.60% 31.00% 26.80% 20.00%
Wine 90.40% 76.90% 76.90% 90.40% 90.00% 90.40%
![Page 27: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/27.jpg)
Experimental Results
• Predictive accuracy– HCV (hybrid) outperforms others in 9 datasets– HCV (large) 3 datasets– HCV (fuzzy) 2 datasets– C4.5 (R 8) 7 datasets– C4.5 (R 5) 6 datasets– NewID 3 datasets
– HCV (hybrid)clearly and significantly outperforms other interpretation techniques (in HCV) for datasets with numerical data in “no match” and “multiple match” cases.
• C4.5 and NewID are included for reference, not for extensive comparison.
![Page 28: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/28.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems
![Page 29: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/29.jpg)
Conclusion• Fuzziness is strongly domain dependent, HCV al
lows users to specify their own intervals and fuzzy functions.– An important direction to take with specific domains
• Fuzzy Borders design combined with probability estimation achieve better results in term of predicative accuracy.– Applicable to other machine learning and data mining
algorithms
![Page 30: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/30.jpg)
Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems
![Page 31: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/31.jpg)
Answers for Final Exam Problems
• Q1:When doing deduction on real world data, what are the three possible cases for each test example? – Single match– No match– Multiple match
• Q2: Of the three cases during deduction, which ones do the HCV hybrid interpretation algorithm use fuzzy borders to classify? – No match
• Q3: In the Hybrid interpretation algorithm used in HCV,– when are sharp borders set up?
• “Sharp borders are set up as usual during induction”– when are fuzzy border defined?
• In deduction, “only in the no match case, fuzzy borders are set up in order to find a rule which is closest to the test example in question”
![Page 32: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5a5503460f94a3a813/html5/thumbnails/32.jpg)
Thank You!