![Page 1: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/1.jpg)
A Fully Distributed Framework A Fully Distributed Framework for Cost-sensitive Data Miningfor Cost-sensitive Data Mining
Wei Fan, Haixun Wang, and Philip S. YuWei Fan, Haixun Wang, and Philip S. YuIBM T.J.Watson, Hawthorne, New YorkIBM T.J.Watson, Hawthorne, New York
Salvatore J. StolfoSalvatore J. StolfoColumbia University, New York City, New YorkColumbia University, New York City, New York
![Page 2: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/2.jpg)
Inductive LearningInductive Learning
TrainingData
Learner Classifier
($43.45,retail,10025,10040, ..., nonfraud)($246,70,weapon,10001,94583,...,fraud)
1. Decision trees2. Rules3. Naive Bayes...
Transaction {fraud,nonfraud}
TestData
($99.99,pharmacy,10013,10027,...,?)($1.00,gas,10040,00234,...,?)
Classifier Class Labels
nonfraudfraud
![Page 3: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/3.jpg)
![Page 4: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/4.jpg)
Distributed Data MiningDistributed Data Mining
ƒ data is inherently distributed across the network.many credit card authorization servers are distributed. Data are collected at each individual site.other examples include supermarket customer and transaction database, hotel reservations, travel agency and so on ...
ƒ In some situations, data cannot even be shared.many different banks have their data servers. They rather share the model but cannot share the data due to many reasons such as privacy, legal, and competitive reasons.
![Page 5: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/5.jpg)
Cost-sensitive ProblemsCost-sensitive Problems
ƒ Charity Donation:Solicit to people who will donate large amount of charity.Costs $0.68 to send a letter.A(x): donation amount.Only solicit if A(x) > 0.68, otherwise lose money.
ƒ Credit card fraud detection:Detect frauds with high transaction amount
$90 to challenge a potential fraudA(x): fraudulant transaction amount.Only challenge if A(x) > $90, otherwise lose money.
![Page 6: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/6.jpg)
Different Learning FrameworksDifferent Learning Frameworks
![Page 7: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/7.jpg)
Fully Distributed Framework Fully Distributed Framework (training)(training)
D1 D2D2K sites
ML1ML2 MLt
C1 C2Ck
generate
K models
![Page 8: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/8.jpg)
Fully-distributed Framework Fully-distributed Framework (predicting)(predicting)
DTest Set
C1 C2 Ck
Sent to k models
P1 P2 PkCompute k predictions
Combine
P
Combine to one prediction
![Page 9: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/9.jpg)
Cost-sensitive Decision MakingCost-sensitive Decision Making
ƒ Assume that records the benefit received by predicting an example of class to be an instance of class .
ƒ The expected benefit received to predict an example to be an instance of class (regardless of its true label) is
ƒ The optimal decision-making policy chooses the label that maximizes the expected benefit, i.e.,
ƒ When and is a
traditional accuracy-based problem.ƒ Total benefits
![Page 10: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/10.jpg)
Charity Donation ExampleCharity Donation Example
ƒ It costs $.68 to send a solicitation.ƒ Assume that is the best
estimate of the donation amount,
ƒ The cost-sensitive decision making will solicit an individual if and only if
![Page 11: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/11.jpg)
Credit Card Fraud Detection Credit Card Fraud Detection ExampleExample
ƒ It costs $90 to challenge a potential fraud
ƒ Assume that y(x) is the transaction amount
ƒ The cost-sensitive decision making policy will predict a transaction to be fraudulent if and only if
![Page 12: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/12.jpg)
Adult DatasetAdult Dataset
ƒ Downloaded from UCI database.ƒ Associate a benefit factor 2 to positives
and a benefit factor 1 to negatives
ƒ The decision to predict positive is
![Page 13: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/13.jpg)
Calculating probabilitiesCalculating probabilities
For decision trees, n is the number of examples in a node and k is the number of examples with class label , then the probability is more sophisticated methods
smoothing:early stopping, and early stopping plus smoothing
For rules, probability is calucated in the same way as decision trees
For naive Bayes, is the score for
class label , then
binning
![Page 14: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/14.jpg)
![Page 15: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/15.jpg)
Combining Technique-Combining Technique-AveragingAveraging
ƒ Each model computes an expected benefit for example over every class label
ƒ Combining individual expected benefit together
ƒ We choose the label with the highest combined expected benefit
![Page 16: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/16.jpg)
1. Decision threshold line2. Examples on the left are more profitable than those on the right3. "Evening effect": biases towards big fish.
Why accuracy is higher?Why accuracy is higher?
![Page 17: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/17.jpg)
Partially distributed combining Partially distributed combining techniquestechniques
ƒ Regression:Treat base classifiers' outputs as indepedent variables of regression and the true label as dependent variables.
ƒ Modify Meta-learning:Learning a classifier that maps the base classifiers' class label predictions to that the true class label.For cost-sensitive learning, the top level classifier output probability instead of just a label.
![Page 18: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/18.jpg)
Communication Overhead Communication Overhead SummarySummary
![Page 19: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/19.jpg)
ExperimentsExperiments
ƒ Decision Tree Learner: C4.5 version 8ƒ Dataset:
Donation Credit CardAdult
![Page 20: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/20.jpg)
Accuracy comparisionAccuracy comparision
![Page 21: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/21.jpg)
Accuracy comparisonAccuracy comparison
![Page 22: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/22.jpg)
Accuracy comparisonAccuracy comparison
![Page 23: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/23.jpg)
Detailed SpreadDetailed Spread
![Page 24: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/24.jpg)
Credit Card Fraud DatasetCredit Card Fraud Dataset
![Page 25: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/25.jpg)
Adult DatasetAdult Dataset
![Page 26: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/26.jpg)
Why accuracy is higher?Why accuracy is higher?
![Page 27: A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo](https://reader035.vdocuments.us/reader035/viewer/2022081518/551515a0550346a87d8b4cbc/html5/thumbnails/27.jpg)
Summary and Future WorkSummary and Future Work
ƒ Evaluated a wide range of combining techniques include variations of averaging, regression and meta-learning for scalable cost-sensitive (and cost-insensitive learning).
ƒ Averaging, although simple, has the highest accuracy.
ƒ Previously proposed approaches have significantly more overhead and only work well for tradtional accuracy-based problems.
ƒ Future work: ensemble pruning and performance estimation