a novel local patch framework for fixing supervised learning models yilei wang 1, bingzheng wei 2,...
Post on 04-Jan-2016
213 Views
Preview:
TRANSCRIPT
A Novel Local Patch Framework for Fixing Supervised Learning Models
Yilei Wang1, Bingzheng Wei2, Jun Yan2, Yang Hu2, Zhi-Hong Deng1, Zheng Chen2
1Peking University2Microsoft Research Asia
Outline Motivation & Background Problem Definition & Algorithm Overview Algorithm Details Experiments - Classification Experiments - Search Ranking Conclusion
Motivation & Background Supervised Learning:
Machine Learning task of inferring a function from labeled training data
Prediction Error: No matter how strong a learning model is, it will
suffer from prediction errors. Noise in training data, dynamically changing
data distribution, weakness of learner Feedback from User:
Good signal for learning models to find the limitation and then improve accordingly
Learning to Fix Errors from Failure Cases Automatically fix model prediction errors from
failure cases in feedback data. Input:
A well trained supervised model (we name it as Mother Model)
A collection of failure cases in feedback dataset. Output:
Learning to automatically fix the model bugs from failure cases
Previous Works Model Retraining Model Aggregation Incremental Learning
Local Patching: from Global to Local Learning models are
generally optimized globally Introducing new prediction
errors when fixing the old ones
Our key idea: learning to fix the model locally using patches
New Error
New Error
Problem Definition Our proposed Local Patch Framework(LPF) aims to
learn a new model
: the original mother model : Patch model : Gaussian distribution defined by a centroid
and a range
0 1 2 3 4 5 6 7 8 9 100
0.20.40.60.8
11.2
𝐾 𝑖 (𝑥 )=exp [− 1
2𝜎 𝑖2‖𝑥− 𝑧𝑖‖
2]
Algorithm Overview Failure Case Collection Learning Patch Regions/Failure Case
Clustering Clustering Failure Cases into N groups through
subspace learning, compute the centroid and range for every group, then define our patches
Learning Patch Model Learn a patch model using only the data
samples that sufficiently close to the patch centroid
Learning Patch Region – Key Challenge Failure cases may distribute diffusely
Small N = large patch range → many success cases will be patched
Big N = small patch range → high computational complexity
How to make trade-offs ?
Success Case
Solution: Clustered Metric Learning Our solution to diffusion: Metric Learning
Learn a distance metric, i.e. subspace, for failure cases, such that the similar failure cases will aggregate, and keep distant from the success cases.
(Red circle = failure cases; blue circle = success cases)
Key idea of the patch model learning• (Left): The cases in original data space.• (Middle): The cases mapped to the learned subspace.• (Right): Repair the failure cases using a single patch.
Metric Learning Conditional distribution over
Ideal distribution
Learn to satisfy
Which is equivalent to maximize
Clustered Metric Learning Algorithm:
1. Initialize each failure case with a random group 2. Repeat the following steps:
a) For the given clusters, proceeds metric learning step b) Update the centroids of the groups, and re-assign the
failure cases to its closest centroid.
Local Patch Region: For each cluster i, we define a corresponding
patch with as its centroid , and as its variance Gaussian weight:
Learning Patch Model Objective:
Where are the parameters, are the labels Update parameter:
For / , we have
Notice: dependent on the specific patch model
Experiments - Classification Dataset
Randomly select 3 UCI subset Spambase, Waveform, Optical Digit Recognition Convert to binary classification dataset ~5000 instances in each dataset Split to: 60% - training, 20% - feedback, 20% - test
Baseline Algorithm SVM Logistic Regression SVM - retrained with training + feedback data Logistic Regression - retrained with training + feedback
data SVM – Incremental Learning Logistic Regression - Incremental Learning
Classification Accuracy Classification accuracy on feedback dataset
Classification accuracy on test dataset
SVM SVM+LPF LR LR+LPF
Spam 0.8230 0.8838 0.9055 0.9283
Wave 0.7270 0.8670 0.8600 0.8850
Optdigit 0.9066 0.9724 0.9306 0.9689
SVMSVM-
RetainSVM-IL
SVM+LPF
LR LR-Retain LR-IL LR-LPF
Spam 0.8196 0.8348 0.8478 0.8587 0.9152 0.9174 0.9185 0.9217
Wave 0.7530 0.7780 0.7850 0.8620 0.8460 0.8600 0.8770 0.8800
Optdigit 0.9101 0.9128 0.9217 0.9635 0.9332 0.9368 0.9388 0.9413
Experiments – Search Ranking Dataset
Data from a commonly used commercial search engine
~14, 126 <q, d> pairs With 5 grades label
Metrics NDCG@K {1,3,5}
Baseline Algorithm GBDT GBDT + IL
Experiment Results – Ranking
GBRT IL GBRT + LPF
nDCG@1 0.9115 0.9122 0.9422
nDCG@3 0.8837 0.8910 0.9149
nDCG@5 0.8790 0.8873 0.9090
Conclusion We proposed
The local model fixing problem A novel patch framework fox fixing the failure
cases in feedback dataset in local view The experiment results demonstrate the
effectiveness of our proposed Local Patch Framework
top related