learning similarity functions from qualitative feedback
DESCRIPTION
The performance of a case-based reasoning system often depends on the suitability of an underlying similarity (distance) measure, and specifying such a measure by hand can be very difficult. In this paper, we therefore develop a machine learning approach to similarity assessment. More precisely, we propose a method that learns how to combine given local similarity measures into a global one. As training information, the method merely assumes qualitative feedback in the form of similarity comparisons, revealing which of two candidate cases is more similar to a reference case. Experimental results, focusing on the ranking performance of this approach, are very promising and show that good models can be obtained with a reasonable amount of training information. See more at www.chengweiwei.comTRANSCRIPT
Learning Similarity Functions fromQualitative Feedback
Weiwei Cheng and Eyke HüllermeierUniversity of Marburg, Germany
Introduction
Proper definition of similarity (distance) measures is crucial for CBR systems.
The specification of local similarity measures, pertaining to individual properties (attributes) of a case, is often less difficult than their combination into a global measure.
Goal of this work:Using machine learning techniques to support elicitation of similarity measures (combination of local into global measures) on the basis of qualitative feedback.
7/31/20091/13 Weiwei Cheng & Eyke Hüllermeier
Problem Setting
Local-global principle: The global distance is an aggregation of local distances
For now, we focus on a linear model:
with (monotonicity).
… easy to incorporate background knowledge… amenable to efficient learning scheme… non-linear extension via kernelization
7/31/20092/13 Weiwei Cheng & Eyke Hüllermeier
Problem Setting cont.
Learning the weights from qualitative feedback:means “case a is more similar to b than to c”.
Given a query, a distance measure induces a linear order on cases:
Notice: Often the ordering of cases is more important than the distance itself it is sufficient to find a , such that
7/31/20093/13 Weiwei Cheng & Eyke Hüllermeier
The Learning Algorithm
Basic idea: From distance learning to classification
Extension 1: Incorporating monotonicity
Extension 2: Ensemble learning
Extension 3: Active learning
7/31/20094/13 Weiwei Cheng & Eyke Hüllermeier
From Distance Learning to Classification
7/31/20095/13 Weiwei Cheng & Eyke Hüllermeier
CASE BASE
(d-dim. vector)
Our model
requires that when a local distance increases, the global distance cannot decrease.
Our approach: (Noise-tolerant) Perceptron learning with a modified update rule:
Monotonicity
7/31/20097/13 Weiwei Cheng & Eyke Hüllermeier
The modified algorithm provably converges after a finite number of iterations.
Ensemble Learning
7/31/20096/13 Weiwei Cheng & Eyke Hüllermeier
Permutations of training
data
CoM of version space Bayes pointEnsemble of
perceptrons
hypothesis space
version space
Bayes point
committee
Goal:Reducing the feedback effort of the user by choosing the most informative training data.
Our approach (a variation of QBC):1. choose 2 most conflicting models2. generate 2 rankings with these 2 models3. get the first conflict pair of these rankings
Example:
Active Learning
7/31/20098/13 Weiwei Cheng & Eyke Hüllermeier
ranking 1: a b c d e
ranking 2: a b d e c
Experimental Setting
7/31/20099/13 Weiwei Cheng & Eyke Hüllermeier
Goal:Investigating the efficacy of our approach and the effectiveness of the extensions:
1. incorporating monotonicity2. ensemble learning3. active learning
Data sets
uni iris wine yeast nba
#features 6 4 13 24 15#cases 200 150 178 2465 3924
Quality Measures
7/31/2009Weiwei Cheng & Eyke Hüllermeier10/13
Kendall’s tau (a common rank correlation measure)
… defined by number of rank inversions (normalized to [-1,+1]):
Recall (a common retrieval measure) ... defined as number of predicted among true top-k cases (k=10):
Position error… defined by the position of true topmost case (minus 1):
7/31/200912 Weiwei Cheng & Eyke Hüllermeier
Extension to Nonlinear Models
7/31/200912/13 Weiwei Cheng & Eyke Hüllermeier
Actually, we only need linearity in the coefficients, not in the local distances. Therefore, some generalizations are easily possible, such as
More generally, with :
Extensions
7/31/200912/13 Weiwei Cheng & Eyke Hüllermeier
Special case of a kernel function leads to kernelization:
Nonlinear classification and sorting
(a, b, c)
b c
classifier
(b, c)
0.7
distance
Conclusions
7/31/2009Weiwei Cheng & Eyke Hüllermeier13/13
Learning to combine local distance measures into a global measure.
Only assuming qualitative feedback of the type “a is more similar to b than to c”.
Reduction of distance learning to classification.