tomer sagi and avigdor gal technion - israel institute of technology non-binary evaluation for...
TRANSCRIPT
![Page 1: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/1.jpg)
Tomer Sagi and Avigdor GalTechnion - Israel Institute of Technology
Non-binary Evaluation for Schema Matching
Presentation @ ER 2012October 2012, Florence Italy
![Page 2: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/2.jpg)
Presentation Outline
Background• Schema Matching
Schema Matching Evaluation• Current model: Set based Precision and Recall• Proposed Model: Similarity Spaces, a vector-space model• Non-Binary measures
Usage example:• Tuning schema matchers using Non-binary measure
2
![Page 3: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/3.jpg)
3
BackgroundSchema Matching
Schema matching is the task of providing correspondences between concepts describing the meaning of data
Schema matching is recognized to be a basic operation required by data integration and web-query interface integration
![Page 4: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/4.jpg)
BackgroundSchema Matching: Schemas
Schemas contain attributes
Each attributes may have a name, label, type, domain (allowed values), instances, etc.
Structural links and relationships are defined between attributes
4
First Name
Last Name
Gender
What are your favourite hobbies
Requested Password:
Password confirmation:
Yes
Register
Web-form Schema Small Business-Document Schema
With 5 concepts
![Page 5: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/5.jpg)
BackgroundSchema Matching: First Line Matchers
First line matchers (a.k.a similarity measures) compare two schemas, generating correspondences between them
Each correspondence is assigned a confidence value over [0,1] The results is often represented as a similarity matrix:
5
0.32
0.64
0.84
0.350.62
![Page 6: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/6.jpg)
BackgroundSchema Matching: Second Line Matchers
Second line matchers transform similarity matrices Filters transform a matrix by removing those values which do not
satisfy the constraint function. Examples: Threshold, MaxDelta.
6
Similarity Matrix
Transformed Similarity Matrix
![Page 7: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/7.jpg)
BackgroundSchema Matching: Second Line Matchers
Second line matchers transform similarity matrices. Filters transform a matrix by removing those values which do not
satisfy the constraint function. Examples: Threshold, MaxDelta. Decision makers transform a matrix by changing the values of some
correspondence to 1 and the rest to 0.
7
Similarity Matrix
Binary Similarity Matrix
![Page 8: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/8.jpg)
Schema matching systems employ various first and second line matchers; their results are composed, aggregated and combined.
Schema Matching SystemUncertain
Schema Matching System
BackgroundSchema Matching: Systems
8
String Matcher
Domain Matcher
Parent-Child Matcher
Instance Matcher
Aggregation
Filter
Decision maker
Schema Pair
Binary Similarity Matrix
Binary Similarity Matrix
1st Line Matchers 2nd Line Matchers
![Page 9: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/9.jpg)
Presentation Outline
Background• Schema Matching
Schema Matching Evaluation• Current model: Set based Precision and Recall• Proposed Model: Similarity Spaces, a vector-space model• Non-Binary measures
Usage example:• Tuning schema matchers using Non-binary measure
9
![Page 10: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/10.jpg)
Exact Match
Current evaluation model provides measures for evaluating a complete system using set-based measures
Major shortcoming: Evaluation of individual components (e.g. first line matchers) and non-binary results (uncertain schema matching systems) is undefined.
Schema Matching EvaluationCurrent Model
10
Schema Matching
System
String Matcher
Domain Matcher
Parent-Child Matcher
Instance Matcher
Aggregation
Filter
Decision maker
Schema Pair
Binary Similarity Matrix
True Positive (TP)
False Negative
(FN)
False Positive
(FP)
Precision = FPTP
TP
Recall =FNTP
TP
![Page 11: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/11.jpg)
Schema Matching Evaluation
Begin with a similarity matrix:
Taking each entry as an element in a vector transforms this matrix to a similarity vector:
(0.84,0.29,0.34,0.32,1.00,0.33,0.32,0.33,0.35,0.30,0.30,0.64)
11
Similarity Spaces: A Vector Space Model
S1
S2
1 cardNum 2 city 3 arrivalDay 4 checkInTime
1 clientNum 0.84 0.32 0.32 0.30
2 city 0.29 1.00 0.33 0.30
3 checkInDay 0.34 0.33 0.35 0.64
![Page 12: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/12.jpg)
Schema Matching Evaluation
We propose a Vector Space model for evaluation
Dimensions are possible correspondences between an attribute pair
Vectors are matching results
12
Similarity Spaces: A Vector Space Model
S1
S2
1 cardNum 2 city 3 arrivalDay 4 checkInTime
1 clientNum 1 0 0 0
2 city 0 1 0 0
3 checkInDay 0 0 0 1
S1
S2
1 cardNum 2 city 3 arrivalDay 4 checkInTime
1 clientNum 0.84 0.32 0.32 0.30
2 city 0.29 1.00 0.33 0.30
3 checkInDay 0.34 0.33 0.35 0.64
![Page 13: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/13.jpg)
The Schema Matching Evaluation Problem
13
The Schema Matching Evaluation Problem
K
Informed?
0 1 2 >2
Yes?
Non-BinaryBinary
Non-BinaryBinary
No Non-BinaryBinary
Non-BinaryBinary
Non-BinaryBinary
Area in green marks where most research has focused to date. Areas in Yellow designate limited work done.
![Page 14: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/14.jpg)
Schema Matching Evaluation
Over this vector space, evaluation functions are defined:
For example, the well known precision and recall functions are functions of two vectors ( v = outcome of a decision maker. ve = exact match ) :
Similarity Spaces: A Vector Space Model
14
5.02
1,1
1
1
2,1
1
1,1)1,0(
REPR
e
e
e
gg
vv
vv
vv
![Page 15: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/15.jpg)
Schema Matching Evaluation
Accommodating non-binary evaluation is now trivial: allow v to be non-binary
Non-binary measures
15
49.02
99.0,99.0
1
99.0
2,99.0
99.0
1,1)35.0,64.0(
REPR
e
e
e
gg
vv
vv
vv
![Page 16: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/16.jpg)
Implications:
Schema Matching SystemUncertain
Schema Matching System
Schema Matching EvaluationNon-binary measures - Implications
16
String Matcher
Domain Matcher
Parent-Child Matcher
Instance Matcher
Aggregation
Filter
Decision maker
Schema Pair
Binary Similarity Matrix
Binary Similarity Matrix
1st Line Matchers 2nd Line Matchers
Can evaluate individual 1st line matchers
Can evaluate Interim Results
Can evaluate Uncertain Results
![Page 17: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/17.jpg)
Match DistanceSometimes you need a metric…
We define two complementary distance metrics:
17
![Page 18: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/18.jpg)
Match DistanceBehavior vs. NBPrecision and NBRecall
Results of synthetic evaluation
18
![Page 19: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/19.jpg)
Match DistanceBehavior vs. NBPrecision and NBRecall
Results of synthetic evaluation
19
Nonsense solution of increasing magnitudeNoisy solution with increasingly strict filter applied
![Page 20: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/20.jpg)
Schema Matching EvaluationPredictors
Background Model Evaluation
EXACT MATCH
What if you don’t have an exact match?
In most applications, this is the case…
![Page 21: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/21.jpg)
Schema Matching Evaluation
Predictors are a special class of schema matching evaluation methods, which do not use an exact match as part of the input.
Predictors can be classified into two subclasses:• Internalizers that refer to the internal structure of a vector as an
indication of match quality (e.g. max , stdev, average)• Idealizers assume the existence of an ideal vector and compare with it
Predictors
Background Model Evaluation
![Page 22: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/22.jpg)
Schema Matching EvaluationIdealizers
Background Model Evaluation
(Ideal Vector)
![Page 23: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/23.jpg)
Schema Matching Evaluation
Desired design properties:• Tunable: We should be able to tune predictors towards the desired
quality in a specific scenario.• Generalizable: Predictors should be based upon principles which are
applicable at several levels of granularity and can be specialized to some levels of granularity.
Desired empirical properties:• Correlated: Well correlated with the quality they are designed to
evaluate• Robust: Correlations are robust and statistically significant over varied
matching systems and datasets
23
Desired Properties of Predictors
Background Model Evaluation
![Page 24: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/24.jpg)
Using PredictionTunable Prediction Models
24
Loosely correlated predictors can be composed into a model. The weights of its participating predictors can be tuned Construction by (multiple) step-wise regression.
![Page 25: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/25.jpg)
Using PredictionTunable Prediction Models
25
Added bonus: Increased correlation
![Page 26: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/26.jpg)
Schema Matching Evaluation
Consider the following example and how a matrix level vs. an attribute level predictor would fare in it
26
Why Granularity Matters
Exact Match Matcher Vectors
![Page 27: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/27.jpg)
The Schema Matching Evaluation Problem
27
The Schema Matching Evaluation Problem
K
Informed?
0 1 2 >2
Yes?
Non-BinaryBinary
Non-BinaryBinary
No Non-BinaryBinary
Non-BinaryBinary
Non-BinaryBinary
Area in green marks where most research has focused to date. Areas in Yellow designate limited work done.
![Page 28: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/28.jpg)
Presentation Outline
Background• Schema Matching
Schema Matching Evaluation• Current model: Set based Precision and Recall• Proposed Model: Similarity Spaces, a vector-space model• Non-Binary measures
Usage examples• Tuning schema matchers using Non-binary measure• Weighting ensembles using attribute-level prediction
28
![Page 29: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/29.jpg)
Usage Examples
First-line matcher named Term has tunable parameter label score weight (α) defining the relative weight of the term’s label and name
Tuning can be done via machine learning methods or statistical methods
All tuning methods benefit from: • Smoothness: Gradual changes in α gradual changes in measure• Robustness: Observed behavior is robust w.r.t number of test cases
Tuning scenario
29
nameScore-1labelScore
Label
Name: leaveSlice
![Page 30: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/30.jpg)
Usage ExampleSmoothness
30
To use binary precision, a decision maker is required. Introducing a decision maker causes random noise caused by arbitrary
thresholds
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Precision Binary Measures
Threshold(0.2)Threshold(0.25)Stable Marriage
Label score weight
Prec
ision
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.020.040.060.08
0.10.120.140.160.18
0.2
Non Binary Precision
Label score weight
NBP
reci
sion
![Page 31: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/31.jpg)
Usage ExampleRobustness – effect of sample size
31
An outlier in schema pair no.2 causes Binary-precision (fig. (b)) to diverge greatly from the eventual observed behavior with 10 pairs.
Unperturbed by outliers, NBPrecision (fig. (a)) displays robust behavior
(pairs) (pairs)
![Page 32: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/32.jpg)
Using PredictionDynamic Prediction Models - Results
32
![Page 33: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/33.jpg)
Conclusions
Non-binary measures are a useful addition to the schema matching evaluation tool-kit.
Non-binary evaluation presents desired characteristics in tuning scenarios (smoothness and robustness)
Using the similarity vector space model we can generate additional measures, breaking from traditional measures (e.g. binary precision and recall) to measures more attuned to modern schema matching needs.
33
![Page 34: Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching Presentation @ ER 2012 October 2012, Florence](https://reader035.vdocuments.us/reader035/viewer/2022062716/56649dca5503460f94ac09e5/html5/thumbnails/34.jpg)
Thank You
34
Questions?