data fusion jens bleiholder and felix naumann presented by aaron stewart
Post on 19-Dec-2015
214 views
TRANSCRIPT
Complete / Concise
• Like recall/precision
• Complete: coverage of real-world objects
• Concise: avoid duplicates
Equi-joinSELECT U1.Name, U2.Name, U1.Age, U2.Age, U1.Status, U2.Status,
U1.Address, U2.Address, U1.Field, U2.Field, U1.Library, U2.Phone
FROM U1 JOIN U2 ON U1.Name=U2.Name
Equi-join Result
SELECT U1.Name, U2.Name, U1.Age, U2.Age, U1.Status, U2.Status,
U1.Address, U2.Address, U1.Field, U2.Field, U1.Library, U2.Phone
FROM U1 JOIN U2 ON U1.Name=U2.Name
Natural JoinSELECT U1.Name, U1.Age, U1.Status, U1.Address, U1.Field,
U1.Library, U2.PhoneFROM U1 JOIN U2 ON U1.Name=U2.Name AND U1.Age=U2.Age
AND U1.Status=U2.Status AND U1.Address=U2.AddressAND U1.Field=U2.Field
Natural Join ResultSELECT U1.Name, U1.Age,
U1.Status, U1.Address, U1.Field, U1.Library, U2.Phone
FROM U1 JOIN U2 ON U1.Name=U2.Name AND U1.Age=U2.AgeAND U1.Status=U2.Status AND U1.Address=U2.AddressAND U1.Field=U2.Field
Full Outer JoinSELECT U1.Name, U2.Name, U1.Age, U2.Age, U1.Status, U2.Status,U1.Address, U2.Address, U1.Field, U2.Field, U1.Library,
U2.PhoneFROM U1 FULL OUTER JOIN U2 ON U1.Name=U2.Name
Full Outer Join ResultSELECT U1.Name, U2.Name,
U1.Age, U2.Age, U1.Status, U2.Status,
U1.Address, U2.Address, U1.Field, U2.Field, U1.Library, U2.Phone
FROM U1 FULL OUTER JOIN U2 ON U1.Name=U2.Name
Information Systems for Data Fusion
1. Conflict resolution
2. Conflict avoidance
3. Conflict ignorance
4. No conflict handling
Architecture
• Database management system (DBMS)
• Multidatabase management system (MDBMS)
• Mediator-wrapper (MW)
• Multi-agent system (MAS)
• Stand-alone application (APP)
Hermes
• HEterogeneous Reasoning and MEdiator System
• C. 1996
• Mediator-specified conflict resolution– Created by an expert
Fusionplex
• Multiplex, Fusionplex, Autoplex
• Classifies quality of data
• User-prioritized feature “importance”
• Able to incorporate new/unknown databases
HumMer
• Humboldt-Merger
• C. 2006
• Handles conflicts in schema, identity, data
• Clusters duplicates
• User-defined aggregation functions
Other Systems
• Research Systems– Trio– Information Manifold– Garlic– Disco (Distributed Information Search Component)– Papyrus, Nomenclature– DIOM, KOMET, Infomaster, Occam, SIMS, Internet
Softbot– Singapore, Magic, Observer– Lore, Tukwila– SIRIUS-DELTA, DDTS, Mermaid, UNIBASE– MRDSM, OMNIBASE, CALIDA, DQS
Other Systems
• Commercial– IBM, Oracle, Microsoft, others– IBM Information Server (IIS)– Microsoft SQL Server Integration Services
(SSIS)