©2004, philippe cudré-mauroux semantic interoperability for global information systems microsoft...
Post on 19-Dec-2015
214 Views
Preview:
TRANSCRIPT
©2004, Philippe Cudré-Mauroux
Semantic Interoperability for Global Information Systems
Microsoft Research Asia 08.20.04
Philippe Cudré-Mauroux
Distributed Information Systems Laboratory (LSIR)Swiss Federal Institute of Technology, Lausanne (EPFL)
©2004, Philippe Cudré-Mauroux
Outline
I. Classical Information Integration (overview)
– Global Schema– Multidatabase Language Approach– Federated Databases
II. Information Integration in the Large– Context: The Semantic Web– Shared ontologies– The Chatty Web
III. State of the Art in Ontology Alignment (overview)
IV. Semantic Integration in a Large-Scale Image Sharing Scenario
©2004, Philippe Cudré-Mauroux
I. Classical Information Integration
• Goal: providing a uniform access to multiple heterogeneous information sources
• More than data exchange (e.g., ASCII, EDI, XML)
• Old problem, difficult, well-know (partial) solutions
©2004, Philippe Cudré-Mauroux
Global Schema Integration
• Merge multiple databases into one global database
• Performed by human expert• Time consuming and error prone• Local autonomy lost• Static solution
Book(ISBN, Title, Price, Author)Author(Name, ISBN)
Livre(ISBN, Prix, Titre)Auteur(Prenom, Nom, ISBN)
Book(ISBN, Title)Author(Name, ISBN)
S1 S2
©2004, Philippe Cudré-Mauroux
Multidatabase Language Approach
• No attempt at integrating schemas• Language (e.g., MSQL) used to integrate
information sources at run-time• Simple example:
• Not transparent• Heavy burden on (expert) users• Global queries subject to local changes
Use S1, S2Select TitreFrom S1.Book, S2.LivreWhere S1.Book.ISBN = S2.Livre.ISBN
©2004, Philippe Cudré-Mauroux
Federated Databases
• Idea: Each information source exports a schema specifying shared relations
• Tight-coupling:– Global schema integration on all export schema (cf.
global schema integration)
• Loose-coupling:– Dynamic add / drop, e.g., by creating views (logical
relations)
©2004, Philippe Cudré-Mauroux
GAV (Global as View)
• Global (mediated) schema is expressed as a view on local schemas
Book(ISBN, Title, Author)
[…][…]Book(ISBN, Title)Author(Name, ISBN)
Create VIEW Book AsSelect ISBN, Title, AuthorFrom S1.Book, S1.Author Where Book.ISBN = Author.ISBN
Mediated Schema
S1 S2 S3
©2004, Philippe Cudré-Mauroux
LAV (Local as View)
• Local schemas are expressed as a view on global schema
Book(ISBN, Title, Author)
[…][…]Book(ISBN, Title)Author(Name, ISBN)
Create VIEW S1.Book AsSelect ISBN, TitleFrom Book
Mediated Schema
S1 S2 S3
©2004, Philippe Cudré-Mauroux
LAV / GAV (cont.)
• Transparent access to heterogeneous databases in the federation
• Local autonomy is (usually) preserved• Query processing through query reformulation• Requires global agreement on the mediated
schema (tight semantic coupling)• Does not scale well
©2004, Philippe Cudré-Mauroux
II. Information Integration in the Large
• Goal: providing a uniform access to many heterogeneous information sources
• Traditional approaches are inadequate– Lack of adaptability– Lack of transparency– Lack of scalability
• Hot research area
©2004, Philippe Cudré-Mauroux
Some Applications
Agent Communication Web services integration Information retrieval from heterogeneous
databases Catalog matching P2P information sharing Personal information delivery Vertical information publishing
©2004, Philippe Cudré-Mauroux
General Context: The Semantic Web
Unicode
XML + NS + xmlschema
RDF + rdfschema
Ontology vocabulary
Logic
Proof
Trust
URI
Dig
ital
Sig
nat
ure
Self-desc.doc.
Data
Data
Rules
• Providing machine-processable data to the Web
©2004, Philippe Cudré-Mauroux
RDF/RDFS 2’ Overview
• RDF triple:
• RDF Schemas– Classes of resources– Classes of properties– Constraints on the subject (domain) or object (range)– Subclassing
• Extensible!
– Full-fledged ontological language: OWL
Subject ObjectProperty
©2004, Philippe Cudré-Mauroux
Example: CreativeCommons
<rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><Work rdf:about="http://example.org/gnomophone.mp3"> <dc:title>Compilers in the Key of C</dc:title> <dc:description>A lovely classical work on compiling code.</dc:description> <dc:creator><Agent> <dc:title>Yo-Yo Dyne</dc:title> </Agent></dc:creator> <dc:rights><Agent> <dc:title>Gnomophone</dc:title> </Agent></dc:rights> <dc:date>1842</dc:date> <dc:format>audio/mpeg</dc:format> <dc:type rdf:resource="http://purl.org/dc/dcmitype/Sound" /> <dc:source rdf:resource="http://example.net/gnomovision.mov" /> <license rdf:resource="http://creativecommons.org/licenses/by-nc-nd/2.0/" /> <license rdf:resource="http://www.eff.org/IP/Open_licenses/eff_oal.html" /></Work>
<License rdf:about="http://creativecommons.org/licenses/by-nc-nd/2.0/"> <permits rdf:resource="http://web.resource.org/cc/Reproduction" /> <permits rdf:resource="http://web.resource.org/cc/Distribution" /> <requires rdf:resource="http://web.resource.org/cc/Notice" /> <requires rdf:resource="http://web.resource.org/cc/Attribution" /> <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" /></License></rdf:RDF>
©2004, Philippe Cudré-Mauroux
Semantic Interoperability in The Semantic Web
• Common ontologies provide for shared context– Requires global agreement!
• Intractable standardization effort!• Back to stage 1…
• Two Plausible solutions:– Agreed-upon corpuses of basic concepts
• IEEE SUMO• Stanford TAP• …
– Local federation of ontologies fostering global interoperability• EPFL Chatty Web• U. Washington Piazza• …
• Complementary approaches
©2004, Philippe Cudré-Mauroux
The Chatty Web
A lab in Trondheim
species
species
EMBLChange site at Cambridge
Swissprot siteat Geneva
A lab at MIT
organism
Query postedat EPFL
organism
organism
EMBLChange peersspecies, …
SwissProt peersauthors, titles, organism, …
other peersauthors, …
organism authors
organism species
species organism
• Local translations enabling global agreements
• Analyzing transitive closures of local mappings
©2004, Philippe Cudré-Mauroux
III. State of the Art on Ontology Alignment
• Problem: Given two ontologies which describe each a set of discrete entities, find the relationships holding between the entities
• Alignments can then be used to foster interoperability locally
• Difficult problem (fully automatic solutions?)
• Active area of research
©2004, Philippe Cudré-Mauroux
Local Ontology Alignment Techniques
1. Terminological methods– string-based– language-based
• Intrinsic• Extrinsic• Multilingual
2. Structural methods– Internal– External
3. Others– Extensional– Semantic– User Feedback
©2004, Philippe Cudré-Mauroux
1. Terminological Methods
• String-based: compare labels of entities– (sub-) String equality– Edit distances– Token-based distances (e.g., TF/IDF on substrings)
• Language-based– Intrinsic
• Terminological matching with morphological / syntactic analysis (allomorphies)
– Extrinsic• Use of external resources (e.g., WordNet synsets)
– Multilingual methods• Matches terms in different languages
©2004, Philippe Cudré-Mauroux
2. Structural Methods
• Internal (constraint-based):– Data-based domain comparison– Multiplicities / Properties comparison– Similarity between collections
• External– Mereologic structures– Taxonomic structures– Relations bw similar entities
©2004, Philippe Cudré-Mauroux
3. Other
• Extensional methods– Extension set of instances of a class
– Jaccard similarity:
– Similarity-based extension comparison
• Semantic Methods– Based on model-theoretic semantics– SAT problem (e.g., subsumption)
• User Feedback
)(
)(),(
BAP
BAPBA
©2004, Philippe Cudré-Mauroux
A Handful of Systems
• APrompt (Stanford) [T,I,S,U]• Cupid (Microsoft research) [T,I,S]• Bibster (U. Karlsruhe) [T,I,S]• Glue (U. Washington) [E]• S-Match (U. Trento) [T,S,M]• …
Typically: a mix of techniques
[Terminological, Internal structure, external Structure, Extensional, seMantic, User]
©2004, Philippe Cudré-Mauroux
IV. Semantic Integration in a Large-Scale Image Sharing Scenario
• Problem: retrieve a specific image from a large collection of shared images
• So far: most application mix CB and text analysis– CB image analysis provides a low-level objective
representation of an image• Good for comparing image features• Not so good w.r.t. end-users needs expressed in N.L.
– Surrounding text / filenames might sometimes be a high-level subjective view of the image
• Incomplete, out-of-context description• Good w.r.t. N.L. (cf. Google images)
©2004, Philippe Cudré-Mauroux
Potential Opportunity
• Emerging applications make use of high-level, local and semi-contextualized image metadata– Structured metadata (Photoshop Album, XML)– Ontological metadata (RDF, Adobe XMP)– Type-based metadata (Microsoft WinFS)
• Paradigm shift from the old metadata standards (e.g., keywords, EXIF)– Extensible formats
• Personal conjecture: – Metadata will be prominent in a few years– Huge opportunity for image retrieval
©2004, Philippe Cudré-Mauroux
Structured Metadata
• Ex.: Photoshop Album• Hierarchy of tags• Stored in a relational,
proprietary, local database
• Non-exportable
©2004, Philippe Cudré-Mauroux
Ontological Metadata (1)
• Ex.: Extensible Metadata Platform (XMP)• Subset of RDF/S• Metadata might be embedded into the file• Supported by a wide range of Adobe applications
– Adobe® Acrobat®– Adobe FrameMaker® – Adobe GoLive®– Adobe Illustrator® – Adobe InCopy®– Adobe InDesign®– Adobe LiveMotion™– Adobe Photoshop®– Adobe Document Server – Adobe Graphics Server – Version Cue™
©2004, Philippe Cudré-Mauroux
Type-Based Metadata (1)
• New file-system for Longhorn (NTFS+++)• No more hierarchies (i.e., folders) but metadata• Items – Attributes – Relationships – Schemas –
Sub-Schemas (extensions)– Déjà vu?
©2004, Philippe Cudré-Mauroux
Observations
• So far, all applications using these metadata are local– It is a typical semantic interoperability problem! Efficient, distributed WinFS is not for tomorrow…
• Image metadata will always be incomplete and subjective
• #images >> #peers >> #schemas
• All these formats can be formally described by a subset of Description Logics– Use them all in and in the darkness bind them!
©2004, Philippe Cudré-Mauroux
Outline of my Project
• Objective: large-scale image retrieval framework taking advantage of metadata
• Outline– Import images– Import metadata– Extract low-level features (thanks to Lei :-)– Store everything in a common, scalable representation– Export data in a shared repository
• SQL server• P2P network (SP2 ?)
– Infer Metadata / Schema mappings locally– Cross validate mappings– Cluster peers / images vis-à-vis their subjective views
©2004, Philippe Cudré-Mauroux
Specificities
• Different metadata models• Incompleteness of metadata (e.g., WinFS dangling
links)• Metadata sparseness• Few (but widely-used) core-classes• Many custom extensions• Many resources• Low-level representation of the resources• Embedded user feedback
Unique application
©2004, Philippe Cudré-Mauroux
Finding Mapping Candidates (sketch)
• [T,I,U] U-Inference based on mutual information
(scalable!)
schema, metadata
Low-levelfeatures
Low-levelfeatures
features
metadata
schema, metadata
schema, metadata
metadataschema
feedback
metadata,schemas
©2004, Philippe Cudré-Mauroux
Cross-Validating Mappings (sketch)
• [S,M] Cross-validation based on graph partitioning, semantic gossiping or SAT techniques
Ref.: Instance-based Schema Matching for Web Databases by Domain-specific Query Probing
Jiying Wang, Ji-Rong Wen, Frederick H. Lochovsky, Wei-Ying MaVLDB 2004
©2004, Philippe Cudré-Mauroux
Conclusions
• Leveraging local metadata produced by end-users– Complex problem! Good heuristics could take years to
be developed…
• Local communications / computations– Scalability
• Hopefully, better results than keywords / low-level analyses even for simple heuristics– Take advantage of context
• Images given local semantics• Analyze the dynamics of the overall system• Objectivity vs. subjectivity of interpretation
becomes a measurable quality
top related