semantic interoperability in infocosm: beyond infrastructural and data interoperability in federated...
DESCRIPTION
Amit Sheth, Keynote: International Conference on Interoperating Geographic Systems (Interop’97), Santa Barbara, December 3-4 1997. Related technical paper: http://knoesis.org/library/resource.php?id=00230TRANSCRIPT
Semantic Interoperability in Infocosm:Semantic Interoperability in Infocosm: Beyond Beyond Infrastructural and Data Interoperability in Federated Infrastructural and Data Interoperability in Federated
Information SystemsInformation Systems
Keynote TalkKeynote TalkInternational Conference on Interoperating Geographic Systems (Interop’97), International Conference on Interoperating Geographic Systems (Interop’97),
Santa Barbara, December 3-4 1997Santa Barbara, December 3-4 1997
Amit ShethAmit ShethLarge Scale Distributed Information Systems LabLarge Scale Distributed Information Systems Lab
University of GeorgiaUniversity of Georgiahttp://lsdis.cs.uga.eduhttp://lsdis.cs.uga.edu
Thanks: Vipul Kashyap, Kshitij ShahThanks: Vipul Kashyap, Kshitij Shah
Semantic Interoperability in Infocosm:Semantic Interoperability in Infocosm: Beyond Beyond Infrastructural and Data Interoperability in Federated Infrastructural and Data Interoperability in Federated
Information SystemsInformation Systems
Keynote TalkKeynote TalkInternational Conference on Interoperating Geographic Systems (Interop’97), International Conference on Interoperating Geographic Systems (Interop’97),
Santa Barbara, December 3-4 1997Santa Barbara, December 3-4 1997
Amit ShethAmit ShethLarge Scale Distributed Information Systems LabLarge Scale Distributed Information Systems Lab
University of GeorgiaUniversity of Georgiahttp://lsdis.cs.uga.eduhttp://lsdis.cs.uga.edu
Thanks: Vipul Kashyap, Kshitij ShahThanks: Vipul Kashyap, Kshitij Shah
• Information Integration Perspective:Information Integration Perspective:Distribution, Heterogeneity, AutonomyDistribution, Heterogeneity, Autonomy
• Information Integration Perspective:Information Integration Perspective:Distribution, Heterogeneity, AutonomyDistribution, Heterogeneity, Autonomy
Three perspectivesThree perspectives
• Information Brokering Perspective:Information Brokering Perspective:Data, Metadata, Data, Metadata, Semantic (Terminological, Contextual) Semantic (Terminological, Contextual)
• Information Brokering Perspective:Information Brokering Perspective:Data, Metadata, Data, Metadata, Semantic (Terminological, Contextual) Semantic (Terminological, Contextual)
• ““Vision” Perspective: Connectivity+Computation, Vision” Perspective: Connectivity+Computation, Information, KnowledgeInformation, Knowledge
• ““Vision” Perspective: Connectivity+Computation, Vision” Perspective: Connectivity+Computation, Information, KnowledgeInformation, Knowledge
Evolving targets and approaches in Evolving targets and approaches in integrating data and information: integrating data and information: a personal perspectivea personal perspective
Evolving targets and approaches in Evolving targets and approaches in integrating data and information: integrating data and information: a personal perspectivea personal perspective
MermaidMermaidDDTSDDTS
Multibase, MRDSM, ADDS, Multibase, MRDSM, ADDS, IISS, Omnibase, ...IISS, Omnibase, ... Early 80s
Infoscopes, HERMES, SIMS, ...Infoscopes, HERMES, SIMS, ...TSIMMIS,Harvest, RUFUS,... TSIMMIS,Harvest, RUFUS,...
VisualHarnessVisualHarnessInfoHarnessInfoHarness
1990
InfocosmInfocosm
Digital Library Projects, ..Digital Library Projects, ..
InfoQuiltInfoQuilt 1997
Generation 1Generation 1
Generation 2Generation 2
Generation 3Generation 3
Generation IGeneration I
• Data recognized as corporate resource -- leverage it!• Most data in structured databases (and the rest in
files), different data models, transitioning from Network and Hierarchical to Relational DBMSs
• Connectivity/access -- a major issue • Heterogeneity (system, modeling and schematic) as
well as need to support autonomy posed main challenges
• Support for corporate IS applications as the primary objective, update often required, data integrity important
Generation IIGeneration II
• Significant improvements in computing and connectivity (standardization of protocol, public network, Internet/Web); remote data access as given
• Increasing diversity in data formats, with focus on variety of textual data and semi-structured documents (and lesser focus on structured data)
• Many more data sources, diverse domains, but not necessarily better understanding of data
• Use of data beyond traditional business applications -- mining + warehousing, marketing, commerce
• Query only, little attention to updates; extensive use of IR techniques
• Focus shift from data to metadata; earlier, distribution applied to data only, now it also applies to metadata
• Wrapper part of Mediator Architecture*, Metadata component of Information Brokering Architecture
• Early work on ontology support
Generation IIGeneration II
* Gio Wiederhold* Gio Wiederhold
• Increasing information overload• Changes in Web architecture: push,…• Broader variety of content with increasing amount
of visual information• Continued standardization related to Web for
representational and metadata issues (MCF, RDF, XML) and distributed computing (CORBA, Java)
• Not just metadata, logical correlation
• Users demand simplicity, but complexities continue to rise
Generation IIIGeneration III
• Broader variety of users and applications; well beyond business and scientific uses (e.g., focused marketing-- more than information on the web)
• Not just data access, but decision supportdecision support through “data mining and information discovery, information fusion, information dissemination, knowledge creation and management”, “information management complemented by cooperation between the information system and humans”
Generation III Generation III (contd)(contd)
DistributionDistribution
AutonomyAutonomy
HeterogeneityHeterogeneity
Dimensions for interoperability and integration: Dimensions for interoperability and integration: Perspective used for Federated DatabasesPerspective used for Federated Databases
FDBS: Schema ArchitectureFDBS: Schema Architecture
ComponentComponentDBSDBS
LocalLocalSchemaSchema
ComponentComponentSchemaSchema
ExportExportSchemaSchema
ExportExportSchemaSchema
ExportExportSchemaSchema
FederatedFederatedSchemaSchema
ExternalExternalSchemaSchema
ExternalExternalSchemaSchema
o o oo o o
o o oo o o
o o oo o o
o o oo o o
o o oo o o
ComponentComponentDBSDBS
LocalLocalSchemaSchema
ComponentComponentSchemaSchema
• Model Heterogeneity:Model Heterogeneity: Common/Canonical Data Model Common/Canonical Data Model Schema Translation Schema Translation• Information Sharing while Information Sharing while preserving Autonomy preserving Autonomy
schema translationschema translation
schema integrationschema integration
Heterogeneity in FDBMSsHeterogeneity in FDBMSs
Hardware/System• instruction set• data representation/coding• configuration
Operating System• file system• naming, file types, operation• transaction support• IPC
Database System• Semantic HeterogeneitySemantic Heterogeneity• Differences in DBMSDifferences in DBMS
• data models data models (abstractions, constraints, query languages) (abstractions, constraints, query languages)• System level support System level support (concurrency control, commit, recovery) (concurrency control, commit, recovery)
Communication
1970s1970s
1980s1980s
Characterization of Schematic ConflictsCharacterization of Schematic Conflictsin Multidatabase Systemsin Multidatabase Systems
Characterization of Schematic ConflictsCharacterization of Schematic Conflictsin Multidatabase Systemsin Multidatabase Systems
SchematicConflicts
Domain DefinitionDomain DefinitionIncompatibilityIncompatibility
Naming Conflicts
Data RepresentationConflicts
Data Scaling Conflicts
Data Precision ConflictsDefault Value Conflicts
Attribute IntegrityConstraint Conflicts
Data ValueIncompatibility
KnownInconsistency
TemporalInconsistency
AcceptableInconsistency
Abstraction LevelIncompatibility
GeneralizationConflicts
AggregationConflicts
SchematicDiscrepancies
Data Value Attribute Conflict
Entity Attribute Conflict
Data Value Entity Conflict
Entity DefinitionIncompatibility
Naming Conflicts
Database IdentifierConflicts
SchemaIsomorphismConflicts
Missing Data ItemsConflicts
Sheth & Kashyap, Kim & SeoSheth & Kashyap, Kim & Seo
Observations and Lessons LearntObservations and Lessons Learnt
• “tightly coupled” vs “loosely coupled” debate
• “good common data model” debate
• “tightly coupled” harder to build, but can give better control over data sharing, provide more transparent access, and can possibly support update; lessons learned in schema integration can be reapplied in newer situations
• “loosely coupled” more flexible, but generally require more user involvement
Retracing the path Retracing the path without learning from past expeditionswithout learning from past expeditions
Steps for transitioning from Data Marts to Warehouses:• Create consistent dimensions in the data marts• Create a data warehouse data model and convert data
marts to it• Go back and build an enterprise data warehouse, then
convert data marts to the new common data model and architectures
The above is doomed to repeat past mistakes. Integrating metadata is not easy!
PC Week, November 24 , 1997PC Week, November 24 , 1997
Autonomy Heterogeneity Distribution
InfoHarness Semi autonomous High - semanticissues not addressed
Data fullydistributed -metadatacentralized
ConceptualIndexing
N/A Extensible semanticknowledge base
N/A
HERMES Autonomous High HighTSIMMIS Autonomous High - semantic
issues not addressedHigh
SIMS Autonomous High HighInfoSleuth Semi autonomous High - semantic
brokeringHigh
KMed Centralized control High semanticheterogeneityfocused on imagesonly
None
Generation 1 concern:So far (schematically),
yet so near (semantically)!
Generation 3 concern:So near (schematically), yet so far (semantically)!
Generation IIand
Generation III
Information Brokering: Information Brokering: A Three-Level ApproachA Three-Level Approach
OntologyOntology
ContentContent
RepresentationRepresentation
used-by
abstracted-into
SemanticSemantic(Domain, Application specific)
Metadata Metadata (content descriptions, intentional)
DataData(heterogeneous types, media)
used-by
abstracted-into
Top Down
BottomUpE
mph
asis
fro
m G
en.I
to G
en.I
II
An Architecture An Architecture for Information Brokeringfor Information Brokering
An Architecture An Architecture for Information Brokeringfor Information Brokering
User Query/Information Request
User Query/Information Request
User Query/Information Request
...DATA REPOSITORIES
...DATA REPOSITORIES
Information System 1 Information System N
INFORMATION BROKERING
Data BrokeringData Brokering (CORBA, HTTP, IIOP) (CORBA, HTTP, IIOP)
Inter-VocabularyRelationships Manager
VocabularyBroker
VocabularyBrokerVocabulary BrokeringVocabulary Brokering
MetadataBroker
MetadataRepository
Metadata System
MetadataBroker
MetadataRepository
Metadata System
Metadata BrokeringMetadata Brokering
Generation 2:Limited Types of Metadata,
Extractors,Mappers,Wrappers
Global/EnterpriseWeb Repositories
NexisUPIAP
DB
METADATA
EXTRACTORS
Generation 2Generation 2
Data Integration
Data Publishing
Publishing Rule
Publisher
Extraction Rules
Extractor
Mapping Rules
Mapper
Internet
Wrappers (SDL Description)
Text
IDTApplication
RDBMS
JungleeJunglee
Gen.2Gen.2
Find Marketing Manager positions in a company that is within 15 miles of San Francisco and whose stock price has been growing at a rate of at least 25% per year over the last three years
Junglee, SIGMOD Record, Dec. 1997Junglee, SIGMOD Record, Dec. 1997
• can automatically identify data/media type• can be extended at any time (pre-specified
or parameterized routines)• can run at data source, metadata storage site or at
IQ server• can run at pre-specified times or events, or on
demand• can route metadata to appropriate metadatabase
repositories
Extractors use agent & networking computing (NC) technologies and are implemented in PERL/ Java
ExtractorsExtractors
A ClassificationA Classification of Metadata of Metadata
• Content Independent Metadata e.g. creation-date, location, ...
• Content Dependent Metadata e.g. size, number of colors in an image
– Content-(directly)based Metadata e.g. inverted lists, doc vectors
– Content-descriptive Metadata
• Domain Independent (structural) Metadata
e.g. parse tree of a C++ program, HTML/SGML DTDs
• Domain Specific Metadatascale, coordinate, land-cover, relief (GIS Domain), area, population (Census Domain), concept descriptions from Domain Specific Ontologies
Move in this direction to tackle information overload !!
Query Processing andInformation Requests
•traditional queries based on keywords•attribute-based queries•content-based queries•'high-level' information requests
involving ontology-based, iconic, mixed-media,
and media-independent information requests•user selected ontology, use of profile
E.g., Kabila’s political activities (in all media) E.g., Kabila’s political activities (in all media)
Generation 2Generation 2
Generation 3Generation 3
VisualHarness
..
..
Image Data
Color CompTexture Structure
Other Attributes
VIR
Extraction
Null Image
Metadata for combined access
User QueryUser Query
VHVH
ResultsResultsResultsResults
Metadata Brokering in VisualHarnessMetadata Brokering in VisualHarness
VisualHarness VisualHarness An Example An Example
What else can Information What else can Information Brokering do?Brokering do?
WWW
• A confusing heterogeneity of media, formats (Tower of Babel)
• Information correlation using physical (HREF) links at the extensional data level
• Location dependent browsing of information using physical (HREF) links => User has to keep track of information content !!
WWW+Information Brokering
• Domain Specific Ontologies as “semantic conceptual views”
• Information correlation using concept mappings at the intensional concept level
• Browsing of information using terminological relationships across ontologies=> Higher level of abstraction, closer to user view of information !!
Ontologies for Ontologies for semantic interchangesemantic interchange
• Need for “transcending” local subject areas/domains => Design Adaptable systems which “adapt/adjust” themselves in the face of vocabularies from different domains
• Coordination and interrelation of models across domainsOne approach => utilize terminological relationships across concepts in ontologies
• Specification languages for ontologies:– Description Logics, Rule-based Languages
– Support for mechanisms for Coordination and Correlation, viz., representation and reasoning with terminological relationships
The InfoQuilt Project
http://lsdis.cs.uga.edu/infoquilt
MREFMetadata Reference Link -- complementing HREF
Creating “logical web” through
Media Independent Metadata based
Correlation
Metadata Reference Link (<A MREF …>)
• <A HREF=“URL”>Document Description</A>
physical link between document (components)
• <A MREF KEYWORDS=<list-of-keywords>; THRESH=<real>>Document Description</A>
• <A MREF ATTRIBUTES(<list-of-attribute-value-pairs>)>Document Description</A>
• <A MREF(<parameterized_routine(….)> Document Description</A>
Correlation based on Correlation based on Content-descriptive Metadata Content-descriptive Metadata
Some interesting <A MREF KEYWORDS=“scenic waterfall mountain”; THRESH = 0.9>information on scenic waterfalls</A> is available here.
Content Descriptive Metadata
Marina wonderlandYou are seeing the nature’s beauty of marina wonderland situated in thecoastal region of the southern part of India. It consists of huge mountainsand water flowing in between the mountains.
WAISLSIGlimpseSMART….….
waterfall.gif (Data)
Full TextIndexing
height, width and size
waterflow.gif (Data)Metadata Storage
waterflow.gif
……gif
……ppm
Major component(RGB)Major component(RGB)
Blue
Content based MetadataContent based Metadata
ContentDependentMetadata
Correlation based on Content-based Metadata
Some interesting <A MREF KEYWORDS= “scenic waterfalls”; THRESH = 0.9; ATTRIBUTES (major-color = ‘blue’)> information on scenic waterfalls</A> is available here.
Metadata,Metadata,Domain Specific OntologiesDomain Specific Ontologies
Metadata,Metadata,Domain Specific OntologiesDomain Specific Ontologies
Get the titlestitles, authorsauthors, documentsdocuments, maps published by the United States Geological Service (USGS) about regionsregions having a populationpopulation greater than 5000, areaarea greater than 1000 acreshaving a low density urban area land coverurban area land cover
domain specific metadata: terms chosen from domain specific ontologies
What is Metadata ?
- data/information about data- useful/derived properties of media- properties/relationships between objects
What are Ontologies ?- collection of terms, definitions and their interrelationships- specification of a representational vocabulary for a shared domain of discourse
TIGER/Line DB
Population: Area:
Boundaries:
Land cover:Relief:
Census DB Image/Map DB
Regions(SQL)
Boundaries
Image Features(image processing routines)
Repositories and Repositories and the Media Typesthe Media Types
Domain Specific CorrelationDomain Specific Correlation
Potential locations for a future shopping mall identified by all regions having a population greater than 500 and area greater than 50 sq ft having an urban land cover and moderate relief <A MREF ATTRIBUTES(population > 500; area > 50; region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’)>can be viewed here</A>
=> media-independent relationships between domain specific metadata: population, area, land cover relief
=> correlation between image and structured data at a higher domain specific level as opposed to physical “link-chasing” in the WWW
InfoQuilt Architecture (partial)
Media Independent Information Requests[Browsing Collections, Keyword-based queries,Attribute-based queries]
CorrelationServer
Media and Domainspecific ExtractorAgents
...
IQR: Metadata & DomainKnowledge Repository and Registry
loc, type, author
Attr.Metadata
ParameterizedRoutines
InfoQuiltServer
KnowledgeBase
Other InfoQuilt Servers
Domain Knowledge
Indices
Text, Image, Audio, Video media repositories
Wrapper Wrapper Wrapper
What next What next (after comprehensive use of metadata)(after comprehensive use of metadata) ? ?
• Context, context, context• Semantic Proximity
– domain– context– modeling/abstraction/representation– state
• Characterizing Loss of Information incurred due to differences in vocabulary
BIG challenge: identifying relationship orsimilarity between objects of different media, developed and managed by different persons and systems
A Semantic TaxonomyA Semantic Taxonomy
Semantic Semantic ProximityProximity
SemanticSemanticResemblanceResemblance
SemanticSemanticRelevanceRelevance
SemanticSemanticRelationshipRelationship
SemanticSemanticEquivalenceEquivalence
SemanticSemanticIncompatibilityIncompatibility
Tools to support semantics
ontologiesontologies
profilesprofiles
contextcontext
domain-specific metadatadomain-specific metadata
Computing Communication
Information
Knowledge
Data
Decision
Connectivity and Data Access
Interoperability
Cooperation
Computing Communication
Information
Knowledge
Data
Decision
Connectivity
Interoperability
Cooperation
Interoperability in the ‘80s
System level interoperability like TCP/IP. Standard communication channels, data exchange formats, etc.
Basic infrastructural work for higher level interoperability.
HTTP, IIOP, TCP/IPHTTP, IIOP, TCP/IP
Computing Communication
Information
Knowledge
Data
Decision
Connectivity
Interoperability
Cooperation
Interoperability in the ‘90s
Information level interoperability. Standards evolve that go beyond connectivity and define information standards. Systems start exchanging metadata (MCF,RDF,..).
Business Objects, CORBA, DCOM, EDI
Computing Communication
Information
Knowledge
Data
Connectivity
Interoperability
Cooperation
Where we are headed
Semantic interoperability where systems share ontologies and knowledge.
Systems and human can cooperate in decision making and can generate new knowledge as a collective entity.
KNOWLEDGE
Cooperative Information SystemsCooperative Information Systems
Collective exploitation of complementary
technologies
InformationManagement
Coordination» Schedulin
g» Workflow
Collaboration» Video
Conferencing
» Whiteboarding
» Application sharing
Computing Communication
Information Interoperablity
Knowledge
Data
InfocosmInfocosm
Cooperating Information Systems
Summary Summary
• We have addressed many data level (schematic, representational,…) issues so far
• We are in a good position to solve additional issues using metadata level; need to support domain-specific metadata and “media-independent” information requests, qualified by use of ontologies
• some challenges remain: e.g., consistency of metadata
Agenda for ResearchAgenda for Research
• Interoperation not at systems level, but at informational and possibly knowledge level– traditional database and information retrieval
solutions do not suffice– need to understand context; measures of similarities
• Need to increase impetus on semantic level issues involving terminological and contextual differences, possible perceptual or cognitive differences in future– information systems and humans need to cooperate,
possible involving a coordination and collaborative processes
http://lsdis.cs.uga.eduhttp://lsdis.cs.uga.edu[See publications on Metadata, Semantics, InfoHarness/InfoQuilt][See publications on Metadata, Semantics, InfoHarness/InfoQuilt]
[email protected]@cs.uga.edu