stanford db seminar, october 20, 2000 web, semantics, oil and fuel: semantic interoperability and...

62
Stanford DB Seminar, October 20, 2000 Web, Semantics, OIL and FUEL: Semantic Interoperability and learning on the Web by Amit Sheth Director, Large-Scale Distributed Information Systems Lab. University of Georgia, Athens, GA USA http://lsdis.cs.uga.edu Founder/Chairman, Taalee, Inc. http://www.taalee.com Special thanks, Digital Library project team at LSDIS

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Stanford DB Seminar, October 20, 2000

Web, Semantics, OIL and FUEL: Semantic Interoperability and learning on the Web

byAmit Sheth

Director, Large-Scale Distributed Information Systems Lab.

University of Georgia, Athens, GA USA

http://lsdis.cs.uga.edu

Founder/Chairman, Taalee, Inc.

http://www.taalee.com

Special thanks, Digital Library project team at LSDIS

Semantics: “meaning or relationship of meanings, or

relating to meaning …” (Webster), meaning and use of data

(Information System)

Semantic Web: “The Web of data (and connections) with

meaning in the sense that a computer program can learn

enough about what the data means to process it. . . .

. . . Imagine what computers can understand when there is

a vast tangle of interconnected terms and data that can

automatically be followed.” (Tim Berners-Lee, Weaving the Web, 1999)

• “A Web in which machine reasoning will be ubiquitous and devastatingly powerful.”

• “A place where the whim of a human being and the reasoning of a machine coexist in an ideal, powerful

mixture.”

• “A semantic Web would permit more accurate and efficient Web searches, which are among the most important Web-based activities.”

— A personal definition Semantic Web: The concept that Web-accessible content can be organized semantically, rather than though syntactic and structural methods.

• Markups/Standards: DAML: Semantic Annotations and Directory; DSML: Directory(of course, XML, RDF, namespaces)

• Commercialization 1 (Oingo): Taxonomy – Ontology and Semantic Techniques

• Commercialization 2 (Taalee): Knowledge-base (Taxonomy, Domain Modeling, Entities and Relationships) and Semantic Techniques

• Research (Digital Earth at UGA): Complex Relationships and “deep semantics”

1. Create an Agent Mark-Up Language (DAML) built upon XML that allows users to provide machine-readable semantic annotations for specific communities of interest.

2. Create tools that embed DAML markup on to web pages and other information sources in a manner that is transparent and beneficial to the users.

3. Use these tools to build up, instantiate, operate, and test sets of agent-based programs that markup and use DAML.

4. 5. 6. ….applications

allow semantic interoperability at the level we currently have syntactic interoperability in XML

DARPA Agent Mark Up Language (DAML)Program Manager: Professor James Hendler  http://dtsn.darpa.mil/iso/programtemp.asp?mode=347

<Title> DAML

<subtitle> an Example </subtitle> </title>

<USE-ONTOLOGY ID=”PPT-ontology" VERSION="1.0" PREFIX=”PP" URL= "http://iwp.darpa.mil/ppt..html">

<CATEGORY NAME=”pp.presentation” FOR="http://iwp.darpa.mil/jhendler/agents.html">

<RELATION-VALUE POS1 = “Agents” POS2 = “/madhan”>

<ONTOLOGY ID=”powerpoint-ontology" VERSION="1.0" DESCRIPTION=”formal model for powerpoint presentations">

<DEF-CATEGORY NAME=”Title" ISA=”Pres-Feature" > <DEF-CATEGORY NAME=”Subtitle" ISA=”Pres-Feature" >

<DEF-RELATION NAME=”title-of" SHORT="was written by"> <DEF-ARG POS=1 TYPE=”presentation"> <DEF-ARG POS=2 TYPE=”presenter" >

Source : http://www.darpa.mil/iso/DAML/

Objects in the web can be marked- in principle - (manually or automatically) to include the following information

• Descriptions of data they contain (DBs)

• Descriptions of functions they provide (Code)

• Descriptions of data they can provide (Sensors)

Example of searching on DAML-centric semantic WebExample of searching on DAML-centric semantic Web

Sou

rce

: ht

tp:/

/ww

w.z

dne

t.co

m/p

cwee

k/st

orie

s/ju

mps

/0,4

270,

2432

946

,00.

htm

l

Value of Information

Directory; Structure; Table of Contents

Tar

get

ing

Search; Syntax; Index

Semantics results in deep understanding of content, resulting in more relevant and timely match with the

information needs and targeting.

Semantics; Entity+Rel+Events;Meaning with Context

• Oingo Ontology – ODP based(?), the database of millions of concepts and relationships that powers Oingo's semantic technology

• Oingo Seek - the database of millions of concepts and relationships that powers Oingo's semantic technology

• Oingo Sense - the knowledge extraction tool that uncovers the essential meaning of information by sensing concepts and context

• Oingo Lingua - the language of meaning used to state intent. The basis for intelligent interaction

• Assets catalogued are Web sites or Web pages.

Broad taxonomy,Shallow understanding and results

After 3 or 4 clicks

Taalee WorldModelTM: Domain Models (metadata of domain-media-business attributes, types), Ontologies, Entities, Relationships, Automated “Experts”, Reference Data (Live Encyclopedia), Mappings

Taalee Distributed Intelligent Agent Infrastructure:push/pull/scheduled agents for fresh extraction

Taalee Metabase of A/V assets

Taalee Semantic EngineTM with contextual reasoning

Taalee Semantic Engine

WorldModelTM

Extractor Agents

WorldModel: Understanding of content, profiles, targeting needs

Automatic Extraction Agents: Expert driven value addition

Metabase

Metabase: Rapidly growing A/V aggregation

SemanticPersonalization

Semantic Cataloging

SemanticSearch

SemanticTargeting

SemanticDirectory

Semantic CategorIzation

  Virage Search on football touchdown

Jimmy Smith Interview Part SevenJimmy Smith explains his philosophy on showboating. URL: http://cbs.sportsline...

Brian Griese Interview Part FourBrian Griese talks about the first touchdown he ever threw. URL: http://cbs.sportsline...

Metadata from Typical Cataloging of Football

Assets

   

Taalee Metadata on Football Assets

Rich Media Reference Page

Baltimore 31, Pit 24

http://www.nfl.com

Quandry Ismail and Tony Banks hook up for their third long touchdown, this time on a 76-yarder to extend the Raven’s lead to 31-24 in the third quarter.

ProfessionalRavens, SteelersBal 31, Pit 24Quandry Ismail, Tony BanksTouchdownNFL.com2/02/2000

League:Teams:Score:

Players:Event:

Produced by:Posted date:

Wh

at e

lse

can

a c

on

text

do

?(a

co

mm

erci

al p

ersp

ecti

ve)

Sem

anti

c E

nri

chm

ent

Simply the most precise and freshest A/V search

Context and Domain Specific Attributes Uniform Metadata for Content from Multiple Sources, Can be sorted by any field

Delightful, relevant information,exceptional targeting opportunity

Cre

atin

g a

Web

of

rela

ted

info

rmat

ion

Wh

at c

an a

co

nte

xt d

o?

System recognizes ENTITY & CATEGORY

Relevant portionof the Directory is automatically presented.

Users can exploreSemantically related

Information.

Looking aheadLooking ahead

TO:

Information requests

Content search

Semantic retrieval

Interpretation

Knowledge creation

Knowledge sharing

FROM:

Browsing

Lexical search

Data exchange

Data retrieval

MermaidMermaidDDTSDDTS

Multibase, MRDSM, ADDS, Multibase, MRDSM, ADDS, IISS, Omnibase, ...IISS, Omnibase, ...

Generation IGeneration I

1980s1980s

Evolving targets and approaches in integratingdata and information (a personal perspective)

DL-II/DARPA/KA2 projects,DL-II/DARPA/KA2 projects,OntoBroker, …OntoBroker, …

Taalee, ObserverTaalee, ObserverADEPT, InfoQuiltADEPT, InfoQuilt

Generation IIIGeneration III

1997...1997...

InfoSleuth, KMed, DL-I projectsInfoSleuth, KMed, DL-I projectsInfoscopes, HERMES, SIMS, Infoscopes, HERMES, SIMS,

Garlic,TSIMMIS,Harvest, RUFUS,...Garlic,TSIMMIS,Harvest, RUFUS,...

Generation IIGeneration II

1990s1990s

VisualHarnessVisualHarnessInfoHarnessInfoHarness

Terminology (and language) transparency

Domain modeling (entities with domain specific

attributes) and complex relationships

Comprehensive metadata management

Context-sensitive information processing

Semantic correlation

enablers of the emerging concepts

Digital Earth Prototype System at UGA

Develop a Digital Earth Modeling SystemDigital Earth Modeling System

Answer requests for collection ofinformation from distributed resources

Develop a supportive learning environment for undergraduate geography students

A Digital Library Scenario VOLCANOES ACTIVITY

Some volcanoes are more active than others, and a few

are in a state of permanent eruption, at least for the

geological present. Volcanoes may become quiescent

(dormant) for months or years. The danger to life posed by

active volcanoes is not limited to eruption of molten rock or

showers of ash and cinders.

Mudflows that melt ice and

snow on the volcano's flanks

are equally hazardous*.

* Encarta® 98 Desk Encyclopedia © & 1996-97 Microsoft Corporation.All rights reserved. Pu'u'O'o, Hawaii

A sample information request:

Find information on volcanoesvolcanoes in St. HelensSt. Helens and how

they affectaffect the environmentenvironment.

Some of the ontologies involved in processing this information request are:

• Ontology for GIS Datasets;

• Ontology for Natural Disasters;

• Ontology for Volcanoes;

• Ontology for Environment;

TRY HERE THIS AND OTHER CONCEPT DEMOS

A Digital Library Scenario VOLCANOES ACTIVITY

““An iscape is an information request that An iscape is an information request that

supports learning and semantic supports learning and semantic

interoperability (about Digital Earth) “interoperability (about Digital Earth) “

(ADEPT at UGA)(ADEPT at UGA)

Iscape working definition

Iscapes are useful to understand geographical phenomena, typically involving relationshipsbetween them

Iscapes are created by instructors usingan iscape specification framework

Iscapes are run by students while learningabout Digital Earth

Iscapes creation framework fits in theADEPT agent -based architecture prototype

Iscapes in the context of digital earth (ADEPT)

Iscape specification framework

InformationLandscape

Ontologies

Relationships

Learning/What-if

Operations/Simulation

Presentation

Creation

Information Landscapes

A modular specification framework to represent information landscapes Specifications of complex information requests

over multiple ontologiesmultiple ontologies

Specification of relationships, relationships, including “affects”including “affects”

Enabling user-configurable parametersparameters

Enabling operationsoperations including simulations simulations

A graphical toolkit for easy creation of iscapes

Information Landscapes

Learning paradigm for students Uses embedded ontological terms and iscapes

Metadata framework Models spatial, temporal and theme based

metadata

Uses FGDC and Dublin Core standards to represent domain independent metadata

Relations

Given a set X, a relation is some property that

may or may not hold between one member of

X and a member of another set

Various relationships:

“equals”, “less_than”, “is_a”, “is_part_of”, “like”

Semantic Relations

Most of these relations are hierarchical or similarity based

These are not powerful enough for our task of semantic interoperability between domains like Geography

In these domains, we have a natural “affects” relation between the ontologies

Semantic Relations

How does A affect B?

A, in its entirety or by a set of its components, induces some changes or properties on a set of components of B

Design of “affects”

How do volcanoes affect the environment?

VOLCANO

LOCATIONASH RAIN

PYROCLASTICFLOW

ENVIRON.

LOCATION

PEOPLE

ATMOSPHERE

PLANT

BUILDING

DESTROYS

COOLS

DESTROYS

KILLS

[Area (Pyroclastic Flow) INTERSECT Area (Plant)]

=> [Pyroclastic Flow destroys Plant]

[Size (Ash Particles) < 2] => [Ash Rain cools Atmosphere]

[Pyroclastic Flow destroys Plant] and [Ash Rain cools Atmosphere]

=>[Volcano affects Environment]

(x | xASC) and (y | yBSC)[ FN(x) operator FN(y) ]* => [ ASC relation BSC ]

[ ASC relation BSC ]* => A affects B

Design of “affects”

Mapping Functions

How do volcanoes affect the environment?

[ Location (Volcano) = Location (Environment) ]

Enclosing function provides a standard interface to the operator

Operator does imprecise or fuzzy match

Achieves Geo-spatial interoperability

Mapping Functions

How do volcanoes affect the environment?

[ Time (Volcano) = Time (Environment) ]

Matches, with a tolerance depending on the granularity of values

Tolerance different for different entities; Specified default; Can be user-defined

Achieves temporal interoperability

Operations

Powerful mechanism of studying geographical domains and other complex phenomena Input parameters can be changed to support learning For e.g. statistical operations, numerical analysis simulation modeling, etc.

Clarke’s Urban Growth Model (UGM)

Demonstrates the utility of integrating existing historic maps

with remotely sensed data and related geographic information

to dynamically map urban land characteristics for large

metropolitan areas.

San Francisco Bay Area prediction of urban extent in 2100

Domain of Learning – URBAN DYNAMICS

Digital Earth Prototype: run-time architecture overview

RELATE

CorrelationAgent

PlanningAgent

User Agent

WrappedResource

Agent

OntologyAgent

Broker

CostModel

Web Wrapper

SimulationDatabaseWrapper

ADEPTMetabase

MetabaseResource

Agent

SimulationResource

Agent

RELATE

CorrelationAgent

PlanningAgent

User Agent

WrappedResource

Agent

OntologyAgent

Broker

CostModel

Web Wrapper

SimulationDatabaseWrapper

ADEPTMetabase

MetabaseResource

Agent

SimulationResource

Agent

Semantic Web: Possible Evolution

HTML XML

XHTML SMIL RDF

Declarative Languages

DAML-O, OIL

FUEL – User defined/supplied operators, functions, computations

OIL,FUEL

FUEL as OIL Extension?

• class-def• subclass-of• slot-def• subslot-of• domain• range

• class-def• subclass-of• slot-def• subslot-of• domain• range

• class-expressions

• AND, OR, NOT

• slot-constraints

• has-value, value-type• cardinality

• slot-properties• trans, symm

• class-expressions

• AND, OR, NOT

• slot-constraints

• has-value, value-type• cardinality

• slot-properties• trans, symm

RDF(S) FUEL • Framework for mapping data/formats• user defined operators eg., affects, simulations

• Framework for mapping data/formats• user defined operators eg., affects, simulations

OIL

Semantic Web can be a basis of handling information

overload and provide semantic interoperability

Step wise enrichment -- starting with constrained and

well understood language (such as based on Description

Logic), let us explore how we can support richer/deeper

semantics for enabling complex decision making and

learning involving heterogeneous digital media on the

Global Information Infrastructure

The Promise of the Web with Semantics….

“Humankind has not woven the web of life.We are but one thread within it.Whatever we do to the web, we do to ourselves.All things connect.”– Chief Seattle, 1854

[email protected] – http://[email protected] – http://lsdis.cs.uga.edu

Further reading http://www.semanticweb.org http://www.daml.org http://lsdis.cs.uga.edu/~adept “DAML could take search to a new level” http://www.zdnet.com/pcweek/stories/news/0,4153,2432538,00.html V. Kashyap and A. Sheth, Information Brokering, Kluwer Academic Publishers, 2000

Tim Berners-Lee, Weaving the Web, Harper, 1999.

Editorial writing by Ramesh Jain in IEEE Multimedia. Gio’s papers. OIL ….

For additional details on Information Brokering Architecture:Realizing Semantic Information Brokering and Semantic Web  ITC-IRST/University of Trento Seminar Series on  Perspectives on Agents: Theories and Technologies,  April, 27, 2000, Trento, Italy http://lsdis.cs.uga.edu/~adept/presenta.html

For additional details on ISCAPE specification and Execution:Project Overview and Detailed Presentation at:

http://lsdis.cs.uga.edu/~adept/presenta.html

Demonstrations at: http://lsdis.cs.uga.edu/~adept

<! -- A template collection for all iscapes -- >

<?xml version = “1.0” ?>

<!DOCYPE IscapeCollection SYSTEM “IscapeCollection.dtd” >

<! -- All Iscapes -- >

<IscapeCollection>

<!-- An iscape specification for how stratovolcanoes affect the environment -- >

<Iscape>

< -- Identifying this iscape -- >

<Name> How do stratovolcanoes affect the environment </Name>

<Description> An iscape using the affects relationship </Description>

<! – All ontologies which participate -- >

<Ontologies>

<Ontology>Volcano</Ontology>

<Ontology>Environment</Ontology>

</Ontologies>

<! – Operations involved -- >

<Operation>

<Relation>Affects</Relation>

</Operation>

Iscape specification using XML

Iscape specification using XML <!— Constraints on ontologies -- >

<Ontological Constraints>

<Constraint> Volcano morphology is stratovolcano </Constraint>

<Constraint> Volcano start year is 1950 </Constraint>

</Ontological Constraints>

<!—Metadata to present in the result -->

<Presentation> Volcano and Environment Metadata </Presentation>

<!—What can the student configure -- >

<Student>

<Config> Location of Environment </Config>

</Student>

</Iscape>

<!—This Iscape Ends -- >

<! – Next Iscape starts -- >

<Iscape>

</Iscape>

</IscapeCollection>

<!—Iscape Collection ends here -- >

Relations <!-- Template collection of all relations in the system -->

<?xml version = “1.0” >

<!DOCTYPE Relations SYSTEM “Relations.dtd” >

<Relations>

<!--Relation specification starts here -->

<Relation>

<!-- Information to correlate with base iscape -->

<Name> Affects </Name>

<!-- Ontologies Involved -->

<OntologyA> Volcano </OntologyA>

<OntologyB> Environment </OntologyB>

<!-- All operators -->

<OperatorSet>

<!-- Specification has value and mapping conditions -->

<ValueCondition>

<OntologyName> Environment </OntologyName>

<Attribute> Damage </Attribute>

<ValOperator> GREATERTHANEQUALS</ValOperator>

<Value> 10000 </Value>

<Type> Integer </Type>

</ValueCondition>

Relations

<MappingCondition>

<FunctionA>Area</FunctionA>

<ElementA>Volcano</FunctionA>

<Operator>EQUALS</Operator>

<FunctionB>Area</Function>

<ElementB>Environment</ElementB>

</MappingCondition>

</OperatorSet>

<!-- End of all operators -- >

</Relation>

<!-- End of this relation specification -- >

</Relations>

<!-- End of relation collection -- >

Ontological Constraints <!-- Template to specify ontological constraints -- >

<?xml version = “1.0” >

<!DOCTYPE OntologicalConstraints SYSTEM “OntologicalConstraints.dtd” >

<!-- A collection of ontological constraints for all iscapes -- >

<OntologicalConstraints>

< -- A constraint on this iscape-->

<Constraint>

<IscapeID>Volcano-Env</IscapeID>

<Name>Volcano morphology is stratovolcano</Name>

<LHSOntology>Volcano</LHSOntology>

<LHSAttribute>Morphology</LHSAttribute>

<Operator>LIKE</Operator>

<Type>String</Type>

<RHSValue>Stratovolcano</RHSValue>

</Constraint>

</OntologicalConstraints>

<! -- Collection of ontological constraints ends here -- >

Presentation <!-- Template for presentation attributes - ><?xml version = “1.0” ><!DOCTYPE Presentation SYSTEM “Presentation.dtd” ><!-- All presentation attributes are embedded here - ><Presentation> <!-- presentation attributes for this iscape-- ><IncludeThese>

<IscapeID>Volcano-Env</IscapeID><Name>Volcano and Environment Metadata</Name><Include>

<Ontology>Volcano</Ontology><Attribute>TectonicSetting</Attribute>

</Include><Include>

<Ontology>Volcano</Ontology><Attribute>EndYear</Attribute>

</Include></IncludeThese></Presentation><!-- Presentation attributes end here -- >

Student < !-- Template for student configurable attributes -- ><! DOCTYPE Student SYSTEM “Student.dtd” ><!-- All parameters which can be configured by a student -- ><UserConfigurable><!-- Configuration for a particular iscape -- ><Config>

<!-- Correlating information -- ><Name>Location of environment</Name><!-- The parameters which are configurable -- ><Parameter>

<Ontology>Environment</Ontology><Attribute>LocationName</Attribute><DisplayName>Configure Location</Display><Value>Hawaii</Value><Value>Kileauaea</Value>

</Parameter></Config><!-- Configuration for this iscape ends here -- ></UserConfigurable><!-- End of all student configurable parameters -- >

Student interface

Results

Receives the results collections from each of the resource agents

Correlates the results on basis of information provided in iscape and the query plan generated by planning agent

Performs data cleaning operations and merges the results into uniform result set and pass it on to user agent

Responsible for performing operations, if specified in the iscape

The correlation agent

Realizing Semantic Information Brokeringand Semantic Web in summary

TextTextStructured DatabasesStructured Databases DataData Syntax,Syntax,

SystemSystem Federated DBFederated DB

Semi-structuredSemi-structured MetadataMetadata Structural,Structural,SchematicSchematic

Mediator,Mediator,Federated ISFederated IS

Visual,Visual,Scientific/Eng.Scientific/Eng. KnowledgeKnowledge SemanticSemantic

Knowledge Mgmt.,Knowledge Mgmt.,InformationInformationBrokering/Brokering/Mediator,Mediator,

Cooperative ISCooperative IS

Popular Alternative perspective/approach: Linguistics, IR, AI

Graduate students in a College of Geography have a final

project in which a case of study is proposed. In the case,

they are supposed to help a City Council in making

decisions over the planning of a new landfill. This is a

hands-on learning exercise through the interaction

with a Digital EarthDigital Earth and the starting

point would be to find the best

location for the landfill*.

Tacoma Landfill

* This scenario comes in support of one of the suggestions for

Digital Earth scenarios sampled by the “First Inter-Agency Digital

Earth Working Group, an effort on behalf of NASA’s inter-agency

Digital Earth Program.

Taking advantage of the Web for learning

bydefinition

bysemantics by

synonymy A first cut refinement leads us to the following information request:

FindFind a proper soil in sites not subject to flooding or high a proper soil in sites not subject to flooding or high

groundwater levelsgroundwater levels for a new landfill nearnear thethe industrial zone industrial zone.

Liquefaction phenomenon cannot occurLiquefaction phenomenon cannot occur.

Find a landfill sitelandfill site for a new landfill near the source of the wastessource of the wastes.

The earthquakes’ impacts must be evaluatedThe earthquakes’ impacts must be evaluated..

A high level information request would be:

An example scenario of learning on the Web

Adding on-the-fly user constraints while processing the information request:

Retrieve satellite images in 12-meter resolution or higher,Retrieve satellite images in 12-meter resolution or higher,

looking for soils with permeability rate < 10 looking for soils with permeability rate < 10 (silty clay loam)

for a new landfill

whose distance from the city industrial park is less than 5km.whose distance from the city industrial park is less than 5km.

Using the images’ coordinates, forecast seismic activity up to Using the images’ coordinates, forecast seismic activity up to

moderate magnitude moderate magnitude (5 - 5.9, Richter scale) in the pointed areas. in the pointed areas.

domain specific metadata; correlation among multiple ontologies; return results in multiple media (in this case, images and a simulation)

An example scenario of learning on the Web

Partial sample ontologies for semantic information brokering:

LANDUSE

COMERCIAL

INDUSTRIAL

RURAL

RESIDENTIAL

AGRICULTURAL

MILITARYRECREATIONAL

LAND(SITE)

CULTIVATEDAREA

GREENLANDAREA LAND

BANK

ZONING

LANDFILLSITE

WASTEDISPOSAL

RECYCLING

HAZARDOUS

LANDFILLRESOURCE REC.

SOLID SEWAGE

shredding

magneticseparation

screening

washing

NATURALDISASTER

EARTHQUAKE

causes

LANDSLIDE

VOLCANO

STORMFLOOD

FIRE

AVALANCHE

TSUNAMI

causes

causes

causes

An example scenario of learning on the Web

A sample result (depending on information providers) could be:

OrbView-4’s stereo imaging capacity providing 3-D terrain images

Hyperspectral data will be valuable for identifying material types

images source: http://www.orbimage.com

5km

industrial zone

identified landfill site

The students now have the information requested for

helping the City Council in the planning of the new landfill

An example scenario of learning on the Web