april 2002 gio artint 1 information interoperation versus integration gio wiederhold stanford...

62
April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002 www-db.stanford.edu/people/gio.html Thanks to Jan Jannink, Prasenjit Mitra, Stefan Decker.

Post on 22-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 1

Information

Interoperation versus Integration

Gio Wiederhold

Stanford University

April 2002

www-db.stanford.edu/people/gio.html

Thanks to Jan Jannink, Prasenjit Mitra, Stefan Decker.

Page 2: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 2

Layout

• Objectives and definitions 3 – 8– information, knowledge, actions

• Integration, articulation, interoperation 9 -- 13• Semantic problems 14 – 20• Effect of the Internet 21 – 26• Precision 27 -- 31• SKC project: 31 -- 52

– Ontology algebra & tools

• Summary 53 -- 59

Page 3: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 3

Today's sources and needs

Many resources• computing

– databases– analytical

software– simulations

• people– organizations– domains

semantics

Many objectives

• operations– reports– routine actions– optimization

• decision-making– crisis response– allocations– conflict resolution

Require information

Page 4: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 4

Objective of Information Systems

• Initiate Optimal Actions, i.e., that are – highest in delivering benefits for the costs and risks

Effective -- achieve the goal

Satisfactory -- move towards the goal

Feasible -- valid and possible

Optimality requires perfect -- complete and error-free --

data and knowledge. Not always attainable

The best is often the enemy of the good

Page 5: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 5

Sources of Information

• Recall of what you have forgotten

• Information from external sources

• news• papers• books

• Induction from Data – Experience

• Processing of Data – Statistics • Combining Information that was previously disjoint

Processing by intermediaries

Page 6: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 6

Data + Knowledge Information

ActionAction

Knowledge LoopKnowledge Loop

= rules = rules processprocess

ExperienceExperience

Data LoopData Loop

= facts= factsStorageStorage

Education

Education SelectionSelection

IntegrationIntegration

AbstractionAbstraction

Decision-makingDecision-making

State changesState changes

RecordingRecording

ProjectionProjection

Info

rmat

ion

Info

rmat

ionAssim

ilation

Assimilatio

n

Page 7: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 7

4 Information Technology Tasks

Actionable Information is created at the confluence

of data -- the state &

knowledge -- the ability toselect, integrate, abstract

and project the stateinto the future

Selection: SQL Selection: SQL

one verb languageone verb languageIntegration: Middleware Integration: Middleware

Mediation . Mediation .

Abstraction: VisualizationAbstraction: VisualizationSummarization Summarization

Projection: Simulation Projection: Simulation Spreadsheets Spreadsheets

Mediation * Mediation *

* Focus of this talk

Page 8: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 8

Encoded Knowledge

Lexicons: collect terms used in information systems

Taxonomies: categorize, abstract, classify terms

Schemas of databases: attributes, ranges, constraints

Data dictionaries: systems with multiple files, owners

Object libraries: grouped attributes, inherit., methods

Symbol tables: terms bound to implemented programs

Domain object models: (XML DTD): interchange terms

Ontologies: networks of terms and relationships

. . . Metadata formalizes Knowledge

Page 9: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 9

Combining data the obvious way

Application

Data A

MetaData A

Data B

MetaData B

Data AB

MetaData AB }Integrate

1. Integration:

Create a large database,that incorporates all

material in the sources

2. Operation:

Access the integrated dataUpdate the integrated data

when sources change

Page 10: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 10

or linkages to the data

Data A

MetaData A

Data B

MetaData B

Application

AB

}

Articulate

Rules for

AB

1. Articulation:

Create Links to needed Dataperhaps with rules

2. Interoperation:

Exploit the linkages and rules

Inter- operation

Page 11: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 11

Benefits and problems

Integration– Long time to establish common metadata more+ Easy for applications, everything is there– Massive replication

– Difficult to maintain as sources are updated

Articulation+ Metadata defined by application need more

+ Rapid implementation and modification– Interoperation via indirect access

+ Mitigated by caching– Many articulations

– Up to one per application+ Affected by selected source changes

Page 12: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 12

Alternatives for Applications

Data A

MetaData A

Data B

MetaData B

Application

AB

}

Articulate

Rules for

AB

Application

Data A

MetaData A

Data B

MetaData B

Data AB

MetaData AB }

Integrate

Operate&Maintain

Inter-Operate

Page 13: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 13

Central Solutions do not Scale

What workswith 7 modules(Apps and DBs)and one person in charge

fails when we have 100

and need a committee

Any changes in sources affect the integrated module

Page 14: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 14

Semantic Heterogeneity:

A Major Cause of Problems:

Integration / Interoperation joins domains

Domains have their own terminologies Need autonomy to deal with knowledge growth

The usage of terms in a domain is efficient Appropriate granularity

Mechanic working on a truck vs. logistics manager Shorthand notations

PS (power steering) vs. PS (partial shipment)

Functions differ in scope Payroll versus Personnel

getting paid vs. available (includes contract staff)

Page 15: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 15

Why now semantic heterogeneity

Heterogeneity has been addressed

• Platform and Operating Systems

• Representation and Access Conventions • Naming and Ontology

Whenever interoperation involves distinct domains mismatch ensues

• Autonomy conflicts with consistency, – Local Needs have Priority,– Outside uses are a Byproduct

Page 16: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 16

Domains and Consistency .

• a domain will contain many objects• the object configuration is consistent• within a domain all terms are consistent &• relationships among objects are consistent

• context is implicit

No committee is needed to forge compromises * within a domain

Compromises hide valuable details

Domain Ontology

Page 17: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 17

SKC grounded definition .

• Ontology: a set of terms and their relationships• Term: a reference to real-world and abstract objects• Relationship: a named and typed set of links between

objects• Reference: a label that names objects• Abstract object: a concept which refers to other objects• Real-world object: an entity instance with a physical

manifestation

Page 18: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 18

Examples of Semantic Mismatches

Information comes from many autonomous sources• Differing viewpoints ( by source )

– differing terms for similar items { lorry, truck }

– same terms for dissimilar items trunk( luggage, car)

– differing coverage vehicles ( DMV, AIA )

– differing granularity trucks ( shipper, manuf. )

– different scope student ( museum fee, Stanford )

• Hinders use of information from disjoint sources – missed linkages loss of information, opportunities– irrelevant linkages overload on user or application program

• Poor precision when merged

Still ok for web browsing , poor for business & science

Page 19: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 19

Structural Semantic Heterogeneity

Same concept?If incorporated in differing structures?

Page 20: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 20

Effect on Information Integration

• Big integrated systems ?Big integrated systems ?– cost, delaycost, delay– obsolescenceobsolescence– inflexibilityinflexibility– unmaintainableunmaintainable

• Composed systems ?Composed systems ?– no or limited controlno or limited control– operational inefficiencyoperational inefficiency– understanding heterogeneityunderstanding heterogeneity– distributed maintenance distributed maintenance

Page 21: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 21

Integration up-to-dateness

Frequency of source change

frequency of visits Feb.2000

F(capability given 2.2M public sites with 288M pages )

never

1/second

1/minute

1/hour

1/day

1/week

1/month

1/year

as often as possible0 1 ?

100%

50%

0%

%tageup-to-date F(user need)

effort, methods

Page 22: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 22

Web capabilities: more domains

• World-wide connectivity– internet– world-wide web

• More databases– public & corporate

• Faster communication– digital– packeting: TCP-IP, ATM

• Disintermediation– ubiquitous publishing

Information overload Data starvation

Page 23: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 23

Global consistency: Hope, but . .

Common semantic assumptions in assembling and integrating distributed information resources

• The language used by the resources is the same

• Sub languages used by the resources are subsets of a globally consistent language

These assumptions are provably false

Working towards the goal of globally consistency is

1. naïve -- the goal cannot be achieved

2. inefficient -- languages are efficient in local contexts

3. unmaintainable – terminology evolves with progress

Page 24: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 24

Knowledge from the Web

• Knowledge about the data sources– where is what– contents - semantics– scope of contents

• Knowledge to apply the data sources– mapping to application terms– mapping to application scope– integration from multiple sources

Page 25: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 25

Availability of Knowledge

• XML based schemasDTDs, extended semantics

a very small fraction now

• Embedded or related knowledgetext, dictionaries, context ontologies, DB schemas

the majority of sources

must exploit both flexibly

in order to make progress with KB-processing

costly to develop

Page 26: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 26

Search engines need knowledge

Yahoo humans catalog and organize useful web sites into hierarchies with crosslinks. About 300 topic experts in 2001. Scalable?demoz public service classification with 3000 editors. Quality?

AltaVista worms (surfs) and indexes all of the web. Frequency/volatility?

Excite also tracks queries and classifies customers. 1D, privacy?

Firefly provides customer control over their profiles. Ease/effective?

Junglee integrates diverse wrapped sources, compares Semantics?

Alexa collects webpages and their usage, keeps old pages Massive, suggests pages with similar access reference pattern. Value?

Google Pagerank gives the reference importance of web pages, using weighted votes, iterated closure. Find new stuff?

and others, used by most of the e-commerce sites .

Page 27: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 27

Problems for integrating the web

• Unsuitable source representations• part classification: HTML --- XML . . . . . . . . . .• print formats: postscript, adobe PDF . . . . . . .

• non-text: images, video (more ), • sound [Audio Mining (Dragon)], . . . . . . . . .

• hidden in databases behind CGI scripts . . .

• Inconsistent semantics • context distinct / scope / view . . . . . . . . . . . . .

• Naïve modeling of customers/apps• personalization, versus context, roles & growth . . . . . . . . . . . . . . . . . . . . . . . . .

Search engines, trying to deal with all of the web, cannot solve the problems precisely

Being improved.

Rate?

Progress

Page 28: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 28

Historical note

Ich weiss nicht was soll es bedeuten, ...

• -- an early complaint about semantics [Heinrich Heine: Die Lorelei]

Page 29: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 29

Precision of Language

• Our languages must serve our needs– Efficiency of expression: minimal symbols

carpenter versus householder domain

– Commitment surgeon versus pathologist

– Locality provides context geographical locality: downtown irrelevant on the webconceptual locality: 5 pounds remains important

Business requires precision, including that terms used map consistently to instances

knowledge controls use of data

Page 30: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 30

Need for precision

adapted from Warren Powell, Princeton Un.

More precision is needed as data volume increases--- a small error rate still leads to too many errors False Positives have to be investigated ( attractive-looking supplier - makes toys apparent drug-target with poor annotation )

False Negatives cause lost opportunities, suboptimal to some degree

False positives = poor precision typically cost more thanfalse negatives = poor recall

information quantity

human lim

it

acceptable limit

hum

an w

ith to

ols?

data

err

ors

Information Wall

Page 31: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 31

Relationships among search parameters

volume retrie

ved

volume available

100%

50%

0%space of methods

% tage actually relevant

perfect recall

v.relevantp= v.retrieved

v.relevantr = v.availablerecall

perfect precision

precision

Page 32: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 32

Large scale demands articulation

• Many source ontologies

• Diverse application domains– with their ontologies

Articulation = application-specific intersection of sources

– keeps and exploits connection to sources

– many articulations are possible

– articulation results are small ( not )

– reliable articulations require consistent sources

– modest amount of knowledge needed

SKC project:Scalable

KnowledgeComposition

Page 33: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 33

Sample Intersections .

Shoe Store• Shoes { . . . }• Customers { . . . }• Employees { . . . }

size = size color =table(colcode)

style = style

Ana-tomy {. . . }

• Material inventory {...}• Employees { . . . }• Machinery { . . . }• Processes { . . . }• Shoes { . . . }

Shoe Factory

Hard-ware

Articulation ontology

matching rules :

foot = foot Employees Employees Nail (toe, foot) Nail (fastener). . . . . .

Department Store

Page 34: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 34

System setting: mediation

data and simulation resources

value-added services

decision-making support applicationsApplication Layer

Mediation Layer

Foundation Layer

Page 35: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 35

Sample Operation: INTERSECTION

Source Domain 1:Owned and maintained by Store

Result contains shared terms,useful for purchasing

Source Domain 2:Owned and maintainedby Factory

Articulation

Page 36: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 36

Sample Processing in HPKB

• What is the most recent year an OPEC member nation was on the UN security council?– Related to DARPA HPKB

Challenge Problem– SKC resolves 3 Sources

CIA Factbook ‘96 (nation) OPEC (members, dates) UN (SC members, years)

– SKC obtains the Correct Answer

1996 (Indonesia)

– Other groups obtained more,

but factually wrong answers

– Problems resolved by SKC* Factbook has out of date

OPEC & UN SC lists Indonesia not listed Gabon (left OPEC 1994)

* different country names Gambia => The Gambia

* historical country names Yugoslavia

UN lists future security council members

Gabon 1999 intent of original question

Temporal variants

Page 37: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 37

Many ontologies Interoperation

Have devolved maintenance onto domain-specific experts / authorities

• Many articulations

But applications often span multiple domains

• Can't make domains bigger and keep them consistent

• Need an algebra to compute composed domains, – whose ontologies are limited to

the terms defined by their articulations

SKCSKCSKCSKC

Page 38: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 38

An Ontology Algebra

A knowledge-based algebra for ontologies

The Articulation Ontology (AO) consists of matching rules that link domain ontologies

Intersection create a subset ontology keep sharable entries

Union create a joint ontology merge entries

Difference create a distinct ontology remove shared entries

Page 39: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 39

Other Basic Operations

typically priorintersections

UNION: mergingentire ontologies

DIFFERENCE: materialfully under local control

Arti-culation ontology

Page 40: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 40

Features of an algebra

Operations can be composed

Operations can be rearranged

Alternate arrangements can be evaluated

Optimization is enabled

The record of past operations can be

kept and reused when sources change

Page 41: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 41

INTERSECTION support

Store Ontology

Articulation ontology

Matching rules that use terms from the 2 source domains

Factory Ontology

Terms usefulfor purchasing

Page 42: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 42

Knowledge CompositionArticulationknowledgefor U

U

U

(A B)U

(B C)U

(C E)

Knowledge resource

B

Knowledge resource

A

Knowledge resource

C Knowledge

resourceD

U

(C D)

U

(B C)

Articulationknowledge

Composed knowledge forapplications using A,B,C,E

Knowledge resource

E

U

(C E)

Legend:

U : unionU: intersection

Articulationknowledgefor (A B)

U

Page 43: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 43

Tools to create articulations

Combine ontology graphs with expert selection based on spelling, graph matching, and a nexus derived from a dictionary (O.E.D.)

Veh

icle

reg

istr

atio

n

on

tolo

gy

Veh

icle

sale

s o

nto

log

y

more

Page 44: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 44

1. Initialize by word matching

Graph matcherforArticulation- creatingExpert

Vehicle ontology

Transport ontology

Suggestionsfor articulations

Page 45: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 45

2. continue from initial point

Also suggest similar terms for further articulation:

• by spelling similarity,• by graph position• by term match repository

Expert response:1. Okay2. False3. Irrelevant to this articulation

All results are recorded

Okay’s are converted into articulation rules

Page 46: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 46

3. A Term Net to Propose Match Candidates

Term linkages automatically extracted from 1912 Webster’s dictionary *

* free, other sources .have been processed.

Based on processing headwords definitions using algebra primitives

Notice presence of 2 domains: chemistry, transport

Page 47: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 47

4. Using the term net

Page 48: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 48

5. Navigating the term net

Page 49: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 49

Primitive Operations

Unary• Summarize -- structure up• Glossarize - list terms• Filter - reduce instances• Extract - circumscription

Binary • Match - data corrobaration• Difference - distance

measure• Intersect - schem discovery• Blend - schema extension

Constructors• create object• create setConnectors• match object• match setEditors• insert value• edit value• move value• delete valueConverters• object - value• object indirection• reference indirection

Model and Instance

Page 50: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 50

Matching rules (in process)

• A equals U• A equals {u, … }• A equals to Union (U, V) • A equals to Union (U, v ...) • A equals to Intersection (U, W) • A equals to Intersection (U, w ...) • A equals to Difference (U, X) • A equals to Difference (U, x ...)

must obey algebraic

properties

Page 51: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 51

Future: exploiting the result

Processing & query evaluation is best performed within Source

Domains & by their engines

Result has linksto source

Avoid n2 problem of interpretermapping as stated by Swartout as an issue in HPKB year 1

Page 52: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 52

Innovation in SKC

• No need to harmonize full ontologies• Focus on what is critical for interoperation• Rules specific for articulation• Potentially many sets of articulation rules

• Maintenance is distributed– to n sources– to m articulation agents

is m < n2 , depends on architecture density a research question

Page 53: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 53

Application• Informal, pragmatic

• Client-control

• Use up to 72 mediators

Mediation• Formal, reliable service

• Domain-Expert control

• Use up to 72 sources

Integration at higher Levels

Page 54: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 54

Domain Specialization .

• Knowledge Acquisition (20% effort) &• Knowledge Maintenance (80% effort *)

to be performed• Domain specialists• Professional organizations• Field teams of modest size

Empowermentautonomouslymaintainable

* based on experience with software

Page 55: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 55

Summary

To sustain the growth of web usage1. The value of the results has to keep increasing

precision, relevance not volume2. Value is provided by mediating experts,

encoded as models of diverse resources, diverse customersProblems being addressed redundancy mismatches quality maintenance

Need to develop flexible tools for these tasks

} Clear models

Page 56: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 56

Integration Science ?

IntegrationScience

IntegrationScience

ArtificialIntelligence

knowledge mgmtlinguistic models

uncertainty

ArtificialIntelligence

knowledge mgmtlinguistic models

uncertainty

Systems Engineering

analysisdocumentation

costing

Systems Engineering

analysisdocumentation

costing

Databasesaccessstoragealgebras

scalability

Databasesaccessstoragealgebras

scalability

Page 57: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 57

E-commerce

• Gartner: 2000 prediction for 2004: 7.3 T$

• Revision:2001 prediction for 2004: 5.9 T$ drastic loss?

Growth and Perception

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...

Realist

ic growth

Perceived growth

Invisible growth

Extrapolated growth

Disap- pointment Combi-

natorial growth

Perceived initial growthPerception level

ExamplesArtificial IntelligenceDatabasesNeural networksE-commerce

50 companies, each after 20% of the market

Failures

Page 58: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 58

E-Commerce Trends 1998/1999/ ...

• Users of the Internet 40% 52% of U.S. population

• Growth of Net Sites (now 2.2M public sites with 288M pages)• Expected growth in E-commerce by Internet users [BW, 6 Sep.1999]

segment 1998 1999– books 7.2% 16.0%– music & video 6.3% 16.4%– toys 3.1% 10.3%– travel 2.6% 4.0%– tickets 1.4% 4.2%– Overall 8.0% 33.0% = $9.5Billion

• Will change all retail businesses, when? An unstainable trend cannot be sustained [Herbert Stein]

new services

98 99 00 01 02 03 04 0.3 1 3 9 27 81 **

90 80 70 60 50 40 30 20 10

0

Year / %

%

Centroid, in 1999 ~1% of total market

E-penetration Toys

Page 59: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 59

Stanford

Some Groups at Stanford University• KSL: Feigenbaum, Fikes et al.• Logic: Genesereth, Koller, McCarthy, Nielsen et al.• Industrial: Petrie et al.• Medical Informatics: Musen et al.• Infolab: Garcia-Molina, Manning, Ullman, Wiederhold

www-db.stanford.edu/people/gio.html

The configurations are not fixed,

projects specifically can include a variety of combinations

Page 60: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 60

Final Panel

• Agreement with Colette Rolland and Arne Solvberg– consider the research question as a Hypothesis (Hx)

note that the job is to Convert a hypo (less than) -thesis to a thesis

– justify the Hx in terms of a real-world problem

• Computer Science is broadening into all areas of human endeavor– Most new reserachers will interact with other disciplines– Many new entrants will contribute competencies from other areas

• This means that we all will need to develop an understanding for the differences among these areas

Page 61: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 61

Differences among Sciences

Scientific incompatibilities sciences have often been characterized as

• differences in language, perhaps, mor formally• differences in Ontology, and • differences in problem decomposition

– alternate high-level objectives– divide-and-conquer paradigms

One person's (or discipline's) optimal approach is viewed by others as a bureaucracy to be fought

There are a more fundamental differences ..

Page 62: April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002

April 2002 Gio ArtInt 62

Paradigms of Science

Disciplines use distinct scientific models1. Mathematical Sciences:

• Formal proofs, to hold in all cases. • A single exception invalidates the proof• Complexity is dealt with explicitly (big Oh)

2. Engineering sciences• recognition of alternative approaches to a goal• A thesis shows the best (cost-benefit) case in some range

3. Social Sciences• surveys can measure truth of a Hx to some degree of belief• having some exceptions does not invalidate the Hx

4. Biological Sciences• cases provide distinct examples and • prediction assumes a linear world

One should know in which paradigm one's research is performed