© daniel s. weld, planet 2003 tutorial on data integration planning for the web ii execution &...

76
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June, 2003

Post on 20-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Planning for the Web IIExecution & Service

IntegrationDan Weld

University of WashingtonJune, 2003

Page 2: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

2 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Acknowledgements

• Oren Etzioni• Yolanda Gil• Keith Golden• Alon Halevy• Zack Ives• Tal Shaked

Caveat

Page 3: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

3 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Outline• Execution for Data Integration

  Coping with incomplete statistics, latency  Interleaved planning & execution  Convergent query processing

• Service Integration  Web service composition

• Background• Representational issues• Planning algorithms

  Automated data analysis

Page 4: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

4 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Optimization and Execution

• Problem:  Few and unreliable statistics about the data.  Unexpected (possibly bursty) network

transfer rates.  Generally, unpredictable environment.

• General solution: (research area)  Adaptive query processing.  Interleave optimization and execution. As you

get to know more about your data, you can improve your plan.

Page 5: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

5 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Adaptivity & Incremental Processing Query Performance

QueryTranslation

User's

Query

Query overSources

Que

ry R

esul

ts

Tukwila Network-BasedQuery Processor

Evaluated within the Tukwila system

[Ives PhD]

Page 6: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

6 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Query Optimization: Model Query Plans’ Execution &

Choose the Best

op

op

op

op

Restock (R) 100 tuples

Orders (O)50 tuples

Shipping (S)90 tuples

Restock (R)100 tuples

Orders (O)50 tuples

Shipping (S)90 tuples

From source sizes, stats, estimate result sizes, costs

RO~30 tuples

ROS~270 tuples

50 sec

ROS~270 tuples

30 sec

OS~15 tuples

Estimates, assumptions introduce error: Exponential increase in estimation error with

each join [Ioannidis & Christodoulakis 91] [Antoshekov 93,96]

Worse if no detailed statistics

Page 7: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

7 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Why Does Data Integration Make Optimization Harder?

Query optimization estimates costs using knowledge about environment and data:

  Data source sizes (“cardinalities”)Often unavailable or not meaningful in data

integration  Histograms

Too expensive to maintain in data integration  I/O costs

Network I/O costs fluctuate

Need a way to gain this sort of knowledge!

Page 8: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

8 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Some Solutions

1. Adaptive operators2. Mid query reoptimization3. Convergent query processing4. Query scrambling [Franklin et al.]5. Eddies [Hellerstein et al.]

Page 9: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

9 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Optim izer

(Re-)Optim izer

Mem Alloc-Fragm enter

ExecutionEngine

Tem p Store

EventHandler

QueryOperators

Reform ulator

Catalog

source mappings

querylogical

planexecplan

answ er

data

execresults

Tukwila Data Integration System

Novel components:  Event handler  Optimization-execution loop  Adaptive operators

Page 10: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

10 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Hybrid Hash Join No output until build relation

read Asymmetric (build vs. probe)

— optimization requires source behavior knowledge

Double Pipelined Hash Join Outputs data immediately Symmetric — requires less

source knowledge to optimize Threads overlap I/O,

computation

Double Pipelined Join

Page 11: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

11 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Tim

e (

sec)

Tuples Output (1000s)

Join of 3 tables sent via JDBC over 10Mb Ethernet: TPC-H Lineitem Supplier Order

Performance on Networked Data

Page 12: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

12 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Benefits:Easier to optimize (symmetric)Sub-operations scheduled flexiblyAllows overlap of I/O and computation

Incurs some overhead:  Threading, queues  Required extensions to intelligently handle

overflow:• Same hash function, number of buckets for each side• Approaches: flush buckets on left side or flush

symmetrically

Double Pipelined Join in Summary

Page 13: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

13 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Some Solutions

1. Adaptive operators2. Mid-query reoptimization

• Interleaved planning and execution

3. Convergent query processing4. Query scrambling5. Eddies

Page 14: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

14 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Mid-query reoptimization

BA

C

D C D

AB

Materialization Point: write AB to disk

If actual predicted statistics replan[Kabra & DeWitt]

Page 15: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

15 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Some Solutions

1. Adaptive operators2. Mid query reoptimization3. Convergent query processing4. Query scrambling5. Eddies

Page 16: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

16 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Convergent Query Processing

• Instead of adapting remainder of plan  after executing all data on plan prefix

• Adapt whole plan  after executing whole plan on part of data

• Can better gather information this way…

Page 17: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

17 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Convergent Query Processing in Action: Changing Join Plans

in Mid-Stream(R O S)

“Cleanup” query plan

Join Restock, Orders, Shipping

ROS

RS

R1

O1 S1

O1S1

R1 O1S1

R2 O2S2

R2O2

R0 O0S0

R0 S0

Page 18: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

18 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Breaking a Join into Phases: One Subset per Table, Each

PhaseRestock (R) Orders (O)

R1 O1

Phase 1

R0 O0Phase 0

O1

O0

CleanupPhase

ncnc

cm

cm

m

mTTTT

1,...,1

11

1

1)...(...

Page 19: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

19 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

The Cleanup Plan Reuses Previous

Work Where PossibleRestock Orders Shipping

R0 O0S0

R1

O1 S1

O1S1

R1 O1S1

R2O2

R0 S0

R2

S2

O2R1

S1

O1

S0

R0 O0

R2

S2

O2

R2O2

R1

S1

O1

S0

R0 O0

Exclude R2O2

Exclude R0S0O0, R1S1O1, R2S2O2,

R2 O2S2

Page 20: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

20 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

CQP on a 100Mbps LAN: Nearly “Optimal”

Performance 866MHz P-III, 256MB buffer pool, re-optimization every 10sec

cost to parse XML

Page 21: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

21 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Slow WAN, Faster CPU: CQP Reduces Work

1GHz P-III, 256MB, re-optimization every 10sec. 1Mbps network, RTT ~50msec

Page 22: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

22 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Outline• Execution for Data Integration

  Coping with incomplete statistics, latency  Interleaved planning & execution  Convergent query processing

• Service Integration  Web service composition

•Background•Representational issues•Planning algorithms

  Automated data analysis

Page 23: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

23 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

What is a Web Service• A web service is a network accessible interface to application

functionality, built using standard Internet protocols (TCP/IP, XML, SOAP, WSDL…  Clients of a web service do NOT need to know how it is implemented.

• Why interesting?   Increased automation

Application

client

Application

codeNetwork Web

Service

Page 24: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

24 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Case Study: Amazon

• Services Exported  Product details (short, long, images, samples)  Purchase functionality  Ratings, reviews, collaborative filtering data, lists, …

• Examples  Store builder tools  Amazon Browser – visualization tool  Windows desktop interfaces – drag-n-drop…  MP3 Piranha  Games  Automatic review writer??

Page 25: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

25 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Case Study: Google

• Services Exported  Search interface  Limits on items returned, queries / day

• Examples  Metacrawler functionality  Geosearch ‘nearby thai restaurants’

• TIGER, FIPs -> lat,long of pages

  Robust hyperlinks• Creates a signature for destination pages & tracks with query

Page 26: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

26 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Case Study: Fed Express• Shipment tracking• Proof of delivery• Invoice reviewed, adjusted, settled• Schedule pickup time, location

  Outgoing or returns• Order supplies (airbills, envelopes, boxes)• Review shipping history• Rate requests

  Location, package size• International trade

  Required documents, duties, taxes

Page 27: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

27 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Case Study: Hailstorm / MyServices

• Web Services  MyDocuments  MyAddressbook  MyWallet  MyNotifications ….

• Scenario  Wallet keeps receipts, arranges product return  Expedia uses notifications to warn of canceled flight

• Reality  Ebay, AmEx, Groove, …

Page 28: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

28 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Case Study: OAA

• Common schema for travel industry• Reservations

  Flights, trains, rental cars, hotels

• Time & distances• Payment, deposits, vouchers• Vacation Packages

Page 29: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

29 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Web Service Technology Stack

DiscoveryDiscovery

DescriptionDescription

PackagingPackaging

TransportTransport

NetworkNetwork

shopping web service?

WSDL URIsWeb ServiceClient

Web Service

UDDI

Proxy

WSDL

SOAP pkg requestWSDL

SOAP pkg response

Page 30: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

32 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

SOAP (Simple Object Access Protocol)

• SOAP Messages  XML Payload

• Using SOAP as RPC (Remote Procedure Call) Messages

SOAP client SOAP serverRequest message

Response message

Page 31: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

33 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

If a WS were a Phone Call…

• XML   represents the conversation,

• SOAP   describes the rules for how to call someone

• UDDI   is the phone book.

• WSDL   describes what the phone call is about and how

you can participate.

Page 32: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

34 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

WSDL <types> <schema targetNamespace="http://tempuri.org/xsd" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdl="http://schemas...l/" elementFormDefault="qualified" > </schema> </types> <message name="Simple.foo"> <part name="arg" type="xsd:int"/> </message> <message name="Simple.fooResponse"> <part name="result" type="xsd:int"/> </message> <portType name="SimplePortType"> <operation name="foo" parameterOrder="arg" > <input message="wsdlns:Simple.foo"/> <output message="wsdlns:Simple.fooResponse"/> </operation> </portType>

for int foo(int arg);

Page 33: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

35 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

DISCO

• If you know the URL for a service• DISCO lets you query them• And get back a WSDL description

• But what if you don’t know the right URL?

Page 34: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

36 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

UDDI• Hosted Registries

  Microsoft, IBM, HP, SAP, NTT, BEA• Entries defined with

  Business information• Name, contacts, descriptions, identifier, yellow pages category

  Service information• Entities, each of which describes a family of related services

which together implement a business process  Binding information

• How to invoke: URI, required parameters, options, & Tmodel  Service specifications (Tmodel)

• As a symbol – fingerprint to recognize a known service• Decomposable to find WSDL description

Page 35: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

37 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Acronyms (W3C, MSFT, IBM)• UDDI

  Discover, describe, register services  SOAP-based service for locating WSDL-formatted service descriptions

• DISCO   Discover / retrieve SCL+SDL descrips

• SDL / NASSL  SOAP description lang –get params / types

• SCL  SOAP contract lang – extends SDL – orchestration of msgs

• WSDL   Describe abstract interface and protocol bindings of arbitrary

network services (extends scl)

• XLANG / WSFL / BPEL4WS   lang for biz processes used in BizTalk  Biz process execution language for web services

• MSFT, IBM, BEA proposal NASSL

SCL

SDL

WSDL

WSFL

XLANG

BPEL4WS

Page 36: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

38 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

The Layer Cake [TBL,XML2000]

Page 37: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

39 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

RDF (Resource Description Framework)

Way to describe resources via metadata Makes no assumptions about a particular application domainBased on XMLAnother one?Standard for semantic web

Restricts resource descriptions to triplets (subject,predicate,object)

Provides a lightweight ontology systemSubproperty, Subclass, Domain & Range

Page 38: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

40 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

DAML+OIL (www.daml.org)

• DAML extends RDF and RDFS with richer modeling primitives.  disjointWith, intersectionOf, oneOf,

cardinality

• Able to provide properties of properties  uniqueness, transitivity, etc.

Page 39: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

41 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

(mapping to WSDL)

DAML-SDAML+OIL ontology describing Web

ServicesComplements low level descriptions like

WSDL   Describes what and why a service operates,   Not just how to communicate with it.

Goals: Discovery, Invocation, Composition,   Verification, Execution Monitoring

Page 40: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

42 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Outline• Execution for Data Integration

  Coping with incomplete statistics, latency  Interleaved planning & execution  Convergent query processing

• Service Integration  Web service composition

• Background• Representational issues• Planning algorithms

  Automated data analysis

Page 41: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

43 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Partial Survey of Planners• UW Internet Softbot

  Planners: SENSp / XII / PUCCINI   Repr. languages: UWL / SADL ; LCW

• PKS  Planning at the knowledge level

• McDermott  Forward-chaining search w/ GRG guidance

• McIlraith et al.   ConGolog (procs, loops, conditionals, w/ nondet

• Papazoglou, Traverso et al.  Stratified service arch; XSRL language; MBP

• Finin; Srivastava; Knoblock; Ambite; Nau…

Page 42: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

44 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Inputs Filters Models Visualization

False Color

PhenologyNPP

Mean wind

MODISFPAR

MODISLAI

RUC2

Mosaic

Re-project

Drill-down

Mosaic

Re-project

LandSurfaceModels

Daily

GRIBStatistics

LAZEA

Daily LAZEA

Mean Precip.

Soil

Topography

FP

AR

L

AI

Min, Max Temp

Stream flowSnow coverSoil Moisture

GOESRadiation

WGRIB bin

Com-posit

Com-posit

8-day

8-day

Planning for image processing tasks• Many fielded systems

  Lansky’s COLLAGE , Chien et al. MVP/ASIP,   Golden ADLIM, Blythe GRID…

• Spatial representations important

Page 43: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

45 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Motivating ScenariosPlanning a trip

Yahoo maps -> driving time -> travel prefsAutomatic expense form filing

Purchasing a group of itemsAggregation from multiple vendorsSelect for: payment types, stock level, delivLocal & 3rd party reputation services (BBB)

Monitoring marketplaceAuction sitesEvents (check calendar / notification service

Page 44: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

46 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

UW Internet Softbot

• Software robot• Effectors mv, ftp, chmod, cd, lpr, rm, ...• Sensors ls, finger, INSPEC, netfind, wc, ...• Say whatwhat we want, not howhow to do it

  Find phone numbers, fetch/print online papers, …

• Integrate multiple resources

Page 45: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

47 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Motivation/Contributions

• Represent actions like ls, finger• Represent goals such as

  “Rename paper.tex to kr.tex”  “Print all files in directory papers.”

(even with incomplete information)

• No previous system could express

Page 46: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

48 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

The Middle Ground

Tractability Expressiveness Complete I nf o STRI PS ADL Situation Calculus I ncomplete UWL Moore et al

1. Action Representation

Tractability Expressiveness Complete I nf o Closed World Assumption (CWA) I ncomplete OWA Circumscription

2. Knowledge Representation

Page 47: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

49 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Softbot Architecture

Sensors

TaskManager

Effectors

UNIX shell & WWW

SADLActions

PUCCINIPlanner

LCWKnowledge

Page 48: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

50 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

SADL Family Tree

STRIPS

UWL ADL

SADL

Incomplete info,Noise-free sensors

ConditionalEffects

[Pednault, 89][Etzioni et al, 92]

[Fikes & Nilsson, 71]

Represents ls, “Rename”, finger...

[Golden & Weld, 96]

Page 49: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

51 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

SADL/UWL AnnotationsGoal annotations: satisfy = achieve by any means hands-off = don’t change (maintenance)Effect annotations cause = change world observe = change agent’s knowledge

“Delete the file named junk” satisfy (name (ƒ, junk)) satisfy(deleted (ƒ))

Page 50: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

52 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Information Goals are Temporal

• Two time points  When proposition sampledWhen proposition sampled  When reply givenWhen reply given

• “Tell me nownow who was President in 1883in 1883”• “Tell me tomorrowtomorrow who is President nownow”• “Identify (ASAPASAP) the file nownow named `junk’”

Page 51: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

53 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Information Goals are Temporal

“Rename paper.tex to kr.tex”  designator (name) changes  UWL can’t express

SADL solution initiallyinitially = time goal was posed = time goal was posed

initially (name (ƒ, paper.tex)) satisfy (name (ƒ, kr.tex))

initially (name (ƒ, core)) satisfy (deleted (ƒ ))

Compare to more general temporal representation

Page 52: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

54 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

“Print paper, but don’t leave it uncompressed.”

initially (compressed (paper), tv) satisfy (printed (paper)) satisfy (compressed (paper), tv)

State of paper.ps may change temporarilybut must be restored

Compare to more general goal lang, e.g. LTL C C B B

Tidiness Goals

Page 53: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

55 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Unbounded Information Gain

action ls (d ) precondition: satisfy(current.shell(csh)) satisfy(readable(d )) effect: ff when in.dir(ff, d) l,n,d observeobserve(length(ff, l ))

observeobserve(name(ff, n )) observeobserve(in.dir(ff, d ))

Page 54: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

56 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Compare PKS Representation

Initial State:Kf = {(= (pwd) root), (indir papers root), (indir planner root), (dir root), (dir papers), (dir planner), (file paper_tex)}Kx = {((indir paper_tex planner) | (indir paper_tex papers))}Goal:K(indir paper_tex (pwd))

Page 55: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

57 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

The Internet Softbot

Sensors

TaskManager

Effectors

UNIX shell & WWW

SADLActions

PUCCINIPlanner

LCWKnowledge

Page 56: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

58 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Knowledge Representation

• Closed World Assumption (CWA)  Made by classical planners  Anything not recorded as true is falsefalse

• Open World Assumption (OWA)  Anything not recorded true or false is

unknownunknown  Sensor abuse   Can’t handle goals

Page 57: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

59 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Sensor Abuse

• OWA: Don’t know when to stop sensing  Many ways to find same information  Many plans containing same action

• After executing find / -name foo, should know  ls bin won’t reveal more files named foo

  ls tex won’t reveal more files named foo  Google may reveal more files named foo

Page 58: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

60 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

CBA

How Classical Planners Handle

blockblock ((xx)) OnTable (xx) replaced with:OnTable (AA) OnTable (BB) OnTable (CC)

• Relies on CWA  Must know all blocks  OWA can never be sure

AA

C C

B B

Page 59: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

61 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Local Closed World Knowledge

• Complete info over restricted domain  All blocks on table, all products at Amazon

• Local Closed World Knowledge (LCW)  Restricted form of circumscription  Provides fast closed world inference  Allows fast updates  Suited to planner action representations.

Page 60: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

62 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

LCW Semantics

““I know all files in directory I know all files in directory binbin””LCW(in.dir(f, bin))

LCW(in.dir(ff, bin)) ff ⊨in.dir(ff, bin) ⊨ in.dir(ff, bin)

Page 61: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

63 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

LCW Representation• M: Ground literals in agent’s model

  in.dir(icaps03, papers)  in.dir(junk, papers)executable(core)

• L: LCW formulas in agent’s model  LCW(in.dir(ff, papers))

• If P M, and L ⊢LCW(P), then P  Conclude: in.dir(foofoo, papers)

Page 62: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

64 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

LCW Reasoning•Inference

  If I know all files in tex, and I know the size of every file, then do I know the size of every file in tex?

•Updates  If I know the size of every file in the size of every file in textex,

and I removeremove a file from tex, do I still know the size of every file in the size of every file in textex?

  What if I addadd a file to tex?

Page 63: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

65 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

LCW Reasoning is HardTheorem:

If LCW formulas can contain and then answering an LCW query is NP-hard.

But we need fast inference!But we need fast inference!

• Solution: restrict representation• Positive first-order conjunctions• Fast polynomial time inference/updates

[Etzioni et al. AIJ][Levy VLDB96][Friedman & Weld IJCAI97]

Page 64: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

66 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

LCW Updates

• L must be updated when M changes.• All changes to M fall into one of four

categories:  Information loss: Δ(φ{TF} U)  Information gain: Δ(φU{TF})  Domain Growth: Δ(φFT)  Domain contraction: Δ(φTF)

Page 65: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

67 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Domain GrowthAdding core to bin invalidate

LCW(in.dir(f, bin) size(f,c)) unless the size of core is known!Theorem:

If Δ(φFT)thenL’ L - MREL(φ)

MREL(φ) {ΦREL(φ)⊬LCW(ΦX)θ}REL(φ) {ΦL(XΦθα)Xθφα⊬(ΦX)θ}

BBBB

A A C C

Page 66: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

68 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

LCW Updates

I nf ormation loss

T F U L’ L -

REL() compress

I nformation gain

U T F L’ L LCW() ls, wc

Domain growth

F T L’ L - MREL()

cp

Domain contraction

T F L’ L rm

Page 67: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

69 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Pruning Redundant Sensing

Experience (problems attempted)

Tim

e (

CPU

seco

nd

s)

Page 68: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

70 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

The Internet Softbot

Sensors

TaskManager

Effectors

UNIX shell & WWW

SADLActions

PUCCINIPlanner

LCWKnowledge

Page 69: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

71 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

XII / Puccini Planner• Based on UCPOP

  Generative, Partial-Order, Causal-Link  I.e. much like Gerevini’s LPG

• Efficient sensing (LCW control)• Lifted support of goals

[Golden et al. 94, Golden Phd]

Page 70: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

72 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Satisfying GoalsLink Directly to Effect

Subgoal on LCW; Then Expand to Ground Form

Partition

rm * f Satisfy(Deleted(f))

ls LCWlpr foo, lpr bar f Satisfy(Printed(f))

Page 71: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

73 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

ls -l /tex goal

Threats to LCW,

LCW(in.dir(f, /tex) & size(f, l))

compress /tex/paper cause(length(paper), U)

Threat = “Information Loss”PromoteDemoteConfrontShrink

mv junk /tex/ cause(in.dir(junk, /tex), T)

Threat = “Domain Growth”Promote, Demote, ConfrontShrinkEnlarge

Page 72: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

74 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Softbot Status• Fully Implemented (1997)• Hundreds of Unix, Internet Actions• Daunting Combinatorics

  Declarative Search Control   Laborious, Brittle

• Hence...  ? Improved Declarative Control  ? Reactive Control  ? Less Expressive Language

Rodney

Simon

Info Manifold

MetaCrawler

BargainFinder

ShopBot

Occam

ILASIMS

Ahoy

Page 73: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

75 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

PG-based Heuristics / Sensing

(own PlayGo)

(subject PlayGo go)

(subject MySystem chess)

(search amazon chess)

(atStore *b amazon)

(subject *b chess)

(LCW((atStore !b amazon)(subject !b chess)))

(trade PlayGo *b amazon)

(order MySystem amazon)

(not (own PlayGo))

(own *b)

(subject *b chess)

(atStore MySystem amazon)

(own PlayGo) (own PlayGo)

(subject PlayGo go) (subject PlayGo go)

(subject MySystem chess) (subject MySystem chess)

(LCW((atStore !b amazon)(subject !b chess)))

(atStore *b amazon)

0 1 2

?

?

[Shaked03]

Page 74: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

76 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Using the Graph

• LPG-like search (local search on POP)• Propagating sensing action links• Executing to reach ‘better’ states• Sophisticated heuristics!

Page 75: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

77 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

Conclusion

,

• Planning for the web is ripe for progress• Data integration

  Modeling sources: GAV, LAV, …  Answering queries using views  Interleaved planning and execution, eddies, cqp

• Service integration  Web service composition  Representing unbounded information gain  Latest heuristic search techniques => fast!

Page 76: © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution & Service Integration Dan Weld University of Washington June,

78 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

PKS

• Contingent, forward-chaining planner  Constructs a complete, correct plan  Separates plan-time and execution-time effects

• Less Expressive  No universal quantification

• Still needs search control heuristics

[Pettrick & Bacchus KR00, AIPS02]