recording actor provenance in scientific workflows ian wootten, shrija rajbhandari, omer rana...

13
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana [email protected] Cardiff University, UK

Upload: job-mcgee

Post on 18-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

Recording Actor Provenance in Scientific Workflows

Ian Wootten, Shrija Rajbhandari, Omer [email protected]

Cardiff University, UK

Page 2: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

What?

Provenance is concerned with process This may or may not be documented

Data Provenance – The process which leads to a particular piece of data

Actor Provenance - The process which leads to a particular actor state How an actor (client or service) arrived at a particular

state during an interaction (for stateless actors)

Page 3: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

What? Actor Provenance

Service

Enactment Engine

ServiceInteraction Assertions: Asserting the contents of a message by an actor sending or receiving it.

A1

A2

B1

B2

Actor State Assertions: Asserting the state of an actor at a particular time during an interaction.

Page 4: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

Metrics for Actor State Assertion

Static No variation in value over actor lifetime

Per Node - Node identity, Operating system Per Actor - Actor identity, Name, Owner, Version

Dynamic Variation in value over actor lifetime

Per Node - Memory usage, Network traffic Per Actor - Execution Time, Availability

Instrumented Actor is ‘Instrumented’ at Key Points in its Execution

Description of internal data flow Eg. German Aerospace Center (DLR)

Completion states for action events and file transfers

Page 5: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

How? Actor Provenance

Service

Enactment Engine

Service

B1

B2

M1 M2

InstrumentedOutput

MonitorOutput

Monitoring Sources: Service information derived from hosting platform via monitoring sources (eg Ganglia)

Instrumented Actor: Service information obtained from instrumented points within an actor.

Page 6: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

Why? Standalone and Combined Value

Standalone State Assertion Value Actor Selection

Performance• Evaluation of Past / Prediction of Future

Resource Allocation Actor administrator allocates resources according to performance

metrics

Combined Value - Putting Assertions into Context Interaction – Through Actor State Assertions

Determining the likely cause of error / results Understanding what an actor is doing

Actor – Through Interaction Assertions Understanding performance pattern observations Understanding instrumented metric observations

Page 7: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

How? Actor Provenance Registry

Attempt to provide a mechanism to specify and record actor state assertions for any application

Generic Mechanism Problems No Knowledge of Potential Resources

Monitoring sources, containers No Direct Knowledge of Implementation

Instrumented Data Capture

Page 8: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

How? Actor Provenance Registry

Resource and Rule Registration Resource – Monitoring Tool Rule - User defined instructions

Indirectly from Resources Coordinator polls resources for information Times of interest – Service Invocation, Request

Directly from actor Collection of Instrumented data

Representation?

Page 9: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

How? Actor Provenance Registry

Integration with PReP [Groth et al.]

Provenance Store

Client Service

Record Provenance

Record Provenance

Reg

istr

y

Monitoring Sources

Registry

Monitoring Sources

Invoke

ResultRecord Actor Provenance

Record Actor Provenance

Local Store

Local Store

Page 10: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

Data Mining Prototype

Record assertions using registry during invocation of a data modelling service

Service takes incoming data sets and generates a model based upon it Uses Quantitative Structure-Activity

Relationship (QSAR) to attempt to correlate biological activity to a chemical compound

Larger data set = longer run time

Page 11: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

Performance Evaluation

0

5000

10000

15000

20000

25000

30000

35000

40000

0 50 100 150 200 250 300 350 400

Size of Data Set (KB)

Invo

cati

on

Tim

e (m

s) No rules

1 rule

5 rules

Page 12: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

Conclusions / Future Work

Actor Provenance data is important Without it, we don’t get the full picture

Prototype shows that it can be done Room for improvement

Interface to Monitoring System Caching of results

No inclusion of ‘instrumented’ actor capture Requires service provider adoption to work

Page 13: Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

Prototype Configuration

Single machine holding both client, service and registry

Rules executed on invocation of service XQuery Invocations performed 100 times on datasets

between 30KB – 340KB in size

Coordinator records rule results to a local file store