recording application executions enriched with domain semantics of computations and data

16
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow, 30.9.2008

Upload: lamar

Post on 05-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Recording application executions enriched with domain semantics of computations and data. Master of Science Thesis Michał Pelczar Krakow, 30.9.2008. Outline. Background Objectives Provenance model Information building Feasibility study QUaTRO State of the art Research outline - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recording application executions enriched with domain semantics of computations and data

Recording application executions enriched with domain semantics

of computations and data

Master of Science Thesis

Michał Pelczar

Krakow, 30.9.2008

Page 2: Recording application executions enriched with domain semantics of computations and data

Outline

• Background• Objectives• Provenance model• Information building• Feasibility study• QUaTRO• State of the art• Research outline• Publications

Page 3: Recording application executions enriched with domain semantics of computations and data

Background

• E-Science– Advanced computing technologies supporting

scientists– Global collaboration in key areas of science

• Semantic Web provides data scalability– XML, RDF, RDFS, OWL– Ontology serves as taxonomy

• Grid computing provides computation scalability• Virtual experiments influence scientific

discoveries pace

Page 4: Recording application executions enriched with domain semantics of computations and data

Provenance

• metadata that pertains to the derivation history of a data product starting from its original sources

• the seven W’s: Who, What, Where, Why, When, Which, hoW

• Scientific results reproducibility• Guarantee of data reliability and quality• Regulatory mechanism of sensitive data

protection• Mean of e ciency optimizationffi

Page 5: Recording application executions enriched with domain semantics of computations and data

ViroLab

• Virtual laboratory for infectious diseases • Prevention, diagnosis and treatment • Medical science, computer science, healthcare

Page 6: Recording application executions enriched with domain semantics of computations and data

Objectives

• Design information model for provenance• Design data model for monitoring system• Adapt existing monitoring infrastructure to the

provenance requirements• Define ontology creation process

– Ontology and data model independent– Manageable– Augmentable– Described semantically

• Design and implement component realizing the process• Incorporate the component into system grid

infrastructure• Design and implement provenance querying component

Page 7: Recording application executions enriched with domain semantics of computations and data

Provenance model

• Experiment re-execution• Data dependencies• Results management• Performance• Resources availability• Related with ontologies:

– Data– Domain

Page 8: Recording application executions enriched with domain semantics of computations and data

Ontology extension

• Derivation concepts– XML– Delegates

• Aggregation rules• Annotations

– Classes– Properties

Page 9: Recording application executions enriched with domain semantics of computations and data

Information building

• OWL and XSD independent• Manageable• Events correlation• Events aggregation• Experiment transaction support• Knowledge history tracking• Association strategy

Page 10: Recording application executions enriched with domain semantics of computations and data

Proof of concept:Drug resistance case study

• Alignment• Subtyping• Drug ranking• Different levels of semantics

– Data– Computation

Page 11: Recording application executions enriched with domain semantics of computations and data

QUaTRO

• Abstract query language– Data representation and storage transparent– Understandable by non-IT specialist– Configurable by ontologies– Easy to integrate with GUI– Extendible

Page 12: Recording application executions enriched with domain semantics of computations and data

Query processing

NewDrugRanking

RulSet

Matthew Brown

2007-06-28

HIVDB

4.2.8

executedBy

dateOfBloodSample

usedRuleSet

name

version

//*[local-name() eq 'NewDrugRanking' and ( (child::*[ name() =

'executedBy' and . eq 'MrHyde']))]

//*[local-name() eq 'NewDrugRanking' and ( (child::*[ name() =

'dateOfBloodSample' and . eq '2007-06-28']))]

SELECT id FROM rulesets WHERE name = ‘HIVDB’

SELECT id FROM rulesets WHERE version = ‘4.2.8’//*[local-name() eq 'RuleSet' and (

(child::*[ name() = 'vl-data-protos:dasId' and . eq

'cyfronet_mysql:test:id:2']))]

//*[local-name() eq 'NewDrugRanking' and (child::*[

name() = 'usedRuleSet' and (@*[name()='rdf:resource' and ( ( . eq 'http://www.virolab.org/onto/drs-protos/HIVDB_4_2_7' )) ]) ])]

• Provenance ontologies• Mapping ontologies• File systems• Databases• Operators

Page 13: Recording application executions enriched with domain semantics of computations and data

Summary

• Data model for operations and resources• Ontologies for data, experiments and geno2drs

scenario• Monitoring infrastructure: remote logging,

automatic generation of helpers• Semantic Event Aggregator implemented and

deployed as OneJAR application• QUaTRO integrated into GridSphere portal

Page 14: Recording application executions enriched with domain semantics of computations and data

Future work

• QUaTRO extensions– Join operation– Provenance graph rendering– File system querying

• Model extensions– Performance recording– Data origin recording

• Explicit provenance recording– Domain ontologies generation– Partial results storage– Domain events publication

Page 15: Recording application executions enriched with domain semantics of computations and data

Publications

• B. Balis, M. Bubak, M. Pelczar, From Monitoring Data to Experiment Information – Monitoring of Grid Scientific Workflows. In G. Fox, K. Chiu, and R. Buyya, editors, Third IEEE International Conference on e-Science and Grid Computing, e-Science 2007, Bangalore, India, 10-13 December 2007, pages 187-194. IEEE Computer Society, 2007.

• B. Balis, M. Bubak, M. Pelczar, J. Wach, Provenance Tracking and Querying in ViroLab. In Cracow GridWorkshop 2007Workshop Proceedings, pp.71-76, ACC CYFRONET AGH 2008.

• B. Balis, M. Bubak, M. Pelczar, J. Wach, Provenance Querying for End-Users: A Drug Resistance Case Study. In: Bubak, M., Albada, G.D.v., Dongarra, J., Sloot, P.M.A. (Eds.), Proceedings ICCS 2008, Krakoland, June 23-25, 2008, LNCS 5103, pp. 80-89, Springer 2008.

Page 16: Recording application executions enriched with domain semantics of computations and data

Detailed information

• ViroLab: http://www.virolab.org

• VLvl: http://www.virolab.cyfronet.pl

http://grid.cyfronet.pl/virolab/wiki

• QUaTRO: http://virolab.cyfronet.pl/trac/quatro

• Ontologies: http://virolab.cyfronet.pl/onto