workflow management in gridminer günter kickinger, jürgen hofer, peter brezany, a min tjoa...
DESCRIPTION
Overview GridMiner –Service-oriented grid-aware data mining system –cope with very large data sets high dimensional data sets geographically distributed data sets different types of data sets –implemented on top of Globus Toolkit 3.0TRANSCRIPT
Workflow Management in GridMiner
Günter Kickinger, Jürgen Hofer, Peter Brezany, A Min Tjoa
Institute for Software ScienceUniversity of Vienna
The 3rd Cracow Grid Workshop
Outline
• Overview• The Knowledge Discovery Process• GridMiner Architecture• Collaboration of Services• Workflows• Dynamic Service Composition
Overview
• GridMiner– Service-oriented grid-aware data mining system– cope with
• very large data sets• high dimensional data sets• geographically distributed data sets• different types of data sets
– implemented on top of Globus Toolkit 3.0
DWH
Knowledge
Cleaning andIntegration
Selection andTransformation
Data Mining
Evaluation andPresentation
The Knowledge Discovery Process
GridMiner Architecture
GMMSMediation
GMPPSPre Processing
GMDMSData Mining
GMPRSPresentation
GM DSCEDynamic Service Control
GMDISIntegration
GMOMSOLAM
GMISInformation
GMRBResource Broker
GridMiner Core
GMCMSOLAP / Cubes
GridMiner Base
GridMiner Workflow
Grid CoreServices Security File and Database
Access ServiceReplica
Management
Grid Core
Grid Resources Data Source
Fabric
Collaboration of GM-Services
GMPPSPre Processing
GMDMSData Mining
GMDISIntegration
GMPRSPresentation
Data SourcesIntermediateResult 1
IntermediateResult 2(e.g. “flat table”)
IntermediateResult 3(e.g. PMML)
FinalResult
Simple Scenario:
Collaboration (2)
GMDISGMPPS
GMPPSGMPPS GMDMS GMPRS
GMPPS GMPPS
GMDMS
GMDMS
GMPRS
GMPRS
Complex Scenarios:
GMDMS GMPRS
GMDISGMPPS
GMPPSGMCMS GMOMS GMPRSGMPPS
Workflow Management
• Motivation– high complex and dynamic process
• order of service execution• selection of services• sequential and parallel execution
– long running process• termination of client would terminate the workflow
=> Additional workflow layer needed !
Workflow ModelsStatic workflows Dynamic workflows
Dynamic Workflows
DSCE
Service A Service B
Service C
Service D
DSCL • Dynamic Service Control Language (DSCL)– based on XML– easy to use
• Dynamic Service Control Engine (DSCE)– processes workflow
according to DSCL
Dynamic Service Control Language
• Features– Control flow
• parallel execution of activities• sequential execution of activities
– Activities• creation of new Grid Service Instances• invoking operations on Grid Service Instances• Querying SDEs of Grid Service Instances• assigning and copying variables
dsclvariables
variable *value ?
compositionactivity *
DSCL - Example
variables
composition
dscl
qreateService invoke query
SDEqreateService invoke query
SDE
qreateService invoke
Dynamic Service Control Engine
• Features– processing of a DSCL document– parallelism– hiding complexity– delivery of intermediate results– status of executed service– Caching mechanism included
Dynamic Service Control Engine
• Implementation– transient stateful OGSA Grid Service– Operations
• updateDSCL()• start()• stop()• resume()
– SDE• activities
– results, failures, states for each activity
DSCE - Architecture
Service Interface Factory Interface
DSC Engine
DGS Invocation
Dynamic Invoker
Axis 1.1
Globus 3.0
Current and Future Work
• This is work in progress• Additional Features
– Notification Model– Exception Handling
Related Work
• BPEL4WS: Business Process Execution Language (BEA, IBM, Microsoft, SAP, Siebel)
• GSFL: Grid Services Flow Language (Krishnan, Wagstrom, Laszewski)
• Data mining. Concepts and Techniques (Han)• Anatomy of the Grid (Foster, Kesselman, Tuecke)• Physiology of the Grid (Foster, Kesselman, Nick, Tuecke)• Open grid service infrastructure (Tuecke, Czaijkowski,
Foster)
Conclusions
• Dynamic Service Control is an approach allowing the service consumer specify a workflow
• General approach – not only restricted to GridMiner