11 core architecture mauro bruno, monica scannapieco, carlo vaccari, giulia vaste antonino...

28
1 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Upload: kellie-skinner

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

11

CORE Architecture

Mauro Bruno, Monica Scannapieco,Carlo Vaccari, Giulia Vaste

Antonino Virgillito, Diego Zardetto(Istat)

Page 2: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Objective

• Provide a unique environment

for:

– Designing

• Statistical processes in terms of

abstract services

• Exchanged data and metadata

– Running

• Designed processes by invoking

existing (wrapped) tools

Page 3: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Design: Services

• Abstract services: specify a well-

defined functionality in a technology-

independent way

• An abstract service can be

implemented by one or more concrete

services, i.e. IT tools

• Examples: sample allocation, record

linkage, estimates and errors

computation, etc.

Page 4: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Design: Services

• GSBPM classification

– Documentation purpose

– Provided that a CORE service can be

linked to IT tools, GSBPM tagging enables

the performance of a search e.g.

retrieving

“all the IT tools implementing the 5.4 Impute

subprocess of GSBPM proposal”

Page 5: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Design: Services

• Service inputs and outputs

– Specified by logical names

– Characterized with respect to their “role”

in data exchange

• Non-CORE: if they are not provided by/to other

services of the process, but are only “local” to

a specific service

• CORE: they are passed by/to other services and

hence they do need to undergo CORE

transformations

Page 6: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Design:

Data and Metadata

• They are specified as service

inputs and outputs

– Logical names link them to

previously specified services

– Non-CORE data only need the file

system path where they can be

retrieved

Page 7: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Design: CORE Data

The specification of CORE data is

provided by 3 elements:

– Domain descriptor

– CORE data model

– Mapping model

Page 8: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Domain Descriptor:

ModelEntity

• Like “entities” in Entity Relationships

Entity properties

• Like “attributes” in Entity Relationships

Very simple (meta-)model: can easily

describe other evolving models

e.g.GSIM

Page 9: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Domain Descriptor:

Example<schema name="DEMO_Domain_Descriptor">

<entity name="SamplePlan"><property name="STRATIFICATION_VAR"/><property name="STRATUM_SAMPLE_SIZE"/><property name="STRATUM_POPULATION_SIZE"/>

</entity><entity name="Enterprise">

<property name="IDENTIFIER"/><property name="STRATIFICATION_VAR"/><property name="WEIGHT"/><property name="SAMPLING_FRACTION"/><property name="ENTERPRISE_FLAG"/><property name="EMPLOYEES_NUM"/><property name="VALUE_ADDED"/><property name="AREA"/>

</entity></schema>

Page 10: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

o1

Domain Descriptor:

RoleRole of the Domain Descriptor (DD):

from service-to-service data mapping to service-to-global data mapping S1

i1

S2

i2

o2

i2o1

O1 mapped to i2Via ad-hoc mapping

DDDDo1

i2DD

O1 mapped to i2Via DD

Page 11: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

11

CORE Data Model

• Rectangular data set

• CORE tag:

• Data set level (mandatory)

• Column level (optional)

• Rows level (optional)

• Data set kind

• Column kind11

Page 12: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Data Model: Role

• Specified once and valid for all

processes

• Extensible, i.e. core tag, data set

kind, column kind can be modified

• Adds more semantics to data

– Example of usage: mapping to other

models

Page 13: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

13

Mapping Model

• Rectangular data assumption

• Mapping is intended to be specified with

respect to Domain Descriptor

• Columns are to be mapped to properties of an entity

• It contains the specification of how CORE

data model concepts are associated to

data

13

Page 14: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

14

CORE Logical

Architecture

GUICORE

Repository

Integration APIs

Process Engine

Runtime

SERVICES

14

Page 15: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

15

CORE GUIs

• Process design

• Ad-hoc customization of an existing tool

(Oryx)

• Service data flow

• Service design

• Set of interfaces for the definition of

services and related data flow

• Data design

• Set of interfaces for the specification of

domain descriptors and mapping files

Page 16: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

16

Process design: Oryx

• Oryx is an academic open source framework for graphical process modeling• Based on web technology • Extensible via a plugin mechanism and

new stencil sets• Supports BPMN and other process

modeling languages • Programming language Javascript and

Java, internal data format based on RDF

Page 17: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

17

Stencil Set

• Set of graphical objects and rules that specify how to relate those graphical objects to others

• Additional properties that can later be used by other applications or Oryx extensions (e.g. setting element colors and visibility)

• Can be used to build process models

17

Page 18: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

The CORE Stencil Set

• Graphical representation of CORE processes

• Easy-to-use editor (desktop feeling)

• Easy-to-extend source (JSON)• Defined from BPMN• Guarantees complete BPMN

compliance

Page 19: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Integration APIs• Purpose: wrapping a tool by a CORE

service

– Translates inputs and outputs of the tool

in a completely transparent and

automatic way

CORE Service

Page 20: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Repository

• Processes and their instances

• Services with their GSBPM and

CORE classifications

• Tools and their runtime features

• Data with their logical

classification within CORE

processes

Page 21: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

21

Process Engine

• Official statistics processes can be

viewed from two perspectives:

• Functional: they are data-oriented,

reflecting a common feature of scientific

workflows

• Organizational: they are workflow-

oriented, have the complexity of real

production lines, with the need for

harmonizing the work of different actors

Page 22: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

22

Process Engine

• Hence our process engine has

two layers

DATA FLOW CONTROL SYSTEM

WF ENGINE

• Complex control flows Syncronizing constructs,

cycles, conditions, etc. E.g.: Interactive multi-user

editing imputation • Simple control flows

Sequence of tasks is composed by connecting the output of one task to the input of another

Data intensive operations

Page 23: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

23

Implementation issues

• Java web application implementing:

• GUIs

• CSV-CORE Integration API

• Data flow control system

• Layered design firmly based on frameworks:

• Hibernate: database mapping

• Struts2: model-view-controller approach

• Repository implementation: MySQL dbms

Page 24: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

24

Web Application Design

Entities

ModelData access

DAOsServices

View (GUI)

Forms Input

validation

Controller

Actions

Struts2 Hibernate

BusinessLogic

Page 25: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Architecture Deployment• Web architecture based on a

centralized component – CORE Environment

• Different CORE deployments can co-exist– Intra- or Inter- organization

• Services can be remotely executed– Support is needed in the form of a

distributed component for tool execution and data transfer

Page 26: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Type of runtime services

• Batch

– Tool executed by a command line call

– Can be automated

• Interactive

– User interacts with the tool through the GUI provided by the tool

– Cannot be automated

• Web service

– No tool procedure distributed on a web service actived by a programming language call

– Can be automated

Page 27: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

CORE Distributed

Deployment

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

Web service runtime

Web container

Page 28: 11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Conclusions

• CORE implementation is a proof-of-concept prototype showing:

– Real implementation of industrialized (standardized and automated) statistical processes

– Reuse of IT tools possibly developed on different platforms and by different NSIs

– GSBPM-aware services implementation

– A unique common data model enabling integration of heterogeneous data exchanged between services

– Openess to evolving statistical information models (e.g. GSIM)