towards exascale-ready data service...

30
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777533. Towards Exascale-ready Data Service Solutions Maximilian Höb 10 th July 2019, Athens PROviding Computing solutions for ExaScale ChallengeS

Upload: others

Post on 22-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777533.

Towards Exascale-readyData Service Solutions

Maximilian Höb

10th July 2019, Athens

PROviding Computing solutions for ExaScale ChallengeS

Page 2: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Consortium

Ludwig-Maximilians-Universität München, Germany

Universiteit van Amsterdam, The Netherlands

Stichting Netherlands eScience Center, The Netherlands

Haute école spécialisée de Suisse occidentale, Switzerland

Lufthansa Systems GmbH & Co. KG, Germany

Inmark Europa SA, Spain

Ústav informatiky, Slovenská Akadémia Vied, Slovakia

Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie, Poland

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 2

PROviding Computing solutions for ExaScale ChallengeS

Page 3: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 3

Storage and Computing Centres

Partner‘s location

Storage Resources

Compute Resources

Page 4: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS will deliver a comprehensive set of mature service

prototypes and tools specially developed to enable extreme

scale data processing in both scientific research and

advanced industry settings

3 Principles

1. Leapfrog beyond the current state of the art

2. Ensure broad research and innovation impact

3. Support the long tail of science and broader innovation

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 4

Vision of PROCESS

Page 5: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

ATMOS-PHERE

GRIDFTP

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

JUPYTER

TOSCA

RIMROCK

LOBCDERDATANET

DISPEL

A user-friendly modular exascale service platform to combinedata and computational services on top of European researchinfrastructures

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 5

PROCESS Concept

SuperMUC-NGLeibniz Supercomputing Centre Munich

Page 6: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Mature, modular, generalizable Open Source solutions for user friendly exascale data.

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 6

Goals of PROCESS

UC#1: Exascale learning on medical image data

UC#4: Ancillary pricing/airline revenue management

UC#3: Supporting innovation based on global disaster risk data

UC#2: Analysis of Radioastronomy Observations

UC#5: Agro-Copernicus (correlating data between simulation and observation)

ATMOS-PHERE

GRIDFTP

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

JUPYTER

TOSCA

RIMROCK

LOBCDERDATANET

DISPELATMOS-PHERE

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

JUPYTER

TOSCA

LOBCDER

DISPELATMOS-PHERE

GRIDFTP

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

TOSCA

RIMROCK

LOBCDERDATANET

UC#2: Analysis of Radioastronomy Observations

Page 7: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS Architecture

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 7

ATMOS-PHERE

GRIDFTP

JUPYTER

RIMROCK

IEE

AUTHENTI-FICATION

CLOUDIFY

TOSCA

KUBER-NETES

NEXT-CLOUD

LOBCDERDATANET

DISPEL

ATMOS-PHERE

GRIDFTP

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

JUPYTER

TOSCA

RIMROCK

LOBCDERDATANET

DISPEL

Page 8: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 8

PROCESS Architecture

Page 9: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Approach: Tiered system with a layer of virtual (data) nodes facilitating:

• data transfers,

• distributed management,

• scheduling and staging.

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 9

Data Delivery for extreme Data Applications

• Independent of resource providers (storage & computing)• Work with data across distributed provider data. • collaboration across research groups

Implementation: container-centric, orchestrated using Kubernetes.software: https://github.com/recap/MicroInfrastructure

A programmable micro-infrastructure

Through

enab

le

Page 10: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 10

Data Management with containerized Services

DISPEL graphical authoring and execution environment based on Eclipse

Jupyter service using the storage adaptor.

Querying files from the Prometheus adaptor through WebDAV service

NextCloud service exposes a GUI to the user to work with user’s files

WebDAV service deployed through the micro-infrastructure. Through the API the user sets the user-name and password which will protect the WebDAV point

Page 11: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

AppTask 1

AppTask 2

AppTask N

Log AppManager

ComputeAdaptor

ComputeAdaptor Data

Adaptor

DataAdaptor

WorkflowManager

ExtraApp

Services

Workflow to infrastructure

Application logic containers

Application specific management containers

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 11

ApplicationDescription

Page 12: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

DeploymentServices

ApplicationDescription

DeploymentServices

DeploymentServices

Container orchestrator e.g. Kubernetes

Deploy application infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 12

Page 13: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 13

Data infrastructure

PROCESS Data infrastructure including data adaptors for UC#1 and UC#2:• Reuse of container adaptors across use cases• Ability to add new application specific container adaptors

Page 14: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Use Case 1

Machine Learning in Medical Imaging

Haute école spécialisée

de Suisse occidentale, Switzerland

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 14

PROviding Computing solutions for ExaScale ChallengeS

Page 15: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

• Use of machine learning to analyse large histopathology images(>100,000x100,000 pixels)

• Cut into small patches for treatment

• Mainly for cancer care to highlight regions of interest

• Use of standard tools such as Keras, Tensorflow, … for Deep learning.

• Adapt the machine learning tools to large data centres and make themscale to improve the amount of training data and thus improve thequality of the models

• Histopathology data is produced in massive quantities constantly

• Use a safe environment for possibly confidential data

• Have a simple user interface to test new pipelines

Use Case Scenario and Objectives

15PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC

Page 16: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Use Case 2

Analysis of Radioastronomy ObservationsLOFAR / SKA

Stichting Netherlands eScience Center, The Netherlands

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 16

PROviding Computing solutions for ExaScale ChallengeS

Page 17: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

LOFAR: Low Frequency Array radio telescope – is a “distributed

software telescope” consisting of ~88.000 antennas in ~51 stations

scattered over Europe. It produces up to 35 TB/h of intermediate data

(visibilities) which is stored for further analysis.

Images courtesy of:

ASTRON

Analysis of Radioastronomy Observations

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 17

SKA: Square Kilometer Array (Operational in 2022+)

130K ~ 1M (LOFAR-style) antenna in Australia + 200 ~ 2000 dishes in

South Africa. Wider frequency range and higher sensitivity and survey

speed than existing telescopes.

Zettabytes/year raw data: 130~300PB/year of correlated data

Huge data and processing problem

Page 18: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Effect of Processing

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 18

Page 19: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 19

UCs prototypes based on modular services

ATMOS-PHERE

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

JUPYTER

TOSCA

LOBCDER

DISPELATMOS-PHERE

GRIDFTP

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

TOSCA

RIMROCK

LOBCDERDATANET

Page 20: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 20

Pipeline and Workflow Configuration Portal

Page 21: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 21

Pipeline Deployment and Output

Page 22: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 22

Towards an Exascale-ready Solutions

ATMOS-PHERE

GRIDFTP

IEE

KUBER-NETES

CLOUDIFY

NEXT-CLOUD

JUPYTER

TOSCA

RIMROCK

LOBCDERDATANET

DISPEL

Page 23: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 23

Enabling Exascale

https://www.top500.org/statistics/perfdevel/

Page 24: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777533.

Maximilian Hö[email protected]

eScience Workshop Platform-driven e-Infrastructure Innovations

September 24, 2019, San Diego, USA

PROviding Computing solutions for ExaScale ChallengeS

Page 25: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

• Data Transfer Nodes (DTN)• Needed for optimal data transfers between different data centres

• Problems• Limitation of TCP

• Firewalls „are evil“

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 25

Data Transfer Nodes

Dedicated, optimized Data Transfer Nodes (DTN)

Source: Peter Hinrich SURFnet “Problems with data transfers“

UvA

SURFnet, DFN, Sanet, PSNC

WAN

Page 26: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

AppTask 1

AppTask 2

AppTask N

Log AppManager

ComputeAdaptor

ComputeAdaptor Data

Adaptor

DataAdaptor

WorkflowManager

ExtraApp

Services

Workflow to infrastructure

Application logic containers

Application specific management containers

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 26

ApplicationDescription

Page 27: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

DeploymentServices

ApplicationDescription

DeploymentServices

DeploymentServices

Container orchestrator e.g. Kubernetes

Deploy application infrastructure

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 27

Page 28: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

● Data-staging service (in progress)○ Batch system for data transfers between sites○ Minimize data copies e.g. JIT data transfers○ Containerize protocol handlers (adaptors)

● Data adaptors (in progress)○ SCP to SCP○ gridFTP to gridFTP○ FTS3 to FTS3○ Define a common container interface

● Compute offload (to do)○ Compute scheduler to decide where to run processing; on the

application infrastructure or offload to an HPC site.

Some containerized services

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 28

Page 29: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Containers Description

WebDAV container Protected a public WebDav entry point.

Token-based WebDAV

meant for access by computing services.

DataNet-adaptor performs operations on metadata

Staging service

stage data just-in-time on the HPC file systems.

DISPELaccess to data (pre)processing environment.

Jupytercontainer access data through Jupyter notebook.

NextCloud view data in Dropbox fashion.

PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 29

Evolution of Data Management

TO: specification of the interaction of the different services

User

LOBCDER

K8s Staging Authservice

Datanet

Adaptors EE RimRocDISPEL

Upload jupyter

Pre -process

DataStores

DataNet

NextCloud

Micro-infrastructure containers

deploy container

deploy container

deploy container

deploy container

deploy container

deploy container

deploy container

Mount physical data stores

Register files

Create application

start application

get user infrastructure

user endpoint

Storage data

getfile (url-encoded-parameter)

Call preprocessing staging pipeline

Webhook: callback

upload: results

Delegate copy data

run application

get infrastructure info

Upload files

get security token

Submit micro-architecture description

Provision micro-architecture

Provisioning

Deploying

Preparing application

Running application

From: Functional Architecture Technology selection Architecture design

Page 30: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure

Workflow of the Use Case

30PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC