![Page 1: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/1.jpg)
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777533.
Towards Exascale-readyData Service Solutions
Maximilian Höb
10th July 2019, Athens
PROviding Computing solutions for ExaScale ChallengeS
![Page 2: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/2.jpg)
Consortium
Ludwig-Maximilians-Universität München, Germany
Universiteit van Amsterdam, The Netherlands
Stichting Netherlands eScience Center, The Netherlands
Haute école spécialisée de Suisse occidentale, Switzerland
Lufthansa Systems GmbH & Co. KG, Germany
Inmark Europa SA, Spain
Ústav informatiky, Slovenská Akadémia Vied, Slovakia
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie, Poland
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 2
PROviding Computing solutions for ExaScale ChallengeS
![Page 3: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/3.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 3
Storage and Computing Centres
Partner‘s location
Storage Resources
Compute Resources
![Page 4: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/4.jpg)
PROCESS will deliver a comprehensive set of mature service
prototypes and tools specially developed to enable extreme
scale data processing in both scientific research and
advanced industry settings
3 Principles
1. Leapfrog beyond the current state of the art
2. Ensure broad research and innovation impact
3. Support the long tail of science and broader innovation
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 4
Vision of PROCESS
![Page 5: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/5.jpg)
ATMOS-PHERE
GRIDFTP
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
JUPYTER
TOSCA
RIMROCK
LOBCDERDATANET
DISPEL
A user-friendly modular exascale service platform to combinedata and computational services on top of European researchinfrastructures
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 5
PROCESS Concept
SuperMUC-NGLeibniz Supercomputing Centre Munich
![Page 6: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/6.jpg)
Mature, modular, generalizable Open Source solutions for user friendly exascale data.
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 6
Goals of PROCESS
UC#1: Exascale learning on medical image data
UC#4: Ancillary pricing/airline revenue management
UC#3: Supporting innovation based on global disaster risk data
UC#2: Analysis of Radioastronomy Observations
UC#5: Agro-Copernicus (correlating data between simulation and observation)
ATMOS-PHERE
GRIDFTP
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
JUPYTER
TOSCA
RIMROCK
LOBCDERDATANET
DISPELATMOS-PHERE
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
JUPYTER
TOSCA
LOBCDER
DISPELATMOS-PHERE
GRIDFTP
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
TOSCA
RIMROCK
LOBCDERDATANET
UC#2: Analysis of Radioastronomy Observations
![Page 7: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/7.jpg)
PROCESS Architecture
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 7
ATMOS-PHERE
GRIDFTP
JUPYTER
RIMROCK
IEE
AUTHENTI-FICATION
CLOUDIFY
TOSCA
KUBER-NETES
NEXT-CLOUD
LOBCDERDATANET
DISPEL
ATMOS-PHERE
GRIDFTP
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
JUPYTER
TOSCA
RIMROCK
LOBCDERDATANET
DISPEL
![Page 8: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/8.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 8
PROCESS Architecture
![Page 9: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/9.jpg)
Approach: Tiered system with a layer of virtual (data) nodes facilitating:
• data transfers,
• distributed management,
• scheduling and staging.
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 9
Data Delivery for extreme Data Applications
• Independent of resource providers (storage & computing)• Work with data across distributed provider data. • collaboration across research groups
Implementation: container-centric, orchestrated using Kubernetes.software: https://github.com/recap/MicroInfrastructure
A programmable micro-infrastructure
Through
enab
le
![Page 10: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/10.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 10
Data Management with containerized Services
DISPEL graphical authoring and execution environment based on Eclipse
Jupyter service using the storage adaptor.
Querying files from the Prometheus adaptor through WebDAV service
NextCloud service exposes a GUI to the user to work with user’s files
WebDAV service deployed through the micro-infrastructure. Through the API the user sets the user-name and password which will protect the WebDAV point
![Page 11: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/11.jpg)
AppTask 1
AppTask 2
AppTask N
Log AppManager
ComputeAdaptor
ComputeAdaptor Data
Adaptor
DataAdaptor
WorkflowManager
ExtraApp
Services
Workflow to infrastructure
Application logic containers
Application specific management containers
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 11
ApplicationDescription
![Page 12: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/12.jpg)
DeploymentServices
ApplicationDescription
DeploymentServices
DeploymentServices
Container orchestrator e.g. Kubernetes
Deploy application infrastructure
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 12
![Page 13: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/13.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 13
Data infrastructure
PROCESS Data infrastructure including data adaptors for UC#1 and UC#2:• Reuse of container adaptors across use cases• Ability to add new application specific container adaptors
![Page 14: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/14.jpg)
Use Case 1
Machine Learning in Medical Imaging
Haute école spécialisée
de Suisse occidentale, Switzerland
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 14
PROviding Computing solutions for ExaScale ChallengeS
![Page 15: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/15.jpg)
• Use of machine learning to analyse large histopathology images(>100,000x100,000 pixels)
• Cut into small patches for treatment
• Mainly for cancer care to highlight regions of interest
• Use of standard tools such as Keras, Tensorflow, … for Deep learning.
• Adapt the machine learning tools to large data centres and make themscale to improve the amount of training data and thus improve thequality of the models
• Histopathology data is produced in massive quantities constantly
• Use a safe environment for possibly confidential data
• Have a simple user interface to test new pipelines
Use Case Scenario and Objectives
15PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC
![Page 16: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/16.jpg)
Use Case 2
Analysis of Radioastronomy ObservationsLOFAR / SKA
Stichting Netherlands eScience Center, The Netherlands
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 16
PROviding Computing solutions for ExaScale ChallengeS
![Page 17: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/17.jpg)
LOFAR: Low Frequency Array radio telescope – is a “distributed
software telescope” consisting of ~88.000 antennas in ~51 stations
scattered over Europe. It produces up to 35 TB/h of intermediate data
(visibilities) which is stored for further analysis.
Images courtesy of:
ASTRON
Analysis of Radioastronomy Observations
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 17
SKA: Square Kilometer Array (Operational in 2022+)
130K ~ 1M (LOFAR-style) antenna in Australia + 200 ~ 2000 dishes in
South Africa. Wider frequency range and higher sensitivity and survey
speed than existing telescopes.
Zettabytes/year raw data: 130~300PB/year of correlated data
Huge data and processing problem
![Page 18: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/18.jpg)
Effect of Processing
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 18
![Page 19: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/19.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 19
UCs prototypes based on modular services
ATMOS-PHERE
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
JUPYTER
TOSCA
LOBCDER
DISPELATMOS-PHERE
GRIDFTP
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
TOSCA
RIMROCK
LOBCDERDATANET
![Page 20: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/20.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 20
Pipeline and Workflow Configuration Portal
![Page 21: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/21.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 21
Pipeline Deployment and Output
![Page 22: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/22.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 22
Towards an Exascale-ready Solutions
ATMOS-PHERE
GRIDFTP
IEE
KUBER-NETES
CLOUDIFY
NEXT-CLOUD
JUPYTER
TOSCA
RIMROCK
LOBCDERDATANET
DISPEL
![Page 23: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/23.jpg)
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 23
Enabling Exascale
https://www.top500.org/statistics/perfdevel/
![Page 24: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/24.jpg)
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777533.
Maximilian Hö[email protected]
eScience Workshop Platform-driven e-Infrastructure Innovations
September 24, 2019, San Diego, USA
PROviding Computing solutions for ExaScale ChallengeS
![Page 25: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/25.jpg)
• Data Transfer Nodes (DTN)• Needed for optimal data transfers between different data centres
• Problems• Limitation of TCP
• Firewalls „are evil“
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 25
Data Transfer Nodes
Dedicated, optimized Data Transfer Nodes (DTN)
Source: Peter Hinrich SURFnet “Problems with data transfers“
UvA
SURFnet, DFN, Sanet, PSNC
WAN
![Page 26: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/26.jpg)
AppTask 1
AppTask 2
AppTask N
Log AppManager
ComputeAdaptor
ComputeAdaptor Data
Adaptor
DataAdaptor
WorkflowManager
ExtraApp
Services
Workflow to infrastructure
Application logic containers
Application specific management containers
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 26
ApplicationDescription
![Page 27: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/27.jpg)
DeploymentServices
ApplicationDescription
DeploymentServices
DeploymentServices
Container orchestrator e.g. Kubernetes
Deploy application infrastructure
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 27
![Page 28: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/28.jpg)
● Data-staging service (in progress)○ Batch system for data transfers between sites○ Minimize data copies e.g. JIT data transfers○ Containerize protocol handlers (adaptors)
● Data adaptors (in progress)○ SCP to SCP○ gridFTP to gridFTP○ FTS3 to FTS3○ Define a common container interface
● Compute offload (to do)○ Compute scheduler to decide where to run processing; on the
application infrastructure or offload to an HPC site.
Some containerized services
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 28
![Page 29: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/29.jpg)
Containers Description
WebDAV container Protected a public WebDav entry point.
Token-based WebDAV
meant for access by computing services.
DataNet-adaptor performs operations on metadata
Staging service
stage data just-in-time on the HPC file systems.
DISPELaccess to data (pre)processing environment.
Jupytercontainer access data through Jupyter notebook.
NextCloud view data in Dropbox fashion.
PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC 29
Evolution of Data Management
TO: specification of the interaction of the different services
User
LOBCDER
K8s Staging Authservice
Datanet
Adaptors EE RimRocDISPEL
Upload jupyter
Pre -process
DataStores
DataNet
NextCloud
Micro-infrastructure containers
deploy container
deploy container
deploy container
deploy container
deploy container
deploy container
deploy container
Mount physical data stores
Register files
Create application
start application
get user infrastructure
user endpoint
Storage data
getfile (url-encoded-parameter)
Call preprocessing staging pipeline
Webhook: callback
upload: results
Delegate copy data
run application
get infrastructure info
Upload files
get security token
Submit micro-architecture description
Provision micro-architecture
Provisioning
Deploying
Preparing application
Running application
From: Functional Architecture Technology selection Architecture design
![Page 30: Towards Exascale-ready Data Service Solutionsproject-dare.eu/wp-content/uploads/2019/07/8.Hob_PROCESS.pdf · friendly exascale data. PROCESS - Creating Platform-Driven E-Infrastructure](https://reader035.vdocuments.us/reader035/viewer/2022071117/60034deab16dca1435038e3d/html5/thumbnails/30.jpg)
Workflow of the Use Case
30PROCESS - Creating Platform-Driven E-Infrastructure Innovation On EOSC