knime for the masses: flexible & scalable deployment of knime … · 2017. 5. 23. · frontend...

23
Nils Weskamp KNIME for the Masses: Flexible & Scalable Deployment of KNIME-Workflows

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Nils Weskamp

KNIME for the Masses: Flexible & Scalable Deployment of KNIME-Workflows

Page 2: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Overview

• Background & Motivation

• Issues and challenges

• Technical compatibility: Web Service ≠ Web Service

• Software licenses and access control

• Robust and scalable deployment of KNIME-workflows

• Results and current status

• Frontend-integration examples

• Usage statistics

• Summary and discussion

Page 3: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Motivation – Background

“CompChem World”

“MedChem World”

• Windows-based setup • Large number of end users • Well-defined installation • Small number of relatively

mature software packages

• Corporate IT:

• Linux-based setup • Small number of power users • High number of software

packages, often from academic groups or small vendors

• Corporate IT:

D360 Moe Spotfire

Marvin

KNIME Pipeline Pilot

Tool X

Tool Y

Tool Z

Tool A

Tool B

Tool C

Tool …

? Increasing need to make scientific

calculation engines directly available to end users

Page 4: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Motivation – Status quo

“CompChem World”

“MedChem World”

Tool X

Tool Y

Tool Z

Tool A

Tool B

Tool C

Tool …

Frontend X

Frontend Y

Frontend Y

• Various, isolated integrations of some calculation engines into some frontends

• Inconsistencies across frontends • Need to setup and maintain various

related calculation engines in parallel • Need to use a certain frontend to access

a given calculation

Page 5: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Motivation – Brave new world

“CompChem World”

“MedChem World”

Tool X

Tool Y

Tool Z

Tool A

Tool B

Tool C

Tool …

Frontend X

Frontend Y

Frontend Y

Computational Chemistry Framework (CCFW)

• All calculation engines are available to all end users in relevant frontends

• Opportunities for service consolidation based on science, not technology

• Consistency of calculation results across frontends

Page 6: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Overview

• Background & Motivation

• Issues and challenges

• Technical compatibility: Web Service ≠ Web Service

• Software licenses and access control

• Robust and scalable deployment of KNIME-workflows

• Results and current status

• Frontend-integration examples

• Usage statistics

• Summary and discussion

Page 7: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Motivation – Brave new world

“CompChem World”

“MedChem World”

Tool X

Tool Y

Tool Z

Tool A

Tool B

Tool C

Tool …

Frontend X

Frontend Y

Frontend Y

Computational Chemistry Framework (CCFW)

• All calculation engines are available to all end users in relevant frontends

• Opportunities for service consolidation based on science, not technology

• Consistency of calculation results across frontends

Web Services!

Page 8: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Web Services – Reality check

Tool A Tool B

Tool C

Web Services Web Services

Web Services

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum.

ما تعد . مقاومة واّتجه بولندا، كان كل

دون كثيرة والروسية , إجالء وفنلندا

. هذا عل مساعدة وفرنسا اوروبا, و

غّرة، , في إحكام مكّثفة والديون الن

بل لّم مشروط , األرضية جعل بل

, و بعض اإلنزال اإليطالية. ومحاولة

السفن , عل وقد الخاطفة ويكيبيديا

وإقامة. العسكري وبولندا مكن أن

骧簯階 橣䦌みゃ갤礯 ラ功フェ禨馩 礯ラ, ごウ ジェ榯ホ と黧仯大ふ 樦ゝぢ廨韦, 䩵楟栩ぴゃぢゃ シャろ蛣䪤禞 荤軣椢ぢょジョ みゅトゥ れでびょ 䥜ぽ 廨韦ホゥ 棌夯馣䯞郎 䨣詞覩䧥滧 と黧仯

CCFW

Page 9: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

KNIME – The magic glue to integrate tools and applications

• KNIME makes it very simple to combine nodes from various sources and contributors

• Significant improvements over the last year: no more explicit format conversions / type casts necessary

• Many node collections (from commercial vendors) require a software license

• Each license comes with its own terms, conditions and restrictions

• Users, Tokens, Sites, Nodes, Servers, Clients, Copies, Installations, Intended Use, Cloud, Principal Place of Business as Registered, Annual, Perpetual, Increment …

• Not trivial to decide whether a given person in a global organization has the right to execute the KNIME workflow

• To make things worse, also internal rules for data access have to be considered

Page 10: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Access group management

Page 11: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Integration with KNIME

• Deployment of KNIME-workflows across a global Research IT landscape poses a significant challenge

• 500+ end users benefit from a deployment, but are also affected by a service outage

• Unpredictable usage / load patterns

• Robust error handling essential

• The deployment mechanism has to

• be adaptable to variable load / usage (dynamic scalability)

• reattempt failed calculations up to n times

• pre-load / cache workflows to ensure responsiveness

• restart KNIME instances regularly

• Decision was made to implement a customized deployment mechanism

Page 12: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Job Monitor

CCFW – Integration with KNIME

BI-internal HPC / cloud environment

KNIME Worker

KNIME Worker

SGE Queueing

System

KNIME Worker

• Conductor ensures a given number (50-100) of KNIME worker instances is active at all times • Workers terminate after a given period of time to avoid Java-specific memory / resource issues • Jobs are placed in the cluster by a cluster queueing system (SGE)

KNIME Worker

Page 13: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Job Monitor

CCFW – Integration with KNIME

BI-internal HPC / cloud environment

KNIME Worker

SGE Queueing

System

• Requests are placed by the CCFW in a request spooling mechanism • KNIME worker instances monitor spooler and retrieve requests (modified KNIME batch executor) • Surplus requests are stored until workers become available • Up to n attempts to process a request in case of failure / long processing time

KNIME Worker

CCFW

Requests Request/ Response Spooler KNIME

Worker KNIME Worker

Responses

Page 14: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Integration with KNIME

• Workflow templates available for typical input / output types; allowing users to focus on content

Page 15: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Integration with KNIME

• Service publishing / registration can be done within minutes for typical input / output types

Page 16: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Overview

• Background & Motivation

• Issues and challenges

• Technical compatibility: Web Service ≠ Web Service

• Software licenses and access control

• Robust and scalable deployment of KNIME-workflows

• Results and current status

• Frontend-integration examples

• Usage statistics

• Summary and discussion

Page 17: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Usage Statistics

PhysChem & Molecular Descriptors ADME(T) Predictions General CI & Infrastructure

• 1.5M+ requests processed in 2014 with a very high usage / load variability over time

Page 18: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Integration into MarvinSketch

Page 19: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Integration into D360

Page 20: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

CCFW – Integration into KNIME and Pipeline Pilot

Page 21: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Summary and Discussion

• Calculated parameters & predictive models contribute significantly to drug discovery research

• Some hurdles to adoption are difficult to address and are likely to persist:

• “Black-box” models with limited interpretability

• Skepticism concerning the “unphysical” character of models

• Applicability domain / error-bar estimations

• Other issues of practical relevance are much easier to address and might increase end-user acceptance:

• Convenient access to calculations and models from relevant frontends

• Consistent use of the same engines across a large organisation

• Response times and throughput

Page 22: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Summary and Discussion (II)

• Large-scale deployment of KNIME-workflows to various tools and heterogeneous user groups is possible

• Some challenges and issues had to be resolved

• Technical incompatibilities of different tools and platforms

• Licensing & access control

• Robust and scalable deployment

• Development opportunities for KNIME for this application field:

• Performance tuning of workflows; identification of major bottlenecks

• Debugging of workflows; linking log file entries to individual nodes

• “Workflow refactoring”

Page 23: KNIME for the Masses: Flexible & Scalable Deployment of KNIME … · 2017. 5. 23. · Frontend X Frontend Y Frontend Y Computational Chemistry Framework (CCFW) • All calculation

Acknowledgements

• Torry Harris Business Solutions • Janez Banic

• Ajay Kannan

• Shree Lakshmi

• Maya Madhusudan

• Shubha Sridhar

• Rishu Srivastava

• KNIME • Bernd Wiswedel

• Boehringer Ingelheim

• Bernd Beck

• Jörg Bentzien

• Andreas Bergner

• Gerald Birringer

• Robert Happel

• Oliver Krämer

• Araz Jakalian

• Jan Kriegl

• Johannes Koppe

• Ingo Mügge

• Prasenjit Mukherjee

• Edith Richter

• Matthias Röhm

• Andreas Teckentrup

• Nils Weskamp

• Matthias Zentgraf