a visual programming approach for intuitive handling of

1
2019-06-13 17:35:59,040 2019-06-13 17:36:10,677 2019-06-13 17:36:22,710 2019-06-13 17:36:34,827 2019-06-13 17:36:35,296 2019-06-13 17:36:47,292 2019-06-13 17:36:47,760 2019-06-13 17:36:59,290 2019-06-13 17:36:59,813 2019-06-13 17:37:00,287 2019-06-13 17:37:11,483 2019-06-13 17:37:11,952 2019-06-13 17:37:12,421 1 1 1 1 3 1 3 1 3 4 1 3 4 91,4165724839181 91,4977221972207 91,4828750193322 91,4680310948919 94,5678994962959 91,4531904229532 94,6518133572467 91,4531904229532 94,5834377966272 95,6246640379906 91,3572556621537 94,6673653803266 95,370168263591 I D Dissolved Oxygen conc. Date&Time During the last decade the expansion of high throughput biology and automated bioprocessing workflows has progressed by the advances in robotics, automation and sensor technology. As a result, the size and number of datasets dramatically grew as did the complexity. An upstream bioprocess engineer or scientist may have to manipulate & wrangle datasets from several different sources, such as online bioreactor profiles, and offline sample analysis, to process one typical experiment. This data wrangling burden to the end users often counteracts any efficiency advantage from using high throughput tools. This generated a requirement for data aware scientists to intervene over the wrangling and processing of complex datasets. This inefficient approach desperately needs a streamlined workflow to eliminate the need of experts for repetitive data processing tasks. An innovative approach to simplify and democratize basic data processing tasks is the introduction of visual programming. Visual programming is a paradigm that enables programs or scripts to be created by connecting graphical elements and illustrations instead of writing complex computation as text. This allows an end user, with no prior coding experience, to describe a process using an intuitive drag’n’drop approach. This degree of freedom enables lab scientists to perform rapid data processing based upon functions already implemented as graphical building blocks. In addition, this frees data scientists to focus on more complex challenges, than general data wrangling tasks. This simplification process allows for the easy combination of workflow steps into “macro” elements, which enhances the clarity of the application. Several of these overarching macro elements were implemented. These included the transfer of specific measured parameters (DO, pH, current volume) of a specific fermentation vessel into designated tables, as well as string to number conversion combined with a forward fill of the imported raw data. Furthermore, data visualization features were introduced as part of the data processing “assembly line”. Again, the basic components of KNIME were modified to fit the demanded visualization application (see Figure 2). All generated visual programming building blocks were evaluated by laboratory staff. The technique was identified as a fast an simple approach to perform basic data wrangling and visualization tasks. Introduction Robert Söldner ,*1 , Jonas Austerjost 1 , Simon Stumm 1 , David J. Pollard 1 1 Sartorius AG, Corporate Research, August-Spindler-Straße 11, 37097 * Corresponding author: [email protected] A visual programming approach for intuitive handling of bioprocess data The initialized visual programming application is based on the free and open-source KNIME Analytics Platform [1]. For the realization of the component, several recurrent data wrangling tasks within an R&D bioprocessing laboratory were identified. For multi-parallel bioreactor system batch data, these data wrangling tasks included the removal of undefined data points (NaN – Not a Number), forward filling of data points, data visualization and more. Based on these identifications, specific building components were implemented that cover the specific tasks. Subsequently the usability and the intuitiveness of the custom components was tested by lab scientists with no prior coding experience. Approach The implemented components are based on KNIME standard nodes, which were customized to fit to the structure and content of the bioprocessing raw data. These standard nodes were then condensed into single components to remove the complexity of the workflow (see Figure 1). Results Figure 1: Connected nodes (A), which depict the data wrangling process of multi-parallel bioreactor system batch raw data into a structured table with the desired features (B). A simplification of the data wrangling workflow was introduced by condensing the processing nodes of flow A) into a single component (flow B). A) B) C) Condensation Figure 2: Connected nodes, which depict a data wrangling and visualization process of multi-parallel bioreactor system batch raw data. Plottable features of the preprocessed raw data file were selected and plotted in line plots. The presented visual programming approach identified as a valuable tool for democratizing and accelerating the data wrangling and processing routine in a bioprocessing laboratory. Work that previously was performed by SMEs can now be executed by all parties in a short amount of time. Although the concept was assessed on bioprocessing data, the initialized approach is applicable to various kinds of structured or unstructured data. A unified, and broadly agreed standard for analytical and process data could make the process of custom building blocks redundant, though. Future implementations are aiming to integrate data analysis components, such as SIMCA-Q [2]. The concept of visual programming is not limited to data processing. The integration of already existing visual hardware, service and software wiring platforms into the laboratory and manufacturing space could facilitate and democratize equipment integration tasks and the setup of process automation pipelines. Conclusion & Outlook [1] KNIME Analytics Platform - https ://www.knime.com/knime- analytics-platform [2] SIMCA-Q Data Analysis Software - https:// umetrics.com/products/simca-q References

Upload: others

Post on 08-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

2019-06-13 17:35:59,0402019-06-13 17:36:10,6772019-06-13 17:36:22,7102019-06-13 17:36:34,8272019-06-13 17:36:35,2962019-06-13 17:36:47,2922019-06-13 17:36:47,7602019-06-13 17:36:59,2902019-06-13 17:36:59,8132019-06-13 17:37:00,2872019-06-13 17:37:11,4832019-06-13 17:37:11,9522019-06-13 17:37:12,421

1111313134134

91,416572483918191,497722197220791,482875019332291,468031094891994,567899496295991,453190422953294,651813357246791,453190422953294,583437796627295,624664037990691,357255662153794,667365380326695,370168263591

ID

Dissolved Oxygen conc.

Date&Time

During the last decade the expansion of high throughput biology andautomated bioprocessing workflows has progressed by the advances inrobotics, automation and sensor technology. As a result, the size andnumber of datasets dramatically grew as did the complexity. Anupstream bioprocess engineer or scientist may have to manipulate &wrangle datasets from several different sources, such as onlinebioreactor profiles, and offline sample analysis, to process one typicalexperiment. This data wrangling burden to the end users oftencounteracts any efficiency advantage from using high throughput tools.This generated a requirement for data aware scientists to interveneover the wrangling and processing of complex datasets. This inefficientapproach desperately needs a streamlined workflow to eliminate theneed of experts for repetitive data processing tasks.An innovative approach to simplify and democratize basic dataprocessing tasks is the introduction of visual programming.Visual programming is a paradigm that enables programs or scripts tobe created by connecting graphical elements and illustrations insteadof writing complex computation as text. This allows an end user, with noprior coding experience, to describe a process using an intuitivedrag’n’drop approach. This degree of freedom enables lab scientists toperform rapid data processing based upon functions alreadyimplemented as graphical building blocks. In addition, this frees datascientists to focus on more complex challenges, than general datawrangling tasks.

This simplification process allows for the easy combination of workflowsteps into “macro” elements, which enhances the clarity of theapplication. Several of these overarching macro elements wereimplemented. These included the transfer of specific measuredparameters (DO, pH, current volume) of a specific fermentation vesselinto designated tables, as well as string to number conversioncombined with a forward fill of the imported raw data.Furthermore, data visualization features were introduced as part of thedata processing “assembly line”. Again, the basic components ofKNIME were modified to fit the demanded visualization application(see Figure 2).

All generated visual programming building blocks were evaluated bylaboratory staff. The technique was identified as a fast an simpleapproach to perform basic data wrangling and visualization tasks.

Introduction

Robert Söldner,*1, Jonas Austerjost1, Simon Stumm1, David J. Pollard1

1 Sartorius AG, Corporate Research, August-Spindler-Straße 11, 37097* Corresponding author: [email protected]

A visual programming approach for intuitive handling of bioprocess data

The initialized visual programming application is based on the free andopen-source KNIME Analytics Platform [1]. For the realization of thecomponent, several recurrent data wrangling tasks within an R&Dbioprocessing laboratory were identified. For multi-parallel bioreactorsystem batch data, these data wrangling tasks included the removal ofundefined data points (NaN – Not a Number), forward filling of datapoints, data visualization and more. Based on these identifications,specific building components were implemented that cover thespecific tasks. Subsequently the usability and the intuitiveness of thecustom components was tested by lab scientists with no prior codingexperience.

Approach

The implemented components are based on KNIME standard nodes,which were customized to fit to the structure and content of thebioprocessing raw data. These standard nodes were then condensedinto single components to remove the complexity of the workflow (seeFigure 1).

Results

Figure 1: Connected nodes (A), which depict the data wrangling process of multi-parallel bioreactor system batch raw datainto a structured table with the desired features (B). A simplification of the data wrangling workflow was introduced bycondensing the processing nodes of flow A) into a single component (flow B).

A) B)

C)Condensation

Figure 2: Connected nodes, which depict a data wrangling and visualization process of multi-parallel bioreactor systembatch raw data. Plottable features of the preprocessed raw data file were selected and plotted in line plots.

The presented visual programming approach identified as a valuabletool for democratizing and accelerating the data wrangling andprocessing routine in a bioprocessing laboratory. Work that previouslywas performed by SMEs can now be executed by all parties in a shortamount of time. Although the concept was assessed on bioprocessingdata, the initialized approach is applicable to various kinds of structuredor unstructured data. A unified, and broadly agreed standard foranalytical and process data could make the process of custom buildingblocks redundant, though. Future implementations are aiming tointegrate data analysis components, such as SIMCA-Q [2].The concept of visual programming is not limited to data processing.The integration of already existing visual hardware, service and softwarewiring platforms into the laboratory and manufacturing space couldfacilitate and democratize equipment integration tasks and the setup ofprocess automation pipelines.

Conclusion & Outlook

[1] KNIME Analytics Platform - https://www.knime.com/knime-analytics-platform[2] SIMCA-Q Data Analysis Software -https://umetrics.com/products/simca-q

References