automated workflows for integrated project data … · automated workflows for integrated project...

31
Automated Workflows for Integrated Project Data Analysis Using Spotfire: the SpotAPP Family Nicolas Zorn, CADD Group, Roche Innovation Center Basel Basel Tibco Spotfire UGM, Nov 3 rd 2016

Upload: tranthien

Post on 09-Apr-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Automated Workflows for Integrated Project Data Analysis Using Spotfire: the SpotAPP Family

Nicolas Zorn, CADD Group, Roche Innovation Center Basel

Basel Tibco Spotfire UGM, Nov 3rd 2016

Page 2: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Data optimization cycle

Preamble

Drug discovery cycle

Key features for success: relevance, efficiency, agility – integration & interactivity

Analysis

Synthesis

Testing

Design

Presentation

Processing

Enrichment

Retrieval

Page 3: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Action

Building-up Capabilities to Help Answer Key Projects Questions

Knowledge

Information

Infrastructure

• SAR, SPR exploration

• Predicted properties

• Virtual Compounds

• Project-centric data mining and analysis

A

Automation

Flexibility

Modularity Data

Decision-making!

System for Agile Development

Page 4: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Presentation Outline

• Overview and concepts of the SpotAPP workflow

• Features highlights – Activity-efficacy data analysis – MMPs and project SAR analyses – ivive and PK/PD

• Workflow expansion to HTS data analysis: SpotHTS

• Conclusions, Perspectives

Page 5: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

SpotAPP: Integrated Project Data Mining and Analysis

Roche internal databases in house

Off-target, HTS and LTS data

Structural data

In house molecular property predictions

Matched Molecular Pairs and SAR analyses Cluster analysis and

series tagging

Custom calculations (selectivity, LE, ...)

Spotfire Automated Project Data Processing

External data (Chembl, patent data…)

PK/PD and ivive

Page 6: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Implementation Overview

• Project data flow

• Overall process control: CRON table drives SpotAPP project runs (frequency and options)

Spotfire library project template

Project data tables (linux FS and WIN shares) Data sources

Tables are linked and auto-embedded

Spotfire server automation service: reloads and embeds data tables after each update.

Full flexibility Options passed as one-letter codes for CRON jobs

Daily update All projects run in ~3h every night

processing

Page 7: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Project Data Structure during Processing

• Standard aggregation rules apply, then possibility to override/refine with project-custom rules

• Optionally, unpivoted data can be exposed at several aggregation levels

Non-pivoted, non-aggregated

1 row per result

Batch aggregation

1 row per result & per batch

Substance aggregation

1 row per result & per subst.

Project custom aggregation then pivoting

1 row per substance

Unpivoted data table

Pivoted data table

Partially aggregated or raw data tables

Page 8: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Project Data Processing: Standard & Customized

Retrieve data Std aggregation

Process structures, properties

Unpivoted data

ancillary data

Project aggregation,

Pivoting pivoted data

Input cleanup

• Auxiliary Pipeline pilot protocol: easy to setup from template to insert specific data manipulations

• Project control file: defines input, processing options and team decisions on desired data format

Final formatting

Retrieve and process off-tgt data

Predicted properties

Final formatting

Clustering Series Tagging

Final formatting

PKPD module

Page 9: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Advanced Series Tagging

• Project-specific substructure definitions (Molfile) Complex search hierarchy, multiple fields possible

• Used to label series, substructures, motifs…

• Can be combined with on-the-fly SS searches.

Page 10: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

SpotAPP Highlight: Activity and Efficacy Data Analysis

in house

Off-target, HTS and LTS data

Structural data

In house molecular property predictions

Matched Molecular Pairs and SAR analyses Cluster analysis and

series tagging

Custom calculations (selectivity, LE, ...)

External data (Chembl, patent data…)

PK/PD and ivive

Roche internal databases

Page 11: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

On-target Activity Data Browsing and Compound Profiling

IronPython script captures marked compounds as tag.

Marked compounds put in ‘shopping cart’ for side-by-side comparison and profiling

• Latest processed data is always exposed in project SpotAPP package

• Browse data across all available dimensions (activities, properties…)

Pivoted data used for overview and data correlations.

Page 12: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Activity–Efficacy Data Analysis and Drilldown

• Efficacy data analyzed using complex set of conditions: target subtype, species, doses, measurement mode…

Interplay of Spotfire pivoted and unpivoted tables: instantaneous data drill-down.

Page 13: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

SpotAPP Highlight: SAR Analysis of a Complex Data Set

in house

Off-target, HTS and LTS data

Structural data

In house molecular property predictions

Cluster analysis and series tagging

Custom calculations (selectivity, LE, ...)

External data (Chembl, patent data…)

PK/PD and ivive

Roche internal databases

Matched Molecular Pairs and SAR analyses

Page 14: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

SAR Analyses: Integration of Standard Methods into Project Context

• SpotAPP exposes and connects in-house Python-based SAR tools: – R-group decomposition – Matched pairs/series – Non-additivity analysis

• Advanced SAR analyses are pre-processed using activities/properties selected by the team and can make use of the SpotAPP series tags.

• Analysis results from SAR tools can then be connected to any other project data.

• Customization is possible by applying the same concepts as presented for regular project data.

MMPs, Non-add

Free- Wilson

R-groups

Project SAR

Page 15: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

R-group Decomposition Analysis

• Setup for single/multiple activity R-group matrix visualization and analysis

• Uses series tags when R attachments are defined à browse R-group cores using project definitions

1. Select core, R-groups and activity 2. Select compounds in R-group matrix

Connect to MMPs

data

3. See SAR and trends for 1+ activity(ies)

Page 16: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Matched Molecular Pairs in SpotAPP

Other project activity data

Project MDO data

Global MMPs knowledge

ΔPgp

ΔHepCl

X Y

X Y

Approach can be combined with ad hoc MMPs calculations…

‘Entry point’ data

ΔIC50 X Y

ΔIC50

ΔKi

X Y

X Y

ΔIC50

ΔKi

F

F

FC H 3

F

F

FC H 3

common core

O HN H 2

MMPs

Page 17: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

MMPs Workflow: Filter-down to Desired Pairs / Series

MMPs workhorse: Python platform designed for processing of large data & interactive queries

2. Browse/select individual pairs

3. Analyze D(activity/properties) for selected pairs and/or

Current marked compounds (R-groups…)

Aggregated MMPs trends (e.g. Avg, Geo… )

Line plot: identify outliers

and/or

1. Filter-to/mark set of MMPs to answer question

Page 18: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

MMPs Workflow: Interactive Substructure Search

• Allows fast substructure-based filtering using Core and variable fragments; can be combined with other filterings

Discngine Panel used as UI for SS search process and result reporting into SpotAPP package

Page 19: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

SpotAPP Highlight: Prototype ivive Calculations

in house

Off-target, HTS and LTS data

Structural data

In house molecular property predictions

Cluster analysis and series tagging

Custom calculations (selectivity, LE, ...)

External data (Chembl, patent data…)

Roche internal databases

Matched Molecular Pairs and SAR analyses

PK/PD and ivive

Page 20: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Concepts for SpotAPP ‘PK/PD’ Module Prototype

Goals: • Provide facilitated and standardized calculations of derived PK properties to team PK reps.

Enable consistent decision making & expose key visualizations to teams.

Key principles: • Automation of calculations using a central, validated, R script

• Implementation of different clearance scaling calculation methods è Comparison and selection of most appropriate method to share and use

• Customization of script behavior per project using control file and ad hoc data files (if needed)

Complete PK tables

Main data table

Main SpotAPP package

Process controlled by PK rep.

PK/PD processing

Advanced PK and PK/PD data package; & features Custom input / parameters

Internal DB’s

Page 21: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Example of Advanced PK Calculations Available to PK Rep. for Decision-making

• Example of different methods providing clearance scaling from hepatocytes

• PK concentration-time curves from individual animals and aggregated over treatment groups

Assume no binding

in vitro CLint_hep

in v

ivo

CLb

_unb

ound

_int

_hep

Estimate unspecific binding (Houston) Dilution method

Estimation of protein binding in hepatocyte incubation medium: fu from preclinical species

CLint in [mL/min/kg]

Page 22: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Action

SpotAPP Workflow Adaptation To HTS Analysis

Knowledge

Information

Infrastructure

• Early HIT SAR exploration

• Off-target, ancillary profile

• Hit-expansion

• HTS screen results, general compound info

Automation

Flexibility

Modularity Data

HTS data package

Re-use of Standard Processing Modules

Page 23: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Multi-Dimensional HTS Data Analysis from a Chemistry Perspective

HTS potency •

HTS counter-screens

HTS promiscuity

Ligand efficiency

Properties

SAR potential

Chemical tractability availability, purity,

synthesis…

Target / gene /

pathway data

Internal &

External knowledge

Chemical diversity

Page 24: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

SpotHTS Workflow Overview

HTS results

MDO, props

SEA analysis

HTS hit history

Off-tgt. data

DR curves

Tox/Safety data

Ligand eff.

Ontology analysis

Primary screen data (single points)

HTS confirmation data (dose-responses)

Tags, labels

HTS Package

External data

External data

Clusterings Roche DB

MDO, purity

Kinase panel

CEREP panel

PAINs

Advanced data mining (Phenotypic/assay profiling…)

Page 25: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

HTS Data Package: Highlights

HTS history and known in-house activities

Chemical space clustering and diversity analysis

Dashboard for multi-parametric hit analysis

• Platform for narrowing high Primary hit-rate, analyze hits

• Used for internal prioritization & sharing with external partners

Page 26: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Knowledge Capture: Hits Annotation within SpotHTS Package • Team members can annotate and label compounds inside SpotHTS package.

Annotations captured in Oracle DB via information link, then embedded in HTS package as data table.

1. Select hit(s) in analyses and provide annotation

2. Retrieve and analyze compounds based on annotations

Page 27: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Integrated Hit Expansion within SpotHTS Package

• Retrieval of top-100 similar compounds for all hits with dose-response data. Done as part of data processing (2D-based similarity).

à Use as initial info to assess hit SAR potential and screening follow-up activities

IronPython scripts in Spotfire to perform automatic list logic and markings.

1. Select hit(s) in analyses 2. See all IRCI closest neighbors 3. Find overlap and compounds not screened yet

Visualize data and stock availability, purity...

Page 28: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Conclusions

• SpotAPP ecosystem has organically grown last couple of years at Roche as DEV platform

Ø For project teams: provides integrated project data, advanced SAR and PK analyses…

Ø For experts-developers: helps test new features, data models & custom visualizations.

• Core design features for efficient data delivery:

Flexible, automated, customizable to project critical needs

• SpotAPP standard components and logic shared by other Spot* family members (potentially also by Roche New Assay Data Analysis Landscape tools)

Ø SpotHTS variant for HTS analysis: on the way to integrated hit delivery

HTS PK MDO *

Page 29: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Perspectives

• Spotfire is efficient and versatile as vector of new data models – especially for fast/complex data optimization cycles – excellent for relational data tables and large data volumes – iPython, R, information links and data connections are powerful – enhanced features possible via add-ons (Discngine, JS D3,…)

• Challenges still remain for drug discovery community – UI and data presentation simplification for non-experts – Increased chemical intelligence & cheminformatics-guided workflows

Presentation

Processing

Enrichment

Retrieval

Page 30: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Acknowledgements

• Brian Jones Yi Lin Lisa Sach-Peltason Christian Blumenroehr Daniel Wenger Olivier Roche Martin Blapp Gunther Doernen Peter Hilty (pREDi) Paula Petrone

• Jerome Hert Christian Kramer Michael Reutlinger Wolfgang Guba (CADD)

• Stefanie Bendels Martin Kapps + many other contributors (PS)

• Katrin Groebke-Zbinden John Cumming + many medicinal chemists for feedback-suggestions

• Eric Leroux (Discngine)

Page 31: Automated Workflows for Integrated Project Data … · Automated Workflows for Integrated Project Data Analysis ... Basel Tibco Spotfire UGM, Nov 3 rd 2016 . ... Integrated Project

Doing now what patients need next