automated workflows for integrated project data … · automated workflows for integrated project...

Automated Workflows for Integrated Project Data Analysis Using Spotfire: the SpotAPP Family

Nicolas Zorn, CADD Group, Roche Innovation Center Basel

Basel Tibco Spotfire UGM, Nov 3rd 2016

Data optimization cycle

Preamble

Drug discovery cycle

Key features for success: relevance, efficiency, agility – integration & interactivity

Analysis

Synthesis

Testing

Design

Presentation

Processing

Enrichment

Retrieval

Action

Building-up Capabilities to Help Answer Key Projects Questions

Knowledge

Information

Infrastructure

• SAR, SPR exploration

• Predicted properties

• Virtual Compounds

• Project-centric data mining and analysis

A

•

Automation

Flexibility

•

•

Modularity Data

Decision-making!

System for Agile Development

Presentation Outline

• Overview and concepts of the SpotAPP workflow

• Features highlights – Activity-efficacy data analysis – MMPs and project SAR analyses – ivive and PK/PD

• Workflow expansion to HTS data analysis: SpotHTS

• Conclusions, Perspectives

SpotAPP: Integrated Project Data Mining and Analysis

Roche internal databases in house

Off-target, HTS and LTS data

Structural data

In house molecular property predictions

Matched Molecular Pairs and SAR analyses Cluster analysis and

series tagging

Custom calculations (selectivity, LE, ...)

Spotfire Automated Project Data Processing

External data (Chembl, patent data…)

PK/PD and ivive

Implementation Overview

• Project data flow

• Overall process control: CRON table drives SpotAPP project runs (frequency and options)

Spotfire library project template

Project data tables (linux FS and WIN shares) Data sources

Tables are linked and auto-embedded

Spotfire server automation service: reloads and embeds data tables after each update.

Full flexibility Options passed as one-letter codes for CRON jobs

Daily update All projects run in ~3h every night

processing

Project Data Structure during Processing

• Standard aggregation rules apply, then possibility to override/refine with project-custom rules

• Optionally, unpivoted data can be exposed at several aggregation levels

Non-pivoted, non-aggregated

1 row per result

Batch aggregation

1 row per result & per batch

Substance aggregation

1 row per result & per subst.

Project custom aggregation then pivoting

1 row per substance

Unpivoted data table

Pivoted data table

Partially aggregated or raw data tables

Project Data Processing: Standard & Customized

Retrieve data Std aggregation

Process structures, properties

Unpivoted data

ancillary data

Project aggregation,

Pivoting pivoted data

Input cleanup

• Auxiliary Pipeline pilot protocol: easy to setup from template to insert specific data manipulations

• Project control file: defines input, processing options and team decisions on desired data format

Final formatting

Retrieve and process off-tgt data

Predicted properties

Final formatting

Clustering Series Tagging

Final formatting

PKPD module

…

Advanced Series Tagging

• Project-specific substructure definitions (Molfile) Complex search hierarchy, multiple fields possible

• Used to label series, substructures, motifs…

• Can be combined with on-the-fly SS searches.

SpotAPP Highlight: Activity and Efficacy Data Analysis

in house


Structural data


Matched Molecular Pairs and SAR analyses Cluster analysis and

series tagging



PK/PD and ivive

Roche internal databases

On-target Activity Data Browsing and Compound Profiling

IronPython script captures marked compounds as tag.

Marked compounds put in ‘shopping cart’ for side-by-side comparison and profiling

• Latest processed data is always exposed in project SpotAPP package

• Browse data across all available dimensions (activities, properties…)

Pivoted data used for overview and data correlations.

Activity–Efficacy Data Analysis and Drilldown

• Efficacy data analyzed using complex set of conditions: target subtype, species, doses, measurement mode…

Interplay of Spotfire pivoted and unpivoted tables: instantaneous data drill-down.

SpotAPP Highlight: SAR Analysis of a Complex Data Set

in house


Structural data


Cluster analysis and series tagging



PK/PD and ivive


Matched Molecular Pairs and SAR analyses

SAR Analyses: Integration of Standard Methods into Project Context

• SpotAPP exposes and connects in-house Python-based SAR tools: – R-group decomposition – Matched pairs/series – Non-additivity analysis

• Advanced SAR analyses are pre-processed using activities/properties selected by the team and can make use of the SpotAPP series tags.

• Analysis results from SAR tools can then be connected to any other project data.

• Customization is possible by applying the same concepts as presented for regular project data.

MMPs, Non-add

Free- Wilson

R-groups

Project SAR

R-group Decomposition Analysis

• Setup for single/multiple activity R-group matrix visualization and analysis

• Uses series tags when R attachments are defined à browse R-group cores using project definitions

1. Select core, R-groups and activity 2. Select compounds in R-group matrix

Connect to MMPs

data

3. See SAR and trends for 1+ activity(ies)

Matched Molecular Pairs in SpotAPP

Other project activity data

Project MDO data

Global MMPs knowledge

ΔPgp

ΔHepCl

X Y

X Y

Approach can be combined with ad hoc MMPs calculations…

‘Entry point’ data

ΔIC50 X Y

ΔIC50

ΔKi

X Y

X Y

ΔIC50

ΔKi

F

F

FC H 3

F

F

FC H 3

common core

O HN H 2

MMPs

MMPs Workflow: Filter-down to Desired Pairs / Series

MMPs workhorse: Python platform designed for processing of large data & interactive queries

2. Browse/select individual pairs

3. Analyze D(activity/properties) for selected pairs and/or

Current marked compounds (R-groups…)

Aggregated MMPs trends (e.g. Avg, Geo… )

Line plot: identify outliers

and/or

1. Filter-to/mark set of MMPs to answer question

MMPs Workflow: Interactive Substructure Search

• Allows fast substructure-based filtering using Core and variable fragments; can be combined with other filterings

Discngine Panel used as UI for SS search process and result reporting into SpotAPP package

SpotAPP Highlight: Prototype ivive Calculations

in house


Structural data


Cluster analysis and series tagging




Matched Molecular Pairs and SAR analyses

PK/PD and ivive

Concepts for SpotAPP ‘PK/PD’ Module Prototype

Goals: • Provide facilitated and standardized calculations of derived PK properties to team PK reps.

Enable consistent decision making & expose key visualizations to teams.

Key principles: • Automation of calculations using a central, validated, R script

• Implementation of different clearance scaling calculation methods è Comparison and selection of most appropriate method to share and use

• Customization of script behavior per project using control file and ad hoc data files (if needed)

Complete PK tables

Main data table

Main SpotAPP package

Process controlled by PK rep.

PK/PD processing

Advanced PK and PK/PD data package; & features Custom input / parameters

Internal DB’s

http://www.google.ch/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://developer.r-project.org/Logo/&ei=jblxVbP5LomnU_vGgMAB&bvm=bv.95039771,d.d24&psig=AFQjCNHXpaGZy2UvYJyMxLVkBLY7GuxeZA&ust=1433602832476571

Example of Advanced PK Calculations Available to PK Rep. for Decision-making

• Example of different methods providing clearance scaling from hepatocytes

• PK concentration-time curves from individual animals and aggregated over treatment groups

Assume no binding

in vitro CLint_hep

in v

ivo

CLb

_unb

ound

_int

_hep

Estimate unspecific binding (Houston) Dilution method

Estimation of protein binding in hepatocyte incubation medium: fu from preclinical species

CLint in [mL/min/kg]

Action

SpotAPP Workflow Adaptation To HTS Analysis

Knowledge

Information

Infrastructure

• Early HIT SAR exploration

• Off-target, ancillary profile

• Hit-expansion

• HTS screen results, general compound info

•

Automation

Flexibility

•

•

Modularity Data

HTS data package

Re-use of Standard Processing Modules

Multi-Dimensional HTS Data Analysis from a Chemistry Perspective

HTS potency •

HTS counter-screens

HTS promiscuity

Ligand efficiency

Properties

SAR potential

Chemical tractability availability, purity,

synthesis…

Target / gene /

pathway data

Internal &

External knowledge

Chemical diversity

SpotHTS Workflow Overview

HTS results

MDO, props

SEA analysis

HTS hit history

Off-tgt. data

DR curves

Tox/Safety data

Ligand eff.

Ontology analysis

Primary screen data (single points)

HTS confirmation data (dose-responses)

Tags, labels

HTS Package

External data

External data

Clusterings Roche DB

MDO, purity

Kinase panel

CEREP panel

PAINs

Advanced data mining (Phenotypic/assay profiling…)

HTS Data Package: Highlights

HTS history and known in-house activities

Chemical space clustering and diversity analysis

Dashboard for multi-parametric hit analysis

• Platform for narrowing high Primary hit-rate, analyze hits

• Used for internal prioritization & sharing with external partners

Knowledge Capture: Hits Annotation within SpotHTS Package • Team members can annotate and label compounds inside SpotHTS package.

Annotations captured in Oracle DB via information link, then embedded in HTS package as data table.

1. Select hit(s) in analyses and provide annotation

2. Retrieve and analyze compounds based on annotations

Integrated Hit Expansion within SpotHTS Package

• Retrieval of top-100 similar compounds for all hits with dose-response data. Done as part of data processing (2D-based similarity).

à Use as initial info to assess hit SAR potential and screening follow-up activities

IronPython scripts in Spotfire to perform automatic list logic and markings.

1. Select hit(s) in analyses 2. See all IRCI closest neighbors 3. Find overlap and compounds not screened yet

Visualize data and stock availability, purity...

Conclusions

• SpotAPP ecosystem has organically grown last couple of years at Roche as DEV platform

Ø For project teams: provides integrated project data, advanced SAR and PK analyses…

Ø For experts-developers: helps test new features, data models & custom visualizations.

• Core design features for efficient data delivery:

Flexible, automated, customizable to project critical needs

• SpotAPP standard components and logic shared by other Spot* family members (potentially also by Roche New Assay Data Analysis Landscape tools)

Ø SpotHTS variant for HTS analysis: on the way to integrated hit delivery

HTS PK MDO *

Perspectives

• Spotfire is efficient and versatile as vector of new data models – especially for fast/complex data optimization cycles – excellent for relational data tables and large data volumes – iPython, R, information links and data connections are powerful – enhanced features possible via add-ons (Discngine, JS D3,…)

• Challenges still remain for drug discovery community – UI and data presentation simplification for non-experts – Increased chemical intelligence & cheminformatics-guided workflows

Presentation

Processing

Enrichment

Retrieval

Acknowledgements

• Brian Jones Yi Lin Lisa Sach-Peltason Christian Blumenroehr Daniel Wenger Olivier Roche Martin Blapp Gunther Doernen Peter Hilty (pREDi) Paula Petrone

• Jerome Hert Christian Kramer Michael Reutlinger Wolfgang Guba (CADD)

• Stefanie Bendels Martin Kapps + many other contributors (PS)

• Katrin Groebke-Zbinden John Cumming + many medicinal chemists for feedback-suggestions

• Eric Leroux (Discngine)

Doing now what patients need next

automated workflows for integrated project data … · automated workflows for integrated project...

Documents