automated workflows for integrated project data … · automated workflows for integrated project...
TRANSCRIPT
Automated Workflows for Integrated Project Data Analysis Using Spotfire: the SpotAPP Family
Nicolas Zorn, CADD Group, Roche Innovation Center Basel
Basel Tibco Spotfire UGM, Nov 3rd 2016
Data optimization cycle
Preamble
Drug discovery cycle
Key features for success: relevance, efficiency, agility – integration & interactivity
Analysis
Synthesis
Testing
Design
Presentation
Processing
Enrichment
Retrieval
Action
Building-up Capabilities to Help Answer Key Projects Questions
Knowledge
Information
Infrastructure
• SAR, SPR exploration
• Predicted properties
• Virtual Compounds
• Project-centric data mining and analysis
A
•
Automation
Flexibility
•
•
Modularity Data
Decision-making!
System for Agile Development
Presentation Outline
• Overview and concepts of the SpotAPP workflow
• Features highlights – Activity-efficacy data analysis – MMPs and project SAR analyses – ivive and PK/PD
• Workflow expansion to HTS data analysis: SpotHTS
• Conclusions, Perspectives
SpotAPP: Integrated Project Data Mining and Analysis
Roche internal databases in house
Off-target, HTS and LTS data
Structural data
In house molecular property predictions
Matched Molecular Pairs and SAR analyses Cluster analysis and
series tagging
Custom calculations (selectivity, LE, ...)
Spotfire Automated Project Data Processing
External data (Chembl, patent data…)
PK/PD and ivive
Implementation Overview
• Project data flow
• Overall process control: CRON table drives SpotAPP project runs (frequency and options)
Spotfire library project template
Project data tables (linux FS and WIN shares) Data sources
Tables are linked and auto-embedded
Spotfire server automation service: reloads and embeds data tables after each update.
Full flexibility Options passed as one-letter codes for CRON jobs
Daily update All projects run in ~3h every night
processing
Project Data Structure during Processing
• Standard aggregation rules apply, then possibility to override/refine with project-custom rules
• Optionally, unpivoted data can be exposed at several aggregation levels
Non-pivoted, non-aggregated
1 row per result
Batch aggregation
1 row per result & per batch
Substance aggregation
1 row per result & per subst.
Project custom aggregation then pivoting
1 row per substance
Unpivoted data table
Pivoted data table
Partially aggregated or raw data tables
Project Data Processing: Standard & Customized
Retrieve data Std aggregation
Process structures, properties
Unpivoted data
ancillary data
Project aggregation,
Pivoting pivoted data
Input cleanup
• Auxiliary Pipeline pilot protocol: easy to setup from template to insert specific data manipulations
• Project control file: defines input, processing options and team decisions on desired data format
Final formatting
Retrieve and process off-tgt data
Predicted properties
Final formatting
Clustering Series Tagging
Final formatting
PKPD module
…
Advanced Series Tagging
• Project-specific substructure definitions (Molfile) Complex search hierarchy, multiple fields possible
• Used to label series, substructures, motifs…
• Can be combined with on-the-fly SS searches.
SpotAPP Highlight: Activity and Efficacy Data Analysis
in house
Off-target, HTS and LTS data
Structural data
In house molecular property predictions
Matched Molecular Pairs and SAR analyses Cluster analysis and
series tagging
Custom calculations (selectivity, LE, ...)
External data (Chembl, patent data…)
PK/PD and ivive
Roche internal databases
On-target Activity Data Browsing and Compound Profiling
IronPython script captures marked compounds as tag.
Marked compounds put in ‘shopping cart’ for side-by-side comparison and profiling
• Latest processed data is always exposed in project SpotAPP package
• Browse data across all available dimensions (activities, properties…)
Pivoted data used for overview and data correlations.
Activity–Efficacy Data Analysis and Drilldown
• Efficacy data analyzed using complex set of conditions: target subtype, species, doses, measurement mode…
Interplay of Spotfire pivoted and unpivoted tables: instantaneous data drill-down.
SpotAPP Highlight: SAR Analysis of a Complex Data Set
in house
Off-target, HTS and LTS data
Structural data
In house molecular property predictions
Cluster analysis and series tagging
Custom calculations (selectivity, LE, ...)
External data (Chembl, patent data…)
PK/PD and ivive
Roche internal databases
Matched Molecular Pairs and SAR analyses
SAR Analyses: Integration of Standard Methods into Project Context
• SpotAPP exposes and connects in-house Python-based SAR tools: – R-group decomposition – Matched pairs/series – Non-additivity analysis
• Advanced SAR analyses are pre-processed using activities/properties selected by the team and can make use of the SpotAPP series tags.
• Analysis results from SAR tools can then be connected to any other project data.
• Customization is possible by applying the same concepts as presented for regular project data.
MMPs, Non-add
Free- Wilson
R-groups
Project SAR
R-group Decomposition Analysis
• Setup for single/multiple activity R-group matrix visualization and analysis
• Uses series tags when R attachments are defined à browse R-group cores using project definitions
1. Select core, R-groups and activity 2. Select compounds in R-group matrix
Connect to MMPs
data
3. See SAR and trends for 1+ activity(ies)
Matched Molecular Pairs in SpotAPP
Other project activity data
Project MDO data
Global MMPs knowledge
ΔPgp
ΔHepCl
X Y
X Y
Approach can be combined with ad hoc MMPs calculations…
‘Entry point’ data
ΔIC50 X Y
ΔIC50
ΔKi
X Y
X Y
ΔIC50
ΔKi
F
F
FC H 3
F
F
FC H 3
common core
O HN H 2
MMPs
MMPs Workflow: Filter-down to Desired Pairs / Series
MMPs workhorse: Python platform designed for processing of large data & interactive queries
2. Browse/select individual pairs
3. Analyze D(activity/properties) for selected pairs and/or
Current marked compounds (R-groups…)
Aggregated MMPs trends (e.g. Avg, Geo… )
Line plot: identify outliers
and/or
1. Filter-to/mark set of MMPs to answer question
MMPs Workflow: Interactive Substructure Search
• Allows fast substructure-based filtering using Core and variable fragments; can be combined with other filterings
Discngine Panel used as UI for SS search process and result reporting into SpotAPP package
SpotAPP Highlight: Prototype ivive Calculations
in house
Off-target, HTS and LTS data
Structural data
In house molecular property predictions
Cluster analysis and series tagging
Custom calculations (selectivity, LE, ...)
External data (Chembl, patent data…)
Roche internal databases
Matched Molecular Pairs and SAR analyses
PK/PD and ivive
Concepts for SpotAPP ‘PK/PD’ Module Prototype
Goals: • Provide facilitated and standardized calculations of derived PK properties to team PK reps.
Enable consistent decision making & expose key visualizations to teams.
Key principles: • Automation of calculations using a central, validated, R script
• Implementation of different clearance scaling calculation methods è Comparison and selection of most appropriate method to share and use
• Customization of script behavior per project using control file and ad hoc data files (if needed)
Complete PK tables
Main data table
Main SpotAPP package
Process controlled by PK rep.
PK/PD processing
Advanced PK and PK/PD data package; & features Custom input / parameters
Internal DB’s
Example of Advanced PK Calculations Available to PK Rep. for Decision-making
• Example of different methods providing clearance scaling from hepatocytes
• PK concentration-time curves from individual animals and aggregated over treatment groups
Assume no binding
in vitro CLint_hep
in v
ivo
CLb
_unb
ound
_int
_hep
Estimate unspecific binding (Houston) Dilution method
Estimation of protein binding in hepatocyte incubation medium: fu from preclinical species
CLint in [mL/min/kg]
Action
SpotAPP Workflow Adaptation To HTS Analysis
Knowledge
Information
Infrastructure
• Early HIT SAR exploration
• Off-target, ancillary profile
• Hit-expansion
• HTS screen results, general compound info
•
Automation
Flexibility
•
•
Modularity Data
HTS data package
Re-use of Standard Processing Modules
Multi-Dimensional HTS Data Analysis from a Chemistry Perspective
HTS potency •
HTS counter-screens
HTS promiscuity
Ligand efficiency
Properties
SAR potential
Chemical tractability availability, purity,
synthesis…
Target / gene /
pathway data
Internal &
External knowledge
Chemical diversity
SpotHTS Workflow Overview
HTS results
MDO, props
SEA analysis
HTS hit history
Off-tgt. data
DR curves
Tox/Safety data
Ligand eff.
Ontology analysis
Primary screen data (single points)
HTS confirmation data (dose-responses)
Tags, labels
HTS Package
External data
External data
Clusterings Roche DB
MDO, purity
Kinase panel
CEREP panel
PAINs
Advanced data mining (Phenotypic/assay profiling…)
HTS Data Package: Highlights
HTS history and known in-house activities
Chemical space clustering and diversity analysis
Dashboard for multi-parametric hit analysis
• Platform for narrowing high Primary hit-rate, analyze hits
• Used for internal prioritization & sharing with external partners
Knowledge Capture: Hits Annotation within SpotHTS Package • Team members can annotate and label compounds inside SpotHTS package.
Annotations captured in Oracle DB via information link, then embedded in HTS package as data table.
1. Select hit(s) in analyses and provide annotation
2. Retrieve and analyze compounds based on annotations
Integrated Hit Expansion within SpotHTS Package
• Retrieval of top-100 similar compounds for all hits with dose-response data. Done as part of data processing (2D-based similarity).
à Use as initial info to assess hit SAR potential and screening follow-up activities
IronPython scripts in Spotfire to perform automatic list logic and markings.
1. Select hit(s) in analyses 2. See all IRCI closest neighbors 3. Find overlap and compounds not screened yet
Visualize data and stock availability, purity...
Conclusions
• SpotAPP ecosystem has organically grown last couple of years at Roche as DEV platform
Ø For project teams: provides integrated project data, advanced SAR and PK analyses…
Ø For experts-developers: helps test new features, data models & custom visualizations.
• Core design features for efficient data delivery:
Flexible, automated, customizable to project critical needs
• SpotAPP standard components and logic shared by other Spot* family members (potentially also by Roche New Assay Data Analysis Landscape tools)
Ø SpotHTS variant for HTS analysis: on the way to integrated hit delivery
HTS PK MDO *
Perspectives
• Spotfire is efficient and versatile as vector of new data models – especially for fast/complex data optimization cycles – excellent for relational data tables and large data volumes – iPython, R, information links and data connections are powerful – enhanced features possible via add-ons (Discngine, JS D3,…)
• Challenges still remain for drug discovery community – UI and data presentation simplification for non-experts – Increased chemical intelligence & cheminformatics-guided workflows
Presentation
Processing
Enrichment
Retrieval
Acknowledgements
• Brian Jones Yi Lin Lisa Sach-Peltason Christian Blumenroehr Daniel Wenger Olivier Roche Martin Blapp Gunther Doernen Peter Hilty (pREDi) Paula Petrone
• Jerome Hert Christian Kramer Michael Reutlinger Wolfgang Guba (CADD)
• Stefanie Bendels Martin Kapps + many other contributors (PS)
• Katrin Groebke-Zbinden John Cumming + many medicinal chemists for feedback-suggestions
• Eric Leroux (Discngine)
Doing now what patients need next