Integrated CADD Methodsa Cocktail of KNIME, Bash and Modeling Software
Loris Moretti
KNIME Spring Summit 2018 Berlin, March 5-9, 2018
Slide 2 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
TOPIC OF THE DAY
Outline
Nuevolution status and technology
The ligand-binding quest in Drug Discovery
Modeling infrastructure
HIV-1 protease as modeling example
Summary
Slide 3 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
Drug Discovery at Nuevolution
Slide 4 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
NUEVOLUTION A/S A Powerful Technology for Hit-Finding
Nuevolution A/S
Founded 2001
Located in central Copenhagen
37 employees in Science Department
Small molecule drug discovery
Chemetics® drug discovery platform
Internal and partnered programs
Inflammation, Cancer & Immuno-oncology
Listed on Nasdaq First North, Sweden, 2015 (uplisting soon)
...with global partnerships
...and global CRO support
Slide 5 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
THE CHEMETICS® LEAD DISCOVERY PLATFORM Fast and Efficient Generation, Selection and Identification of hits
~1 month ~2 days ~2 weeks
~60.000 Fragments
~5Libraries/Year
~500 Screenings/Year
~20 BTemplates/Year
DNA Encoded Library (DEL) Selection Identification
Re-synthesisConfirmationOptimization
Etc…
Slide 6 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
PURSUED TARGETSInternal Pipeline and Collaborations
We are active in the fields of INFLAMATION and ONCOLOGY (~25 targets)
Slide 7 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
EVERYDAY SCENARIO AT NUEVOLUTIONThe Ligand-Binding Players
NEEDLigand-binding hypothesis for ligand optimization
LIGAND
Small Molecules, 10s-1000s hits from CHEMETICS
TARGET
Many different proteins, receptors, enzymes, recognition
domain, etc…
Slide 8 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
The Ligand Binding Quest
Slide 9 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
knowledge
• Biological information
inputs
• Protein structure
• Ligand structure
modeling
• Docking software
outputs
• 1 or more poses
• Energy estimation
LIGAND-BINDING STUDYSelf Docking
Ligand-binding prediction
• LB known• RMSD• score
Slide 10 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
knowledge
• Biological information
inputs
• Protein structure
• Ligand structure
modeling
• Docking software
outputs
• 1 or more poses
• Energy estimation
LIGAND-BINDING STUDYCross Docking (Non-Native)
Ligand-binding prediction
• LB known• van der Waals• Induced fit
Slide 11 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
LIGAND-BINDING STUDYA More Complex Picture: Drug Discovery Environment
knowledge
• literature information
• in house data
• activity data, biophysical, biochemical, In vitro, etc…
• Kd, Ki, IC50, EC50, etc…
• ADME/Tox data
inputs
Protein
• Xray (one or more), homology model
• binding sites, conformations, induce fit, plasticity
• role of waters, ionization, cofactor, ions, phys-chemproperties
Ligand
• chemotypes, flexibility, planarity,
• stereoisomers, tautomers, ionization
modeling
• protein preparation
• ligand preparation
• ligand exploration (QM)
• Software selection
• filters (pharmacophore)
• scoring
• Optimization (MM, MD)
outputs
• visualization
• prior knowledge
• 1 or more poses
• metrics for energy estimation
• ranking
• more binding modes
Ligand-binding prediction
• LB unknown
MODELING HYPOTHESISdeal with these characteristics, issues, aspects…
Slide 12 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
LIGAND-BINDING STUDYWhen the Answer is Unknown
Expand and explore all the possibilities: protein states and ligands states
Consider more solutions: different hypotheses
Look for confirmation: prior knowledge, SAR, consistency
Unbiased view: different software and technology
Fraction into steps: more control over the process
Evaluate and explore each step: process tuning
…to be robust, reliable, automated, modifiable, traceable
COMPUTATIONAL INFRASTRUCTUREan environment to control and explore the modeling process
Slide 13 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
The Computational Infrastructure
Slide 14 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
COMPUTATIONAL INFRASTRUCTURESoftware Setup Concept
Moretti L., & Sartori L. (2016) Molecular informatics, 35(8-9), 382-390.Moretti L., & Sartori L. (2016) Molecular informatics, 35(10), 489-494.
Modeling software through
command line interface
Bash scripts wrapping modeling
software
KNIME Analytics platform protocols
for data flow and system calls
Slide 15 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
MODELING WORKFLOWInfrastructure in Layers: KNIME
Reads TXT input for files location and modeling parameters
Modeling steps interconnected through flow variables
Email with experiment specifications and results
Condition to run the step
SDF reader Actual modeling step
Handling of molecules files
Settings handling for the job
Process of the outputs
Command line for KNIME batch mode1
0
-1
-2
-3 System call to the modeling software (Bash)
from danish“LEg GOdt” (play well)
Slide 16 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
MODELING WORKFLOWInfrastructure in Layers: BASH
Main ScriptIntro and variables
Files and conditions
Modeling software execution and files transformation
Conclusion
Slurmqueuing management
Variables
Modeling task Script
Slide 17 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
COMPUTATIONAL INFRASTRUCTUREFeatures
Hardware and Software related
Servers with multiple CPUs and GPUs
Run in parallel
GNU/Linux Debian OS
Installation of third-party software (for modeling, analysis, etc.)
Python and Bash to glue together software and procedures (make a “flow”)
Process related
Nomenclature (identifiers) for targets, small-molecules and experiments
Environment variables customizable for target, small-molecules and experiments
File system structure for storing inputs and outputs, and for temporary files
Targets prepared in the same way (consistency)
Moretti L., & Sartori L. (2016) Molecular informatics, 35(8-9), 382-390.Moretti L., & Sartori L. (2016) Molecular informatics, 35(10), 489-494.
Slide 18 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
“MODELING BRICKS” AND PROTOCOLSEveryday Tasks
Knime Protocols
Docking
Scoring
Molecular Dynamics
Virtual Screening
Quantum Mechanics
Tasks and Software
Protein Preparation: Bash script with modeling software
Ligand Preparation: Ligprep, RDKit
Docking: Autodock, Vina, Plants, Glide, rDock
Poses Clustering: ACIAP1 and cut-off based
Molecular Mechanics: Ambertools and Gromacs
Scoring: plants, XScore, Drugscore, BEAR2, consensus score3,4
Interaction Fingerprint: Plants
Quantum Mechanics: Gamess-US
Reference Comparison: Python script
Binding Site Analysis: Voidoo, Fpocket, Caver
Favorable Interaction Regions: Autogrid, Autodock/Vina Pymol plugin5
Visualization: Pymol, Maestro, Jmol, Vmd, Bodil, Coot
Web interface: Django and Python 1Bottegoni G. et al., (2006) Bioinformatics, 22(14), e58-e65.2Degliesposti G. et al., (2011) Journal of biomolecular screening, 16(1), 129-133.3Charifson, P. S. et al., (1999) Journal of medicinal chemistry, 42(25), 5100-5109.4Oda, A. et al., (2006) Journal of chemical information and modeling, 46(1), 380-391.5Seeliger D., & de Groot B. L. (2010) Journal of computer-aided molecular design, 24(5), 417-422.
Slide 19 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
A Modeling Example
Slide 20 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
LIGAND-BINDING STUDYDocking Example
• Protein HIV-1 Protease• 11 ligands (15-23 rot bonds)• Complexes PDBs available• Cross docking on 1HXW
Slide 21 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
DOCKING EXAMPLESoftware and Sampling
AutodockGlide SPPlantsrDockVina
1 3
10 100
Poses X ligand X software
more poses and more programs
Slide 22 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
DOCKING EXAMPLEScoring
Post-optimization
PBSA:• optimization
Cscore: • optimization• Xscore +
DrugscoreX + Plants
• Customizable• Wider scope
AutodockGlide SPPlantsrDockVina
Combination of scoring metrics
Slide 23 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
DOCKING EXAMPLEClustering
Docking posesCluster best RmsdClusters medoidsClusters best cscore
• ACIAP implementation• Simplify conformational
space
Map and simplify the conformational space
Slide 24 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
DOCKING EXAMPLEPost-Docking Optimization
• MMFF94• Ligand minimization• Ambertools• Cscore• ”fast”
• BEAR• AM1-BCC• Complex min - Ligand MD – complex min• Ambertools• PBSA• ”slow”
Improve results with optimization
Slide 25 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
The Summary
Slide 26 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
SUMMARYIntegration for Robustness and Flexibility
A Drug Discovery environment: Nuevolution
“The Need” in a Drug Discovery environment: Ligand-Binding Assessment
Complexity: self, cross, real docking
Modeling infrastructure: …to be robust, reliable, automated, modifiable, traceable
Modeling infrastructure: Integration and customizable environment (LEGO)
KNIME + Bash + Third-party software
HIV-1 protease case: example of integration
Slide 27 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT
THANKS
• Alex Haahr Gouliaev
• Thomas Franch
• Mads Nørregaard-Madsen
• Johannes Dolberg
• Aleksejs Kontijevskis
• All others at Nuevolution
• To the open-source and free software community
• To the KNIME team and community
…and you for the attention
NUEVOLUTION
TRANSFORMING CHALLENGESINTO MEDICINE
Visit us at:
https://nuevolution.com
Contact me at: