steve brudz - chemaxon · cbip v1.0 – transactional, data producer focused reagent inventory...
TRANSCRIPT
Steve Brudz
Manager, Software Engineering Chemical Biology Platform
The Evolution of the Broad Chemical Biology Informatics Platform From Screening to Lead Optimization
Faculty: Nine Core Members in Broad buildings
155 Associate Members: Faculty with labs at MIT, Harvard, Whitehead, or affiliated hospitals
An Experiment in Philanthropy and Organization
2003 – 2006: $200M “venture philanthropy” investment by Eli and Edythe Broad to accelerate the transformation of medicine
2008: The Broads announced an endowment of $400 million making the Institute a permanent nonprofit organization
Unique Collaborative Structure Enables the Broad Mission
Broad Platforms
The Broad's Scientific platforms are made up of professional scientists with the expertise and organization to carry out major projects that cannot be done within a single laboratory
Imaging
RNAi
Chemical Biology/ NovelTherapeutics
Proteomics
Genetic Analysis
Genome Sequencing
Biological Samples
Broad Programs
The Broad's Scientific programs bring together research groups with a shared commitment to important biomedical challenges
Infectious Disease
Chemical Biology
Medical & Popul’n Genetics
Psychiatric Disease
Stanley Center
Metabolic Disease
Cell Circuits
Genome Biology
Cancer
Computational Biology
Advances needed to explore human biology systematically:
• To innovate in fundamental chemistry: Developing next-generation synthetic chemistry that reaches ‘undruggable’ targets or processes.
• To bring complex biology to small-molecule science: Screening in physiologically relevant conditions.
• To understand how small molecules affect disease: Determining the proteins that small-molecule modulators bind in cells.
From Opportunistic to Disciplined: Modulating Challenging Targets
1997 2002 2004 2006 2008 2012 2010 2000
ActivityBase PubChem 2.0 1.0
Broad Institute Chemical Biology Platform Early DOS Concepts
NCI Initiative for Chemical Genetics
Broad Institute of Harvard & MIT
Design DOS library
Design screening system
Automation; MLPCN comprehensive center; 1st DOS compounds
The Broad Institute (7/2009)
150K DOS Library Y3/Y4 MLPCN
Screening, Hits Structural Diversity, Screening, Hit to Probe High Quality Probes Lead Optimization
Academic Screening Center for collaborative high throughput screening through hit validation
– DOS Chemistry Library Production – HTS screening on 100,000 compound set
Comprehensive Center for the Molecular Libraries Probe Production Centers Network (MLPCN)
– HTS screening on 300,000 compound set – Confirmation and dose assays – Follow-up medicinal chemistry
Goal to find novel probes and therapeutics
Screening Collection
MLPCN (~324K)
New code Collection name # of compounds COMB Commercially available 296,780 BIOA Bioactives 442
STRD Non-commercial stereochemically diverse 14,801
NATP Natural products - purified 1,472 GPCR G protein-coupled receptor biased 3,880 IONC Ion channel targeted 1,956 KINA Kinase biased 3,892 NUCR Nuclear receptor targeted 152 PROT Protease targeted 684
324,059
Broad (~102K)
Code Collection name # of compounds CBLI DOS 61,004
CHRM Chromatin biased 2,327 BIOA Bioactives 4,920
STRD Non-commercial stereochemically diverse 2,115
NATP Natural products - purified 1,291 KINA Kinase biased 12,119
COMA Commercial - Forma 1,133 COMB Commercial - other 12,278 GPCR G protein-coupled receptor biased 5,000
102,187
MW vs. ALogP
Molecular Weight
ALo
gP
COMA CBLI STRD NATP
T. E. Nielsen, S. L. Schreiber, Angew Chem Int Ed Engl 47, 48 (2008)
• Objective: Generate novel small molecular probes • Assays sourced from the scientific community
- “Fast-track” (specific objective in NIH grant e.g. R01) - “HTS-ready” (R03) - Assay development (R01)
• Perform HTS, secondary screens and medicinal chemistry • Data shared globally through PubChem, published probe reports
7
MLPCN: Molecular Libraries Probe Production Centers Network
https://mli.nih.gov/mli
CBIP v1.0 – Transactional, Data Producer focused
Reagent Inventory
Cellario
Library Design Chemistry ELN
CBIP Library Production
CBIP Cleavage & Formatting
CBIP Compound Management
CBIP Analytical Viewer
HTS/Data Analysis
Analytical Data CBIP Screening LIMS
Walk-up Instruments
Purchasing
Screening Data Warehouse
Data Visualization & Analysis
Chemistry/Cheminformatics Analytical Chemistry
Compound Management HTS/Data analysis
Automation Integration
CBIP Compound Registration
CBIP v1.0 – March 2010 – High Level Architecture
800K samples, 450K structures, 30M observations
Challenges: Compound Registration PubChem Data Submission Results Data Querying
ChemAxon products: Marvin JChem API Standardizer Oracle Cartridge Pipeline Pilot Plug-in Chemical Terms Instant JChem
CBIP v1.5 – Sept. 2011 – High Level Architecture
1.8M samples, 520K structures, 77M observations
Compound Registration integrated with ELN PubChem Data Submission automated via in-house tool Hit calling and Cherry Pick workflows via TIBCO Spotfire Improved SAR via Seurat and SSAR Visualization Tool
SD File Mol File
Conversion to smiles
Standardization
Desalting
Calculate chemical properties
Pipeline Pilot canonical tautomer
Unspecified chirality flag
Structure registration and Broad Id assignment
Compound Registration
Automated single and batch compound registration from CambridgeSoft ELN via Registration Service
Marvin for editing and rendering Standardizer for Standardization & Desalting JChem API for properties calculation (mass and chirality detection)
Cherry Pick Workflow in TIBCO Spotfire
Scientists choose compounds for follow up using activity, structure, and chemical properties
Examine Hit Calling Results
Filter out undesirable
substructures
Filter by structure properties
Find related compounds of
interest
Cherry Pick compounds for follow up
Cherry Pick Workflow in TIBCO Spotfire
Compound collection is pre-annotated by substructure in database to improve performance
Examine Hit Calling Results
Filter out undesirable
substructures
Filter by structure properties
Find related compounds of
interest
Cherry Pick compounds for follow up
Cherry Pick Workflow in TIBCO Spotfire
Chemical properties are pre-computed using ChemAxon’s Chemical Terms during ETL of compounds into database
Examine Hit Calling Results
Filter out undesirable
substructures
Filter by structure properties
Find related compounds of
interest
Cherry Pick compounds for follow up
Cherry Pick Workflow in TIBCO Spotfire
Similarity search via the Jchem cartridge is used to find additional compounds of interest
Examine Hit Calling Results
Filter out undesirable
substructures
Filter by structure properties
Find related compounds of
interest
Cherry Pick compounds for follow up
Nielsen T.E.; Schreiber, S.L. Angew. Chem. Int. Ed. 2008, 47,48-56
What is Diversity-Oriented Synthesis?
DOS is a strategy for efficient synthesis of collections of small molecules having skeletal and stereochemical diversity with defined coordinates in chemical space
SSAR Visualization Tool
2 out of 8 stereoisomers
active
RSS SRR RRR SSS
RSR SRS RRS SSR
Tools for Stereo/structure –activity analysis
Tool uses file for data, structures from database, Jchem API for rendering and sorting of building blocks
SSAR Visualization Tool
>90%
75-90%
50-75%
<50%
R1
R2
QC Analysis of DOS Libraries
Repurposing Stereo/structure –activity analysis tools
CBIP v2.0+ – 2012/2013 – High Level Architecture
Resolve Current Challenges: Research Data Management Improve Results Data Querying Dimensional Data Model Modularize transactional CBIP Transition to Groovy & Grails
DataProducers
DataConsumers
Multi-dimensional Querying & Reporting
Multi-dimensional Data AnalysisSSAR
Visualization Tool
Accelrys Pipeline Pilot
TIBCO Spotfire Schrodinger Seurat
Results Dimensional Warehouse
OtherData
ContentUnstructured
Data/Files
Transactional/Operational
Results DBPubChem
Submission Tool
Data Dictionary (MDM)
Catalog of Assay Protocols
Compound Registration
DOS Chemistry
AnalyticalADME
Compound Management
CambridgeSoft ELN NuGenesis SDMS
Screening, Instrument Automation,Confirmation & LTS Assays
Genedata Assay
Analyzer
Biology-specific ELN
Accelrys Pipeline Pilot
Compound Ordering
Genedata Condoseo LTS
SSAR Visualization
Tool
Accelrys Pipeline Pilot
TIBCO Spotfire Schrodinger Seurat
Lessons Learned
• It’s great to have a toolkit! – We’ve successfully integrated many ChemAxon tools across many
systems – The Java API is extensive and well-documented
• ChemAxon’s support is excellent – Forum is a great resource – Frequent releases are nice…
• …but we can’t keep up – Testing cartridge upgrades is expensive – So many integration points require extensive testing for every
upgrade
• We would love to have Grails plug-ins from ChemAxon – Integration of Molecule and Reaction data types into GORM – Easy integration of Marvin into GSPs
Broad Chemical Biology Platform
Cheminformatics Lakshmi Akella Carol Mulrooney
Informatics
Platform Director Mike Foley Program Director Stuart Schreiber
Screening Michelle Palmer
Medicinal Chemistry Benito Munoz Siva Dandapani Rich Heidebrecht
Funding: Broad, NIH (MLPCN, CMLD, GBDD, CTD2)
DOS Lisa Marcaurelle Jeremy Duvall Eamon Comer Sarathy Kesavan Jason Lowe Baudoin Gerard
Mary Pat Happ Dennis Moccia Gil Walzer Karen Emmith Evan Mulligan David Lahr Michael Quintin
Steve Brudz Daniel Durkin Jacob Asiedu Xiaorong Xiang Jeri Levine Peter Chen Ben Alexander
Novel Therapeutics Christina Scherer
Lead Discovery Joshua Bittker Assay Development Jose Perez Analytical Chemistry Stephen Johnston
Outreach Lab Nicola Tolliday
HTS James Spoonamore
Automation Gavin McKeown Tom Hasaka
Enzymology Yan-Ling Zhang
Compound Management John McGrath
Project Management Patti Aha
Stanley Center Ed Holson Florence Wagner
Andrea De Souza, Director