chemmodlab: a web-based cheminformatics modeling laboratory s. stanley young + eccr and chemspider...
TRANSCRIPT
ChemModLab: A Web-ChemModLab: A Web-based based
Cheminformatics Cheminformatics Modeling LaboratoryModeling Laboratory
S. Stanley Young + ECCRS. Stanley Young + ECCR and and ChemSpider TeamsChemSpider Teams
S. Stanley Young + ECCR and S. Stanley Young + ECCR and ChemSpider TeamsChemSpider Teams
ChemSpider : A Web-based ChemSpider : A Web-based Chemical Informatics ResourceChemical Informatics Resource
3
What is What is ChemSpider?ChemSpider?
ChemSpider is a molecular structure-ChemSpider is a molecular structure-centric web service for chemists:centric web service for chemists: Chemical structure drawing, manipulation, Chemical structure drawing, manipulation,
visualization, modeling & databasingvisualization, modeling & databasing Web location to deposit, curate and enhance Web location to deposit, curate and enhance
data associated with chemical structuresdata associated with chemical structures Web structure-based access to federated Web structure-based access to federated
chemistry databases representing chemical chemistry databases representing chemical vendors, literature, online data, patents and vendors, literature, online data, patents and other forms of chemistry data other forms of chemistry data
4
How do people generally use How do people generally use ChemSpider?ChemSpider?
Searching for chemical structures, in rank Searching for chemical structures, in rank order, via:order, via: Registry numbers, trade names and synonyms. Registry numbers, trade names and synonyms. Structure identifiers such as SMILES or InChIStructure identifiers such as SMILES or InChI Intrinsic properties: commonly mass-based Intrinsic properties: commonly mass-based
searches executed by mass spectrometristssearches executed by mass spectrometrists By systematic names: IUPAC or CAS Index nameBy systematic names: IUPAC or CAS Index name
Generation of physicochemical propertiesGeneration of physicochemical properties Text-based searching of Open Access Text-based searching of Open Access
articlesarticles
5
ChemSpider Status ChemSpider Status August 2007August 2007
Online database of over Online database of over 16.5 million16.5 million structures structures Systems in place for: Systems in place for:
Single structure and data collection depositionsSingle structure and data collection depositions Association of analytical data with structuresAssociation of analytical data with structures Ability to curate data for each individual recordAbility to curate data for each individual record
Indexing of and Integration to:Indexing of and Integration to: Over 70 individual databasesOver 70 individual databases Patents from the US, European and Asian Patent officesPatents from the US, European and Asian Patent offices
Text-based searching of over Text-based searching of over 50,000 Open Access 50,000 Open Access articlesarticles
Over a thousand unique users access ChemSpider Over a thousand unique users access ChemSpider per dayper day
10
External Integrations - External Integrations - WikipediaWikipedia
The links between Wikipedia and ChemSpider are formed automatically
11
What is What is ChemModLab?ChemModLab?
ChemModLab is a Web Service for building ChemModLab is a Web Service for building and evaluating QSAR models.and evaluating QSAR models.
Send your data: assay results and SD file.Send your data: assay results and SD file.
Use any or all of five descriptor types (2D).Use any or all of five descriptor types (2D). (Use your own descriptors)(Use your own descriptors)
Use any or all of 16 statistical modeling Use any or all of 16 statistical modeling methods.methods.
Predict potency of untested compound. Predict potency of untested compound.
16
ChemModLab Modeling ChemModLab Modeling MethodsMethods
16 Statistical Modeling Methods•Trees: RandomForest, rpart, tree• Neural networks• k-nearest neighbors• Support vector machines• Partial least squares• Partial least squares with linear discriminant analysis• Least angle regression• Ridge regression• Elastic net• Principal components regression• Family ensemble of k-nearest neighbors, using 70% selection• Family ensemble of tree, using 70% selection• Family ensemble of rpart, using 70% selection• randomForest using 70% selection
17
ECCR@NCSU + ChemSpider ECCR@NCSU + ChemSpider
PlanPlan
User submits data to ChemModLab to get QSAR Model(s).
Model is sent to ChemSpider.
ChemSpider computes a “virtual screen”.
The hit-list is clustered and sent to the user.
24
ModelModelEvaluatiEvaluati
ononTake detailed looks at which
models?
AID348 (NCGC):KNN – PhENet – CAPRF – B#RF – CAPRF – FFTree – CAPTree – PhTree – FFPLS – CAP
25
SummarSummaryy
1.ChemSpider is a web chemical informatics center.
2.ChemModLab is a free, web service for QSAR.
3.Together they support sophisticated virtual screening.
* ChemModLab is supported by the NCI RoadMap project.
26
ECCR@NCSU Group ECCR@NCSU Group ChemSpider GroupChemSpider Group
ChemModLab Team
Jacqueline M. Hughes-OliverAtina D. Brooks Gary W. HowellKirtesh PatilStan YoungQianyi Zhang
ChemSpider Team
Antony Williams (project lead)
A rotating team of advisors and developers including many contributions from the Open Source community
eccr.stat.ncsu.edu www.chemspider.com