[vermeer]slides/ir/datamining.ppt © gasteiger et al. c3c3 data mining in chemistry markus c. hemmer...
TRANSCRIPT
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Data Mining in Chemistry
Markus C. HemmerComputer-Chemie-Centrum, Universität Erlangen-Nürnberg
D-91054 Erlangen, Germany
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
What is Data Mining ?
Data Mining is
an analytical process designed to explore large amounts of data in search for consistent patterns and systematic relationships.
„...a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data“ (Srikant, Agrawal, 1996)
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
100
1920 1940 1960 1980
200
300
400
500
600
2000
700
800
Yearly number of documentsin Chemical Abstracts
Amount of Information in Chemistry
4
8
12
16
20
24
Mill
ions
1970 1980 1990 2000
Number of registered substances
Tho
usan
ds
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
The Chemical Language
O
OS
OP
ClH
H
Cl
H
HH
HH
H H
H
HH
H
C10H13Cl2O3PS
Dichlophenthion
Phosphorothioic acid O-2,4-dichlorophenyl O,O-diethyl ester
ClC(C(=C1)OP(=S)(OCC)OCC)=CC(=C1)Cl
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Search for Cancerostatic Drugs
similar substratesprotein/substrate complex
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
chemicalreactivity
biologicalactivity
N
NC2H5 O
CH3
Representation of Properties
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Comparison of Steroid Surfaces
o
o o
o3,20-Allopregnandion 3,20-Pregnandion
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Research and Projects at the CCC
TeleSpec
Evaluation of Reactions
Drug Design
Synthesis Design
Structure/Spectrum Correlation Dissertation online
SOL
Biochemical Pathways
ChemVisQSAR/QSPR
VS-C
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Software Development at the CCC
CORINA 3D structure generator
PETRA atomic property calculator
ARC descriptor generator
KMAP Kohonen network generator
CACTVSchemical information system
EROSreaction prediction expert system
CORAreaction classification system
WODCAsynthesis design expert system
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Data Mining Dienst – Chemie (Data Mining Service – Chemistry)
Pattern Recognition
Substructure Search
Similarity Search Diversity Search
Pattern Analysis
Property Search
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Information Sources
Simulation
Analysis
Databases
2
2
2
2
2
22 111
z
Z
Yy
Y
Yx
X
X
Calculation
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
The Concept of Data Mining Service - Chemistry
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Acknowledgements
Chemical Information Dr. Thomas Engel
Databases & VisualizationDr. Wolf-Dietrich IhlenfeldtFrank Oellien
Expert SystemsAchim Herwig
Genetic AlgorithmsDr. Sandra Handschuh
Neural NetworksDr. Andreas TeckentrupDr. Lothar Terfloth
SpectroscopyDr. Paul SelzerThomas Kostka
Structures & PropertiesThomas KleinöderChristof Schwab
Structure CodingDr. Joao Aires de SousaDr. Valentin Steinhauer
Synthesis PlanningDr. Matthias PförtnerMarkus Sitzmann
Team CoordinationProf. Dr. Johann Gasteiger
[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3
Contact Information
Email:[email protected]
WWW: http://www2.chemie.uni-erlangen.de