[vermeer]slides/ir/datamining.ppt © gasteiger et al. c3c3 data mining in chemistry markus c. hemmer...

19
[vermeer]slides/IR/DataMining.ppt © Gasteiger et al. C 3 Data Mining in Chemistry Markus C. Hemmer Computer-Chemie-Centrum, Universität Erlangen-Nürnberg D-91054 Erlangen, Germany

Upload: mabel-wright

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Data Mining in Chemistry

Markus C. HemmerComputer-Chemie-Centrum, Universität Erlangen-Nürnberg

D-91054 Erlangen, Germany

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

What is Data Mining ?

Data Mining is

an analytical process designed to explore large amounts of data in search for consistent patterns and systematic relationships.

„...a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data“ (Srikant, Agrawal, 1996)

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

100

1920 1940 1960 1980

200

300

400

500

600

2000

700

800

Yearly number of documentsin Chemical Abstracts

Amount of Information in Chemistry

4

8

12

16

20

24

Mill

ions

1970 1980 1990 2000

Number of registered substances

Tho

usan

ds

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

The Chemical Language

O

OS

OP

ClH

H

Cl

H

HH

HH

H H

H

HH

H

C10H13Cl2O3PS

Dichlophenthion

Phosphorothioic acid O-2,4-dichlorophenyl O,O-diethyl ester

ClC(C(=C1)OP(=S)(OCC)OCC)=CC(=C1)Cl

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Search for Cancerostatic Drugs

similar substratesprotein/substrate complex

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

chemicalreactivity

biologicalactivity

N

NC2H5 O

CH3

Representation of Properties

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Non-linear Projection onto a Torus

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Comparison of Steroid Surfaces

o

o o

o3,20-Allopregnandion 3,20-Pregnandion

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Descriptor of a Polycyclic System

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Visualization of Multidimensional Data

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Research and Projects at the CCC

TeleSpec

Evaluation of Reactions

Drug Design

Synthesis Design

Structure/Spectrum Correlation Dissertation online

SOL

Biochemical Pathways

ChemVisQSAR/QSPR

VS-C

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Software Development at the CCC

CORINA 3D structure generator

PETRA atomic property calculator

ARC descriptor generator

KMAP Kohonen network generator

CACTVSchemical information system

EROSreaction prediction expert system

CORAreaction classification system

WODCAsynthesis design expert system

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Data Mining Dienst – Chemie (Data Mining Service – Chemistry)

Pattern Recognition

Substructure Search

Similarity Search Diversity Search

Pattern Analysis

Property Search

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Information Sources

Simulation

Analysis

Databases

2

2

2

2

2

22 111

z

Z

Yy

Y

Yx

X

X

Calculation

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

The Concept of Data Mining Service - Chemistry

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Descriptor Software

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Searching a Substructure

substructure search

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Acknowledgements

Chemical Information Dr. Thomas Engel

Databases & VisualizationDr. Wolf-Dietrich IhlenfeldtFrank Oellien

Expert SystemsAchim Herwig

Genetic AlgorithmsDr. Sandra Handschuh

Neural NetworksDr. Andreas TeckentrupDr. Lothar Terfloth

SpectroscopyDr. Paul SelzerThomas Kostka

Structures & PropertiesThomas KleinöderChristof Schwab

Structure CodingDr. Joao Aires de SousaDr. Valentin Steinhauer

Synthesis PlanningDr. Matthias PförtnerMarkus Sitzmann

Team CoordinationProf. Dr. Johann Gasteiger

[vermeer]slides/IR/DataMining.ppt© Gasteiger et al.C3

Contact Information

Email:[email protected]

[email protected]

WWW: http://www2.chemie.uni-erlangen.de