diversity filtering
DESCRIPTION
Compounds diversity filtering using RDKit @ 2nd RDKit UGM 2013TRANSCRIPT
Diversity FilteringChristos KannasUniversity of Cyprus
2nd
RDKi
t UG
M
Outline• Introduction• Methodology• Implementation
3rd
Oct
ober
, 201
3
2
2nd
RDKi
t UG
M
Introduction• The need to select all the diverse molecules from a dataset
(based on a threshold).
• Divide the dataset into diverse molecules and similar molecules .
3rd
Oct
ober
, 201
3
3
2nd
RDKi
t UG
M
Methodology• 2D Fingerprints
• Similarity Metric: Tanimoto, Dice• Similarity Matrix• Diagonal has 1…
• Make diagonal 0, or• Skip it…
• Max/Mean/Min Similarity (row/column based)
• Divide molecules in to 2 datasets• One with diverse molecules (below similarity threshold)• One with similar molecules (above similarity threshold)
3rd
Oct
ober
, 201
3
4
2nd
RDKi
t UG
M
Implementation 1/4• Diversity Score Function [O(n2)]• Inputs:
• Query Molecules == Reference Molecules• Similarity Metric [Tanimoto, Dice]• Scoring Method [Max, Mean, Min]
• Output:• Diversity Score
3rd
Oct
ober
, 201
3
5
2nd
RDKi
t UG
M
Implementation 2/4• Show source code for fingerprint similarity/diversity…
3rd
Oct
ober
, 201
3
6
2nd
RDKi
t UG
M
Implementation 3/4• Filtering Engine [O(n)]• Inputs:
• Molecules + Diversity Score• Threshold
• Outputs:• Diverse Molecules• Similar Molecules
3rd
Oct
ober
, 201
3
7
2nd
RDKi
t UG
M
Implementation 4/4• Show source code for diversity filtering…
3rd
Oct
ober
, 201
3
8
2nd
RDKi
t UG
M3r
d O
ctob
er, 2
013
9