ugm 2007 miklós vargyas*, judit vaskó-szedlár whats new in librarymcs
TRANSCRIPT
![Page 1: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/1.jpg)
UGM 2007
Miklós Vargyas*, Judit Vaskó-Szedlár
What’s new in LibraryMCS
![Page 2: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/2.jpg)
UGM 2007
Talk Overview
• Introduction to LibraryMCS – Concepts, motivation– Main features– GUI
• 2006 Roadmap accomplishment
• New features in detail– Performance– Iterative clustering– Additive clustering
• Current roadmap and wishlist
![Page 3: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/3.jpg)
UGM 2007
Introduction – Concept of MCS
Maximum Common Substructure
Looks simple, yet hard to compute!
![Page 4: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/4.jpg)
UGM 2007
Introduction – Motivations
• MCS based clustering– More intuitive than similarity based– Closer to chemists golden standard
• Initial requirements– Focused set analysis
• screens: 2000 – 10000 structures• lead optimization: 3000 – 5000 structures
– Should be hierarchical (outliers)– Ultimate goal: cluster 5000 compounds in 5 seconds
• Further application areas– Library profiling– Compound acquisition
![Page 5: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/5.jpg)
UGM 2007
Introduction – Main features
• MCS based hierarchical clustering
• Flexible search options
• No theoretical size limitation
• Fast operation
• Filtering by chemical properties
• Cluster statistics
• Hierarchy browser
![Page 6: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/6.jpg)
UGM 2007
GUI – Dendogram view
• Interactive navigation, selection
• Zoom & move
![Page 7: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/7.jpg)
UGM 2007
GUI – Molecule view
![Page 8: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/8.jpg)
UGM 2007
GUI – SAR-table
• Cluster statistics, structure filtering by properties
![Page 9: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/9.jpg)
UGM 2007
GUI – R-table
![Page 10: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/10.jpg)
UGM 2007
2006 Roadmap accomplishment
...
![Page 11: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/11.jpg)
UGM 2007
Preserving rings
![Page 12: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/12.jpg)
UGM 2007
Iterative clustering
• Outliers– Singletons– Large blobby clusters
• Aim – Minimise number of singletons
– Maintain high quality
![Page 13: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/13.jpg)
UGM 2007
Additive clustering
Corporatedatabase
Pre-clustering, stored
new set
registration
Cluster diversity enrichment
![Page 14: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/14.jpg)
UGM 2007
Performance
• Depends on various factors– average structure size– diversity– minimal required MCS size– atom/bond constraints
0
2
4
6
8
10
12
14
16
CombiLib MixedLib Maybridge
Normal
Fast
Fastest
![Page 15: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/15.jpg)
UGM 2007
Performance
• Scales linearly
-500
0
500
1000
1500
2000
2500
3000
3500
4000
0 5000 10000 15000 20000 25000 30000 35000
Structure count
Ru
nn
ing
tim
e (
sec)
2006
2007
Linear (2007)
![Page 16: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/16.jpg)
UGM 2007
Performance
• Maximum speed achieved:1 000 structures/s
0
2000
4000
6000
8000
10000
12000
14000
100 1000 10000 20000 40000 100000
library size
run
tim
e (
s)
Ward 512
Jarp 512
LibMCS 6
• Memory requirements– scalable
– 50 000 structures occupy <100MB
![Page 17: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/17.jpg)
UGM 2007
In the pipeline
• Multi-stage clustering
• Additive clustering
• Disconnected MCS (Maximum Overlapping Set)
• Enhanced R-group decomposition
• Markush export
• Further clustering criteria
– Ring count
• Performance tuning
– Easier memory control of memory usage
![Page 18: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/18.jpg)
UGM 2007
Current roadmap and wishlist
• Simpler table view
• IJC integration
• Multi-cluster members
• Clustering million compound libraries
• Integrate Chemical Terms
• Stereo care MCS
•
•
![Page 19: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS](https://reader034.vdocuments.us/reader034/viewer/2022051314/5514653a550346284e8b5ae7/html5/thumbnails/19.jpg)
UGM 2007
Acknowledgements
• Co-workers– Péter Vadász– Judit Vaskó-Szedlár
• Ideas– Ferenc Csizmadia, Szabolcs Csepregi,
Ákos Papp, György Pirok
• Partners, early adaptors