building a massive virtual screening using grid infrastructure chak sangma centre for...
TRANSCRIPT
![Page 1: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/1.jpg)
Building a Massive Virtual Building a Massive Virtual Screening using Grid InfrastructureScreening using Grid Infrastructure
Chak SangmaCentre for CheminformaticsKasetsart University
Putchong UthayopasHigh Performance Computing and Networking Center, Kasetsart University
![Page 2: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/2.jpg)
Motivation• Thailand’s Medicinal Plants is
important for Thai society– Over 1,000 species– Over 200,000 compounds– Multiple disease targets
• Problem– No complete collection of compounds
database– The practice is still mostly rely on local
knowledge and conventional wisdom– Lack of systematic verifications by scientific
methods
SIATIC PENNYWORT
Bariena lunulina Linae
![Page 3: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/3.jpg)
Kasetsart University Thai Medicinal Plants Effort
• Led by Center for Cheminformatics, Kasetsart University (Dr. Chak Sangma)
• Goal – Establish Thai medicinal plant knowledgebase
by building 3D molecular database– Employ Virtual Screening to verify active
compounds with conventional knowledge
![Page 4: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/4.jpg)
2D Structures
Optimized 3D Structures with GAMESS
Calculated Binding Energy with Autodock 3.0
Reports and Literatures
Structure in 0.5 Å from Binding Site
Results
SOM Neural Network Map
Approximated 3D Structures ComputeIntensive!
![Page 5: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/5.jpg)
ThaiGrid Drug Design Portal• Partners
– High Performance Computing and networking Center, KU– Center for Cheminfomatics, KU– IBM Thailand
• Goal – Building a virtual screening infrastructure on ThaiGrid System– Start from KU campus Grid and extended to other ThaiGrid
partner universities later
• Link – http://tgcc.cpe.ku.ac.th– http://www.thaigrid.net
![Page 6: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/6.jpg)
Challenge• Recent project for National Center for Genetic
Engineering and Biotechnology, Thailand– Screen 3000 compounds in 3 months
• Computation time on 2.4 GHz Pentium IV 4 system– Over 30 mins/1 optimized structure– Over 30 mins/1 docking
• Estimate computing time on single processor – (3,000 x 30) + (3,000 x 30) – 3,000 Hours– 125 Days– 4 month 16 days
• Not fast enough!
![Page 7: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/7.jpg)
Key Technologies• Three key technologies must be combined
to provide the solution– Cluster Computing– Grid Computing– Portal Technology
![Page 8: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/8.jpg)
What we want to do?
Hide the complexity of Grid and computational chemistry software from scientists while providing massive computational power needed
![Page 9: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/9.jpg)
Infrastructure• ThaiGrid infrastructure are
used• 10 Clusters from 6
organizations– AMATA – KU– GASS – KU– MAEKA – KU– WARINE – KU– CAMETA – SUT– OPTIMA - AIT– ENQUEUE – KMUTNB– PALM – KMUTNB– SPIRIT – CU– INCA - KMUTT
• 158 CPUs on 110 nodes
Network
SPIRIT
PALM
OPTIMA
ENQUEUE
GASS
AMATA
MAEKA
WARINE
KU
ThaiGrid User
ThaiGrid PortalTgcc.cpe.ku.ac.th
Submit
Grid Job Scheduling
CU
KMUTT
KMUTNB
AIT
SUT
CAMETAINCA
![Page 10: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/10.jpg)
Software Architecture• Each cluster has local
scheduler– SGE, OpenPBS, Condor can
be used– We use our SQMS scheduler
• Globus2.4 is used as middleware– Resources control and security
(GSI)
• Grid level scheduler control multi-cluster job submission– Use KU own SQMS/G
AMATAAMATA
KU Gigabit Campus NetworkKU Gigabit Campus Network
WarineWarine GASSGASS MaekaMaeka
Globus 2.4Globus 2.4
SQMSSQMS SQMSSQMS SQMSSQMS SQMSSQMS
SQMS/GSQMS/G
PortalPortal
SCMSWebSCMSWeb
![Page 11: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/11.jpg)
The Portal• Roles
– User interface– Automate execution flow– File access and management
• Features– Create project – Add ligand, enzyme– Submit screening job, monitor job
status– Download output
• Current portal is built using Plone – http://www.plone.org/– Python based web content
management– Flexible and extensible
![Page 12: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/12.jpg)
How things work!
ComputeResource
ComputeResource
ComputeResource
ComputeResource
ComputeResource
KU Campus network
Resource Broker
(SQMS/G)Portal
Grid MiddlewareGlobus2.4
Task Task
TaskTaskTaskMonitor
![Page 13: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/13.jpg)
Results• The first version of
compound databases (around 3,000 compounds)
• 3,000 compounds screened ( found 30 high potential compounds)– 4 drug targets (Influenza,
HIV-RT, HIV-PR, HIV-IN)
XK-263
![Page 14: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/14.jpg)
Experiences• Some files such as enzyme structure and output are very
large. – Require a good bandwidth between sites– Some simple optimizing techniques can help
• Implements caching of enzyme structure file at target hosts. Substantially reduce the number of transfer needed
• Batch scheduling approach is good if the systems are very homogenous– Allow dynamic execution code staging to the target host without
installation/recompilation• Many script tools must be developed to
– Streamline the execution– Handling data and code staging– Cleanup the execution
![Page 15: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/15.jpg)
Next Generation Massive Screening on Grid
• Move to Service Oriented Grid – Use Grid and Web services to encapsulate key applications– Build broker and service discovery infrastructure– Rely heavily on OGSA and GT3.X, 4.X
• Portlet based portal– JSR 168: Portlet Specification compliance– More modular , customizable, flexible– Plan to adopt GridShpere from gridlab (www.gridlab.org)
• Use database as backend instead of files– OGSA DAI might be used for data access
![Page 16: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/16.jpg)
Progress• We are working on
– New portal using GridSphere technology (done, testing)– Service wrapper for lagacy code
• Gamess, autodock (done, testing)
– MMJFS interface ( progress) – OGSA DAI integration (progress) – Service Registration and Discovery (partial) – Broker System ( design)– New Monitoring (done)
• Schedule – Finish and testing Jan-Feb 2005– Deploy in March 2005
![Page 17: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/17.jpg)
Scheduler
MMJFS
Gamess
GamessService
Gamess
File Server
Portal
Portlet
OG
SA D
AI
BrokerServer
RegistrationServer
BackendDB
MolecularDB
Grid Ftp
![Page 18: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/18.jpg)
Design Choices• Mass Data Transportation across site
– Central ftp server is used to store data/database – Each compute node can pull required data from this
ftp• Adhoc – ftp , wget/http (firewall friendly) • Next – Grid ftp
• Cluster/ Single server– Gridify using service wrapper to expose grid service
of that lagacy application to the grid– Not working for cluster since compute node are
hidden behind head node• Back to MMJFS interface that talk to local shceduler
![Page 19: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/19.jpg)
Design Choices• Service Discovery Mechanism
– Publish/subscribe model• Service advertising interface/protocol• Backend data based that shared
between registration service component and broker component
• Adoption of Grid Notification service and model– Available from mygrid project, seems
to be useful for more dynamics environment
– Scalability….
BrokerService
RegistrationService
Discovery (SQL)
![Page 20: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/20.jpg)
Job Submission
Job Status
Result visualization
![Page 21: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/21.jpg)
System Status
Performance Record
Job Queue Monitoring
![Page 22: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/22.jpg)
Service Discovery
![Page 23: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/23.jpg)
Conclusion• Grid and cluster computing is a key technology that
can give us the power. Grid works if use wisely!• Challenges
– Grid standard is still rapidly evolving• Things change before you can finish!
– Difficult to configure, maintain, Some part is still unstable
– Firewall and security concern– Lack of manpower with expertise
• Opportunity– Secure infrastructure– Cost reduction by the integration of networked
resources on demand
![Page 24: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/24.jpg)
Acknowledgement• HPCNC Team
– Somsak Sriprayoonsakul– Nuttaphon Thangkittisuwan – Thanakit Petchprasan – Isiriya Paireepairit
![Page 25: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/25.jpg)
The End
![Page 26: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/26.jpg)
Backup
![Page 27: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/27.jpg)
Process
2D Structure 3D StructureGAMESS
MolecularStructureDatabase
Optimized 3D Structure
Enzyme Enzyme GridAutodock
AutodockAutodock
Autodock
GAMESSGAMESS
GAMESSGAMESS
SOMNeural Network
Analysis Results
GRID
![Page 28: Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e445503460f94b38152/html5/thumbnails/28.jpg)
Grid Middleware (OGSA )
GridPortal
MoleculeDatabase
Docking Services
Resources ( Computer, Network)
Optimizing Services
OGSADAI
Monitoring Services
Portlet
Workflow Engine
Broker Services
Portlet Portlet Portlet