national institute of advanced industrial science and technology status report on the large-scale...

6
National Institute of Advanced Industrial Science and Technology Status report on the large-scale long-run simulation on the grid - Hybrid QM/MD simulation - Grid Technology Research Center Grid Technology Research Center AIST AIST Hiroshi Takemiya, Yoshio Tanaka Hiroshi Takemiya, Yoshio Tanaka

Upload: eric-odonnell

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

National Institute of Advanced Industrial Science and Technology Status report on the large-scale long-run simulation on the grid - Hybrid QM/MD simulation - Grid Technology Research Center AIST Hiroshi Takemiya, Yoshio Tanaka Slide 2 Goal of the experiment To verify the effectiveness of our programming approach for large- scale long-run grid applications Flexibility Robustness efficiency Friction simulation Nano-scale prober moves on the Si substrate Requiring hundreds of CPUs Requiring long simulation time over a few months No. of QM regions and No. of QM atoms change dynamically 525fs 40 v=0.009 /fs Gridifying the application Using GridRPC + MPI Gridifying the application Using GridRPC + MPI 2 QM regions with 72 + 41 QM atoms Totally 28598 atoms Slide 3 Used 11 clusters with totally 632 CPUs in 8 organizations. PRAGMA Clusters SDSC (32 CPUs), KU (8 CPUs), NCSA (8 CPUs), NCHC (8 CPUs) Titech-1(8 CPUs), AIST(8 CPUs) AIST Super Cluster M64 (128 CPUs), F32-1(128 CPUs + 128 CPUs) Japan Clusters U-Tokyo (128 CPUs), Tokushima-U (32 CPUs), Titech-2 (16 CPUs) Testbed for the Friction Simulation M64 F32 NCHC NCSA SDSC U-Tokyo Titech-2 Tokushima-U Titech-1 AIST KU Slide 4 Result of the Friction Simulation Experiment Time: 52. 5 days Longest Calculation Time: 22 day Manual restart: 2 times Execution failure: 165 times Succeeded in recovering these failures Changing the No. of CPUs used: 18 times succeeded in adjusting No. of CPUs to the No. of QM regions/QM atoms Slide 5 Summary and future work Our approach is effective for running large- scale grid applications for a long time Need more grid services Getting information on available resources Resource reservation Coordinating with resource manager/scheduler Need cleaner MPI mpich quits leaving processes/IPC resources Using GridMPI in place of mpich Slide 6