1 first-principles molecular dynamics for petascale computers françois gygi dept of applied...

11
1 First-Principles Molecular Dynamics for Petascale Computers François Gygi Dept of Applied Science, UC Davis [email protected] http://eslab.ucdavis.edu Zhaojun Bai Dept of Computer Science, UC Davis Giulia Galli Dept of Chemistry, UC Davis Kwan-Liu Ma Dept of Computer Science, UC Davis Supported by NSF-ITR-HECURA 0749217

Upload: dana-harper

Post on 27-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

First-Principles Molecular Dynamics for Petascale Computers

François GygiDept of Applied Science, UC [email protected]://eslab.ucdavis.edu

Zhaojun BaiDept of Computer Science, UC Davis

Giulia GalliDept of Chemistry, UC Davis

Kwan-Liu MaDept of Computer Science, UC Davis

Supported by NSF-ITR-HECURA 0749217

2

The Qbox project

• Qbox is a C++/MPI implementation of First-Principles Molecular Dynamics (FPMD)

• Qbox includes a quantum mechanical description of electronic structure within Density Functional Theory

• Applications to Materials Science, Chemistry, Nanoscience

• Software development focuses on large-scale parallelism

3

Qbox code architecture

Qbox

ScaLAPACK/PBLAS

BLACS

MPI

BLAS/ATLAS

XercesC

(XML parser)

FFTW lib

DGEMM lib

http://eslab.ucdavis.edu/software/qbox

4

Qbox performance results

8 k-points: 207.3 TFlop/s (56% of peak)

4 k-points: 187.7 TFlop/s (51% of peak)

1 k-point: 108.8 TFlop/s (30% of peak)

2006 ACM/IEEE Gordon Bell Award for peak performance

• Electronic structure of a 1000-atom Molybdenum sample

• 12,000 electrons• LLNL BlueGene/L

5

Current Qbox availability on Teragrid Platforms

• Mercury, NCSA • Cobalt, NCSA• Tungsten, NCSA• BlueGene/L, SDSC• IBM p655, SDSCOther platforms• ANL BG/L• ANL BG/P• NERSC Franklin, Cray XT4• NCSA Abe

6

New scalable algorithms for electronic structure calculations

• One-sided Jacobi simultaneous diagonalization algorithm used in electronic structure calculations – 64-node dual-dual-core

AMD Opteron/Infinipath cluster

– 1 rack ANL BlueGene/L

0

1

2

3

4

5

6

7

8

9

10

0 200 400 600 800 1000 1200

N_CPU

Sp

eed

up

: t(

NC

PU

min

)/t(

NC

PU

)

m=8192 BG/L speedup m=8192 AMD/Opt speedupBG/L ideal speedup AMD/Opt ideal speedup

7

Qbox scalability for nanoscience applications

• Electronic structure of a 2260-atom silicon nanowire

• Cray-XT4, up to 8k CPUs • Superlinear scaling due

to cache effects and size-dependent MPI protocols

• 86% parallel efficiency between 2k and 8k CPUs

0

1

2

3

4

5

0 2048 4096 6144 8192

N_CPU

Sp

eed

up

: t(

NC

PU

min

)/t(

NC

PU

)

Qbox / Cray-XT4 ideal speedup

8

Qbox parallel I/O strategy

• Advanced functions in MPI-IO are not supported by all file systems (MPI_File_write_shared, etc.)

• Qbox uses a strategy based on shared file pointer objects• Achieves >700 MB/s write rate for file sizes of 50–250 GB

platform #tasks write speed

Cray-XT4 2048 778 MB/s

Cray-XT4 4096 715 MB/s

Cray-XT4 8192 687 MB/s

BG/P (ANL) 2048 814 MB/s

9

Analysis of MPI message traffic patterns in Qbox

• Multiple traffic patterns are involved during a Qbox simulation– physics kernels– 3D Fourier transforms– ScaLAPACK linear algebra

• Logical-to-physical mapping of tasks has a large impact on performance on large platforms (> 4k CPUs)

• We are developing instrumentation and visualization tools to analyze message traffic patterns on various interconnect architectures

Mapping of 65536 MPI tasks on the 32x32x64 torus of the LLNL BG/L

10

Analysis of MPI message traffic patterns in Qbox• Screenshot of the message traffic visualization tool showing

MPI calls in a ScaLAPACK matrix multiplication (C. Muelder, K-L Ma, UCDavis)

11

Qbox current developments

• Deployment on TeraGrid track-2 platforms• Applications to Nanoscience simulations

– G. Galli, Chemistry UCDavis• Specialized linear algebra algorithms

– Z. Bai, Computer Science, UCDavis• Visualization

– K-L. Ma, Computer Science, UCDavis• Application-specific data compression algorithms• Large dataset management (1010 – 1012 bytes)• XML standards for electronic structure data

(http://www.quantum-simulation.org)

Supported by NSF-ITR-HECURA 0749217

http://eslab.ucdavis.edu