aces iii and sial: technologies for petascale computing in chemistry and materials physics

30
Nov 14, 08 ACES III and SIAL 1 ACES III and SIAL: technologies for petascale computing in chemistry and materials physics Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett AcesQC, LLC QTP, University of Florida Gainesville, Florida

Upload: lindsay

Post on 17-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

ACES III and SIAL: technologies for petascale computing in chemistry and materials physics. Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett AcesQC, LLC QTP, University of Florida Gainesville, Florida. Outline of the talk. Performance results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 1

ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett

AcesQC, LLCQTP, University of FloridaGainesville, Florida

Page 2: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 2

Outline of the talk

Performance results What can be done today?

Design of petascale capable software How does SIAL work? What makes it different?

Outlook

Page 3: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 3

ACES III software

Developed under CHSSI CBD-03 Parallel for shared and distributed

memory Capabilities

Hartree-Fock (RHF, UHF) MBPT(2) energy, gradient, hessian CCSD(T) energy and gradient

(DROPMO) EOM-CC excited state energies

Page 4: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 4

Two examples

Luciferin(C11H8O3S2N2) RHF C1 symmetry Basis = aug-cc-pvdz

(494 bf) Ncorr

occ = 46

Sucrose (C12H22O11) RHF C1 symmetry Basis = 6-311G**

(546 bf) =91

Page 5: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 5

Luciferin CCSD(T)

CCSD on 128 processors One iteration: 23 min Total 12 iterations: 275 min

(T) Hardest 8 occupied orbitals: 420 min on

128 processors Total 48 correlated orbitals: 420 min on

768 processors

Page 6: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 6

Luciferin CCSD scalingmin per iter; 12 iterations; two versions;

0

20

40

60

80

100

120

140

32 64 128 256

Jan code

May code

ideal

Page 7: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 7

Sucrose CCSD scalingmin per iter, 8 iterations, on Cray XT4

0

5

10

15

20

25

30

35

256 512 1024 1536 2048

Sep code

ideal

Page 8: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 8

(H2O)21H+ scalingmin per iter; 657 bf 84 corr occ

0

5

10

15

20

25

30

35

1024 2048 3072 4096

Sep code

ideal

Page 9: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 9

Outline of the talk

Performance results What can be done today?

Design of petascale capable software How does SIAL work? What makes it different?

Outlook

Page 10: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 10

A computer with a single CPU

Basic data item: 64 bit number High level language: Fortran, C

c = a + b Assembly language

ADD dest,src ADD is an operation code dest and src are registers

Page 11: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 11

The ACES III parallel machine

Basic data item: data block 10,000 64 bit numbers -> super number

High level language: being developed

Assembly language: SIAL super instruction assembly language R(I,J,K,L) += V(I,J,C,D) * T(C,D,K,L)

xaces3 -> super instruction processor

Page 12: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 12

User level execution flow

ACES III

algo.sio

SIAL compiler

algo.sial

input

Page 13: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 13

Coarse grain parallelism

Executing super instructions in SIAL algorithm

Example: memory super instruction GET block Can be from

Local node RAM Other node RAM

Time for data to become available differs

Page 14: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 14

Fine grain parallelism

Inside super instructions Example: Compute super instruction

* (contractions) compute_integrals

Can use multiple cores Can use accelerators

GPGPUs and Cell processors FPGAs (field programmable gate arrays)

Page 15: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 15

Super instruction flow

Worker i GET a -> ask j … d=b*c … wait for a? a arrives <- e=a*d …

Worker j … <- send a … … … … …

Page 16: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 16

Super instruction performance

Super instructions are asynchronous Makes execution very elastic Helps maintain consistent

performance on many parallel architectures

Page 17: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 17

Distributed data

N worker tasks, each with local RAM Data distributed in RAM of workers

AO-based: direct use of integrals MO-based: use transformed integrals

Array blocks are spread over all workers

Page 18: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 18

Served (disk resident) data

M server tasks have access to local or global disk

storage accept, store and retrieve blocks also can compute integrals when asked

Data served to and from disk

Page 19: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 19

High level Low levelProblem

ACESIII design

concepts

Data structures

algorithms

communication

Input/output

Super instruction Assembly language

SIAL

Super instructionProcessor

SIP (xaces3)

input output

Performance

Page 20: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 20

Outline of the talk

Performance results What can be done today?

Design of petascale capable software How does SIAL work? What makes it different?

Outlook

Page 21: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 21

Clear divisions

Extreme object oriented approach High level = problem domain

specific Concepts Data structures Algorithms

Low level = focus on performance Processor and memory speed Communication latency and bandwidth

Page 22: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 22

Super Instruction Coding

Write algorithm in high level super instruction assembly language Declare (block) arrays, (block) indices DO - END DO construct PARDO – END PARDO construct Basic operations: add and multiply and

contract SIP_BARRIER Each line maps to a few super

instructions

Page 23: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 23

Optimize and Tune

Optimize with traditional techniques optimize the basic contraction

operations by mapping them to DGEMM calls

create fast code to generate integrals optimize memory allocation by using

multiple block stacks optimize execution and data movement

Page 24: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 24

Programmer productivity: Other

Other tools for parallel development UPC (Universal Parallel C) CAF (Co-Array Fortran) GA (Global Array Tools) DDI (Distributed Data Interface)

Simple syntax Specify precise data layout

PGAS partitioned global address space Rigorous array blocking

Page 25: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 25

Programmer productivity: SIAL

SIAL has simple syntax Experience shows it is more expressive

Exact data layout is done by SIP Allows runtime tuning and optimization

SIAL has rich set of data structures Distributed array Served array Temporary array Local array

Page 26: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 26

Outline of the talk

Performance results What can be done today?

Design of petascale capable software How does SIAL work? What makes it different?

Outlook

Page 27: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 27

New SIAL developer tools coming

Develop higher level programming language

Programmer support Eclipse as IDE (integrated development

environment) for SIAL coding Understands SIAL syntax Code refactoring tools

Rewrite code Help improve performance

Page 28: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 28

New algorithms being explored

SIAL: Data staging Huge served array Copy section in distributed array Work efficiently on distributed array Similar to BLAS-3 management of

cache ACES III: Linear scaling

Localized orbitals

Page 29: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 29

New domains being explored

Need A domain specialist, or a few of them Willingness and expertise to explore

alternative algorithms Apply “super instruction” design

pattern Find “super number”, the basic data

item in the domain “Super instructions” then follow

Page 30: ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

Nov 14, 08 ACES III and SIAL 30

Towards petascale computing

ACES III Ready for real work Has run on 8,192 processors

SIAL Useful in electronic structure Can be used in other domains