aces iii and sial: technologies for petascale computing in chemistry and materials physics
DESCRIPTION
ACES III and SIAL: technologies for petascale computing in chemistry and materials physics. Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett AcesQC, LLC QTP, University of Florida Gainesville, Florida. Outline of the talk. Performance results - PowerPoint PPT PresentationTRANSCRIPT
Nov 14, 08 ACES III and SIAL 1
ACES III and SIAL: technologies for petascale computing in chemistry and materials physics
Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett
AcesQC, LLCQTP, University of FloridaGainesville, Florida
Nov 14, 08 ACES III and SIAL 2
Outline of the talk
Performance results What can be done today?
Design of petascale capable software How does SIAL work? What makes it different?
Outlook
Nov 14, 08 ACES III and SIAL 3
ACES III software
Developed under CHSSI CBD-03 Parallel for shared and distributed
memory Capabilities
Hartree-Fock (RHF, UHF) MBPT(2) energy, gradient, hessian CCSD(T) energy and gradient
(DROPMO) EOM-CC excited state energies
Nov 14, 08 ACES III and SIAL 4
Two examples
Luciferin(C11H8O3S2N2) RHF C1 symmetry Basis = aug-cc-pvdz
(494 bf) Ncorr
occ = 46
Sucrose (C12H22O11) RHF C1 symmetry Basis = 6-311G**
(546 bf) =91
Nov 14, 08 ACES III and SIAL 5
Luciferin CCSD(T)
CCSD on 128 processors One iteration: 23 min Total 12 iterations: 275 min
(T) Hardest 8 occupied orbitals: 420 min on
128 processors Total 48 correlated orbitals: 420 min on
768 processors
Nov 14, 08 ACES III and SIAL 6
Luciferin CCSD scalingmin per iter; 12 iterations; two versions;
0
20
40
60
80
100
120
140
32 64 128 256
Jan code
May code
ideal
Nov 14, 08 ACES III and SIAL 7
Sucrose CCSD scalingmin per iter, 8 iterations, on Cray XT4
0
5
10
15
20
25
30
35
256 512 1024 1536 2048
Sep code
ideal
Nov 14, 08 ACES III and SIAL 8
(H2O)21H+ scalingmin per iter; 657 bf 84 corr occ
0
5
10
15
20
25
30
35
1024 2048 3072 4096
Sep code
ideal
Nov 14, 08 ACES III and SIAL 9
Outline of the talk
Performance results What can be done today?
Design of petascale capable software How does SIAL work? What makes it different?
Outlook
Nov 14, 08 ACES III and SIAL 10
A computer with a single CPU
Basic data item: 64 bit number High level language: Fortran, C
c = a + b Assembly language
ADD dest,src ADD is an operation code dest and src are registers
Nov 14, 08 ACES III and SIAL 11
The ACES III parallel machine
Basic data item: data block 10,000 64 bit numbers -> super number
High level language: being developed
Assembly language: SIAL super instruction assembly language R(I,J,K,L) += V(I,J,C,D) * T(C,D,K,L)
xaces3 -> super instruction processor
Nov 14, 08 ACES III and SIAL 12
User level execution flow
ACES III
algo.sio
SIAL compiler
algo.sial
input
Nov 14, 08 ACES III and SIAL 13
Coarse grain parallelism
Executing super instructions in SIAL algorithm
Example: memory super instruction GET block Can be from
Local node RAM Other node RAM
Time for data to become available differs
Nov 14, 08 ACES III and SIAL 14
Fine grain parallelism
Inside super instructions Example: Compute super instruction
* (contractions) compute_integrals
Can use multiple cores Can use accelerators
GPGPUs and Cell processors FPGAs (field programmable gate arrays)
Nov 14, 08 ACES III and SIAL 15
Super instruction flow
Worker i GET a -> ask j … d=b*c … wait for a? a arrives <- e=a*d …
Worker j … <- send a … … … … …
Nov 14, 08 ACES III and SIAL 16
Super instruction performance
Super instructions are asynchronous Makes execution very elastic Helps maintain consistent
performance on many parallel architectures
Nov 14, 08 ACES III and SIAL 17
Distributed data
N worker tasks, each with local RAM Data distributed in RAM of workers
AO-based: direct use of integrals MO-based: use transformed integrals
Array blocks are spread over all workers
Nov 14, 08 ACES III and SIAL 18
Served (disk resident) data
M server tasks have access to local or global disk
storage accept, store and retrieve blocks also can compute integrals when asked
Data served to and from disk
Nov 14, 08 ACES III and SIAL 19
High level Low levelProblem
ACESIII design
concepts
Data structures
algorithms
communication
Input/output
Super instruction Assembly language
SIAL
Super instructionProcessor
SIP (xaces3)
input output
Performance
Nov 14, 08 ACES III and SIAL 20
Outline of the talk
Performance results What can be done today?
Design of petascale capable software How does SIAL work? What makes it different?
Outlook
Nov 14, 08 ACES III and SIAL 21
Clear divisions
Extreme object oriented approach High level = problem domain
specific Concepts Data structures Algorithms
Low level = focus on performance Processor and memory speed Communication latency and bandwidth
Nov 14, 08 ACES III and SIAL 22
Super Instruction Coding
Write algorithm in high level super instruction assembly language Declare (block) arrays, (block) indices DO - END DO construct PARDO – END PARDO construct Basic operations: add and multiply and
contract SIP_BARRIER Each line maps to a few super
instructions
Nov 14, 08 ACES III and SIAL 23
Optimize and Tune
Optimize with traditional techniques optimize the basic contraction
operations by mapping them to DGEMM calls
create fast code to generate integrals optimize memory allocation by using
multiple block stacks optimize execution and data movement
Nov 14, 08 ACES III and SIAL 24
Programmer productivity: Other
Other tools for parallel development UPC (Universal Parallel C) CAF (Co-Array Fortran) GA (Global Array Tools) DDI (Distributed Data Interface)
Simple syntax Specify precise data layout
PGAS partitioned global address space Rigorous array blocking
Nov 14, 08 ACES III and SIAL 25
Programmer productivity: SIAL
SIAL has simple syntax Experience shows it is more expressive
Exact data layout is done by SIP Allows runtime tuning and optimization
SIAL has rich set of data structures Distributed array Served array Temporary array Local array
Nov 14, 08 ACES III and SIAL 26
Outline of the talk
Performance results What can be done today?
Design of petascale capable software How does SIAL work? What makes it different?
Outlook
Nov 14, 08 ACES III and SIAL 27
New SIAL developer tools coming
Develop higher level programming language
Programmer support Eclipse as IDE (integrated development
environment) for SIAL coding Understands SIAL syntax Code refactoring tools
Rewrite code Help improve performance
Nov 14, 08 ACES III and SIAL 28
New algorithms being explored
SIAL: Data staging Huge served array Copy section in distributed array Work efficiently on distributed array Similar to BLAS-3 management of
cache ACES III: Linear scaling
Localized orbitals
Nov 14, 08 ACES III and SIAL 29
New domains being explored
Need A domain specialist, or a few of them Willingness and expertise to explore
alternative algorithms Apply “super instruction” design
pattern Find “super number”, the basic data
item in the domain “Super instructions” then follow
Nov 14, 08 ACES III and SIAL 30
Towards petascale computing
ACES III Ready for real work Has run on 8,192 processors
SIAL Useful in electronic structure Can be used in other domains