beyond gemm: how can we make quantum chemistry fast? or: why computer scientists don’t like...

21
Beyond GEMM: How Can We Make Quantum Chemistry Fast? or: Why Computer Scientists Don’t Like Chemists Devin Matthews 9/25/14 2014 BLIS Retreat 1

Upload: daniel-horn

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

2014 BLIS Retreat 1

Beyond GEMM: How Can We Make Quantum Chemistry Fast?

or: Why Computer Scientists Don’t Like Chemists

Devin Matthews

9/25/14

2014 BLIS Retreat 2

A Motivating Example

Equation-of-Motion Coupled Cluster Theory: what is the difference in energy between the ground and excited states of some molecule?

“matrix”:Describes the interactions in the system. The bar means it is “dressed” (i.e. tuned to a

specific ground state).

? E

S1

S0

9/25/14

“vector”:Describes the excited state. Should be an eigenvector of H.

scalar:The energy difference.

2014 BLIS Retreat 3

This is Linear Algebra, But…

9/25/14

R1

R2

R3

R4

Tensors!

2014 BLIS Retreat 4

This is Linear Algebra, But…

9/25/14

(+ all permutations!)

2014 BLIS Retreat 5

…It’s Really Multi-(non)-linear Algebra

9/25/14

Hundreds of tensor contractions in a single “matrix-vector multiply”…

2014 BLIS Retreat 6

Oh Yeah, It’s Sparse Too…

9/25/14

O2

~0.002% non-zero…

~0.39% non-zero…

2014 BLIS Retreat 7

Oh Yeah, It’s Sparse Too…

9/25/14

, ,…

Spin-orbital

+Symmetry

+Spin-integration

+Non-orthogonal spin-adaptation

+More symmetry

100.0%

0.174%

0.047%

0.016%

2014 BLIS Retreat 8

Oh Yeah, It’s Sparse Too…

9/25/14

• This symmetry is very unwieldy to use and maintain when using GEMM.

• This tensor may be very large and need to be split amongst several processors or be cached to disk.

A B E F

A B E F

A B E F

A B E F

A B E F

A B E F

ijkl=0000

0001

0002

0010

0011

0012

• Blocks may be distributed to disk or other processors.

• No symmetry makes using GEMM easier.

2014 BLIS Retreat 9

Oh Yeah, It’s Sparse Too…

9/25/14

The final reduction from 0.016% to ~0.002% in the previous example is due to point group symmetry:

2014 BLIS Retreat 10

Oh Yeah, It’s Sparse Too…

9/25/14

The final reduction from 0.016% to ~0.002% in the previous example is due to point group symmetry:

abij b

a

2014 BLIS Retreat 11

Adding It All Up

9/25/14

1 matrix-vector multiply

1 complicated tensor

Point group symmetry

Column symmetry

Solution of eigenproblem

100s-1000s of tensor contractions

100s-1000s of simpler tensors

Multiple GEMMs per contraction

10s of permutations

10s of iterations

X

X

X

X

Potentially billions (!!) of calls to GEMM

2014 BLIS Retreat 12

Adding It All Up

9/25/14

2014 BLIS Retreat 13

The Big Picture

9/25/14

,

Chem

istry

Line

ar A

lgeb

ra

“Simple” eigenproblem…

In terms of tensors…

In terms of other tensors…

With structured sparsity…

With symmetry…

With slicing (or blocking etc.)…

With more sparsity…

In terms of matrices.

2014 BLIS Retreat 14

Status Quo (CFOUR)

9/25/14

, Layer 4

Layer 3

Layer 2

Layer 1

Me

Som

eone

Else

“Simple” eigenproblem…

In terms of tensors…

In terms of other tensors…

With structured sparsity…

With symmetry…

With slicing (or blocking etc.)…

With more sparsity…

In terms of matrices.

MPI

OMP

OMP

+

2014 BLIS Retreat 15

Dealing With Chemistry: Large Scale

9/25/14

Node 1 Node 2 Node 3

Node 4 Node 5 Node 6

Node 7 Node 8 Node 9

Pros:• Each block has little to no

symmetry/sparsity.• Blocks can be distributed in many ways.• Load balancing can be static or dynamic.

Cons:• Blocks require padding for edge case. Padding can be

excessive for many dimensions or short edge lengths.• To avoid padding, some blocks must keep complex

structure.

2014 BLIS Retreat 16

Dealing With Chemistry: Large Scale

9/25/14

Node 1 Node 2 Node 3

Node 4 Node 5 Node 6

Node 7 Node 8 Node 9

Pros:• Load balancing is automatic.• Communication is regular.• Little to no padding needed.• Can be composed with blocking.

Cons:• Complex structure is retained at all levels.• Communication and local computation needs to take

this structure into account.

2014 BLIS Retreat 17

Dealing With Chemistry: Small Scale

9/25/14

ck

ckem

emai

aiThe Old Way The New Way?

BLIS:BLAS:

=Memory

movement

2014 BLIS Retreat 18

Dealing With Chemistry: Small Scale

9/25/14

AXPY!

BLIS:

W

kl

mn

abcd

mn

abcd

kl

R

Z

2014 BLIS Retreat 19

Flexibility Through Interfaces

9/25/14

Tensor<…>

, Basic Operator

Similarity-transform operator

Spin-orbital operator

Index permutation symmetry

Distributed

Point group symmetry

(Basic tensor functionality)

Capabilities:

Commutator expansion

Factorization, operator resolution

Tensor<DIST|IPS|SO|PGS>

Spin-integration or spin-adaptation

Blocking/packing

Tensor<DIST|IPS>

CTF

2014 BLIS Retreat 20

Summary• Chemistry is hard.

• A fast GEMM implementation is nice, but doesn’t go far enough.

• Complex structure can be dealt with– By breaking the problem into simple blocks,– By incorporating the structure into communication and computation,– By relating a complex object to a simpler one (a matrix) bit by bit.

• Layered and composable interfaces are important. – Implementations written at a “high level” can use “low level” interfaces

through intermediate ones.– Adapters can go from one well-defined interface to another.

9/25/14

2014 BLIS Retreat 21

Thanks!

9/25/14

BLIS:Field van ZeeTyler SmithMany others…

CTF/AQ:Edgar SolomonikJeff Hammond

Tensormental:

Martin SchatzBryan Marker

Tensor packing:Woody AustinMartin Schatz

Robert van de Geijn

John Stanton

The CFOUR developers