linear scaling dft based on daubechies wavelets for ... filebigdft bigdft wavelets operations...

42
BigDFT BigDFT Wavelets Operations Performances O(N ) Minimal basis Performance Fragment Applications Perspectives Conclusions Linear Scaling DFT based on Daubechies Wavelets for Massively Parallel Archiecture L Ratcliff 1 , Stephan Mohr 1,2 , Paul Boulanger 1,3 , Luigi Genovese 1 , Damien Caliste 1 , Stefan Goedecker 2 , Thierry Deutsch 1 Laboratoire de Simulation Atomistique (L Sim), CEA Grenoble 2 Institut f ¨ ur Physik, Universit¨ at Basel, Switzerland 3 Institut N ´ eel, CNRS, Grenoble Centre Blaise Pascal, Lyon Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

Upload: duongkhanh

Post on 11-May-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Linear Scaling DFT based on DaubechiesWavelets for Massively Parallel Archiecture

L Ratcliff1, Stephan Mohr1,2, Paul Boulanger1,3,Luigi Genovese1, Damien Caliste1, Stefan Goedecker2,

Thierry Deutsch

1Laboratoire de Simulation Atomistique (L Sim), CEA Grenoble2Institut fur Physik, Universitat Basel, Switzerland

3Institut Neel, CNRS, Grenoble

Centre Blaise Pascal, Lyon

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Outline

1 The machinery of BigDFT (cubic scaling version)Mathematics of the waveletsMain OperationsMPI/OpenMP and GPU performances

2 O(N ) (linear scaling) BigDFT approachMinimal basis set approachPerformance and AccuracyFragment approachApplicationsPerspectives

3 Conclusions

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

A basis for nanosciences: the BigDFT project

STREP European project: BigDFT(2005-2008)Four partners, 15 contributors:CEA-INAC Grenoble (T.Deutsch), U. Basel (S.Goedecker),U. Louvain-la-Neuve (X.Gonze), U. Kiel (R.Schneider)

Aim: To develop an ab-initio DFT code basedon Daubechies Wavelets, to be integrated inABINIT, distributed under GNU-GPL license

After six years, BigDFT project still alive and wellBigDFT formalism from HPC perspective

Wavelets for O(N) DFT

Resonant states of open quantum systems

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Why do we use wavelets in BigDFT?

AdaptivityOne grid, two resolution levels in BigDFT:

• 1 scaling function (“coarse region”)

• 1 scaling function and 7 wavelets(“fine region”)

Ideal for big inhomogeneous systemsEfficient Poisson solver, capable ofhandling different boundary conditions –free, wire, surface, periodicExplicit treatment of charged systemsEstablished code with many capabilites

-1.5

-1

-0.5

0

0.5

1

1.5

-6 -4 -2 0 2 4 6 8

x

φ(x)

ψ(x)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

A brief description of wavelet theory

Two kind of basis functions

A Multi-Resolution real space basisThe functions can be classified following the resolution level theyspan.

Scaling FunctionsThe functions of low resolution level are a linear combination ofhigh-resolution functions

= +

φ(x) =m

∑j=−m

hjφ(2x− j)

Centered on a resolution-dependent grid: φj = φ0(x− j).

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

A brief description of wavelet theory

WaveletsThey contain the DoF needed to complete the information which islacking due to the coarseness of the resolution.

= 12 + 1

2

φ(2x) =m

∑j=−m

hjφ(x− j) +m

∑j=−m

gjψ(x− j)

Increase the resolution without modifying grid spaceSF + W = same DoF of SF of higher resolution

ψ(x) =m

∑j=−m

gjφ(2x− j)

All functions have compact support, centered on grid points.

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Adaptivity of the mesh

Atomic positions (H2O)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Adaptivity of the mesh

Fine grid (high resolution)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Adaptivity of the mesh

Coarse grid (low resolution)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Basis set features

Orthogonality, scaling relation∫dx φk (x)φj (x) = δkj φ(x) =

1√2

m

∑j=−m

hjφ(2x− j)

The hamiltonian-related quantities can be calculated up to machineprecision in the given basis.

The accuracy is only limited by the basis set (O(

h14grid

))

Exact evaluation of kinetic energyObtained by convolution with filters:

f (x) = ∑`

c`φ`(x) , ∇2f (x) = ∑

`

c` φ`(x) ,

c` = ∑j

cj a`−j , a` ≡∫

φ0(x)∂2x φ`(x) ,

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

http://www.bigdft.org version 1.7.x

• Isolated, surfaces and 3D-periodic boundary conditions(k-points, symmetries)

• All XC functionals of the ABINIT package (libXC library)

• Hybrid functionals, Fock exchange operator

• Direct Minimisation and Mixing routines (metals)

• Local geometry optimizations (with constraints)

• External electric fields (surfaces BC)

• Born-Oppenheimer MD

• Vibrations

• Unoccupied states

• Empirical van der Waals interactions

• Saddle point searches (NEB, Granot & Bear)

• All these functionalities are GPU-compatible

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Optimal for isolated systems

Test case: cinchonidine molecule (44 atoms)

10-4

10-3

10-2

10-1

100

101

105 106

Abs

olut

e en

ergy

pre

cisi

on (

Ha)

Number of degrees of freedom

Ec = 125 Ha

Ec = 90 Ha

Ec = 40 Ha

h = 0.3bohr

h = 0.4bohr

Plane waves

Wavelets

Allows a systematic approach for moleculesConsiderably faster than Plane Waves codes.10 (5) times faster than ABINIT (CPMD)Charged systems can be treated explicitly with the same time

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Systematic basis set

Two parameters for tuning the basisThe grid spacing hgrid

The extension of the low resolution points crmult

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Massively parallel

Two kinds of parallelisationBy orbitals (Hamiltonian application, preconditioning)

By components (overlap matrices, orthogonalisation)

A few (but big) packets of dataMore demanding in bandwidth than in latency

No need of fast network

Optimal speedup (eff. ∼ 85%), also for big systems

Cubic scaling codeFor systems bigger than 500 atomes (1500 orbitals) :orthonormalisation operation is predominant (N3)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Orbital distribution scheme

Used for the application of the hamiltonianThe hamiltonian (convolutions) is applied separately onto eachwavefunction

ψ5

ψ4

ψ3

ψ2

ψ1

MPI 0

MPI 1

MPI 2

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Coefficient distribution scheme

Used for scalar product & orthonormalisationBLAS routines (level 3) are called, then result is reduced

ψ5

ψ4

ψ3

ψ2

ψ1

MPI 0 MPI 1 MPI 2

Communications are performed via MPI ALLTOALLV

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

OpenMP parallelisation

Innermost parallelisation level(Almost) Any BigDFT operation is parallelised via OpenMP

4 Useful for memory demanding calculations

4 Allows further increase of speedups

4 Saves MPI processes and intra-node Message Passing

6 Less efficient thanMPI

6 Compiler and systemdependent

6 OpenMP sectionsshould be regularlymaintained 1.5

2

2.5

3

3.5

4

4.5

0 20 40 60 80 100 120

OM

P S

peed

up

No. of MPI procs

2OMP3OMP6OMP

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

High Performance Computing

Localisation & Orthogonality→ Data localityPrincipal code operations can be intensively optimised

Optimal for application on supercomputers

Little communication, bigpackets of data

No need of fast network

Optimal speedup

Efficiency of the order of 90%,up to thousands of processors

88

90

92

94

96

98

100

1 10 100 1000

Effi

cien

cy (

%)

Number of processors

(with orbitals/proc)

32 at

8

4

2

65 at

2

1

173 at

44

22

11

5

3257 at

16

4

11025 at

4

2

1

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Task repartition for a small system (ZnO, 128 atoms)

0

10

20

30

40

50

60

70

80

90

100

16 24 32 48 64 96 144 192 288 576 1

10

100

1000P

erce

nt

Sec

onds

(lo

g. s

cale

)

No. of cores

1 Th. OMP per core

CommsLinAlgConvCPUOtherTime (sec)Efficiency (%)

Data repartition optimal for material accelerators (GPU)Graphic Processing Units can be used to speed up the computation

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Using GPUs in a Big systematic DFT code

Nature of the operationsOperators approach via convolutions

Linear Algebra due to orthogonality of the basis

Communications and calculations do not interfere

* A number of operations which can be accelerated

Evaluating GPU convenienceThree levels of evaluation

1 Bare speedups: GPU kernels vs. CPU routinesDoes the operations are suitable for GPU?

2 Full code speedup on one processAmdahl’s law: are there hot-spot operations?

3 Speedup in a (massively?) parallel environmentThe MPI layer adds an extra level of complexity

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Hybrid and Heterogeneous runs with OpenCL

NVidia S2070 Connected each toa NehalemWorkstation

BigDFT may run onboth

ATI HD 6970

Sample BigDFT run: Graphene, 4 C atoms, 52 kpts

No. of Flop: 8.053 · 1012

MPI 1 1 4 1 4 8GPU NO NV NV ATI ATI NV + ATITime (s) 6020 300 160 347 197 109Speedup 1 20.07 37.62 17.35 30.55 55.23GFlop/s 1.34 26.84 50.33 23.2 40.87 73.87

Next Step: handling of Load (un)balancing

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Configuration space of the cage-like boron clusters

Stabilize the buckyball configuration of B80 systemsPRB 83 081403(R), 2011

-2

-1.5

-1

-0.5

0

0.5

-2

-1.5

-1

-0.5

0

0.5

B @B12

B 80

14:6

68

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Configuration space of the cage-like boron clusters

Stabilize the buckyball configuration of B80 systemsPRB 83 081403(R), 2011

-2

-1.5

-1

-0.5

0

0.5

-2

-1.5

-1

-0.5

0

0.5

-12.76 eV

-13.55 eV on B @B

12 68

@ B 80

8:12

Sc 3N

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Outline

1 The machinery of BigDFT (cubic scaling version)Mathematics of the waveletsMain OperationsMPI/OpenMP and GPU performances

2 O(N ) (linear scaling) BigDFT approachMinimal basis set approachPerformance and AccuracyFragment approachApplicationsPerspectives

3 Conclusions

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Scaling of BigDFT (I)

We can reach systems containing up to a few hundred electronsthanks to wavelet properties and efficient parallelization:

• MPI • OpenMP • GPU ported

But the various operations scale differently:

• O(N logN ): Poisson solver

• O(N 2): convolutions

• O(N 3): linear algebra

and have different prefactors:

• cO(N 3)� cO(N 2)� cO(N logN ) 0

10

20

30

40

50

60

0 1000 2000

Tim

e / i

tera

tion

(s)

Number of atoms

ax3 + bx2 + cx:2.8e-09; 8.4e-06; 4.1e-031.3e-10; 6.1e-07; 3.9e-035.7e-11; 6.1e-07; 4.9e-047.1e-11; 1.7e-11; 3.4e-032.1e-11; 1.2e-11; 3.8e-03

−→ for bigger systems the O(N 3) will dominate and so we need anew approach

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Scaling of BigDFT (II)

To improve the scaling and thus the scope of BigDFT, we must takeadvantage of the nearsightedness principle:

• the behaviour of large systems is short-ranged, ornear-sighted

• the density matrix decays exponentially in insulating systems

• can take advantage of this locality to create linear-scalingO(N ) DFT formalisms by having localized orbitals e.g. as inONETEP, Conquest, SIESTA...

• we want to adapt this formalism for BigDFT

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Minimal basis and density kernel

Write the KS orbitals as linearcombinations of minimal basisset φα(r):

Ψi (r) = ∑α

cαi φα(r)

• localized

• atom-centred

• expanded in wavelets

Define the density matrix ρ(r, r′)and kernel K αβ:

ρ(r, r′) = ∑i

∣∣Ψi (r)〉〈Ψi (r′)∣∣

= ∑α,β

∣∣∣φα(r)〉K αβ〈φβ(r′)∣∣∣

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

The algorithm

Initial Guess:Atomic Orbitals

Basis optimization:

Kernel Optimization:

Update potentialDiagonalize Hamiltonian

Forces

or

Minimal basis set• Initialized to atomic orbitals, optimized

using a quartic confining potential subjectto an orthogonality constraint

• Minimize the ‘trace’, band structureenergy or a combination at fixed potential

• SD or DIIS with k.e. preconditioning

Density kernelThree methods for self-consistent optimization:

• Diagonalization

• Direct minimization

• Fermi operator expansion (linear-scaling)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Scaling

0

10

20

30

40

50

60

0 1000 2000 3000 4000 5000 6000 7000 8000

Tim

e/itera

tion

(s)

Number of atoms

ax3

+ bx2

+ cx:1.0e+00; 3.0e+03; 1.5e+06

4.6e-02; 2.2e+02; 1.4e+06

7.6e-03; 4.3e-03; 1.4e+06

CubicDminFOE

2

4

6

8

10

12

0 4000 8000

Mem

ory

(GB

)

Number of atoms

Improved time and memory scaling• 301 MPI, 8 OpenMP for above results

• ∼ O (N ) for FOE

• need sparse matrix algebra for directminimisation→ O (N )

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Parallelization – MPI

0

5

10

15

20

0 500 1000 1500 2000 2500 3000 3500 4000 0 10 20 30 40 50 60 70 80 90 100

Spe

edup

wrt

160

cor

es

Effi

cien

cy (

%)

Number of cores (MPI tasks x 4 OMP threads)

SpeedupIdeal speedup

Amdahl’s law, P=0.97Efficiency

Efficient parallelization for 1000s of CPUs• ∼ 97% of code parallelized with MPI

• ∼ 93% of code parallelized with OpenMP

960 atomsLaboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Accuracy: Total energies and forces

Several systems 44 – 450 atoms• compared with standard BigDFT

• total energies accuracy: ∼ 1 meV/atom

• forces accuracy: ∼ a few meV/bohr

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Geometry optimization

1e−03

1e−02

1e−01

1e+00

RM

S F

(eV

/Å)

0

5

10

Tim

e (m

in.)

0

20

40

60

80

100

0 5 10 15

Cum

. tim

e (m

in.)

CubicReset

Reformat 0.00

0.05

0.10

0 5 10 15

∆ R

)

288 atoms

Further savings through support function reuse

• Pulay forces not needed • lower prefactor than single point

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Energy differences

Silicon cluster with a vacancy defect• 291 atoms

• need 9 support functions per Si

• accuracy in defect energy of 12 meV

pristine vacancy ∆ ∆-∆cubic

eV eV eV meVcubic −20674.2 −20563.1 111.167 –4/1 (sp-like) −20667.6 −20556.5 111.038 1299/1 (spd-like) −20672.9 −20561.7 111.155 12

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Fragment approach• By dividing a system into fragments,

we can avoid optimizing the supportfunctions entirely for large systems

• Substantially reduces the cost(need an efficient reformatting)

• Many useful applications, including theexplicit treatment of solvents

• A necessary first step towards atight-binding like approach, with eachatom as a fragment

Reformatting the minimal basis set in the same grid

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Support function reuse

As we have seen, the reuse of support functions lends itself tosome interesting applications, e.g. geometry optimizations,charged systemsFor this, we need an accurate and efficient reformatting scheme

0.001

0.01

0.1

1

10

100

E-

Ecubic

(meV

/ato

m)

Cubic eggboxLinear eggbox

Linear template

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Transfer integrals and site energies

0.00

0.20

0.40

0.60

0.80

1.00

0 30 60 90

Ene

rgy

(eV

)

Rotation angle

DZ J12 LUMOTZP J12 LUMOTMB J12 LUMO

0.00

0.20

0.40

0.60

0.80

1.00

0 30 60 90

Ene

rgy

(eV

)

Rotation angle

DZ J12 HOMOTZP J12 HOMOTMB J12 HOMO

−4.00

−3.50

−3.00

−2.50

−2.00

−1.50

0 30 60 90

Ene

rgy

(eV

)

Rotation angle

DZ e1,2 LUMOTZP e1,2 LUMOTMB e1,2 LUMO

−6.00

−5.50

−5.00

−4.50

−4.00

0 30 60 90E

nerg

y (e

V)

Rotation angle

DZ e1,2 HOMOTZP e1,2 HOMOTMB e1,2 HOMO

BigDFT compared to ADF fragment approach• support functions from molecules reused in dimers

• Application to OLED (Organic Light-Emitting Diode)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Transfer integrals for OLEDs (I)

We want to consider environment effects in realistic ‘host-guest’morphologies: 6192 atoms, 100 molecules

Host molecule

Guest molecule

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Transfer integrals for OLEDs (II)

Transfer integrals (left) and site energies (right) calculated with(bottom) and without (top) constrained DFT

-0.06 -0.04 -0.02 0 0.02 0.04 0.06Energy (eV)

all

host-host

guest-guest

host-guest-5.5 -5.4 -5.3 -5.2 -5.1 -5 -4.9 -4.8 -4.7 -4.6 -4.5

-7.3 -7.2 -7.1 -7 -6.9 -6.8 -6.7 -6.6 -6.5 -6.4 -6.3

Energy (eV)

hostguest

Generate good statistics.Then use of Markus model to calculate the efficiency of OLEDS.

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Perspectives of linear scaling

Combination of wavelets and localized basis functions• Highly accurate (energies and forces) and efficient

• Linear scaling – Thousands of atoms

• Ideal for massively parallel calculations

• Flexible e.g. reuse of support functions to accelerategeometry optimizations and for charged systems

Fragment approach• Accurate reformatting scheme – can accelerate calculations

on large systems e.g. explicit treatment of solvents

• Wide range of applications: constrained DFT, transfer integrals

• Crucial for future developments e.g. tight-binding, embeddedsystems (QM/QM)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Outline

1 The machinery of BigDFT (cubic scaling version)Mathematics of the waveletsMain OperationsMPI/OpenMP and GPU performances

2 O(N ) (linear scaling) BigDFT approachMinimal basis set approachPerformance and AccuracyFragment approachApplicationsPerspectives

3 Conclusions

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Conclusions

Wavelets: Powerful formalism• Linear scaling – Thousands of atoms

• Accurate reformatting scheme – can accelerate calculationson large systems e.g. explicit treatment of solvents

• Wide range of applications: constrained DFT, transfer integrals

• Towards multi-scale approach (embedded systems)

Beyond DFT• Resonant states are the way to describe fully a system

• Give a compact description (few states) of the systems

• Linear-response, TD-DFT, ...

• Many-body perturbation theory (GW, ...)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch

BigDFT

BigDFT

Wavelets

Operations

Performances

O(N )

Minimal basis

Performance

Fragment

Applications

Perspectives

Conclusions

Acknowledgments

Main maintainer Luigi Genovese

Group of Stefan Goedecker A. Ghazemi, A. Willand, S. De,A. Sadeghi, N. Dugan (Basel University)

Link with ABINIT and bindings Damien Caliste (CEA-Grenoble)

Order N methods S. Mohr, L. Ratcliff, P. Boulanger (Neel Institute,CNRS)

Exploring the PES N. Mousseau (Montreal University)

Resonant states A. Cerioni, A. Mirone (ESRF, Grenoble),I. Duchemin (CEA) M. Moriniere (CEA)

Optimized convolutions B. Videau (LIG, computer scientist,Grenoble)

Applications P. Pochet, E. Machado, S. Krishnan, D. Timerkaeva(CEA-Grenoble)

Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch