recent developments in the conquest code: cdft, exx,...

Recent Developments in the CONQUEST code: cDFT, EXX,

TDDFT and Basis Sets David Bowler

Thomas Young Centre, UCL, Gower St, London, UKLondon Centre for Nanotechnology, 17-19 Gordon St, London, UKDepartment of Physics & Astronomy, UCL, Gower St, London, UK

[email protected]://www.cmmp.ucl.ac.uk/~drb/

mailto:[email protected]

mailto:[email protected]

http://www.cmmp.ucl.ac.uk/~drb/

http://www.cmmp.ucl.ac.uk/~drb/

Petascale computing

•Petascale computing is already here: •Jaguar (ORNL) has 200,000+ cores•K computer in Japan (京) is already ≥10 PFLOPS

•800,000+ cores•The increase in power comes from massive numbers of CPUs

•Each CPU is multi-core (16-core standard)•How can we use these new resources efficiently ?

Extending DFT•DFT is enormously successful•But it has O(N3) scaling: limits applicability and parallel scalability

•Ground state theory with known problems (especially self-interaction, charge transfer)

•We are developing ways to pass these problems:•Linear scaling DFT•cDFT, EXX, Real Time TDDFT, ∆SCF•Spin and vdW in O(N) code

Locality

•Locality is key to parallel efficiency and scaling

•Standard DFT is non-local: wavefunctions span all space (or system)

•Plane waves also highly non-local

•O(N3) scaling will prevent efficient use of thousands of cores

Making DFT local•Work with the one-particle density matrix:

•We know that this decays with distance (Kohn)

•We will enforce locality on the density matrix

•N.B. This is a well-controlled approximation

�(r, r�) =�

n

fn⇥n(r)⇥�n(r�)

�(r, r�)⇥ 0, |r� r�|⇥⇤

⇢(r, r0) = 0, |r� r0| > Rc

Linear scaling DFT

Etot = EKE + Eps + EHar + EXC + EC

EKE = � �2

2m

�dr(⇥2

r�(r, r�))r=r�

Eps = 2�

dr dr�Vps(r, r�)�(r, r�)

n (r) = � (r, r)

•Rewrite DFT in terms of density matrix•Hartree, XC energy depend only on charge density

Linear scaling DFT•We need a practical problem to solve•Assume that the density matrix is separable:

•Here is a support function centred on atom i• is the density matrix in the basis of support functions•Assumes finite number of non-zero eigenvalues

�(r, r�) =�

i�,j⇥

⇥i�(r)Ki�,j⇥⇥j⇥(r�)

�i�(r)Ki�,j⇥

Truncating density matrix• The support functions are confined within

a sphere, radius Rreg

• The K matrix is truncated

• All matrices are sparse, ρ is local

• By increasing radii, approach exact result

Ki↵j� = 0, |Ri �Rj | > Rc

Support functions & matrices• We can now form Hamiltonian and overlap matrices

• This will require integration

• Support functions need a representation

• Action of operators on support functions needed

Hi↵j� =Z

dr�i↵(r)H�j�(r)

Si↵j� =Z

dr�i↵(r)�j�(r)

H�i↵(r)

n(r) =X

i↵j�

�i↵(r)Ki↵j��j�(r)

=X

i↵

�i↵(r)i↵(r)

i↵(r) =X

j�

Ki↵j��j�(r)

Linear scaling DFT•Minimise total energy with respect to and subject to:1. Correct electron number2. Self consistency (potential, charge density)3. Idempotency of density matrix

•We can consider the minimisation in three separate stages:1. Density matrix, 2. Self consistency3. Support functions

�i�(r) Ki�,j⇥

Ki�,j⇥

�2 = �

Brief Thoughts on O(N) DFT•Two basic approaches:•Spatial truncation•Approximate DM (imposed sparsity)•Variational•Consistent sparsity patterns (parallelisation)•LNV, OMM

•Numerical truncation•Exact DM to some tolerance•Non-variational•Varying sparsity patterns•McWeeny, TC2 etc

•See Rep. Prog. Phys. 75 036503 (2012)

CONQUEST:Capabilities

•Imposing idempotency is difficult

•We use the McWeeny transform:

•Here σ is an auxiliary density matrix

•If λσ lie in [-0.5, 1.5] then λρ will lie in [0,1]

•Vary energy with respect to elements of σ

•During minimisation, ρ tends towards idempotency

Minimising K

� = 3⇥2 � 2⇥3

Support Functions•Support functions are represented in terms of a basis:

•CONQUEST uses two different basis sets:

• Pseudo-atomic orbitals (cf OpenMX, SIESTA)

• B-splines or blips (cf ONETEP psincs)

•PAOs: analytic operations, small basis, intuitive

•Blips: systematic convergence (to plane-wave accuracy)

�i�(r) =�

s

ci�s⇥s(r)

Forces•Forces are exact derivatives of energy (include Pulay contributions)

•Available at all levels of approximation (non-SC, SC, LDA, GGA, all bases)

ETot = 2Tr[KH] + �EHar + �Exc + Ec

�EHar = �12

�drn(r)VHar(r)

�Exc =�

dr (fxc(n(r),g(r))� n(r)Vxc(r))

Fi = �2Tr[K⇥iH �⇥iKH] +⇥i[�EHar + �Exc] + Fci

Parallelisation•Most computer time is spent on matrix multiplication

•Own parallel, sparse matrix multiplication based on groups of atoms (partitions)

•Uniform grid for integration and FFTs, divided into blocks of grid points

•Cores have responsibility for a bundle of atoms and a domain of grid points (compact, overlapping)

•We use MPI for parallel calls (developing mix with OpenMP)

CONQUEST: Applications

• DNA in water: 3,400 atoms. Exact & O(N) static

• Water: 32 molecules. Exact & O(N) NVE MD

• DHFR: ~16,000 atoms. O(N) static

• GramicidinA: 16,000 atoms. Exact & O(N) static

• Ge/Si(001) hut clusters: 23,000 atoms. O(N) relaxation

Conquest Simulations

Ge/Si(001):Optimisation• DMM is robust

• Relaxation is efficient

Hut Nucleation

• One reconstruction gives transition ~3ML

• Matches experiment well

• Face reconstruction key (edges not)

-106.76

-106.74

-106.72

-106.70

-106.68

-106.66

-106.64

1 2 3 4 5 6

flatzigzag-1zigzag-22x82x62x4

En

erg

y p

er

Ge

hut

ato

m (

eV

)

coverage

• Difficult test for default partitioner

• Two different sizes, (harder for larger hut)

0 10 20 30 40Increase in number of cores

0

10

20

30

40

Sp

eed

up

fac

tor

22,746 atom hut (64-288 cores)


Ideal

0 10 20 30 400.6

0.7

0.8

0.9

1

Eff

icie

ncy

0 2 4 6 8 10Increase in number of cores

0

2

4

6

8

10

Sp

eed

up

fac

tor



Ideal

0 2 4 6 8 100.6

0.7

0.8

0.9

1

Eff

icie

ncy

Scaling: Ge/Si(001)

1 10 100 1000Increase in size

1

10

100

1000

Incr

ease

in

tim

e

104

105

106

Atoms

105

106

107

To

tal

tim

e (s

)

104

105

106

Atoms

103

104

105

En

erg

y (

Ha)

Million atom DFT• Bulk silicon for convenience

• 512 atoms/core (memory limited)

• Cubic cells, cubic numbers of cores

• Four support functions, slightly coarse grid

• Self-consistency done for smaller cells

Million atom DFT: details

Atoms Time/core (s) Energy (Ha) Cores

4,096 7068.878 -308.268 8

32,768 6893.759 -2,466.150 64

262,144 6931.418 -19,729.202 512

2,097,152 7032.496 -157,833.618 4096

Testing Defect Convergence

• We are working on Si dopants

• Cells from 512 atoms to 262,144 atoms

• Run on 8 to 4096 cores

• Look at monovacancy, divacancy, P substitutional

• Investigate O(N) convergence, formation energy with system size

Convergence to ground state

0 10 20 30 40Iteration

1e-05

0.0001

0.001

0.01

0.1

1

10

100

1000

Resid

ual

Bulk SiMonovacancyDivacancyP substitutional

Defect Energies

100 1000 10000 1e+05 1e+065.1375.1385.1395.14

5.1415.142

Ener

gy (e

V) Monovacancy

DivacancyP substitutional

100 1000 10000 1e+05 1e+067.8847.8867.8887.89

7.8927.8947.896

Ener

gy (e

V)

100 1000 10000 1e+05 1e+06Number of atoms

178.1178.15178.2

178.25178.3

178.35178.4

Ener

gy (e

V)

2-c. Ion channel : gramicidin AComparison of forces by CHARMM and CONQUEST

bulk water

Lipid Bilayers (DMPC)

ion

(a) Na ion is bound to GA (b) Ca ion is bound to GA

Forces acting on water molecules and ions in gramicidin A (GA) channel

Ion Channel

Scaling on 京•The machine is still in development: these results may improve

Scaling on 京•The machine is still in development: these results may improve•Scaling tests:

393,216 atom-system by 12,288 nodes : 295 sec786,432 atom-system by 12,288 nodes : 478 sec786,432 atom-system by 24,576 nodes : 289 sec(24,576 nodes = 196,608 cores)

Constrained DFT

•DFT is a successful ground state theory

•Self-interaction causes charges to spread out

•TDDFT works well for excitations

•Charge transfer excitations poorly described

•A good solution to these problems is constrained DFT

•We add an extra potential to constrain charge (or spin)

•Excited state becomes ground state

•We define a constraint on the charge density:

•Then define a new energy functional:

•We can find the derivatives, and minimise:

cDFT equations

W [�, Vc] = E[�] + Vc(�

wc(r)�(r)dr�Nc)

dW

dVc=

�wc(r)�(r)dr�Nc

�wc(r)�(r)dr = Nc

Implementation

⇥wc(r)�(r)dr =

⇥wc(r)

�

ij

⇥i(r)Kij⇥j(r)dr

=�

ij

wcijKij

wcij =

⇥⇥i(r)wc(r)⇥j(r)dr

•Write constraint in terms of K

•Now we need to define Nc or wc

Testing cDFT

•Charge transfer from BC to ZnBC

•DFT fails to predict 1/R form

•Use cDFT to confine charges

cDFT: scaling

•Confine charge in PPV•At ends and on adjacent monomers

•Change length and test scaling

2 3 4 5 6Oligomer length (units)

500

1000

1500

2000

Elap

sed

time

(s)

Ends, Δq=+1Ends, Δq=-1Adjacent, Δq=+1Adjacent, Δq=-1

cDFT: moving charge in DNA

•DNA 10-mer in water•We can move charge along the backbone

•Near charges •Far charges•cDFT about 6-8 times longer than DFT

Molecular Switches

•Biphenyls have natural twist angle•Shallow minimum•Photo-excitation changes geometry

Photo-excitation: cDFT

MolIIIIII

Expt DFT cDFT

0 0 0

0 35 8

36 75 41

•DFT fails to predict any change (charge spread)

•cDFT agrees well with experiment

•N.B. experiments in solution

Linear Scaling TDDFT•Standard packages calculate linear response (LR-TDDFT)

•Based on Casida approach; needs KS states•We use real-time propagation of the density matrix (RTP-TDDFT)

•Absorption spectrum found from dipole moment of molecule

•Simple to implement, compatible with O(N)•Range of density matrix needs testing

Linear Scaling TDDFT•The K matrix is propagated formally as:

iK = S�1HK�KHS�1

•The K matrix is propagated formally as:K(t) = U(t, t0)K(t0)U

†(t0, t)•The propagator from t to t+∆t is:

U(t+�t, t) = exp

⇥�iS�1H(⌧)�t

⇤

•We use a standard matrix exponential form

Linear Scaling TDDFT•We apply an electric field at t=0, and evaluate the dipole moment from the density:

•We find polarisability from the dipole moment:

•The cross-section follows:

pj (t) = pj (0)�Z

n (r, t)xjd3r

↵µj =

Rdtei!tpj(t)Rdtei!tE(t)

� (!) =4⇡!

cIm

✓1

3Tr (↵µj)

◆

Linear Scaling TDDFT

•Benzene dipole and spectrum

•Good agreement with experiment

•PAO basis sets need care•We are working on blips

Linear Scaling TDDFT

•tPA simulation•TDDFT evolved for 1.5fs

•Linear scaling seen with system size

•Preliminary results

EXX in Conquest•Linear scaling EXX seen already•We have developed a new approach•Contract with K to give 3-centre integrals (3CRI)•Efficient, linear scaling

EX = �2X

i,j,k,l

Z Zdrdr0

�i(r)Kij�j(r0)�k(r)Kkl�l(r0)

|r� r0|

= �2X

ij

KijXij

Xij =X

kl

Z Zdrdr0

�i(r)�j(r0)�k(r)Kkl�l(r0)

|r� r0|

EXX in Conquest

•We use density matrix•3CRIs improve efficiency•Range on X gives O(N)•Use ISF to solve Poisson

�k(r) =X

l

Kkl�l(r)

⇢kj(r) = �k(r)�j(r)

vkj(r) =

Zdr0

⇢kj(r0)

|r� r0|⌦j(r) =

X

k

vkj(r)�k(r)

Xij =

Zdr�i(r)⌦j(r)

EXX in Conquest

Basis Sets in Conquest

•Using DZP for O(N) can be hard (inverting S)

•Definition of second zeta is important

•Experimenting with combining PAOs into SFs

0.1 0.15 0.2 0.25 0.3 0.35 0.4Grid spacing (bohr)

-0.002

0

0.002

0.004

0.006

Frac

tiona

l erro

r in

ener

gy

Numerical KEAnalyticNumerical NLAnalytic

Basis Sets in Conquest

•Blips can be grid sensitive

•We are making all integrals analytic

•More efficient, accurate

Outlook & Conclusions•Linear scaling DFT allows simulations with 104-106 atoms (and beyond !)

•Constrained DFT allows simulations with charge transfer and localised charges

•TDDFT will soon be available for excitations

•EXX will soon be available for hybrids

•Basis sets are making excellent progress

•We are applying these to problems in biomolecules, nanostructures, dye-sensitised solar cells

Acknowledgements•Tsuyoshi Miyazaki (Conquest co-leader)•Lionel Truflandier (EXX)•Alex Sena (cDFT)•Conn O’Rourke (TDDFT)•Conquest team•Lianheng Tong•Michiaki Arita•Veronika Brázdová•Umberto Terranova•Ayako Nakata

Coming soon...

KuypersKlassische M

echanik9. A

ufl age

www.wiley-vch.de

Mit diesem Lehrbuch-Klassiker erhält der Leser eine Einführung in die Klassische Mechanik einschließlich der Relativistischen Mechanik. Der Text ist kompakt und übersichtlich gehalten, und alle grundlegen-den Aussagen werden durch anschauliche Beispiele illustrativ verdeut-licht. Die zahlreichen und erneut erweiterten Aufgaben und Beispiele sind eng an den Lehrstoff angelehnt, und am Ende des Buches befi n-den sich detaillierte Lösungen. Das Angebot umfasst sowohl ausführ-liche prüfungsorientierte Standardaufgaben als auch weiterführende Aufgaben. Für die neunte Aufl age hat der Autor eine DVD mit der MATLAB-basierten Software „Mechanicus“ entwickelt, die mit 52 Gleichungssys-temen zum interaktiven Experimentieren einlädt. Über 80 fotorealisti-sche Filme zu wichtigen oder faszinierenden mechanischen Systemen ergänzen diesen lehrreichen Service.

Aus Rezensionen zu früheren Aufl agen:

„Auch die Durchmischung des Stoffes mit anschaulichen Beispielen und der gut lesbare Text werden diese Ausgabe der Klassischen Mecha-nik in den Bestsellerlisten halten.“Internationale Mathematische Nachrichten

„Die Ausgewogenheit in Theorie und Anwendungen hilft, die klassi-sche Mechanik als das zu erkennen, was sie wirklich ist.“Optik

Stimmen von Hochschullehrern zu früheren Aufl agen:

„... ist das Buch von einer bestechenden Didaktik. Das äußert sich im Sprachstil, der dem Leser die Begeisterung des Autors unmittelbar mitteilt ...“

„… mit allergrößter – wissenschaftlicher wie pädagogischer – Sorgfalt und ausgewogen in Theorie und Anwendungsbeispielen …“

Friedhelm Kuypers unterrichtet seit 1986 Physik und Techni-sche Mechanik für Ingenieure und Naturwissenschaftler an der FH Regensburg. In seinen Vorlesungen legt er großen Wert auf Veranschaulichungen und hebt die Anwendung physikalischer Gesetze in Technik und Alltag hervor. Er ist ebenfalls Autor des zweibändigen Lehrbuches „Physik für Ingenieure und Natur-wissenschaftler“.

V. Brázdová, D. R. Bowler

Atomistic ComputerSimulations

PHYSICS TEXTBOOK

A Practical Guide

mit DVD

Weitere Titel

F. KuypersPhysik für Ingenieure und Naturwissenschaftler 12002. XII, 544 Seiten, Broschur.ISBN 3-527-40368-X

F. KuypersPhysik für Ingenieure und Naturwissenschaftler 22003. XII, 578 Seiten, Broschur.ISBN 3-527-40394-9

Available at APS March Meeting 2013

References•General: •Rep. Prog. Phys. 75, 036503 (2012)• J. Phys.: Condens. Matter 14, 2781–2798 (2002) •phys. stat. sol. b 243, 989-1000 (2006)

•Support functions:•Blips: Phys. Rev. B. 55, 13485 (1997) •PAOs: J. Phys.: Condens. Matter 20, 294206 (2008)

•Forces: J. Chem. Phys. 121, 6186-6194 (2004) •cDFT: J. Comput. Theor. Chem. 7, 884 (2011)

http://www.conquest.ucl.ac.uk/http://www.linear-scaling.org/

http://www.order-n.org/

http://www.conquest.ucl.ac.uk

http://www.conquest.ucl.ac.uk

http://www.linear-scaling.org

http://www.linear-scaling.org

http://www.order-n.org

http://www.order-n.org

recent developments in the conquest code: cdft, exx,...

Documents