recent developments in the conquest code: cdft, exx,...
TRANSCRIPT
Recent Developments in the CONQUEST code: cDFT, EXX,
TDDFT and Basis Sets David Bowler
Thomas Young Centre, UCL, Gower St, London, UKLondon Centre for Nanotechnology, 17-19 Gordon St, London, UKDepartment of Physics & Astronomy, UCL, Gower St, London, UK
[email protected]://www.cmmp.ucl.ac.uk/~drb/
Petascale computing
•Petascale computing is already here: •Jaguar (ORNL) has 200,000+ cores•K computer in Japan (京) is already ≥10 PFLOPS
•800,000+ cores•The increase in power comes from massive numbers of CPUs
•Each CPU is multi-core (16-core standard)•How can we use these new resources efficiently ?
Extending DFT•DFT is enormously successful•But it has O(N3) scaling: limits applicability and parallel scalability
•Ground state theory with known problems (especially self-interaction, charge transfer)
•We are developing ways to pass these problems:•Linear scaling DFT•cDFT, EXX, Real Time TDDFT, ∆SCF•Spin and vdW in O(N) code
Locality
•Locality is key to parallel efficiency and scaling
•Standard DFT is non-local: wavefunctions span all space (or system)
•Plane waves also highly non-local
•O(N3) scaling will prevent efficient use of thousands of cores
Making DFT local•Work with the one-particle density matrix:
•We know that this decays with distance (Kohn)
•We will enforce locality on the density matrix
•N.B. This is a well-controlled approximation
�(r, r�) =�
n
fn⇥n(r)⇥�n(r�)
�(r, r�)⇥ 0, |r� r�|⇥⇤
⇢(r, r0) = 0, |r� r0| > Rc
Linear scaling DFT
Etot = EKE + Eps + EHar + EXC + EC
EKE = � �2
2m
�dr(⇥2
r�(r, r�))r=r�
Eps = 2�
dr dr�Vps(r, r�)�(r, r�)
n (r) = � (r, r)
•Rewrite DFT in terms of density matrix•Hartree, XC energy depend only on charge density
Linear scaling DFT•We need a practical problem to solve•Assume that the density matrix is separable:
•Here is a support function centred on atom i• is the density matrix in the basis of support functions•Assumes finite number of non-zero eigenvalues
�(r, r�) =�
i�,j⇥
⇥i�(r)Ki�,j⇥⇥j⇥(r�)
�i�(r)Ki�,j⇥
Truncating density matrix• The support functions are confined within
a sphere, radius Rreg
• The K matrix is truncated
• All matrices are sparse, ρ is local
• By increasing radii, approach exact result
Ki↵j� = 0, |Ri �Rj | > Rc
Support functions & matrices• We can now form Hamiltonian and overlap matrices
• This will require integration
• Support functions need a representation
• Action of operators on support functions needed
Hi↵j� =Z
dr�i↵(r)H�j�(r)
Si↵j� =Z
dr�i↵(r)�j�(r)
H�i↵(r)
n(r) =X
i↵j�
�i↵(r)Ki↵j��j�(r)
=X
i↵
�i↵(r)i↵(r)
i↵(r) =X
j�
Ki↵j��j�(r)
Linear scaling DFT•Minimise total energy with respect to and subject to:1. Correct electron number2. Self consistency (potential, charge density)3. Idempotency of density matrix
•We can consider the minimisation in three separate stages:1. Density matrix, 2. Self consistency3. Support functions
�i�(r) Ki�,j⇥
Ki�,j⇥
�2 = �
Brief Thoughts on O(N) DFT•Two basic approaches:•Spatial truncation•Approximate DM (imposed sparsity)•Variational•Consistent sparsity patterns (parallelisation)•LNV, OMM
•Numerical truncation•Exact DM to some tolerance•Non-variational•Varying sparsity patterns•McWeeny, TC2 etc
•See Rep. Prog. Phys. 75 036503 (2012)
CONQUEST:Capabilities
•Imposing idempotency is difficult
•We use the McWeeny transform:
•Here σ is an auxiliary density matrix
•If λσ lie in [-0.5, 1.5] then λρ will lie in [0,1]
•Vary energy with respect to elements of σ
•During minimisation, ρ tends towards idempotency
Minimising K
� = 3⇥2 � 2⇥3
Support Functions•Support functions are represented in terms of a basis:
•CONQUEST uses two different basis sets:
• Pseudo-atomic orbitals (cf OpenMX, SIESTA)
• B-splines or blips (cf ONETEP psincs)
•PAOs: analytic operations, small basis, intuitive
•Blips: systematic convergence (to plane-wave accuracy)
�i�(r) =�
s
ci�s⇥s(r)
Forces•Forces are exact derivatives of energy (include Pulay contributions)
•Available at all levels of approximation (non-SC, SC, LDA, GGA, all bases)
ETot = 2Tr[KH] + �EHar + �Exc + Ec
�EHar = �12
�drn(r)VHar(r)
�Exc =�
dr (fxc(n(r),g(r))� n(r)Vxc(r))
Fi = �2Tr[K⇥iH �⇥iKH] +⇥i[�EHar + �Exc] + Fci
Parallelisation•Most computer time is spent on matrix multiplication
•Own parallel, sparse matrix multiplication based on groups of atoms (partitions)
•Uniform grid for integration and FFTs, divided into blocks of grid points
•Cores have responsibility for a bundle of atoms and a domain of grid points (compact, overlapping)
•We use MPI for parallel calls (developing mix with OpenMP)
CONQUEST: Applications
• DNA in water: 3,400 atoms. Exact & O(N) static
• Water: 32 molecules. Exact & O(N) NVE MD
• DHFR: ~16,000 atoms. O(N) static
• GramicidinA: 16,000 atoms. Exact & O(N) static
• Ge/Si(001) hut clusters: 23,000 atoms. O(N) relaxation
Conquest Simulations
Ge/Si(001):Optimisation• DMM is robust
• Relaxation is efficient
Hut Nucleation
• One reconstruction gives transition ~3ML
• Matches experiment well
• Face reconstruction key (edges not)
-106.76
-106.74
-106.72
-106.70
-106.68
-106.66
-106.64
1 2 3 4 5 6
flatzigzag-1zigzag-22x82x62x4
En
erg
y p
er
Ge
hut
ato
m (
eV
)
coverage
• Difficult test for default partitioner
• Two different sizes, (harder for larger hut)
0 10 20 30 40Increase in number of cores
0
10
20
30
40
Sp
eed
up
fac
tor
22,746 atom hut (64-288 cores)
11,620 atom hut (16-512 cores)
Ideal
0 10 20 30 400.6
0.7
0.8
0.9
1
Eff
icie
ncy
0 2 4 6 8 10Increase in number of cores
0
2
4
6
8
10
Sp
eed
up
fac
tor
22,746 atom hut (64-288 cores)
11,620 atom hut (16-512 cores)
Ideal
0 2 4 6 8 100.6
0.7
0.8
0.9
1
Eff
icie
ncy
Scaling: Ge/Si(001)
1 10 100 1000Increase in size
1
10
100
1000
Incr
ease
in
tim
e
104
105
106
Atoms
105
106
107
To
tal
tim
e (s
)
104
105
106
Atoms
103
104
105
En
erg
y (
Ha)
Million atom DFT• Bulk silicon for convenience
• 512 atoms/core (memory limited)
• Cubic cells, cubic numbers of cores
• Four support functions, slightly coarse grid
• Self-consistency done for smaller cells
Million atom DFT: details
Atoms Time/core (s) Energy (Ha) Cores
4,096 7068.878 -308.268 8
32,768 6893.759 -2,466.150 64
262,144 6931.418 -19,729.202 512
2,097,152 7032.496 -157,833.618 4096
Testing Defect Convergence
• We are working on Si dopants
• Cells from 512 atoms to 262,144 atoms
• Run on 8 to 4096 cores
• Look at monovacancy, divacancy, P substitutional
• Investigate O(N) convergence, formation energy with system size
Convergence to ground state
0 10 20 30 40Iteration
1e-05
0.0001
0.001
0.01
0.1
1
10
100
1000
Resid
ual
Bulk SiMonovacancyDivacancyP substitutional
Defect Energies
100 1000 10000 1e+05 1e+065.1375.1385.1395.14
5.1415.142
Ener
gy (e
V) Monovacancy
DivacancyP substitutional
100 1000 10000 1e+05 1e+067.8847.8867.8887.89
7.8927.8947.896
Ener
gy (e
V)
100 1000 10000 1e+05 1e+06Number of atoms
178.1178.15178.2
178.25178.3
178.35178.4
Ener
gy (e
V)
2-c. Ion channel : gramicidin AComparison of forces by CHARMM and CONQUEST
bulk water
Lipid Bilayers (DMPC)
ion
(a) Na ion is bound to GA (b) Ca ion is bound to GA
Forces acting on water molecules and ions in gramicidin A (GA) channel
Ion Channel
Scaling on 京•The machine is still in development: these results may improve
Scaling on 京•The machine is still in development: these results may improve•Scaling tests:
393,216 atom-system by 12,288 nodes : 295 sec786,432 atom-system by 12,288 nodes : 478 sec786,432 atom-system by 24,576 nodes : 289 sec(24,576 nodes = 196,608 cores)
Constrained DFT
•DFT is a successful ground state theory
•Self-interaction causes charges to spread out
•TDDFT works well for excitations
•Charge transfer excitations poorly described
•A good solution to these problems is constrained DFT
•We add an extra potential to constrain charge (or spin)
•Excited state becomes ground state
•We define a constraint on the charge density:
•Then define a new energy functional:
•We can find the derivatives, and minimise:
cDFT equations
W [�, Vc] = E[�] + Vc(�
wc(r)�(r)dr�Nc)
dW
dVc=
�wc(r)�(r)dr�Nc
�wc(r)�(r)dr = Nc
Implementation
⇥wc(r)�(r)dr =
⇥wc(r)
�
ij
⇥i(r)Kij⇥j(r)dr
=�
ij
wcijKij
wcij =
⇥⇥i(r)wc(r)⇥j(r)dr
•Write constraint in terms of K
•Now we need to define Nc or wc
Testing cDFT
•Charge transfer from BC to ZnBC
•DFT fails to predict 1/R form
•Use cDFT to confine charges
cDFT: scaling
•Confine charge in PPV•At ends and on adjacent monomers
•Change length and test scaling
2 3 4 5 6Oligomer length (units)
500
1000
1500
2000
Elap
sed
time
(s)
Ends, Δq=+1Ends, Δq=-1Adjacent, Δq=+1Adjacent, Δq=-1
cDFT: moving charge in DNA
•DNA 10-mer in water•We can move charge along the backbone
•Near charges •Far charges•cDFT about 6-8 times longer than DFT
Molecular Switches
•Biphenyls have natural twist angle•Shallow minimum•Photo-excitation changes geometry
Photo-excitation: cDFT
MolIIIIII
Expt DFT cDFT
0 0 0
0 35 8
36 75 41
•DFT fails to predict any change (charge spread)
•cDFT agrees well with experiment
•N.B. experiments in solution
Linear Scaling TDDFT•Standard packages calculate linear response (LR-TDDFT)
•Based on Casida approach; needs KS states•We use real-time propagation of the density matrix (RTP-TDDFT)
•Absorption spectrum found from dipole moment of molecule
•Simple to implement, compatible with O(N)•Range of density matrix needs testing
Linear Scaling TDDFT•The K matrix is propagated formally as:
iK = S�1HK�KHS�1
•The K matrix is propagated formally as:K(t) = U(t, t0)K(t0)U
†(t0, t)•The propagator from t to t+∆t is:
U(t+�t, t) = exp
⇥�iS�1H(⌧)�t
⇤
•We use a standard matrix exponential form
Linear Scaling TDDFT•We apply an electric field at t=0, and evaluate the dipole moment from the density:
•We find polarisability from the dipole moment:
•The cross-section follows:
pj (t) = pj (0)�Z
n (r, t)xjd3r
↵µj =
Rdtei!tpj(t)Rdtei!tE(t)
� (!) =4⇡!
cIm
✓1
3Tr (↵µj)
◆
Linear Scaling TDDFT
•Benzene dipole and spectrum
•Good agreement with experiment
•PAO basis sets need care•We are working on blips
Linear Scaling TDDFT
•Benzene dipole and spectrum
•Good agreement with experiment
•PAO basis sets need care•We are working on blips
Linear Scaling TDDFT
•tPA simulation•TDDFT evolved for 1.5fs
•Linear scaling seen with system size
•Preliminary results
EXX in Conquest•Linear scaling EXX seen already•We have developed a new approach•Contract with K to give 3-centre integrals (3CRI)•Efficient, linear scaling
EX = �2X
i,j,k,l
Z Zdrdr0
�i(r)Kij�j(r0)�k(r)Kkl�l(r0)
|r� r0|
= �2X
ij
KijXij
Xij =X
kl
Z Zdrdr0
�i(r)�j(r0)�k(r)Kkl�l(r0)
|r� r0|
EXX in Conquest
•We use density matrix•3CRIs improve efficiency•Range on X gives O(N)•Use ISF to solve Poisson
�k(r) =X
l
Kkl�l(r)
⇢kj(r) = �k(r)�j(r)
vkj(r) =
Zdr0
⇢kj(r0)
|r� r0|⌦j(r) =
X
k
vkj(r)�k(r)
Xij =
Zdr�i(r)⌦j(r)
EXX in Conquest
Basis Sets in Conquest
•Using DZP for O(N) can be hard (inverting S)
•Definition of second zeta is important
•Experimenting with combining PAOs into SFs
0.1 0.15 0.2 0.25 0.3 0.35 0.4Grid spacing (bohr)
-0.002
0
0.002
0.004
0.006
Frac
tiona
l erro
r in
ener
gy
Numerical KEAnalyticNumerical NLAnalytic
Basis Sets in Conquest
•Blips can be grid sensitive
•We are making all integrals analytic
•More efficient, accurate
Outlook & Conclusions•Linear scaling DFT allows simulations with 104-106 atoms (and beyond !)
•Constrained DFT allows simulations with charge transfer and localised charges
•TDDFT will soon be available for excitations
•EXX will soon be available for hybrids
•Basis sets are making excellent progress
•We are applying these to problems in biomolecules, nanostructures, dye-sensitised solar cells
Acknowledgements•Tsuyoshi Miyazaki (Conquest co-leader)•Lionel Truflandier (EXX)•Alex Sena (cDFT)•Conn O’Rourke (TDDFT)•Conquest team•Lianheng Tong•Michiaki Arita•Veronika Brázdová•Umberto Terranova•Ayako Nakata
Coming soon...
KuypersKlassische M
echanik9. A
ufl age
www.wiley-vch.de
Mit diesem Lehrbuch-Klassiker erhält der Leser eine Einführung in die Klassische Mechanik einschließlich der Relativistischen Mechanik. Der Text ist kompakt und übersichtlich gehalten, und alle grundlegen-den Aussagen werden durch anschauliche Beispiele illustrativ verdeut-licht. Die zahlreichen und erneut erweiterten Aufgaben und Beispiele sind eng an den Lehrstoff angelehnt, und am Ende des Buches befi n-den sich detaillierte Lösungen. Das Angebot umfasst sowohl ausführ-liche prüfungsorientierte Standardaufgaben als auch weiterführende Aufgaben. Für die neunte Aufl age hat der Autor eine DVD mit der MATLAB-basierten Software „Mechanicus“ entwickelt, die mit 52 Gleichungssys-temen zum interaktiven Experimentieren einlädt. Über 80 fotorealisti-sche Filme zu wichtigen oder faszinierenden mechanischen Systemen ergänzen diesen lehrreichen Service.
Aus Rezensionen zu früheren Aufl agen:
„Auch die Durchmischung des Stoffes mit anschaulichen Beispielen und der gut lesbare Text werden diese Ausgabe der Klassischen Mecha-nik in den Bestsellerlisten halten.“Internationale Mathematische Nachrichten
„Die Ausgewogenheit in Theorie und Anwendungen hilft, die klassi-sche Mechanik als das zu erkennen, was sie wirklich ist.“Optik
Stimmen von Hochschullehrern zu früheren Aufl agen:
„... ist das Buch von einer bestechenden Didaktik. Das äußert sich im Sprachstil, der dem Leser die Begeisterung des Autors unmittelbar mitteilt ...“
„… mit allergrößter – wissenschaftlicher wie pädagogischer – Sorgfalt und ausgewogen in Theorie und Anwendungsbeispielen …“
Friedhelm Kuypers unterrichtet seit 1986 Physik und Techni-sche Mechanik für Ingenieure und Naturwissenschaftler an der FH Regensburg. In seinen Vorlesungen legt er großen Wert auf Veranschaulichungen und hebt die Anwendung physikalischer Gesetze in Technik und Alltag hervor. Er ist ebenfalls Autor des zweibändigen Lehrbuches „Physik für Ingenieure und Natur-wissenschaftler“.
V. Brázdová, D. R. Bowler
Atomistic ComputerSimulations
PHYSICS TEXTBOOK
A Practical Guide
mit DVD
Weitere Titel
F. KuypersPhysik für Ingenieure und Naturwissenschaftler 12002. XII, 544 Seiten, Broschur.ISBN 3-527-40368-X
F. KuypersPhysik für Ingenieure und Naturwissenschaftler 22003. XII, 578 Seiten, Broschur.ISBN 3-527-40394-9
Available at APS March Meeting 2013
References•General: •Rep. Prog. Phys. 75, 036503 (2012)• J. Phys.: Condens. Matter 14, 2781–2798 (2002) •phys. stat. sol. b 243, 989-1000 (2006)
•Support functions:•Blips: Phys. Rev. B. 55, 13485 (1997) •PAOs: J. Phys.: Condens. Matter 20, 294206 (2008)
•Forces: J. Chem. Phys. 121, 6186-6194 (2004) •cDFT: J. Comput. Theor. Chem. 7, 884 (2011)
http://www.conquest.ucl.ac.uk/http://www.linear-scaling.org/
http://www.order-n.org/