fast algorithms for electronic structure analysis
TRANSCRIPT
Lin Lin
Computational Research Division, Lawrence Berkeley National Laboratory
Laboratoire Jacques-Louis Lions,
Paris 6, June 2013
Supported by Luis Alvarez fellowship in LBNL, DOE SciDAC and BES Partnership.
1
Fast Algorithms for Electronic Structure Analysis
Acknowledgment Collaborators of past and ongoing projects on this topic: โข Roberto Car, Princeton University โข Mohan Chen, Princeton University โข Weinan E, Princeton University and Peking University โข Alberto Garcia, Institute de Ciencia de Materiales de Barcelona โข Lixin He, University of Science and Technology in China โข Georg Huhs, Barcelona Supercomputing Center โข Mathias Jacquelin, Lawrence Berkeley National Laboratory โข Juan Meza, UC Merced โข Jianfeng Lu, Duke University โข Chao Yang, Lawrence Berkeley National Laboratory โข Lexing Ying, Stanford University
2
Electronic structure theory Main goal: Given fixed atomic positions ๐ ๐ผ ๐ผ=1
๐ , compute the ground state electron energy ๐ธ๐( ๐ ๐ผ ). Useful in a large number of applications. Ground state electron wavefunction ฮจ๐(๐1,โฏ , ๐๐; ๐ ๐ผ )
โ12๏ฟฝฮ๐ โ๏ฟฝ๏ฟฝ
๐๐ผ๐๐ โ ๐ ๐ผ
+12๏ฟฝ
1๐๐ โ ๐๐
๐
๐,๐=1,๐โ ๐
๐
๐=1
๐
๐ผ=1
๐
๐=1
ฮจ๐ = ๐ธ๐ ๐ ๐ผ ฮจ๐
Curse of dimensionality
The fundamental laws necessary to the mathematical treatment of large parts of physics and the whole of chemistry are thus fully known, and the difficult lies only in the fact that application of these laws leads to equations that are too complex to be solved.
โP. Dirac, 1929
3
Pople diagram
John Pople, Nobel Prize in Chemistry, 1998
Acc
urac
y
CI CCSD(T)
RPA
MP2
DFT TB
10 100 1000 10000
Number of atoms
Density functional theory (DFT): best compromise between efficiency and accuracy. Most widely used electronic structure theory for condensed matter systems.
4
Kohn-Sham density functional theory
โข Efficient: Single particle theory โข Accurate: Exact ground state energy for
exact ๐๐ฅ๐ฅ[๐], [Hohenberg-Kohn,1964], [Kohn-Sham, 1965]
7
๐ป ๐ ๐๐ ๐ฅ = โ12ฮ + ๐๐๐ฅ๐ + โซ ๐๐ฅโฒ
๐ ๐ฅโฒ
๐ฅ โ ๐ฅโฒ+ ๐๐ฅ๐ฅ ๐ ๐๐ ๐ฅ = ๐๐๐๐ ๐ฅ
๐ ๐ฅ = 2๏ฟฝ ๐๐ ๐ฅ 2๐/2
๐=1
, โซ ๐๐ฅ ๐๐โ ๐ฅ ๐๐ ๐ฅ = ๐ฟ๐๐
Walter Kohn, Nobel Prize in Chemistry, 1998
Self Consistent Field Iteration
8
๐ป[๐๐๐] ๐๐๐
๐๐๐๐
Discretization
Evaluation Iteration
Self Consistent Field Iteration
9
๐ป[๐๐๐] ๐๐๐
๐๐๐๐
Discretization
Evaluation Iteration
1) Very costly step. 2) Limiting practical calculations
to hundreds of atoms
Cubic scaling of KSDFT
10
โข KS orbitals are delocalized in the global domain.
โข N atoms. ๐(๐) grid points. ๐(๐) KS orbitals. โข Orthogonalization of an ๐ ๐ ร ๐(๐) matrix โ ๐ ๐3
scaling, regardless of what eigensolver is being used. Cannot efficiently use high performance supercomputers.
โข Conclusion: DO NOT directly treat KS orbitals that are
delocalized in the global domain.
Evaluation: Alternatives? โข Linear scaling algorithms
โข Near-sightedness [Kohn, 1996] โข Truncation based algorithm: low to intermediate accuracy โข Only applicable to insulators.
[Bowler and Miyazaki, Rep. Prog. Phys 2012] โโฆThe second challenge is that of metallic systems: there is no clear route to linear-scaling solution for systems with low or zero gaps and extended electronic structureโฆโ
โข Difficult task:
โข Accurate and efficient โข Uniformly applicable to metals as well as insulators.
11
ฮ๐(๐โฒ) ฮ๐(๐)
๐โฒ โ ๐
Alterative solution? Linear scaling methods
โข Truncation (KS orbital, 1-dm). Near-sightedness.
โข Very costly for metals (large
preconstant)
โข Complicated user-interface (select truncation region)
[Yang, 1991], [Kohn, 1996]. Review: [Goedecker, 1999]. [Bowler-Miyazaki, 2012].
What we propose โข No truncation. Not based on
near-sightedness.
โข Applicable to insulator and metal.
โข Black-box user-inteface. โข Scales better than ๐(๐3).
Outline
PEXSI: Pole EXpansion Selected Inversion
โข Pole Expansion โข Selected Inversion โข How it works in practice
13
PEXSI at work
14
C-BN-C layered system, weak scaling for more than 10,000 atoms. All examples use 40*256=10240 procs on hopper.
Number of atoms
Equivalent cells
Matrix dimension
Time per iteration
Scaling
2532 1 ร 1 32916 32 1 10128 2 ร 2 131664 258 8.06 20256 4 ร 2 263328 554 17.3
๐(๐1.5) scaling
๐(๐) scaling
ScaLAPACK performance: 230 sec for 2532 atoms using 768 processors and does not scale beyond.
KSDFT: Matrix point of view
๐ ๐ฅ = 2๏ฟฝ ๐๐ ๐ฅ 2๐/2
๐=1
= ๐1(๐ฅ) โฆ ๐๐๐ก(๐ฅ)๐(๐1 โ ๐)
โฑ ๐(๐๐๐ก โ ๐)
๐1(๐ฅ)โฎ
๐๐๐ก(๐ฅ)= ๐(๐ป ๐ โ ๐๐) ๐ฅ,๐ฅ
โข ๐ : Chemical potential such that #{๐ ๐ป โค ๐} = ๐/2
โข ๐ : Heaviside function satisfying ๐ ๐ฅ = ๏ฟฝ2, ๐ฅ โค 0,0, ๐ฅ > 0
๐ = diag ๐(๐ป ๐ โ ๐๐)
15
Finite temperature: Fermi operator
๐ = diag2
1 + ๐๐ฝ(๐ป[๐]โ๐๐)
โข ๐ฝ = 1/๐๐ต๐: inverse temperature โข ๐: Chemical potential
โข Finite temperature, Fermi-Dirac โข Zero temperature, Heaviside
16
Fermi operator expansion
โข ฮ๐ธ = ๐(๐ป โ ๐๐). โข Fermi operator expansion: solving KSDFT without diagonalization
โข [Goedecker, 1993], ๐ โผ ๐ ๐ฝฮ๐ธ โข [Head-Gordon et al, 2004], ๐ โผ ๐(๐ฝฮ๐ธ) but with ๐( ๐ฝฮ๐ธ)
operation โข [Ceriotti et al, 2008], Q โผ ๐ ๐ฝฮ๐ธ ; other work
๐ = diag2
1 + ๐๐ฝ(๐ป[๐]โ๐๐) = diag2
1 + ๐๐ฝฮ๐ธ ๐ป[๐]โ๐๐ฮ๐ธ
โ diag ๏ฟฝ๐๐
๐
๐=1
๐ป ๐ โ ๐๐ฮ๐ธ
๐
+ ๏ฟฝ๐๐
๐ง๐๐ โ๐ป ๐ โ ๐๐
ฮ๐ธ ๐๐
๐
๐=1
17
Pole expansion โข [LL, Lu, Ying and E, 2009], ๐ โผ ๐ log ๐ฝฮ๐ธ
๐ โ diag๏ฟฝ๐๐
๐ป โ ๐ง๐๐
๐
๐=1
โข ๐ง๐ ,๐๐ โ โ are complex shifts and complex weights
18
Contour integral technique
Fermi-Dirac
๐ ๐ =12๐๐
๏ฟฝ๐ ๐ง๐ง โ ๐
๐๐ง โ12๐๐
๏ฟฝ๐ ๐ง๐ ๐ค๐๐ง๐ โ ๐
๐
๐=1ฮ
19
Contour integral technique
Fermi-Dirac
๐ ๐ =12๐๐
๏ฟฝ๐ ๐ง๐ง โ ๐
๐๐ง โ12๐๐
๏ฟฝ๐ ๐ง๐ ๐ค๐๐ง๐ โ ๐
๐
๐=1ฮ
Simpler problem
[Hale, Higham and Trefethen, 2008] ๐ ๐ โ ๐๐ ๐ โผ ๐(๐โ๐ถ๐/ log(๐/๐))
20
Contour selection โข [Hale, Higham, Trefethen 2008] ๐พ
โฒ
2๐พโผ 1
log๐๐
โข Trapezoid rule for periodic function gives geometric convergence
22
Outline
PEXSI: Pole EXpansion Selected Inversion
โข Pole Expansion โข Selected Inversion โข How it works in practice
25
Selected inversion
๐ โ diag๏ฟฝ๐๐
๐ป โ ๐ง๐๐
๐
๐=1
โข All the diagonal elements of an inverse matrix. โข ๐ป is a sparse matrix, but ๐ป โ ๐ง๐๐ โ1 is a full matrix. โข Naรฏve approach: ๐ ๐3 . โข Need selected inversion.
26
Selected inversion: basic idea โข ๐ฟ๐ฟ๐ฟ๐ factorization
๐ด =๐ด11 ๐ด21๐
๐ด21 ๏ฟฝฬ๏ฟฝ22= 1 0
๐ฟ21 ๐๐ด11 0
0 ๐221 ๐ฟ21๐0 ๐
๐ฟ21 = ๐ด21๐ด11โ1, ๐22 = ๏ฟฝฬ๏ฟฝ22 โ ๐ด21๐ฟ21๐
โข Inversion
๐ดโ1 = ๐ด11โ1 + ๐ฟ21๐ ๐22โ1๐ฟ21 โ๐ฟ21๐ ๐22โ1
โ๐22โ1๐ฟ21 ๐22โ1
27
Observation: If ๐ฟ21 is sparse, ๐ฟ21๐ ๐22โ1๐ฟ21 only require rows and columns of ๐22โ1 corresponding to the sparsity pattern of ๐ฟ21.
Recursive relation
๐22 =๐ด22 ๐ด32๐
๐ด32 ๏ฟฝฬ๏ฟฝ33
๐ด = 1 0๐ฟ21 ๐
1 0 00 1 00 ๐ฟ32 ๐
๐ด11 0 00 ๐ด22 00 0 ๏ฟฝฬ๏ฟฝ33
1 0 00 1 ๐ฟ32๐0 0 ๐
1 ๐ฟ21๐0 ๐
๐ดโ1 =๐ด11โ1 + ๐ฟ21๐ ๐22โ1๐ฟ21 โ๐ฟ21๐ ๐22โ1
โ๐22โ1๐ฟ21๐ด22โ1 + ๐ฟ32๐ ๐33โ1๐ฟ32 โ๐ฟ32๐ ๐33โ1
โ๐33โ1๐ฟ32 ๐33โ1
28
Recursive relation โข ๐ = ๐ ๐ฟ21 ๐, 1 โ 0 , 2 โ ๐ โข ๐ฟ21 ๐, 1 โ 0 โ ๐22 ๐, ๐ โ 0, ๐, ๐ โ ๐ because ๐22 = ๐ด22 โ ๐ด21๐ฟ21๐ โ ๐ฟ32 ๐, 2 โ 0, ๐ โ ๐
โข ๐ดโ1 =๐ด11โ1 + ๐ฟ21๐ ๐22โ1๐ฟ21 โ๐ฟ21๐ ๐22โ1
โ๐22โ1๐ฟ21๐ด22โ1 + ๐ฟ32๐ ๐33โ1๐ฟ32 โ๐ฟ32๐ ๐33โ1
โ๐33โ1๐ฟ32 ๐33โ1
29
Selected inversion โข ๐ด = ๐ฟ๐ฟ๐ฟ๐: ๐ดโ1 restricted to the non-zero pattern of ๐ฟ is โself-
containedโ. Exact method with exact arithmetic.
โข For KS Hamiltonian discretized by local basis set, the cost of selected inversion is ๐(๐) for 1D systems, ๐ ๐1.5 for 2D systems, and ๐(๐2) for 3D systems.
โข Combined with pole expansion: At most ๐ ๐2 scaling for solving Kohn-Sham problem.
โข Idea of selected inversion dates back to [Erisman and Tinney, 1975],
[Takakashi et al 1973]; For electronic structure [LL-Lu-Ying-Car-E, 2009]; For quantum transport [Li, Darve et al, 2008]
30
SelInv: Numerical results SelInv: a selected inversion package for general sparse symmetric matrix written in FORTRAN. [LL-Yang-Meza-Lu-Ying-E, TOMS, 2011]
31
Outline
PEXSI: Pole EXpansion Selected Inversion
โข Pole Expansion โข Selected Inversion โข How it works in practice
32
Force
33
๐น๐ = โ๐๐ ๐พ๐๐ป๐๐ ๐
+ ๐๐ ๐พ๐ธ๐๐๐๐ ๐
โข Including both the Hellmann-Feynman force and the Pulay force โข Energy density matrix
๐พ๐ธ = ๐ถ๐๐ธ ฮ โ ๐ ๐ถ๐ ๐๐ธ ๐ฅ โ ๐ = ๐ฅ๐(๐ฅ โ ๐) โข Pole expansion with the same shift but different weight โข The same selected elements of ๐ป โ ๐ง๐๐ โ1
โข Similar treatment for other physical quantities
[LL-Chen-Yang-He, JPCM, 2013, in press]
Efficiency of the selected inversion
37
Carbon
nanotube (metallic) SZ: single-zeta (4 basis per atom) DZP: Double-zeta with polarization (13 basis per atom)
All on a single core, 80 poles (not parallelized) and 2 iterations for chemical potential.
PEXSI in parallel โข Distributed memory parallel selected inversion for general
matrix (factorization is based on SuperLU_DIST), preliminary version scalable to 64 ~ 256 procs. More efficient version under progress (ongoing work with Mathias Jacquelin and Chao Yang)
โข Pole expansion parallelized. With 40 poles used in practice, PEXSI can scale to 256*40~10,000 procs.
โข C++ implementation. Nearly black-box interface, being integrated to SIESTA (ongoing work with Alberto Garcia, Georg Huhs and Chao Yang)
40
PEXSI in parallel
41
C-BN-C layered system, weak scaling for more than 10,000 atoms. All examples use 40*256=10240 procs on hopper.
Number of atoms
Equivalent cells
Matrix dimension
Time per iteration
Scaling
2532 1 ร 1 32916 32 1 10128 2 ร 2 131664 258 8.06 20256 4 ร 2 263328 554 17.3
๐(๐1.5) scaling
๐(๐) scaling
ScaLAPACK performance: 230 sec for 2532 atoms using 768 processors and does not scale beyond.
Conclusion โข Pole Expansion and Selected Inversion (PEXSI) method for
KSDFT at large scale.
โข Based on the sparsity of Hamiltonian and overlap matrix. Require local basis set with small number of basis per atom (such as NAO and GTO, not applicable to PW)
โข Accurate calculation of density, total energy, free energy and force (no truncation) for insulating and metallic systems.
โข ๐(๐) for quasi-1D system, ๐(๐1.5) for quasi-2D system, and ๐(๐2) for 3D bulk systems.
โข Black-box: suitable for all codes localized basis set such atomic orbitals.
Thank you for your attention!