fast algorithms for electronic structure analysis

42
Lin Lin Computational Research Division, Lawrence Berkeley National Laboratory Laboratoire Jacques-Louis Lions, Paris 6, June 2013 Supported by Luis Alvarez fellowship in LBNL, DOE SciDAC and BES Partnership. 1 Fast Algorithms for Electronic Structure Analysis

Upload: others

Post on 16-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Lin Lin

Computational Research Division, Lawrence Berkeley National Laboratory

Laboratoire Jacques-Louis Lions,

Paris 6, June 2013

Supported by Luis Alvarez fellowship in LBNL, DOE SciDAC and BES Partnership.

1

Fast Algorithms for Electronic Structure Analysis

Acknowledgment Collaborators of past and ongoing projects on this topic: โ€ข Roberto Car, Princeton University โ€ข Mohan Chen, Princeton University โ€ข Weinan E, Princeton University and Peking University โ€ข Alberto Garcia, Institute de Ciencia de Materiales de Barcelona โ€ข Lixin He, University of Science and Technology in China โ€ข Georg Huhs, Barcelona Supercomputing Center โ€ข Mathias Jacquelin, Lawrence Berkeley National Laboratory โ€ข Juan Meza, UC Merced โ€ข Jianfeng Lu, Duke University โ€ข Chao Yang, Lawrence Berkeley National Laboratory โ€ข Lexing Ying, Stanford University

2

Electronic structure theory Main goal: Given fixed atomic positions ๐‘…๐›ผ ๐›ผ=1

๐‘€ , compute the ground state electron energy ๐ธ๐‘’( ๐‘…๐›ผ ). Useful in a large number of applications. Ground state electron wavefunction ฮจ๐‘’(๐‘Ÿ1,โ‹ฏ , ๐‘Ÿ๐‘; ๐‘…๐›ผ )

โˆ’12๏ฟฝฮ”๐‘– โˆ’๏ฟฝ๏ฟฝ

๐‘๐›ผ๐‘Ÿ๐‘– โˆ’ ๐‘…๐›ผ

+12๏ฟฝ

1๐‘Ÿ๐‘– โˆ’ ๐‘Ÿ๐‘—

๐‘

๐‘–,๐‘—=1,๐‘–โ‰ ๐‘—

๐‘

๐‘—=1

๐‘€

๐›ผ=1

๐‘

๐‘–=1

ฮจ๐‘’ = ๐ธ๐‘’ ๐‘…๐›ผ ฮจ๐‘’

Curse of dimensionality

The fundamental laws necessary to the mathematical treatment of large parts of physics and the whole of chemistry are thus fully known, and the difficult lies only in the fact that application of these laws leads to equations that are too complex to be solved.

โ€“P. Dirac, 1929

3

Pople diagram

John Pople, Nobel Prize in Chemistry, 1998

Acc

urac

y

CI CCSD(T)

RPA

MP2

DFT TB

10 100 1000 10000

Number of atoms

Density functional theory (DFT): best compromise between efficiency and accuracy. Most widely used electronic structure theory for condensed matter systems.

4

Density functional theory [S. Redner, Citation Statistics from 110 Years of Physical Review]

5

Density functional theory [S. Redner, Citation Statistics from 110 Years of Physical Review]

6

Kohn-Sham density functional theory

โ€ข Efficient: Single particle theory โ€ข Accurate: Exact ground state energy for

exact ๐‘‰๐‘ฅ๐‘ฅ[๐œŒ], [Hohenberg-Kohn,1964], [Kohn-Sham, 1965]

7

๐ป ๐œŒ ๐œ“๐‘– ๐‘ฅ = โˆ’12ฮ” + ๐‘‰๐‘’๐‘ฅ๐‘’ + โˆซ ๐‘‘๐‘ฅโ€ฒ

๐œŒ ๐‘ฅโ€ฒ

๐‘ฅ โˆ’ ๐‘ฅโ€ฒ+ ๐‘‰๐‘ฅ๐‘ฅ ๐œŒ ๐œ“๐‘– ๐‘ฅ = ๐œ€๐‘–๐œ“๐‘– ๐‘ฅ

๐œŒ ๐‘ฅ = 2๏ฟฝ ๐œ“๐‘– ๐‘ฅ 2๐‘/2

๐‘–=1

, โˆซ ๐‘‘๐‘ฅ ๐œ“๐‘–โˆ— ๐‘ฅ ๐œ“๐‘— ๐‘ฅ = ๐›ฟ๐‘–๐‘—

Walter Kohn, Nobel Prize in Chemistry, 1998

Self Consistent Field Iteration

8

๐ป[๐œŒ๐‘–๐‘–] ๐œŒ๐‘–๐‘–

๐œŒ๐‘œ๐‘œ๐‘’

Discretization

Evaluation Iteration

Self Consistent Field Iteration

9

๐ป[๐œŒ๐‘–๐‘–] ๐œŒ๐‘–๐‘–

๐œŒ๐‘œ๐‘œ๐‘’

Discretization

Evaluation Iteration

1) Very costly step. 2) Limiting practical calculations

to hundreds of atoms

Cubic scaling of KSDFT

10

โ€ข KS orbitals are delocalized in the global domain.

โ€ข N atoms. ๐‘‚(๐‘) grid points. ๐‘‚(๐‘) KS orbitals. โ€ข Orthogonalization of an ๐‘‚ ๐‘ ร— ๐‘‚(๐‘) matrix โ‡’ ๐‘‚ ๐‘3

scaling, regardless of what eigensolver is being used. Cannot efficiently use high performance supercomputers.

โ€ข Conclusion: DO NOT directly treat KS orbitals that are

delocalized in the global domain.

Evaluation: Alternatives? โ€ข Linear scaling algorithms

โ€ข Near-sightedness [Kohn, 1996] โ€ข Truncation based algorithm: low to intermediate accuracy โ€ข Only applicable to insulators.

[Bowler and Miyazaki, Rep. Prog. Phys 2012] โ€œโ€ฆThe second challenge is that of metallic systems: there is no clear route to linear-scaling solution for systems with low or zero gaps and extended electronic structureโ€ฆโ€

โ€ข Difficult task:

โ€ข Accurate and efficient โ€ข Uniformly applicable to metals as well as insulators.

11

ฮ”๐‘‰(๐‘Ÿโ€ฒ) ฮ”๐œŒ(๐‘Ÿ)

๐‘Ÿโ€ฒ โˆ’ ๐‘Ÿ

Alterative solution? Linear scaling methods

โ€ข Truncation (KS orbital, 1-dm). Near-sightedness.

โ€ข Very costly for metals (large

preconstant)

โ€ข Complicated user-interface (select truncation region)

[Yang, 1991], [Kohn, 1996]. Review: [Goedecker, 1999]. [Bowler-Miyazaki, 2012].

What we propose โ€ข No truncation. Not based on

near-sightedness.

โ€ข Applicable to insulator and metal.

โ€ข Black-box user-inteface. โ€ข Scales better than ๐‘‚(๐‘3).

Outline

PEXSI: Pole EXpansion Selected Inversion

โ€ข Pole Expansion โ€ข Selected Inversion โ€ข How it works in practice

13

PEXSI at work

14

C-BN-C layered system, weak scaling for more than 10,000 atoms. All examples use 40*256=10240 procs on hopper.

Number of atoms

Equivalent cells

Matrix dimension

Time per iteration

Scaling

2532 1 ร— 1 32916 32 1 10128 2 ร— 2 131664 258 8.06 20256 4 ร— 2 263328 554 17.3

๐‘‚(๐‘1.5) scaling

๐‘‚(๐‘) scaling

ScaLAPACK performance: 230 sec for 2532 atoms using 768 processors and does not scale beyond.

KSDFT: Matrix point of view

๐œŒ ๐‘ฅ = 2๏ฟฝ ๐œ“๐‘– ๐‘ฅ 2๐‘/2

๐‘–=1

= ๐œ“1(๐‘ฅ) โ€ฆ ๐œ“๐‘๐‘ก(๐‘ฅ)๐œ’(๐œ€1 โˆ’ ๐œ‡)

โ‹ฑ ๐œ’(๐œ€๐‘๐‘ก โˆ’ ๐œ‡)

๐œ“1(๐‘ฅ)โ‹ฎ

๐œ“๐‘๐‘ก(๐‘ฅ)= ๐œ’(๐ป ๐œŒ โˆ’ ๐œ‡๐œ‡) ๐‘ฅ,๐‘ฅ

โ€ข ๐œ‡ : Chemical potential such that #{๐œŽ ๐ป โ‰ค ๐œ‡} = ๐‘/2

โ€ข ๐œ’ : Heaviside function satisfying ๐œ’ ๐‘ฅ = ๏ฟฝ2, ๐‘ฅ โ‰ค 0,0, ๐‘ฅ > 0

๐œŒ = diag ๐œ’(๐ป ๐œŒ โˆ’ ๐œ‡๐œ‡)

15

Finite temperature: Fermi operator

๐œŒ = diag2

1 + ๐‘’๐›ฝ(๐ป[๐œŒ]โˆ’๐œ‡๐œ‡)

โ€ข ๐›ฝ = 1/๐‘˜๐ต๐‘‡: inverse temperature โ€ข ๐œ‡: Chemical potential

โ€ข Finite temperature, Fermi-Dirac โ€ข Zero temperature, Heaviside

16

Fermi operator expansion

โ€ข ฮ”๐ธ = ๐œŽ(๐ป โˆ’ ๐œ‡๐œ‡). โ€ข Fermi operator expansion: solving KSDFT without diagonalization

โ€ข [Goedecker, 1993], ๐‘ƒ โˆผ ๐‘‚ ๐›ฝฮ”๐ธ โ€ข [Head-Gordon et al, 2004], ๐‘ƒ โˆผ ๐‘‚(๐›ฝฮ”๐ธ) but with ๐‘‚( ๐›ฝฮ”๐ธ)

operation โ€ข [Ceriotti et al, 2008], Q โˆผ ๐‘‚ ๐›ฝฮ”๐ธ ; other work

๐œŒ = diag2

1 + ๐‘’๐›ฝ(๐ป[๐œŒ]โˆ’๐œ‡๐œ‡) = diag2

1 + ๐‘’๐›ฝฮ”๐ธ ๐ป[๐œŒ]โˆ’๐œ‡๐œ‡ฮ”๐ธ

โ‰ˆ diag ๏ฟฝ๐‘๐‘™

๐‘ƒ

๐‘™=1

๐ป ๐œŒ โˆ’ ๐œ‡๐œ‡ฮ”๐ธ

๐‘™

+ ๏ฟฝ๐œ”๐‘™

๐‘ง๐‘™๐œ‡ โˆ’๐ป ๐œŒ โˆ’ ๐œ‡๐œ‡

ฮ”๐ธ ๐‘ž๐‘™

๐‘„

๐‘™=1

17

Pole expansion โ€ข [LL, Lu, Ying and E, 2009], ๐‘„ โˆผ ๐‘‚ log ๐›ฝฮ”๐ธ

๐œŒ โ‰ˆ diag๏ฟฝ๐œ”๐‘–

๐ป โˆ’ ๐‘ง๐‘–๐œ‡

๐‘„

๐‘–=1

โ€ข ๐‘ง๐‘– ,๐œ”๐‘– โˆˆ โ„‚ are complex shifts and complex weights

18

Contour integral technique

Fermi-Dirac

๐œŒ ๐œ‰ =12๐œ‹๐œ‹

๏ฟฝ๐œŒ ๐‘ง๐‘ง โˆ’ ๐œ‰

๐‘‘๐‘ง โ‰ˆ12๐œ‹๐œ‹

๏ฟฝ๐œŒ ๐‘ง๐‘– ๐‘ค๐‘–๐‘ง๐‘– โˆ’ ๐œ‰

๐‘„

๐‘–=1ฮ“

19

Contour integral technique

Fermi-Dirac

๐œŒ ๐œ‰ =12๐œ‹๐œ‹

๏ฟฝ๐œŒ ๐‘ง๐‘ง โˆ’ ๐œ‰

๐‘‘๐‘ง โ‰ˆ12๐œ‹๐œ‹

๏ฟฝ๐œŒ ๐‘ง๐‘– ๐‘ค๐‘–๐‘ง๐‘– โˆ’ ๐œ‰

๐‘„

๐‘–=1ฮ“

Simpler problem

[Hale, Higham and Trefethen, 2008] ๐œŒ ๐œ‰ โˆ’ ๐œŒ๐‘„ ๐œ‰ โˆผ ๐‘‚(๐‘’โˆ’๐ถ๐‘„/ log(๐‘€/๐‘š))

20

Domain transformation

21

Contour selection โ€ข [Hale, Higham, Trefethen 2008] ๐พ

โ€ฒ

2๐พโˆผ 1

log๐‘€๐‘š

โ€ข Trapezoid rule for periodic function gives geometric convergence

22

Pole expansion

23

Numerical result H: Tight binding model on a 2D grid

24

Outline

PEXSI: Pole EXpansion Selected Inversion

โ€ข Pole Expansion โ€ข Selected Inversion โ€ข How it works in practice

25

Selected inversion

๐œŒ โ‰ˆ diag๏ฟฝ๐œ”๐‘–

๐ป โˆ’ ๐‘ง๐‘–๐œ‡

๐‘„

๐‘–=1

โ€ข All the diagonal elements of an inverse matrix. โ€ข ๐ป is a sparse matrix, but ๐ป โˆ’ ๐‘ง๐‘–๐œ‡ โˆ’1 is a full matrix. โ€ข Naรฏve approach: ๐‘‚ ๐‘3 . โ€ข Need selected inversion.

26

Selected inversion: basic idea โ€ข ๐ฟ๐ฟ๐ฟ๐‘‡ factorization

๐ด =๐ด11 ๐ด21๐‘‡

๐ด21 ๏ฟฝฬ‚๏ฟฝ22= 1 0

๐ฟ21 ๐œ‡๐ด11 0

0 ๐‘†221 ๐ฟ21๐‘‡0 ๐œ‡

๐ฟ21 = ๐ด21๐ด11โˆ’1, ๐‘†22 = ๏ฟฝฬ‚๏ฟฝ22 โˆ’ ๐ด21๐ฟ21๐‘‡

โ€ข Inversion

๐ดโˆ’1 = ๐ด11โˆ’1 + ๐ฟ21๐‘‡ ๐‘†22โˆ’1๐ฟ21 โˆ’๐ฟ21๐‘‡ ๐‘†22โˆ’1

โˆ’๐‘†22โˆ’1๐ฟ21 ๐‘†22โˆ’1

27

Observation: If ๐ฟ21 is sparse, ๐ฟ21๐‘‡ ๐‘†22โˆ’1๐ฟ21 only require rows and columns of ๐‘†22โˆ’1 corresponding to the sparsity pattern of ๐ฟ21.

Recursive relation

๐‘†22 =๐ด22 ๐ด32๐‘‡

๐ด32 ๏ฟฝฬ‚๏ฟฝ33

๐ด = 1 0๐ฟ21 ๐œ‡

1 0 00 1 00 ๐ฟ32 ๐œ‡

๐ด11 0 00 ๐ด22 00 0 ๏ฟฝฬ‚๏ฟฝ33

1 0 00 1 ๐ฟ32๐‘‡0 0 ๐œ‡

1 ๐ฟ21๐‘‡0 ๐œ‡

๐ดโˆ’1 =๐ด11โˆ’1 + ๐ฟ21๐‘‡ ๐‘†22โˆ’1๐ฟ21 โˆ’๐ฟ21๐‘‡ ๐‘†22โˆ’1

โˆ’๐‘†22โˆ’1๐ฟ21๐ด22โˆ’1 + ๐ฟ32๐‘‡ ๐‘†33โˆ’1๐ฟ32 โˆ’๐ฟ32๐‘‡ ๐‘†33โˆ’1

โˆ’๐‘†33โˆ’1๐ฟ32 ๐‘†33โˆ’1

28

Recursive relation โ€ข ๐œ‡ = ๐œ‹ ๐ฟ21 ๐œ‹, 1 โ‰  0 , 2 โˆˆ ๐œ‡ โ€ข ๐ฟ21 ๐œ‹, 1 โ‰  0 โ‡’ ๐‘†22 ๐œ‹, ๐‘— โ‰  0, ๐œ‹, ๐‘— โˆˆ ๐œ‡ because ๐‘†22 = ๐ด22 โˆ’ ๐ด21๐ฟ21๐‘‡ โ‡’ ๐ฟ32 ๐œ‹, 2 โ‰  0, ๐œ‹ โˆˆ ๐œ‡

โ€ข ๐ดโˆ’1 =๐ด11โˆ’1 + ๐ฟ21๐‘‡ ๐‘†22โˆ’1๐ฟ21 โˆ’๐ฟ21๐‘‡ ๐‘†22โˆ’1

โˆ’๐‘†22โˆ’1๐ฟ21๐ด22โˆ’1 + ๐ฟ32๐‘‡ ๐‘†33โˆ’1๐ฟ32 โˆ’๐ฟ32๐‘‡ ๐‘†33โˆ’1

โˆ’๐‘†33โˆ’1๐ฟ32 ๐‘†33โˆ’1

29

Selected inversion โ€ข ๐ด = ๐ฟ๐ฟ๐ฟ๐‘‡: ๐ดโˆ’1 restricted to the non-zero pattern of ๐ฟ is โ€œself-

containedโ€. Exact method with exact arithmetic.

โ€ข For KS Hamiltonian discretized by local basis set, the cost of selected inversion is ๐‘‚(๐‘) for 1D systems, ๐‘‚ ๐‘1.5 for 2D systems, and ๐‘‚(๐‘2) for 3D systems.

โ€ข Combined with pole expansion: At most ๐‘‚ ๐‘2 scaling for solving Kohn-Sham problem.

โ€ข Idea of selected inversion dates back to [Erisman and Tinney, 1975],

[Takakashi et al 1973]; For electronic structure [LL-Lu-Ying-Car-E, 2009]; For quantum transport [Li, Darve et al, 2008]

30

SelInv: Numerical results SelInv: a selected inversion package for general sparse symmetric matrix written in FORTRAN. [LL-Yang-Meza-Lu-Ying-E, TOMS, 2011]

31

Outline

PEXSI: Pole EXpansion Selected Inversion

โ€ข Pole Expansion โ€ข Selected Inversion โ€ข How it works in practice

32

Force

33

๐น๐œ‡ = โˆ’๐‘‡๐‘Ÿ ๐›พ๐œ•๐ป๐œ•๐‘…๐œ‡

+ ๐‘‡๐‘Ÿ ๐›พ๐ธ๐œ•๐‘†๐œ•๐‘…๐œ‡

โ€ข Including both the Hellmann-Feynman force and the Pulay force โ€ข Energy density matrix

๐›พ๐ธ = ๐ถ๐‘“๐ธ ฮž โˆ’ ๐œ‡ ๐ถ๐‘‡ ๐‘“๐ธ ๐‘ฅ โˆ’ ๐œ‡ = ๐‘ฅ๐‘“(๐‘ฅ โˆ’ ๐œ‡) โ€ข Pole expansion with the same shift but different weight โ€ข The same selected elements of ๐ป โˆ’ ๐‘ง๐‘–๐‘† โˆ’1

โ€ข Similar treatment for other physical quantities

[LL-Chen-Yang-He, JPCM, 2013, in press]

Numerical examples with atomic orbitals

Boron Nitride Nanotube

Carbon Nanotube

34

Sparsity is the key

35

Accuracy of the pole expansion

36

PEXSI

Efficiency of the selected inversion

37

Carbon

nanotube (metallic) SZ: single-zeta (4 basis per atom) DZP: Double-zeta with polarization (13 basis per atom)

All on a single core, 80 poles (not parallelized) and 2 iterations for chemical potential.

Geometry optimization: BNNT

38

Truncated BNNT. 504 B atoms, 504 N atoms, 16 H atoms

Geometry optimization: BNNT

39

PEXSI in parallel โ€ข Distributed memory parallel selected inversion for general

matrix (factorization is based on SuperLU_DIST), preliminary version scalable to 64 ~ 256 procs. More efficient version under progress (ongoing work with Mathias Jacquelin and Chao Yang)

โ€ข Pole expansion parallelized. With 40 poles used in practice, PEXSI can scale to 256*40~10,000 procs.

โ€ข C++ implementation. Nearly black-box interface, being integrated to SIESTA (ongoing work with Alberto Garcia, Georg Huhs and Chao Yang)

40

PEXSI in parallel

41

C-BN-C layered system, weak scaling for more than 10,000 atoms. All examples use 40*256=10240 procs on hopper.

Number of atoms

Equivalent cells

Matrix dimension

Time per iteration

Scaling

2532 1 ร— 1 32916 32 1 10128 2 ร— 2 131664 258 8.06 20256 4 ร— 2 263328 554 17.3

๐‘‚(๐‘1.5) scaling

๐‘‚(๐‘) scaling

ScaLAPACK performance: 230 sec for 2532 atoms using 768 processors and does not scale beyond.

Conclusion โ€ข Pole Expansion and Selected Inversion (PEXSI) method for

KSDFT at large scale.

โ€ข Based on the sparsity of Hamiltonian and overlap matrix. Require local basis set with small number of basis per atom (such as NAO and GTO, not applicable to PW)

โ€ข Accurate calculation of density, total energy, free energy and force (no truncation) for insulating and metallic systems.

โ€ข ๐‘‚(๐‘) for quasi-1D system, ๐‘‚(๐‘1.5) for quasi-2D system, and ๐‘‚(๐‘2) for 3D bulk systems.

โ€ข Black-box: suitable for all codes localized basis set such atomic orbitals.

Thank you for your attention!