10 aflow school ml - aflowlib.org

20
1 Materials Database and Machine Learning: AFLOWML Cormac Toher June 18 th , 2020

Upload: others

Post on 11-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10 aflow school ml - aflowlib.org

1

Materials  Database  and  Machine  Learning:AFLOW-­‐ML

Cormac  Toher

June 18th,  2020

Page 2: 10 aflow school ml - aflowlib.org

AFLOW.org:  Using  the  data

• Tools  like  the  REST  API  and  AFLUX  provide  a  way  to  wrangle  data.

• Each  entries  properties  were  calculated  from  DFT,  which  requires  a  high  performance  computing  environment.

• Even  with  this,  calculation  times  can  take  days  to  months  depending  on  the  system  and  property.

Goal:  Use  existing  data  to  construct  a  model  that  predicts  these  properties  with  high  accuracy  to  accelerate  materials  discovery.

2

Page 3: 10 aflow school ml - aflowlib.org

3

AFLOW  Machine  Learning

• AFLOW  data  for  >  26,674  materials  on  AFLOW.org  used  to  train  gradient  boosting  decision  trees  machine-­‐learning  model

• Predictions  based  on  structural  morphology  and  elemental  properties  crystalstructure

Voronoitessella/onandneighborssearch

infiniteperiodicgraphconstruc/onandpropertylabeling

nodes(atoms)

decomposi/ontofragments

edges(bonds)

pathfragmentsoflengthl,l=2,3,…

circularfragments(polyhedrons)

a b c

d

O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017).

• Voronoi  tessellation  used  to  determine  atomic  connectivity

• Atoms  which  share  a  Voronoi  cell  face  are  connected  to  form  a  graph

Page 4: 10 aflow school ml - aflowlib.org

4

AFLOW  Machine  Learning

• Connected  atoms  form  structure  fragments  descriptors

crystalstructureVoronoitessella/onand

neighborssearchinfiniteperiodicgraph

construc/onandpropertylabeling

nodes(atoms)

decomposi/ontofragments

edges(bonds)

pathfragmentsoflengthl,l=2,3,…

circularfragments(polyhedrons)

a b c

d

O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017).

• Atomic  nodes  in  structure  fragments  are  decorated  with  elemental  properties  to  form  Property-­‐Labeled  Materials  Fragments  (PLMF)

• Properties  used  include  number  of  valence  electrons,  ionization  potential,  electron  affinity,  electronegativity,  covalent  radii,  etc.

Page 5: 10 aflow school ml - aflowlib.org

5

AFLOW  Machine  Learning• Model  predicts  electronic  and  thermo-­‐mechanical  properties 4

crystalstructure

ElectronicProper1es Thermo-MechanicalProper1es

metalorinsulator?

no EBG

{EBG

� R :

EBG

> 0}

bandgapenergy

predic4on

bulkmodulus(VRH)predic4on

{X � R}

yes

no

classifica4onmodel

regressionmodel

regressionmodels

FIG. 2. Outline of the modeling work-flow. ML models are represented by orange diamonds. Target properties predictedby these models are highlighted in green.

structure of the material and determine the atomic con-nectivity within it. In general, atomic connectivity isnot a trivial property to determine within materials.Not only must we consider the potential bonding dis-tances among the atoms, but also whether the topologyof nearby atoms allows for bonding. Therefore, we haveemployed a computational geometry approach to parti-tion the crystal structure (Figure 1a) into atom-centeredVoronoi-Dirichlet polyhedra [59–62] (Figure 1b). Thispartitioning scheme was found to be invaluable in thetopological analysis of metal organic frameworks (MOF),molecules, and inorganic crystals [63, 64]. Connectivitybetween atoms is established by satisfying two criteria:(i) the atoms must share a Voronoi face (perpendicu-lar bisector between neighboring atoms), and (ii) theinteratomic distance must be shorter than the sum ofthe Cordero covalent radii [65] to within a 0.25 A tol-erance. Here, we consider only strong interatomic in-teractions such as covalent, ionic, and metallic bonding,ignoring van der Waals interactions. Due to the ambigu-ity within materials, the bond order (single/double/triplebond classification) is not considered. Taken together,the Voronoi centers that share a Voronoi face and arewithin the sum of their covalent radii form a three-dimensional graph defining the connectivity within thematerial.

In the final steps of the PLMF construction, the fullgraph and corresponding adjacency matrix (Figure 1c)are constructed from the total list of connections. Theadjacency matrix A of a simple graph (material) with nvertices (atoms) is a square matrix (n ⇥ n) with entriesaij = 1 if atom i is connected to atom j, and aij = 0 oth-erwise. This adjacency matrix reflects the global topol-

ogy for a given system, including interatomic bonds andcontacts within the crystal. The full graph is partitionedinto smaller subgraphs, corresponding to individual frag-ments (Figure 1d). While there are several subgraphs toconsider in general, we restrict the length l to a maximumof three, where l is the largest number of consecutive,non-repetitive edges in the subgraph. This restrictionserves to curb the complexity of the final descriptor vec-tor. In particular, we consider two types of fragments.Path fragments are subgraphs of at most l = 3 that en-code any linear strand of up to four atoms. Only theshortest paths between atoms are considered. Circularfragments are subgraphs of l = 2 that encode the firstshell of nearest neighbor atoms. In this context, circularfragments represent coordination polyhedra, or clustersof atoms with anion/cation centers each surrounded bya set of its respective counter ion. Coordination polyhe-dra are used extensively in crystallography and mineral-ogy [66].

Property labeling. PLMFs are di↵erentiated bylocal (standard atomic) reference properties [57], whichinclude: (i) general properties: the Mendeleev group andperiod numbers, number of valence electrons (N

V

); (ii)measured properties [57]: atomic mass, electron a�nity(EA), thermal conductivity (�), heat capacity (C), en-thalpies of atomization (�H

at

), fusion (�Hfusion), and va-porization, first three ionization potentials (IP

1,2,3); and(iii) derived properties: e↵ective atomic charge (Z

e↵

),molar volume (Vmolar), chemical hardness (⌘) [57, 67],covalent (rcov) [65], absolute [68], and van der Waalsradii [57], electronegativity (�), and polarizability. Wealso combine pairs of properties in the form of their mul-tiplication and ratio, as well as include the property value

• Model  predicts  electronic  band  gap  for  non-­‐metals

O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017).

Page 6: 10 aflow school ml - aflowlib.org

6

AFLOW  Machine  Learning

• Good  agreement  of  predictions  with  both  DFT  and  experimenta b c

• Partial  dependence  of  properties  on  descriptors:

B,G : r2 = 0.99; ✓D : r2 = 0.97O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017).

Page 7: 10 aflow school ml - aflowlib.org

7

AFLOW-­‐ML  Online• Models  are  available  online  at  aflow.org/aflow-­‐ml

O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017),   E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134  (2018).

Page 8: 10 aflow school ml - aflowlib.org

8

AFLOW-­‐ML  Online• Models  are  available  online  at  aflow.org/aflow-­‐ml

PLMF:  O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017)MFD:  F.  Legrain et  al.,  J.  Chem.  Inf.  Model.  58(12),   2460-­‐2466  (2018)ASC:  V.  Stanev et  al.,  npj Comput.  Mater.  4,  29  (2018)

PLMF MFD ASC

POSCAR  (VASP  5)

Run  prediction

Page 9: 10 aflow school ml - aflowlib.org

9

AFLOW-­‐ML  Online• Models  are  available  online  at  aflow.org/aflow-­‐ml

Page 10: 10 aflow school ml - aflowlib.org

10

AFLOW-­‐ML  Online• Convert  POSCAR  for  VASP  4  to  POSCAR  for  VASP  5

ClNa/AB_cF8_225_a_b.AB  params=5.63931  SG=2251.0000000.00000000000000      2.81965500000000      2.819655000000002.81965500000000      0.00000000000000      2.819655000000002.81965500000000      2.81965500000000      0.00000000000000

1  1  Direct(2)  [A1B1]  0.00000000000000      0.00000000000000      0.00000000000000    Cl        0.50000000000000      0.50000000000000      0.50000000000000    Na  

ClNa/AB_cF8_225_a_b.AB  params=5.63931  SG=2251.0000000.00000000000000      2.81965500000000      2.819655000000002.81965500000000      0.00000000000000      2.819655000000002.81965500000000      2.81965500000000      0.00000000000000

Cl  Na1  1  Direct(2)  [A1B1]  0.00000000000000      0.00000000000000      0.00000000000000    Cl        0.50000000000000      0.50000000000000      0.50000000000000    Na  

VASP  5:  Add  line  with  list  of  elements

VASP  4

VASP  5

Page 11: 10 aflow school ml - aflowlib.org

11

AFLOW-­‐ML  Online

G  (p,  T;  V)  =  E  +  pV

Exercises:• Convert  the  Heusler structure  POSCAR  you  decorated  in  Session  4  from  VASP  4  to  

VASP  5.  

• Copy  this  structure  into  the  AFLOW-­‐ML  application,  and  run  the  PLMF  model.  Is  it  a  metal  or  insulator?  What  are  the  values  of  the  bulk  and  shear  moduli?

• Run  the  MFD  model  for  the  same  structure.  What  properties  does  this  model  give?

• Upload  the  chemical  formula  for  this  material  to  the  AFLOW-­‐ML  application,  and  run  the  ASC  model.  What  is  the  superconducting  critical  temperature  for  this  composition?

O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017);  E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134  (2018);F.  Legrain et  al.,  J.  Chem.  Inf.  Model.  58(12),   2460-­‐2466  (2018);  V.  Stanev et  al.,  npj Comput.  Mater.  4,  29  (2018)

Page 12: 10 aflow school ml - aflowlib.org

AFLOW-­‐ML  API

• With  machine  learning  models  becoming  more  prevalent,  we  wanted  to  create  a  programable  interface  to  access  our  ML  models.

• In  tandem,  we  wanted  this  interface  to  be  simple  and  require  users  access  to  predictions  without  the  need  of  installing  ML  libraries  or  codebases.

• Finally,  we  wanted  a  centralized  location  to  continuously  update  our  models  as  well  as  add  those  of  our  collaborators.  

12

Page 13: 10 aflow school ml - aflowlib.org

13

AFLOW-­‐ML  API• Models  are  now  programmatically  accessible  via  AFLOW-­‐ML  API

E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134-­‐145  (2018).

Prediction

no

yes

POSCAREndpoint <model>/prediction

Response task object (which includes {id})

Endpoint /prediction/result/{id}

Response status or prediction object

status =

"SUCCESS"

POST

GET

Page 14: 10 aflow school ml - aflowlib.org

AFLOW-­‐ML  API

14

• AFLOW-­‐ML  API  Python  client  can  be  downloaded  from:http://aflow.org/src/aflow-­‐ml/

E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134-­‐145  (2018).

Page 15: 10 aflow school ml - aflowlib.org

AFLOW-­‐ML  API:  Applications• PLMF  model  integrated  with  genetic  algorithm  code  XtalOpt to  discover  new  superhard carbon  phases

15

0

10

20

30

40

50

60

70

80

-9 -8 -7 -6 -5

Hv

(GP

a)

Energy (eV per formula unit)

diamond likegraphite like

Superhardand  stable P-1-12

75.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P-1-1275.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P-1-1275.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P-1-1275.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P.  Avery  et  al., npj Comput.  Mater.  5,  89  (2019).

Page 16: 10 aflow school ml - aflowlib.org

AFLOW-­‐ML  API:  Example• Submit  VASP  5  POSCAR  to  prediction  endpoint  with  curl:

16

curl  http://aflow.org/API/aflow-­‐ml/v1.0/plmf/prediction   -­‐-­‐data-­‐urlencode file@POSCAR

• Receive  task  object  with  task  ID:{"id":  "39b0f11a-­‐671d-­‐4144-­‐9465-­‐997013ab19c0",  "model":   "plmf",  "results_endpoint":   "/prediction/result/39b0f11a-­‐671d-­‐4144-­‐9465-­‐997013ab19c0"}

• Query  task  ID  to  retrieve  results:curl  http://aflow.org/API/aflow-­‐ml/v1.0/prediction/result/39b0f11a-­‐671d-­‐4144-­‐9465-­‐997013ab19c0

• Receive  results  object

E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134-­‐145  (2018).

Page 17: 10 aflow school ml - aflowlib.org

AFLOW-­‐ML  API:  Example

17

• PLMF  results  object

{"citation":  "10.1038/ncomms15679",  "description":   "The  job  has  completed.",  "ml_ael_bulk_modulus_vrh":   144.522,  "ml_ael_shear_modulus_vrh":   104.453,  "ml_agl_debye":  777.163,  "ml_agl_heat_capacity_Cp_300K":  4.33,  "ml_agl_heat_capacity_Cp_300K_per_atom":  2.194,  "ml_agl_heat_capacity_Cv_300K":  4.178,  "ml_agl_heat_capacity_Cv_300K_per_atom":  2.139,  "ml_agl_thermal_conductivity_300K":   3.509,  "ml_agl_thermal_expansion_300K":   6.18e-­‐05,  "ml_egap":  3.375,  "ml_egap_type":   "Insulator",  "ml_energy_per_atom":   -­‐5.742,  "model":   "plmf",  "status":  "SUCCESS"}

E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134-­‐145  (2018).

Page 18: 10 aflow school ml - aflowlib.org

AFLOW-­‐ML  API:  Example

18

• ML  API  python  script

E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134-­‐145  (2018).

#!/usr/bin/python3import  json,  sys,  osfrom  time  import  sleepfrom  urllib.parse import  urlencodefrom  urllib.request import  urlopenfrom  urllib.request import  Requestfrom  urllib.error import  HTTPError

SERVER="http://aflow.org"API="/API/aflow-­‐ml/v1.0"MODEL="plmf"

poscar=open('POSCAR',  'r').read()encoded_data =  urlencode({'file':  poscar,}).encode('utf-­‐8')

url =  SERVER  +  API  +  "/"  +  MODEL  +  "/prediction"request_task =  Request(url,  encoded_data)task  =  urlopen(request_task).read()task_json =  json.loads(task.decode('utf-­‐8'))results_endpoint =  task_json["results_endpoint"]results_url =  SERVER  +  API  +  results_endpoint

Sleep  library  

AFLOW-­‐ML  server

PLMF  model

Encode  POSCAR

Retrieve  task  object  

Extract  task  ID  and  results  endpoint

Results  URL

Page 19: 10 aflow school ml - aflowlib.org

AFLOW-­‐ML  API:  Example

19

• ML  API  python  script

E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134-­‐145  (2018).

incomplete  =  Truewhile  incomplete:request_results =  Request(results_url)results  =  urlopen(request_results).read()results_json =  json.loads(results)if  results_json["status"]  ==  'PENDING':sleep(10)continue

elif results_json["status"]  ==  'STARTED':sleep(10)continue

elif results_json["status"]  ==  'FAILURE':print("Error:   prediction   failure")incomplete  =  False

elif results_json["status"]  ==  'SUCCESS':print("Successful   prediction")print(results_json)incomplete  =  False

Retrieve  status/results  object  

Check  status:  if  PENDING  or  STARTED,  sleep  for  10  seconds  and  recheck

Check  status:  if  FAILURE,  write  error  message

Check  status:  if  SUCCESS,  write  out  the  results  json

Page 20: 10 aflow school ml - aflowlib.org

20

AFLOW-­‐ML  Online

G  (p,  T;  V)  =  E  +  pV

Exercises:• Copy  the  VASP  5  Heusler structure  POSCAR  from  the  previous  exercise  to  the  

appropriate  directory.  Modify  the  aflow_ml_api.py script  to  print  whether  the  material  is  a  metal  or  an  insulator,  and  if  it  is  an  insulator,  to  print  the  band  gap.

• Modify  the  script  to  run  the  MFD  model  for  the  same  structure.  What  results  are  returned?  

• Use  AFLUX  or  the  AFLOW.orgadvanced  search  page  to  find  the  entry  in  the  Mo-­‐Ti  alloy  system  with  the  lowest  formation  enthalpy  per  atom.  Download  the  relaxed  structure  and  convert  it  to  VASP  5  format,  and  use  the  AFLUX  ML  API  to  find  the  bulk  and  shear  moduli.  

O.  Isayev  et  al., Nat.  Commun.  8,  15679  (2017);  E.  Gossett  et  al., Comput.  Mater.  Sci.  152,  134  (2018);F.  Legrain et  al.,  J.  Chem.  Inf.  Model.  58(12),   2460-­‐2466  (2018);  V.  Stanev et  al.,  npj Comput.  Mater.  4,  29  (2018)