10 aflow school ml - aflowlib.org
TRANSCRIPT
1
Materials Database and Machine Learning:AFLOW-‐ML
Cormac Toher
June 18th, 2020
AFLOW.org: Using the data
• Tools like the REST API and AFLUX provide a way to wrangle data.
• Each entries properties were calculated from DFT, which requires a high performance computing environment.
• Even with this, calculation times can take days to months depending on the system and property.
Goal: Use existing data to construct a model that predicts these properties with high accuracy to accelerate materials discovery.
2
3
AFLOW Machine Learning
• AFLOW data for > 26,674 materials on AFLOW.org used to train gradient boosting decision trees machine-‐learning model
• Predictions based on structural morphology and elemental properties crystalstructure
Voronoitessella/onandneighborssearch
infiniteperiodicgraphconstruc/onandpropertylabeling
nodes(atoms)
decomposi/ontofragments
edges(bonds)
pathfragmentsoflengthl,l=2,3,…
circularfragments(polyhedrons)
a b c
d
O. Isayev et al., Nat. Commun. 8, 15679 (2017).
• Voronoi tessellation used to determine atomic connectivity
• Atoms which share a Voronoi cell face are connected to form a graph
4
AFLOW Machine Learning
• Connected atoms form structure fragments descriptors
crystalstructureVoronoitessella/onand
neighborssearchinfiniteperiodicgraph
construc/onandpropertylabeling
nodes(atoms)
decomposi/ontofragments
edges(bonds)
pathfragmentsoflengthl,l=2,3,…
circularfragments(polyhedrons)
a b c
d
O. Isayev et al., Nat. Commun. 8, 15679 (2017).
• Atomic nodes in structure fragments are decorated with elemental properties to form Property-‐Labeled Materials Fragments (PLMF)
• Properties used include number of valence electrons, ionization potential, electron affinity, electronegativity, covalent radii, etc.
5
AFLOW Machine Learning• Model predicts electronic and thermo-‐mechanical properties 4
crystalstructure
ElectronicProper1es Thermo-MechanicalProper1es
metalorinsulator?
no EBG
{EBG
� R :
EBG
> 0}
bandgapenergy
predic4on
bulkmodulus(VRH)predic4on
{X � R}
yes
no
classifica4onmodel
regressionmodel
regressionmodels
FIG. 2. Outline of the modeling work-flow. ML models are represented by orange diamonds. Target properties predictedby these models are highlighted in green.
structure of the material and determine the atomic con-nectivity within it. In general, atomic connectivity isnot a trivial property to determine within materials.Not only must we consider the potential bonding dis-tances among the atoms, but also whether the topologyof nearby atoms allows for bonding. Therefore, we haveemployed a computational geometry approach to parti-tion the crystal structure (Figure 1a) into atom-centeredVoronoi-Dirichlet polyhedra [59–62] (Figure 1b). Thispartitioning scheme was found to be invaluable in thetopological analysis of metal organic frameworks (MOF),molecules, and inorganic crystals [63, 64]. Connectivitybetween atoms is established by satisfying two criteria:(i) the atoms must share a Voronoi face (perpendicu-lar bisector between neighboring atoms), and (ii) theinteratomic distance must be shorter than the sum ofthe Cordero covalent radii [65] to within a 0.25 A tol-erance. Here, we consider only strong interatomic in-teractions such as covalent, ionic, and metallic bonding,ignoring van der Waals interactions. Due to the ambigu-ity within materials, the bond order (single/double/triplebond classification) is not considered. Taken together,the Voronoi centers that share a Voronoi face and arewithin the sum of their covalent radii form a three-dimensional graph defining the connectivity within thematerial.
In the final steps of the PLMF construction, the fullgraph and corresponding adjacency matrix (Figure 1c)are constructed from the total list of connections. Theadjacency matrix A of a simple graph (material) with nvertices (atoms) is a square matrix (n ⇥ n) with entriesaij = 1 if atom i is connected to atom j, and aij = 0 oth-erwise. This adjacency matrix reflects the global topol-
ogy for a given system, including interatomic bonds andcontacts within the crystal. The full graph is partitionedinto smaller subgraphs, corresponding to individual frag-ments (Figure 1d). While there are several subgraphs toconsider in general, we restrict the length l to a maximumof three, where l is the largest number of consecutive,non-repetitive edges in the subgraph. This restrictionserves to curb the complexity of the final descriptor vec-tor. In particular, we consider two types of fragments.Path fragments are subgraphs of at most l = 3 that en-code any linear strand of up to four atoms. Only theshortest paths between atoms are considered. Circularfragments are subgraphs of l = 2 that encode the firstshell of nearest neighbor atoms. In this context, circularfragments represent coordination polyhedra, or clustersof atoms with anion/cation centers each surrounded bya set of its respective counter ion. Coordination polyhe-dra are used extensively in crystallography and mineral-ogy [66].
Property labeling. PLMFs are di↵erentiated bylocal (standard atomic) reference properties [57], whichinclude: (i) general properties: the Mendeleev group andperiod numbers, number of valence electrons (N
V
); (ii)measured properties [57]: atomic mass, electron a�nity(EA), thermal conductivity (�), heat capacity (C), en-thalpies of atomization (�H
at
), fusion (�Hfusion), and va-porization, first three ionization potentials (IP
1,2,3); and(iii) derived properties: e↵ective atomic charge (Z
e↵
),molar volume (Vmolar), chemical hardness (⌘) [57, 67],covalent (rcov) [65], absolute [68], and van der Waalsradii [57], electronegativity (�), and polarizability. Wealso combine pairs of properties in the form of their mul-tiplication and ratio, as well as include the property value
• Model predicts electronic band gap for non-‐metals
O. Isayev et al., Nat. Commun. 8, 15679 (2017).
6
AFLOW Machine Learning
• Good agreement of predictions with both DFT and experimenta b c
• Partial dependence of properties on descriptors:
B,G : r2 = 0.99; ✓D : r2 = 0.97O. Isayev et al., Nat. Commun. 8, 15679 (2017).
7
AFLOW-‐ML Online• Models are available online at aflow.org/aflow-‐ml
O. Isayev et al., Nat. Commun. 8, 15679 (2017), E. Gossett et al., Comput. Mater. Sci. 152, 134 (2018).
8
AFLOW-‐ML Online• Models are available online at aflow.org/aflow-‐ml
PLMF: O. Isayev et al., Nat. Commun. 8, 15679 (2017)MFD: F. Legrain et al., J. Chem. Inf. Model. 58(12), 2460-‐2466 (2018)ASC: V. Stanev et al., npj Comput. Mater. 4, 29 (2018)
PLMF MFD ASC
POSCAR (VASP 5)
Run prediction
9
AFLOW-‐ML Online• Models are available online at aflow.org/aflow-‐ml
10
AFLOW-‐ML Online• Convert POSCAR for VASP 4 to POSCAR for VASP 5
ClNa/AB_cF8_225_a_b.AB params=5.63931 SG=2251.0000000.00000000000000 2.81965500000000 2.819655000000002.81965500000000 0.00000000000000 2.819655000000002.81965500000000 2.81965500000000 0.00000000000000
1 1 Direct(2) [A1B1] 0.00000000000000 0.00000000000000 0.00000000000000 Cl 0.50000000000000 0.50000000000000 0.50000000000000 Na
ClNa/AB_cF8_225_a_b.AB params=5.63931 SG=2251.0000000.00000000000000 2.81965500000000 2.819655000000002.81965500000000 0.00000000000000 2.819655000000002.81965500000000 2.81965500000000 0.00000000000000
Cl Na1 1 Direct(2) [A1B1] 0.00000000000000 0.00000000000000 0.00000000000000 Cl 0.50000000000000 0.50000000000000 0.50000000000000 Na
VASP 5: Add line with list of elements
VASP 4
VASP 5
11
AFLOW-‐ML Online
G (p, T; V) = E + pV
Exercises:• Convert the Heusler structure POSCAR you decorated in Session 4 from VASP 4 to
VASP 5.
• Copy this structure into the AFLOW-‐ML application, and run the PLMF model. Is it a metal or insulator? What are the values of the bulk and shear moduli?
• Run the MFD model for the same structure. What properties does this model give?
• Upload the chemical formula for this material to the AFLOW-‐ML application, and run the ASC model. What is the superconducting critical temperature for this composition?
O. Isayev et al., Nat. Commun. 8, 15679 (2017); E. Gossett et al., Comput. Mater. Sci. 152, 134 (2018);F. Legrain et al., J. Chem. Inf. Model. 58(12), 2460-‐2466 (2018); V. Stanev et al., npj Comput. Mater. 4, 29 (2018)
AFLOW-‐ML API
• With machine learning models becoming more prevalent, we wanted to create a programable interface to access our ML models.
• In tandem, we wanted this interface to be simple and require users access to predictions without the need of installing ML libraries or codebases.
• Finally, we wanted a centralized location to continuously update our models as well as add those of our collaborators.
12
13
AFLOW-‐ML API• Models are now programmatically accessible via AFLOW-‐ML API
E. Gossett et al., Comput. Mater. Sci. 152, 134-‐145 (2018).
Prediction
no
yes
POSCAREndpoint <model>/prediction
Response task object (which includes {id})
Endpoint /prediction/result/{id}
Response status or prediction object
status =
"SUCCESS"
POST
GET
AFLOW-‐ML API
14
• AFLOW-‐ML API Python client can be downloaded from:http://aflow.org/src/aflow-‐ml/
E. Gossett et al., Comput. Mater. Sci. 152, 134-‐145 (2018).
AFLOW-‐ML API: Applications• PLMF model integrated with genetic algorithm code XtalOpt to discover new superhard carbon phases
15
0
10
20
30
40
50
60
70
80
-9 -8 -7 -6 -5
Hv
(GP
a)
Energy (eV per formula unit)
diamond likegraphite like
Superhardand stable P-1-12
75.6 GPa
(a) (b) (c)
(f)
P-1-16c71.3 GPa
P1-16d72.4 GPa
P-1-16e71.5 GPa
(e)
P-1-16b72.4 GPa
P-1-16d76.2 GPa
(d)
P-1-1275.6 GPa
(a) (b) (c)
(f)
P-1-16c71.3 GPa
P1-16d72.4 GPa
P-1-16e71.5 GPa
(e)
P-1-16b72.4 GPa
P-1-16d76.2 GPa
(d)
P-1-1275.6 GPa
(a) (b) (c)
(f)
P-1-16c71.3 GPa
P1-16d72.4 GPa
P-1-16e71.5 GPa
(e)
P-1-16b72.4 GPa
P-1-16d76.2 GPa
(d)
P-1-1275.6 GPa
(a) (b) (c)
(f)
P-1-16c71.3 GPa
P1-16d72.4 GPa
P-1-16e71.5 GPa
(e)
P-1-16b72.4 GPa
P-1-16d76.2 GPa
(d)
P. Avery et al., npj Comput. Mater. 5, 89 (2019).
AFLOW-‐ML API: Example• Submit VASP 5 POSCAR to prediction endpoint with curl:
16
curl http://aflow.org/API/aflow-‐ml/v1.0/plmf/prediction -‐-‐data-‐urlencode file@POSCAR
• Receive task object with task ID:{"id": "39b0f11a-‐671d-‐4144-‐9465-‐997013ab19c0", "model": "plmf", "results_endpoint": "/prediction/result/39b0f11a-‐671d-‐4144-‐9465-‐997013ab19c0"}
• Query task ID to retrieve results:curl http://aflow.org/API/aflow-‐ml/v1.0/prediction/result/39b0f11a-‐671d-‐4144-‐9465-‐997013ab19c0
• Receive results object
E. Gossett et al., Comput. Mater. Sci. 152, 134-‐145 (2018).
AFLOW-‐ML API: Example
17
• PLMF results object
{"citation": "10.1038/ncomms15679", "description": "The job has completed.", "ml_ael_bulk_modulus_vrh": 144.522, "ml_ael_shear_modulus_vrh": 104.453, "ml_agl_debye": 777.163, "ml_agl_heat_capacity_Cp_300K": 4.33, "ml_agl_heat_capacity_Cp_300K_per_atom": 2.194, "ml_agl_heat_capacity_Cv_300K": 4.178, "ml_agl_heat_capacity_Cv_300K_per_atom": 2.139, "ml_agl_thermal_conductivity_300K": 3.509, "ml_agl_thermal_expansion_300K": 6.18e-‐05, "ml_egap": 3.375, "ml_egap_type": "Insulator", "ml_energy_per_atom": -‐5.742, "model": "plmf", "status": "SUCCESS"}
E. Gossett et al., Comput. Mater. Sci. 152, 134-‐145 (2018).
AFLOW-‐ML API: Example
18
• ML API python script
E. Gossett et al., Comput. Mater. Sci. 152, 134-‐145 (2018).
#!/usr/bin/python3import json, sys, osfrom time import sleepfrom urllib.parse import urlencodefrom urllib.request import urlopenfrom urllib.request import Requestfrom urllib.error import HTTPError
SERVER="http://aflow.org"API="/API/aflow-‐ml/v1.0"MODEL="plmf"
poscar=open('POSCAR', 'r').read()encoded_data = urlencode({'file': poscar,}).encode('utf-‐8')
url = SERVER + API + "/" + MODEL + "/prediction"request_task = Request(url, encoded_data)task = urlopen(request_task).read()task_json = json.loads(task.decode('utf-‐8'))results_endpoint = task_json["results_endpoint"]results_url = SERVER + API + results_endpoint
Sleep library
AFLOW-‐ML server
PLMF model
Encode POSCAR
Retrieve task object
Extract task ID and results endpoint
Results URL
AFLOW-‐ML API: Example
19
• ML API python script
E. Gossett et al., Comput. Mater. Sci. 152, 134-‐145 (2018).
incomplete = Truewhile incomplete:request_results = Request(results_url)results = urlopen(request_results).read()results_json = json.loads(results)if results_json["status"] == 'PENDING':sleep(10)continue
elif results_json["status"] == 'STARTED':sleep(10)continue
elif results_json["status"] == 'FAILURE':print("Error: prediction failure")incomplete = False
elif results_json["status"] == 'SUCCESS':print("Successful prediction")print(results_json)incomplete = False
Retrieve status/results object
Check status: if PENDING or STARTED, sleep for 10 seconds and recheck
Check status: if FAILURE, write error message
Check status: if SUCCESS, write out the results json
20
AFLOW-‐ML Online
G (p, T; V) = E + pV
Exercises:• Copy the VASP 5 Heusler structure POSCAR from the previous exercise to the
appropriate directory. Modify the aflow_ml_api.py script to print whether the material is a metal or an insulator, and if it is an insulator, to print the band gap.
• Modify the script to run the MFD model for the same structure. What results are returned?
• Use AFLUX or the AFLOW.orgadvanced search page to find the entry in the Mo-‐Ti alloy system with the lowest formation enthalpy per atom. Download the relaxed structure and convert it to VASP 5 format, and use the AFLUX ML API to find the bulk and shear moduli.
O. Isayev et al., Nat. Commun. 8, 15679 (2017); E. Gossett et al., Comput. Mater. Sci. 152, 134 (2018);F. Legrain et al., J. Chem. Inf. Model. 58(12), 2460-‐2466 (2018); V. Stanev et al., npj Comput. Mater. 4, 29 (2018)