Materials Genome ProjectMaterials Genome Project
For a fisherman, efficient data mining means deducing where he has the highest probability of finding fish, but does not guarantee that he will catch one.
H. AouragTlemcen University, Algeria
Differences-SocietyDifferences-Society
• We should not be divided into developed countries and developing countries, but we are developing our common futures together with the gifts from friends, i.e., data & knowledge.
• We need to work together.
Essential changes are Essential changes are
• Actors and actresses in S/T are not only experts but people in general in the information era.
• Techno-democracy by IT may emerge as a new relation between people and S & T and experts are the people who can show exemplars for the people and help people to do it by themselves.
FAIR COMPETITION!
We have been emerging with mistakes and successes, and we need to make our experiences into public goods. It starts from our collaboration fighting against public bads!
The Sumatran earthquakes The Sumatran earthquakes
of 2004 and 2005: of 2004 and 2005:
What’s next? What can be done?What’s next? What can be done?
We are repeating mistakes.
Is it our intrinsic feature?
To be brave enough for making challenges (=mistakes and/or
challenges) .
Successes by total quality control let people stop thinking together and differently.
How to go beyond a domain differentiated discipline ensuring universal access to scientific knowledge ?
Eradicate extreme poverty and hunger Reduce by half the proportion of people living on less than a dollar a dayReduce by half the proportion of people who suffer from hunger
Achieve universal primary educationEnsure that all boys and girls complete a full course of primary schooling
Promote gender equality and empower womenEliminate gender disparity in primary and secondary education preferably by 2005, and at all levels by 2015
Reduce child mortalityReduce by two thirds the mortality rate among children under five
Improve maternal healthReduce by three quarters the maternal mortality ratio
Combat HIV/AIDS, malaria and other diseasesHalt and begin to reverse the spread of HIV/AIDSHalt and begin to reverse the incidence of malaria and other major diseases
Ensure environmental sustainabilityIntegrate the principles of sustainable development into country policies and programmes; reverse loss of environmental resourcesReduce by half the proportion of people without sustainable access to safe drinking waterAchieve significant improvement in lives of at least 100 million slum dwellers, by 2020
Develop a global partnership for developmentDevelop further an open trading and financial system that is rule-based, predictable and non-discriminatory. Includes a commitment to good governance, development and poverty reduction—nationally and internationally Address the least developed countries’ special needs. This includes tariff- and quota-free access for their exports; enhanced debt relief for heavily indebted poor countries; cancellation of official bilateral debt; and more generous official development assistance for countries committed to poverty reduction Address the special needs of landlocked and small island developing States Deal comprehensively with developing countries’ debt problems through national and international measures to make debt sustainable in the long term In cooperation with the developing countries, develop decent and productive work for youth In cooperation with pharmaceutical companies, provide access to affordable essential drugs in developing countries In cooperation with the private sector, make available the benefits of new technologies—especially information and communications technologies
Data Activities in GeneralData Activities in General
• Databases everywhere, but not well organized.– Many databases, but too many duplications
– Less interoperability• Necessity to make practically useful interface
– Piecewise• How to integrate for ad hoc application
– Positive incentives to go beyond “collection”
• Next : Long Tail Possibilities Individual Cares
Working HypothesisWorking Hypothesis
• Data Science– Friendly interface for many sciences!
• Design Science– Value extraction/design/creation from data
• Management Science– Knowledge(Physics, Chemistry, Mathematics,
Technology)– Environment(Nature, Artifact, Human beings) – Society(Politics, Economy, Sociology)
Components : Mind Sets in E-Science
Data ScienceData Science
• Universality– Data for everyone
• Sharing, standards, metadata, interoperability, ….
– Data of no one• Equitable, universal, open, …access
• Individual Care-establishing service channels– Data services for each person and each context
with appropriate expression, timing and contents. – Differences of individuals are the key for
evolutions.
What are our objectives?What are our objectives?
The improvement of the quality and accessibility of data, as well as the methods by which data are acquired, managed, analyzed and evaluated, with particular emphasis on digital divide.
The facilitation of national and international co-operation among those collecting, organizing and using data.
The promotion of an increased awareness in the scientific and technical community of the importance of these activities.
The consideration of data access and intellectual property issues.
Let’s work together from Let’s work together from now!now!
Data Science is not pursued as an end in itself, but as a means to the attainment of wisdom as human.
Diagram illustrating how, in particular, information and knowledge derive from Diagram illustrating how, in particular, information and knowledge derive from
raw data through the understanding of relationships and thenraw data through the understanding of relationships and then patterns.patterns.
The concepts of preservation, curation, provenance, discovery, The concepts of preservation, curation, provenance, discovery, access in the context of the research lifecycle.access in the context of the research lifecycle.
H.Aourag 29
So Why Designing MaterialsSo Why Designing Materials
Systems Experimentaly Known
Percent Known
Maximum Number
Unaries 100 100% 100
Binaries 4000 81% 4950
Ternaries 8000 5% 161700
Quaternaries 1000 <1% 3921225
Combinatorial Materials Techniques
H.Aourag 32
MethodologyMethodology
• Computational and database software tools should be configured in a manner that maximally exploits the synergy between them
Problem Solving/ Analysis
data Theory
Correlations
Crystal structure, property, phase data, both experimental and calculated
Ab initio quatum mechanical methods
Statistics, rules, regularities, data patterns, structure
One of the most challenging tasks in materials science is the design of new materials with tailored properties. Two different approaches are generally explored:
► The first one consists of simulating the motion of the atoms in the material and their electronic interactions by performing ab-initio calculations at the quantum-mechanical level. This approach does (at least in principle) not rely on experiments, but is computationally demanding and can currently only be applied to a limited number of rather simple solids.
► The second approach remains at a more pragmatic level: Most of our current knowledge in materials science has been collected empirically, by searching for patterns in experimental observations. During the past 100 years, huge amounts of data have been collected making it possible to use modern computer technology to search for additional correlations. This approach, however, depends on the availability of a sufficiently large amount of experimental data of appropriate quality.
Regularities ?
Materials Design
Periodic Table of the Elements
Too manyPossibilities…?
DesignUnaryBinaryTernaryQuaternary… Multinary
Materials
Properties Structures AtomicConstituents
Materials Design(Resolution Line)
Prediction(Production Line)
FunctionsNeeds
Specificationdesign
Functiondesign
Structuredesign
Processdesign
II. Approaches: Data-Driven Approach
- Formation of compound in a given binary system - Composition of stable compounds in “compound formers” - Structures of a given compound - Properties of a given compound Postulation
Property of Materials
Elemental Property Parameters (EPPs)
Expression
Tool: Materials Databases: Pauling File
Based on the comprehensive materials database to reveal regularities:
Basic Idea
phase diagrams + crystal structures + physical properties together in the world largest database for inorganic compounds
phase diagrams + crystal structures + physical properties together in the world largest database for inorganic compounds
Purpose of Mapping
Mapping
Data-Driven Approach
Two key points in mapping
Characterization: To find optimal coordinates
Classification: To define meaning of domains
Proper Elemental Properties as Axes
Substances in same/similar structure/properties Groups
Modeling-Driven Approach
Calculations based on various
physical models provide:
Complement to empirical data, provide new data;
Further screening and prediction of hypothesis;
Understanding of insight into the origin;
Prediction of materials with required properties.
Theoretical Approaches
• First Principles Electronic Structures (FLAPW, Wien)
• Car-Parrinello Molecular Dynamics (CPMD, VASP)
• Cluster Expansion Method (CEM)
• Cluster Variation Method (CVM)
• Phase Field Method (PPM)
• Classical Molecular Dynamics (MD)
• ……
Modeling-Driven Approach
Data/Modeling-Driven Approach
Periodic Table of the Elements Too manyPossibilities…?
Design
Regularities
UnaryBinaryTernaryQuaternary… Multinary
Materials
Model-Driven Approach Origin Data-Drive Approach Discovery
Density vs. Melting pointEach property clusterstructures
Purpose: Regularity between Crystal structure & Element properties
Structural Map
Definition of domains~3,500
Conventional Structures Types
Definition of domains~3,500
Conventional Structures Types
Optimal coordinates
56 Element Property
Parameters
Optimal coordinates
56 Element Property
Parameters
? Decreasing possibilities !
6 most distinct EPP groups Atomic number Group number Mendeleev number Cohesion energy Electrochemical factor Size
Operations Sum EP(A)+EP(B) Difference EP(A)-EP(B) Product EP(A)*EP(B) Ratio EP(A)/EP(B) Maximum Max(EP(A),EP(B)) Minimum Min(EP(A),EP(B))
EPP EP(tot) = EP(A) op EP(B)
2-3 Optimal EPP Expressions
2-3 Optimal EPP Expressions
~30 Atomic Environment
Types
~30 Atomic Environment
Types
Conventional structure types
Conventional structure types
Max Gap
Distribution, Patterns, …… ☺
Compound Formation MapCompound Formation Map
Separation of 2,330 binary systems into compound formers (blue) and non-formers (yellow) in a compound formation map showing max[PN(A) / PNmax, PN(B) / PNmax] (y-axis) vs. [PN(A) / PNmax × PN(B) / PNmax] (x-axis), where PN is the Periodic Number (a distinct integer assigned to each chemical element based on its position in Mendeleev's periodic system)
atomic environment type stability map for atomic environment type stability map for ABAB compounds compounds
Atomic environment type (AET) stability map showing the Periodic Number PNmax (y-axis) vs. PNmin / PNmax (x-axis) for equiatomic AB compounds [4] . AET of the element with the highest Periodic Number is given on the left-hand side of x = 1, AET of the element with the lowest Periodic Number in the same compound on the right-hand side in the same row.
generalized atomic environment type matrix generalized atomic environment type matrix
Generalized AET matrix PN(A) vs. PN(B), which is independent of the stoichiometry and the number of chemical elements in the compound [5]. The element A occupying the center of the AET is given on the y-axis and the coordinating element B on the x-axis.
Phase DiagramsPhase Diagrams
distribution according to publication year38'592 database entries processed 06.2012
crystal structurescrystal structures
● journals :81'290 publications processed 09.2012
Acta Crystallographica Journal of Alloys and Compounds Journal of Solid State Chemistry Zeitschrift für Anorganische und Allgemeine Chemie Inorganic Materials Russian Journal of Inorganic Chemistry Inorganic Chemistry Physical Review B Zeitschrift für Kristallographie C.R. des Seances de l'Academie des Sciences Materials Research Bulletin American Mineralogist others
distribution according to journal
1 or 2 elements 3 elements 4 or more elements
distribution according to chemical class
52'769 99'438 100'392
physical propertiesphysical properties
● journals Physical Review B Journal of Alloys and Compounds Solid State Communications Physica B+C Journal of Magnetism and Magnetic Materials Journal of Solid State Chemistry Journal of Physics: Condensed Matter Physica Status Solidi A Journal of the Physical Society of Japan Physical Review Letters Materials Research Bulletin Journal of Applied Physics others
33'458 publications processed 06.2012
Distribution according to journal
1 or 2 elements 3 elements 4 or more elements
distribution according to chemical class
34'889 30'627 25'618
● property class
1 mechanical properties 2 thermal and thermodynamic properties 3 electronic and electrical properties 4 optical properties 5 ferroelectric properties 6 magnetic properties 7 superconductor properties
● data category
from bottom to top: - numerical values- figure descriptions- additional data
distribution according to property class
AMASS – 7/25/03
Predicting Properties with Atomistic ModelingPredicting Properties with Atomistic Modeling
Atomistic modeling• Atom positions• Electronic structure• Energies
Macroscopic properties• Elastic properties• Conductivity• Toxicity
?Band GapElastic Constants
Direct calculation
Band GapElastic Constants
Segregation EnergiesActivation Barriers
Physical lawsConstitutive relations
EmbrittlementTransport
WeldabilityToxicity
Data MiningAtomic Scale Descriptors
AMASS – 7/25/03
Power of Data MiningPower of Data Mining
• Does not require complete and accurate multiscale theories
• New physics in relationships R• Quick, cheap screening for desired properties, errors, etc. – can be qualitative
Use known data to establish R
Calculated Atomistic Properties Database
Measured Macroscopic Properties DatabaseR
Calculated Atomistic Properties Database
Predicted Macroscopic Properties DatabaseR
Use R to predict new data
AMASS – 7/25/03
Key IssuesKey Issues
– Descriptors accessible to modeling– Descriptors optimally chosen
• Use known relationships/physics• Optimize from large set of possibilities
– Descriptors→Property relationship is robust• Sensible choice of methods• tested with cross validation, test sets
– Data• Large enough• Clean enough
Macroscopic Properties
Data MiningAtomic scale descriptors
It is common for chemists to propose new compounds from the substitution of another, chemically similar, ion. For instance, as illustrated in Figure 1, knowing that BaTiO3 forms a perovskite structure, one can deduct that it is likely for another chemically similar ion as Ca2+ to form the same structur
Data mined tendency for ionic substitutions. Red indicates high substitution tendency. Blue indicates that the two ions tend to not substitute
Procedure for proposing new compound candidates in a quaternary system using the ionic substitution probability
• The Materials Genome Initiative will create a new era of materials innovation that will serve as a foundation for strengthening domestic industries in these fields. This initiative offers a unique opportunity for the United States to discover, develop, manufacture, and deploy advanced materials at least twice as fast as possible today, at a fraction of the cost. Essential to this effort is the development of a data infrastructure that will provide the needed data and tools to support this effort. Some of the fundamental data needed for this infrastructure is phase based material data.
On-Line Distributed Materials DevelopmentOn-Line Distributed Materials Developmentthe the aflowlib.orgaflowlib.org Consortium Consortium
Stefano Curtarolo, Duke University, DMR 0639822Stefano Curtarolo, Duke University, DMR 0639822
Creation of the AFLOWLIB.ORGRepository of electronic structures.•High-throughout data-mining•Phenomenological rules•Automatic Correlations
Take home message:high-throughput ab-initio is used to study:•Thermoelectrics•Photovoltaics•Topological insulators•Scintillators•Magnetic alloys
INTELLECTUAL MERIT
ACS Comb. Sci. 13(4), 382-390 (2011), Comp. Mat. Sci. 49, 299-312 (2010)
On-Line Distributed Materials DevelopmentOn-Line Distributed Materials Developmentthe the aflowlib.orgaflowlib.org Consortium Consortium
Stefano Curtarolo, Duke University, DMR 0639822Stefano Curtarolo, Duke University, DMR 0639822
BROADER IMPACT
Technological outputs: scintillators design
Serving the community: online-infrastructure, aconvasp-online (web interface for high-throughput electronic structure calculations)
Pearson's Crystal Data is a crystallographic database published by ASM International (Materials Park, Ohio, USA), edited by Pierre Villars and Karin Cenzual. It has its roots in the well-known PAULING FILE project and contains crystal structures of a large variety of inorganic materials and compounds. The "PCD" (as it is typically abbreviated) is a collaboration between ASM International and Material Phases Data System, Vitznau, Switzerland (MPDS), aiming to create and maintain the world's largest critically evaluated "Non-organic database".
The current release 2013/14 contains more than 242,600 structural data sets (including atom coordinates and displacement parameters, when determined) for about 141,600 different chemical formulas, roughly 16,000 experimental powder diffraction patterns and about 220,000 calculated patterns (interplanar spacings, intensities, Miller indices). This release achieves nearly full overlap with ICSD entries.