materials genome project for a fisherman, efficient data mining means deducing where he has the...

75
Materials Genome Materials Genome Project Project For a fisherman, efficient data mining means deducing where he has the highest probability of finding fish, but does not guarantee that he will catch one. H. Aourag Tlemcen University, Algeria

Upload: sara-caldwell

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Materials Genome ProjectMaterials Genome Project

For a fisherman, efficient data mining means deducing where he has the highest probability of finding fish, but does not guarantee that he will catch one.

H. AouragTlemcen University, Algeria

Differences-Mass Production of DataDifferences-Mass Production of Data

。。。。

Differences-SocietyDifferences-Society

• We should not be divided into developed countries and developing countries, but we are developing our common futures together with the gifts from friends, i.e., data & knowledge.

• We need to work together.

Essential changes are Essential changes are

• Actors and actresses in S/T are not only experts but people in general in the information era.

• Techno-democracy by IT may emerge as a new relation between people and S & T and experts are the people who can show exemplars for the people and help people to do it by themselves.

FAIR COMPETITION!

We have been emerging with mistakes and successes, and we need to make our experiences into public goods. It starts from our collaboration fighting against public bads!

FloodsFloods

August 10, 2002

August 11, 2002

August 15, 2002 Oregon (AP)

StarvationStarvation

StarvationStarvation

Water ShortageWater Shortage

Desertification and Refugee

Malaria and Climate Change

RefugeeRefugee

The Sumatran earthquakes The Sumatran earthquakes

of 2004 and 2005: of 2004 and 2005:

What’s next? What can be done?What’s next? What can be done?

Lessons from Failures

We are repeating mistakes.

Is it our intrinsic feature?

To be brave enough for making challenges (=mistakes and/or

challenges) .

Successes by total quality control let people stop thinking together and differently.

How to go beyond a domain differentiated discipline ensuring universal access to scientific knowledge ?

Eradicate extreme poverty and hunger Reduce by half the proportion of people living on less than a dollar a dayReduce by half the proportion of people who suffer from hunger

Achieve universal primary educationEnsure that all boys and girls complete a full course of primary schooling

Promote gender equality and empower womenEliminate gender disparity in primary and secondary education preferably by 2005, and at all levels by 2015

Reduce child mortalityReduce by two thirds the mortality rate among children under five

Improve maternal healthReduce by three quarters the maternal mortality ratio

Combat HIV/AIDS, malaria and other diseasesHalt and begin to reverse the spread of HIV/AIDSHalt and begin to reverse the incidence of malaria and other major diseases

Ensure environmental sustainabilityIntegrate the principles of sustainable development into country policies and programmes; reverse loss of environmental resourcesReduce by half the proportion of people without sustainable access to safe drinking waterAchieve significant improvement in lives of at least 100 million slum dwellers, by 2020

Develop a global partnership for developmentDevelop further an open trading and financial system that is rule-based, predictable and non-discriminatory. Includes a commitment to good governance, development and poverty reduction—nationally and internationally Address the least developed countries’ special needs. This includes tariff- and quota-free access for their exports; enhanced debt relief for heavily indebted poor countries; cancellation of official bilateral debt; and more generous official development assistance for countries committed to poverty reduction Address the special needs of landlocked and small island developing States Deal comprehensively with developing countries’ debt problems through national and international measures to make debt sustainable in the long term In cooperation with the developing countries, develop decent and productive work for youth In cooperation with pharmaceutical companies, provide access to affordable essential drugs in developing countries In cooperation with the private sector, make available the benefits of new technologies—especially information and communications technologies

Data Activities in GeneralData Activities in General

• Databases everywhere, but not well organized.– Many databases, but too many duplications

– Less interoperability• Necessity to make practically useful interface

– Piecewise• How to integrate for ad hoc application

– Positive incentives to go beyond “collection”

• Next : Long Tail Possibilities Individual Cares

Working HypothesisWorking Hypothesis

• Data Science– Friendly interface for many sciences!

• Design Science– Value extraction/design/creation from data

• Management Science– Knowledge(Physics, Chemistry, Mathematics,

Technology)– Environment(Nature, Artifact, Human beings) – Society(Politics, Economy, Sociology)

Components : Mind Sets in E-Science

Data ScienceData Science

• Universality– Data for everyone

• Sharing, standards, metadata, interoperability, ….

– Data of no one• Equitable, universal, open, …access

• Individual Care-establishing service channels– Data services for each person and each context

with appropriate expression, timing and contents. – Differences of individuals are the key for

evolutions.

What are our objectives?What are our objectives?

The improvement of the quality and accessibility of data, as well as the methods by which data are acquired, managed, analyzed and evaluated, with particular emphasis on digital divide.

The facilitation of national and international co-operation among those collecting, organizing and using data.

The promotion of an increased awareness in the scientific and technical community of the importance of these activities.

The consideration of data access and intellectual property issues.

Let’s work together from Let’s work together from now!now!

Data Science is not pursued as an end in itself, but as a means to the attainment of wisdom as human.

Diagram illustrating how, in particular, information and knowledge derive from Diagram illustrating how, in particular, information and knowledge derive from

raw data through the understanding of relationships and thenraw data through the understanding of relationships and then patterns.patterns.

The concepts of preservation, curation, provenance, discovery, The concepts of preservation, curation, provenance, discovery, access in the context of the research lifecycle.access in the context of the research lifecycle.

Human-beings : Human Genome

AMASS – 7/25/03

Our GenomeOur Genome

H.Aourag 29

So Why Designing MaterialsSo Why Designing Materials

Systems Experimentaly Known

Percent Known

Maximum Number

Unaries 100 100% 100

Binaries 4000 81% 4950

Ternaries 8000 5% 161700

Quaternaries 1000 <1% 3921225

Combinatorial Materials Techniques

H.Aourag 32

MethodologyMethodology

• Computational and database software tools should be configured in a manner that maximally exploits the synergy between them

Problem Solving/ Analysis

data Theory

Correlations

Crystal structure, property, phase data, both experimental and calculated

Ab initio quatum mechanical methods

Statistics, rules, regularities, data patterns, structure

One of the most challenging tasks in materials science is the design of new materials with tailored properties. Two different approaches are generally explored:

► The first one consists of simulating the motion of the atoms in the material and their electronic interactions by performing ab-initio calculations at the quantum-mechanical level. This approach does (at least in principle) not rely on experiments, but is computationally demanding and can currently only be applied to a limited number of rather simple solids.

► The second approach remains at a more pragmatic level: Most of our current knowledge in materials science has been collected empirically, by searching for patterns in experimental observations. During the past 100 years, huge amounts of data have been collected making it possible to use modern computer technology to search for additional correlations. This approach, however, depends on the availability of a sufficiently large amount of experimental data of appropriate quality.

Regularities ?

Materials Design

Periodic Table of the Elements

Too manyPossibilities…?

DesignUnaryBinaryTernaryQuaternary… Multinary

Materials

Properties Structures AtomicConstituents

Materials Design(Resolution Line)

Prediction(Production Line)

FunctionsNeeds

Specificationdesign

Functiondesign

Structuredesign

Processdesign

II. Approaches: Data-Driven Approach

- Formation of compound in a given binary system - Composition of stable compounds in “compound formers” - Structures of a given compound - Properties of a given compound Postulation

Property of Materials

Elemental Property Parameters (EPPs)

Expression

Tool: Materials Databases: Pauling File

Based on the comprehensive materials database to reveal regularities:

Basic Idea

phase diagrams + crystal structures + physical properties together in the world largest database for inorganic compounds

phase diagrams + crystal structures + physical properties together in the world largest database for inorganic compounds

Purpose of Mapping

Mapping

Data-Driven Approach

Two key points in mapping

Characterization: To find optimal coordinates

Classification: To define meaning of domains

Proper Elemental Properties as Axes

Substances in same/similar structure/properties Groups

Modeling-Driven Approach

Calculations based on various

physical models provide:

Complement to empirical data, provide new data;

Further screening and prediction of hypothesis;

Understanding of insight into the origin;

Prediction of materials with required properties.

Theoretical Approaches

• First Principles Electronic Structures (FLAPW, Wien)

• Car-Parrinello Molecular Dynamics (CPMD, VASP)

• Cluster Expansion Method (CEM)

• Cluster Variation Method (CVM)

• Phase Field Method (PPM)

• Classical Molecular Dynamics (MD)

• ……

Modeling-Driven Approach

Data/Modeling-Driven Approach

Periodic Table of the Elements Too manyPossibilities…?

Design

Regularities

UnaryBinaryTernaryQuaternary… Multinary

Materials

Model-Driven Approach Origin Data-Drive Approach Discovery

Density vs. Melting pointEach property clusterstructures

Purpose: Regularity between Crystal structure & Element properties

Structural Map

Definition of domains~3,500

Conventional Structures Types

Definition of domains~3,500

Conventional Structures Types

Optimal coordinates

56 Element Property

Parameters

Optimal coordinates

56 Element Property

Parameters

? Decreasing possibilities !

6 most distinct EPP groups Atomic number Group number Mendeleev number Cohesion energy Electrochemical factor Size

Operations Sum EP(A)+EP(B) Difference EP(A)-EP(B) Product EP(A)*EP(B) Ratio EP(A)/EP(B) Maximum Max(EP(A),EP(B)) Minimum Min(EP(A),EP(B))

EPP EP(tot) = EP(A) op EP(B)

2-3 Optimal EPP Expressions

2-3 Optimal EPP Expressions

~30 Atomic Environment

Types

~30 Atomic Environment

Types

Conventional structure types

Conventional structure types

Max Gap

Distribution, Patterns, …… ☺

Compound Formation MapCompound Formation Map

Separation of 2,330 binary systems into compound formers (blue) and non-formers (yellow) in a compound formation map showing max[PN(A) / PNmax, PN(B) / PNmax] (y-axis) vs. [PN(A) / PNmax × PN(B) / PNmax] (x-axis), where PN is the Periodic Number (a distinct integer assigned to each chemical element based on its position in Mendeleev's periodic system)

atomic environment type stability map for atomic environment type stability map for ABAB compounds compounds

Atomic environment type (AET) stability map showing the Periodic Number PNmax (y-axis) vs. PNmin / PNmax (x-axis) for equiatomic AB compounds [4] . AET of the element with the highest Periodic Number is given on the left-hand side of x = 1, AET of the element with the lowest Periodic Number in the same compound on the right-hand side in the same row.

generalized atomic environment type matrix generalized atomic environment type matrix

Generalized AET matrix PN(A) vs. PN(B), which is independent of the stoichiometry and the number of chemical elements in the compound [5]. The element A occupying the center of the AET is given on the y-axis and the coordinating element B on the x-axis.

Phase DiagramsPhase Diagrams

distribution according to publication year38'592 database entries processed 06.2012

distribution according to chemical class

binary systems 11'027 ternary systems 27'565

crystal structurescrystal structures

● journals :81'290 publications processed 09.2012

Acta Crystallographica Journal of Alloys and Compounds Journal of Solid State Chemistry Zeitschrift für Anorganische und Allgemeine Chemie Inorganic Materials Russian Journal of Inorganic Chemistry Inorganic Chemistry Physical Review B Zeitschrift für Kristallographie C.R. des Seances de l'Academie des Sciences Materials Research Bulletin American Mineralogist others

distribution according to journal

distribution according to publication year252'599 database entries processed 09.2012

1 or 2 elements 3 elements 4 or more elements

distribution according to chemical class

52'769 99'438 100'392

physical propertiesphysical properties

● journals Physical Review B Journal of Alloys and Compounds Solid State Communications Physica B+C Journal of Magnetism and Magnetic Materials Journal of Solid State Chemistry Journal of Physics: Condensed Matter Physica Status Solidi A Journal of the Physical Society of Japan Physical Review Letters Materials Research Bulletin Journal of Applied Physics others

33'458 publications processed 06.2012

Distribution according to journal

distribution according to publication year91'134 database entries processed 06.2012

1 or 2 elements 3 elements 4 or more elements

distribution according to chemical class

34'889 30'627 25'618

● property class

1 mechanical properties 2 thermal and thermodynamic properties 3 electronic and electrical properties 4 optical properties 5 ferroelectric properties 6 magnetic properties 7 superconductor properties

● data category

from bottom to top: - numerical values- figure descriptions- additional data

distribution according to property class

AMASS – 7/25/03

Predicting Properties with Atomistic ModelingPredicting Properties with Atomistic Modeling

Atomistic modeling• Atom positions• Electronic structure• Energies

Macroscopic properties• Elastic properties• Conductivity• Toxicity

?Band GapElastic Constants

Direct calculation

Band GapElastic Constants

Segregation EnergiesActivation Barriers

Physical lawsConstitutive relations

EmbrittlementTransport

WeldabilityToxicity

Data MiningAtomic Scale Descriptors

AMASS – 7/25/03

Power of Data MiningPower of Data Mining

• Does not require complete and accurate multiscale theories

• New physics in relationships R• Quick, cheap screening for desired properties, errors, etc. – can be qualitative

Use known data to establish R

Calculated Atomistic Properties Database

Measured Macroscopic Properties DatabaseR

Calculated Atomistic Properties Database

Predicted Macroscopic Properties DatabaseR

Use R to predict new data

AMASS – 7/25/03

Key IssuesKey Issues

– Descriptors accessible to modeling– Descriptors optimally chosen

• Use known relationships/physics• Optimize from large set of possibilities

– Descriptors→Property relationship is robust• Sensible choice of methods• tested with cross validation, test sets

– Data• Large enough• Clean enough

Macroscopic Properties

Data MiningAtomic scale descriptors

It is common for chemists to propose new compounds from the substitution of another, chemically similar, ion. For instance, as illustrated in Figure 1, knowing that BaTiO3 forms a perovskite structure, one can deduct that it is likely for another chemically similar ion as Ca2+ to form the same structur

Data mined tendency for ionic substitutions. Red indicates high substitution tendency. Blue indicates that the two ions tend to not substitute

Procedure for proposing new compound candidates in a quaternary system using the ionic substitution probability

• The Materials Genome Initiative will create a new era of materials innovation that will serve as a foundation for strengthening domestic industries in these fields. This initiative offers a unique opportunity for the United States to discover, develop, manufacture, and deploy advanced materials at least twice as fast as possible today, at a fraction of the cost. Essential to this effort is the development of a data infrastructure that will provide the needed data and tools to support this effort. Some of the fundamental data needed for this infrastructure is phase based material data.  

Quantum Materials Informatics Quantum Materials Informatics ProjectProject

On-Line Distributed Materials DevelopmentOn-Line Distributed Materials Developmentthe the aflowlib.orgaflowlib.org Consortium Consortium

Stefano Curtarolo, Duke University, DMR 0639822Stefano Curtarolo, Duke University, DMR 0639822

Creation of the AFLOWLIB.ORGRepository of electronic structures.•High-throughout data-mining•Phenomenological rules•Automatic Correlations

Take home message:high-throughput ab-initio is used to study:•Thermoelectrics•Photovoltaics•Topological insulators•Scintillators•Magnetic alloys

INTELLECTUAL MERIT

ACS Comb. Sci. 13(4), 382-390 (2011), Comp. Mat. Sci. 49, 299-312 (2010)

On-Line Distributed Materials DevelopmentOn-Line Distributed Materials Developmentthe the aflowlib.orgaflowlib.org Consortium Consortium

Stefano Curtarolo, Duke University, DMR 0639822Stefano Curtarolo, Duke University, DMR 0639822

BROADER IMPACT

Technological outputs: scintillators design

Serving the community: online-infrastructure, aconvasp-online (web interface for high-throughput electronic structure calculations)

Pearson's Crystal Data is a crystallographic database published by ASM International (Materials Park, Ohio, USA), edited by Pierre Villars and Karin Cenzual. It has its roots in the well-known PAULING FILE project and contains crystal structures of a large variety of inorganic materials and compounds. The "PCD" (as it is typically abbreviated) is a collaboration between ASM International and Material Phases Data System, Vitznau, Switzerland (MPDS), aiming to create and maintain the world's largest critically evaluated "Non-organic database".

The current release 2013/14 contains more than 242,600 structural data sets (including atom coordinates and displacement parameters, when determined) for about 141,600 different chemical formulas, roughly 16,000 experimental powder diffraction patterns and about 220,000 calculated patterns (interplanar spacings, intensities, Miller indices). This release achieves nearly full overlap with ICSD entries.

Let’s work together from Let’s work together from now!now!

Complementary Set

Data Science is not pursued as an end in itself, but as a means to the attainment of wisdom as human.