using mongodb for materials discovery

Post on 29-Jul-2015

747 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using MongoDB for Materials Discovery

Michael Kocher and Dan GunterLawrence Berkeley National Lab

Energy Mission at LBNL

• Li-ion Batteries

• Photovoltaic (Solar Cells)

• Thermoelectrics

• Biofuels

• New Computational Tools

• Cutting edge Spectroscopic Tools (Advanced Light Source)

http://carboncycle2.lbl.gov/

Current Material Design model is Slow

18 Years... from the average new materials discovery to commercialization

Bringing New Materials to the Market: Eagar, T.W. Technology Review Feb 1995, 98, 42.

Materials Genome Initiative:A Renaissance of American Manufacturing

“To help businesses discover, develop, and deploy new materials twice as fast, we're launching what we call the

Materials Genome Initiative. The invention of silicon circuits and lithium-ion batteries made computers and iPods

and iPads possible -- but it took years to get those technologies from the drawing board to the marketplace.

We can do it faster.”

- President Obama at Carnegie Mellon University 6/24/2011

What is a Material?

NaCl Silicon

LiCoO2Li

O

Co

What can we Compute using quantum mechanics?

+

No empirical parameters!

volumedensity

total energyformation energy

metallic?etc...

MIT and LBNL collaboration

‘The Google of Material Science Data”MaterialsProject.org

+

Inverting the Problem

Detailed Properties

Machine LearningStructure 1Structure 2Structure 3Structure 4Structure 5Structure 6

materials.bson Learning Algorithm

(new materials)

Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503)

What about Na, V, P, O?

How often can you substitute Mg for Ca?

Materials Project:A Play in Three Acts

I.Data generation using HTCII. Data storageIII.Data analysis/logging

Act I: Managing Calculations

• Centralized distributed model is the only way to go

• Hub is at LBNL

• Store the state in db

• Overview of running many MPI jobs at many different HP centers

MasterQueue

master_queue.bson

Franklin

NERSC (Oakland)

Lawrencium(Berkeley)

Hopper Carver lr1 lr2

manager.x manager.x manager.x manager.x manager.x

create a new engine, add

to queue

builder.xpull crystal

HPC

‘The Brain’

ExampleMongoDB

FranklinHopper Carver lr1 lr2

manager.x

CathodeO1

MIT

manager.x manager.x manager.x manager.x manager.x manager.x

DLX

manager.x

Centralized Logging and Management

NERSC (Oakland) LBNL Kentucky

query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}

Act II :Core Data storage

Very Complex Documents

Powerful Querying

Every crystal that has (Li or Na or K), (Mn), (O or S or F or Si)plus one other element except (Zn or Ni or Fe or Cu or Co)

{"lattice.volume" : { "$lt" : 500 },"elements" : {"$all" : ['Mn'],"$size" : 4, “$nin”:['Zn','Ni','Fe','Cu','Co']},"atoms" : { "$elemMatch" : { ‘oxidation_state’ : 3, ‘symbol’:’Mn’} },"$where" : "match_all(

this.element_names, ['Li', 'Na', 'K'], ['Mn'], ['O', 'S', 'F', 'Si'])"

}

pre-MongoDB :(((SELECT structure.structureid FROM structure NATURAL INNER JOINdatabase NATURAL INNER JOIN databaseentry WHERE structureid IN((select structure.structureid from structure NATURAL INNER JOINelemententry where elemententry.symbol='Li' INTERSECT selectstructure.structureid from structure NATURAL INNER JOIN elemententrywhere elemententry.symbol='O') INTERSECT select structure.structureidfrom structure NATURAL INNER JOIN database NATURAL INNER JOINdatabaseentry where database.title='ICSD')) EXCEPT (SELECTstructure.structureid FROM structure where structure.entryid IN(select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECTstructure.structureid FROM structure where structure.entryid IN(select entryid from removals))

Search for materials with Li and O, excluding duplicates

Map/Reduce

tasks.bson materials.bson

MR

✓Calculation 12Calculation 13Calculation 14Calculation 15

Every App uses MongoDB

by G. Hautier

structure_predictors.bsoncandidate_materials.bson diffraction_patterns.bson

Structure Predictor

Diffraction Pattern

Act III:Analytics and Logging

Rich Error Analysis

Experimental Calculated

Integrated logging just makes sense

• Semi-structured data easily stored

• Can correlate with all other data

• Automation Layer: Failed tasks

• Web/App Layer

Conclusions • MongoDB is a very versatile tool

• Used in several different cases

• Elegant query syntax

• Very useful for scientific data storage

• A lot of exciting future ideas

Acknowledgements

Thanks!

MaterialsProject.org

top related