networking materials data

40
computationinstitute.or Networking materials data Ian Foster [email protected] ianfoster.org

Upload: ian-foster

Post on 23-Aug-2014

340 views

Category:

Science


0 download

DESCRIPTION

A talk given at a workshop in Atlanta on "Building an Integrated MGI Accelerator Network": see http://acceleratornetwork.org/event/building-an-integrated-mgi-accelerator-network/. The US Materials Genome Initiative seeks to develop an infrastructure that will accelerate advanced materials development and deployment. The term Materials Genome suggests a science that is fundamentally driven by the systematic capture of large quantities of elemental data. In practice, we know, things are more complex—in materials as in biology. Nevertheless, the ability to locate and reuse data is often essential to research progress. I discuss here three aspects of networking materials data: data publication and discovery; linking instruments, computations, and people to enable new research modalities based on near-real-time processing; and organizing data generation, transformation, and analysis software to facilitate understanding and reuse. I use these three problems to motivate a discussion of recent results in cloud computing, data publication management, high-performance computing, and related topics.

TRANSCRIPT

  • computationinstitute.org Networking materials data Ian Foster [email protected] ianfoster.org
  • computationinstitute.org
  • computationinstitute.org
  • computationinstitute.org Materials Innovation Infrastructure A data sharing system to facilitate: Use of a broader set of data to render more accurate models Multi-disciplinary communication among scientists and engineers working on different stages of materials development Searches for advanced materials with specific, desired properties Curating and sharing of reliable computational code for modeling and simulation Credit: Meredith Drosback, OSTP Computation Data Experiment
  • computationinstitute.org Data: Rare treasure? http://www.thejakartapost.com/news/2011/05/14/holy-water.html
  • computationinstitute.org Its both We must manage the data delugeboth to enhance user productivity and to increase data capture Network materials data Or chaotic deluge? Wellington bucket fountain: https://www.youtube.com/watch?v=_p_FNNDu16w
  • computationinstitute.org Linking simulation and experiment to study disordered structures Diffuse scattering images from Ray Osborn et al., Argonne
  • computationinstitute.org Linking simulation and experiment to study disordered structures Diffuse scattering images from Ray Osborn et al., Argonne SampleExperimental scattering Material composition Simulated structure Simulated scattering La 60% Sr 40% Detect errors (secsmins) Knowledge base Past experiments; simulations; literature; expert knowledge Select experiments (minshours) Contribute to knowledge base Simulations driven by experiments (minsdays) Knowledge-driven decision making Evolutionary optimization
  • computationinstitute.org An expensive business Network engineer Parallel programme r Software engineer Database architect Database manager Software engineer Data engineer Parallel programmer Postdoc Postdoc
  • computationinstitute.org A small business, 20 years ago Secretary HR manager Marketing Database manager Accountant IT department Personal assistant Shipping department Intern Payroll
  • computationinstitute.org A small business, today Business cloud Reduce costs Speed innovation Reliable, scalable, simple
  • computationinstitute.org Can we do the same for research? Discovery cloud Reduce costs Speed discovery Reliable, scalable, simple ?
  • computationinstitute.org File transfer & sharing Discovery cloud: Globus research data management services www.globus.org
  • computationinstitute.org Linking simulation and experiment to study disordered structures Diffuse scattering images from Ray Osborn et al., Argonne SampleExperimental scattering Material composition Simulated structure Simulated scattering La 60% Sr 40% Globus transfer service Cloud hosted: reliable, secure, fast 20K users, 3B files, 50 PB transferred Available at www.globus.org
  • computationinstitute.org File transfer & sharing Identity & group management Discovery cloud: Globus research data management services www.globus.org
  • computationinstitute.org Linking simulation and experiment to study disordered structures Diffuse scattering images from Ray Osborn et al., Argonne SampleExperimental scattering Material composition Simulated structure Simulated scattering La 60% Sr 40% Evolutionary optimization Globus sharing Identities, groups, profiles Cloud hosted
  • computationinstitute.org File transfer & sharing Data publication & discovery Identity & group management Discovery cloud: Globus research data management services www.globus.org
  • computationinstitute.org Linking simulation and experiment to study disordered structures Diffuse scattering images from Ray Osborn et al., Argonne SampleExperimental scattering Material composition Simulated structure Simulated scattering La 60% Sr 40% Knowledge base Past experiments; simulations; literature; expert knowledge Contribute to knowledge base Knowledge-driven decision making Globus data publication and discovery Cloud hosted
  • computationinstitute.org Data publication and discovery We are looking for pilot users! Metadata Access Control License Storage Curation Workflow Policies Collection Metadata DataMetadata Data Metadata Data Dataset Dataset Dataset Community
  • computationinstitute.org Publish dashboard 20
  • computationinstitute.org Start a new submission 21
  • 22 Describe submission: 1) Dublin Core
  • 23 Describe submission: 2) Science metadata
  • computationinstitute.org Assemble the dataset 24
  • 25 Transfer files to submission endpoint
  • 26 Check dataset is assembled correctly
  • computationinstitute.org Submission now in curation workflow 27
  • computationinstitute.org Search published datasets 28
  • computationinstitute.org Search across collections
  • computationinstitute.org Discover a published dataset 30
  • computationinstitute.org Select a published dataset 31
  • computationinstitute.org View downloaded dataset 32
  • computationinstitute.org File transfer & sharing Data publication & discovery Simulation & data analysis Identity & group management Discovery cloud: Globus research data management services www.globus.org
  • computationinstitute.org Linking simulation and experiment to study disordered structures Diffuse scattering images from Ray Osborn et al., Argonne SampleExperimental scattering Material composition Simulated structure Simulated scattering La 60% Sr 40% Detect errors (secsmins) Knowledge base Past experiments; simulations; literature; expert knowledge Select experiments (minshours) Contribute to knowledge base Simulations driven by experiments (minsdays) Knowledge-driven decision making Evolutionary optimization
  • Justin Wozniak et al.
  • computationinstitute.org Tool shed Simulation models & analysis tools Data space Local and remote datasets Workflows Link data, tools in reusable form Simulation and data analysis: Point and click parallelism Capture domain knowledge: data and code Reusable workflows encode commonly used modeling and analysis pipelines Builds on widely used Galaxy, Globus, and Swift systems galaxyproject.org globus.org swift-lang.org Large simulation campaigns Hosted on Amazon cloud for reliable, on-demand access and scalability
  • computationinstitute.org Discovery Cloud: Three common themes 1) Accelerate discovery via automation 2) Slash costs of trying new methods No local software installation No need to read manual On-demand, elastic scalability Low operational costs, proactive support 3) Make data preservation trivial
  • computationinstitute.org Take away messages Data has a dual nature: rare treasure and chaotic deluge MGI must embrace this duality Treasure: Store, curate, index, preserve Deluge: Slash management costs, to both accelerate use & facilitate data preservation Cloud services can help in both areas
  • computationinstitute.org Thanks to great colleagues and collaborators Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Raj Kettimuthu, Ravi Madduri, Tanu Malik, Steve Tuecke, Justin Wozniak, and other CS colleagues Ray Osborn, Francesco de Carlo, Chris Jacobsen, Nicola Ferrier, and other Argonne scientists Juan de Pablo, Peter Voorhees, and other NIST CHiMaD participants
  • computationinstitute.org Thank you to our sponsors!