cism collaboratory development plan philip j. maechling information technology architect southern...

51
CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Upload: alice-sutton

Post on 18-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Collaboratory Development Plan

Philip J. MaechlingInformation Technology Architect

Southern California Earthquake CenterMarch 11, 2015

Page 2: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Collaboratory Development Plan

• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations

Page 3: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Key CISM Use CasesBased on the CISM Proposal, we identified the following CISM Use Cases:Year 1: • Couple the empirical Uniform California Earthquake Rupture Forecast to the

CyberShake ground-motion forecasting models of the Los Angeles region. • Provide new computational tools to assist the development of rupture simulators

such as RSQSim and ground-motion simulators such as CyberShake.

Year 2: • Couple the RSQSim physics-based rupture simulator to the CyberShake ground-

motion forecasting models• Retrospectively calibrate and test the resulting comprehensive forecasting models.

Year 3: • Construct a computational environment that can sustain the long-term

development of comprehensive, physics-based earthquake forecasting models• Submitted exemplars to CSEP for prospective testing against observed earthquake

activity in California.

Page 4: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Additional CISM System RequirementsCISM must be designed to meet several additional non-functional requirements:1. Must use existing scientific software written in a variety of

programming languages 2. Must use local computing resources and high-performance

parallel computing resources from external resource providers3. Must be able to “show our work” to support scientific review of

results.4. Must be inexpensive to design, build, maintain, and operate5. Must be easy to modify without significant re-implementation or

down time.6. Must support new development without impacting ongoing

operations7. Must run for years to get statistically significant results

Page 5: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Information TechnologyOverview

• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations

Page 6: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Modular Processing Architecture

Build a modular, extensible, distributed, high performance computing framework:

1) Define and execute a multi-stage series of scientific calculations

2) Execute calculations on SCEC and external resources and return results to SCEC

3) Modular construction to enable evaluation of multiple alternative methods

4) Ensure repeatable and reviewable results

* We will use a workflow-based distributed computing framework developed on SCEC HPC Projects

Define Rupture Catalog

Define list of possible earthquakes for region of Interest during period of interest

Assign Rupture Probabilities

Assign a probability to each rupture in catalog during period of interest

Calculate Rupture Ground Motions

Calculate ground motions produced by each rupture in region of interest

Forecast Future Ground Motions

Combine ground motions with probabilities to produce probabilistic ground motion forecast

Page 7: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Modular Processing Architecture

Define Rupture Catalog

Define list of possible earthquakes for region of Interest during period of interest

Assign Rupture Probabilities

Assign a probability to each rupture in catalog during period of interest

Calculate Rupture Ground Motions

Calculate ground motions produced by each rupture in region of interest

Forecast Future Ground Motions

Combine ground motions with probabilities to produce probabilistic ground motion forecast

OpenSHA

UCERF 3 Ruptures

3D Wave Propagation Simulations

UCERF 3 Probabilities

CyberShake OpenSHA

Combine Amplitudes

into Forecast

Page 8: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Focus CISM Software Development on Defining Workflows to Minimize Software Development

WorkflowConfigurationEnvironment

(CISM Software

Development)

WorkflowExecution

Environment

(Existing Open-Source

Software)

Page 9: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Workflow-oriented System Implementation

1. CISM forecasts are implemented by running a series of scientific programs.

2. CISM Workflows define the programs used, the input and output files, and order they must be run.

3. Workflows are defined without machine, or computing environment, specific details (called abstract workflows)

4. After target run site is selected, abstract is “planned” and specific executables, and physical file names are inserted (called concrete workflows)– This technique is well suited for computing environments that move

computing from one system to another.– Workflow tools also provide logging, metadata collection, and restart

capabilities

Page 10: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Information TechnologyOverview

• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations

Page 11: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Computing Environment

Develop a distributed computing environment, based at USC HPCC, utilizing NSF and DOE HPC systems.• Establish both an (1) operational and

(2) development computing environment• Maintain cumulative data results locally• Provide external interfaces to forecasts and

forecast results

Page 12: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015
Page 13: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Information TechnologyOverview

• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations

Page 14: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Essential CISM Scientific Codes[1] OpenSHA: Implement Uniform California Earthquake Rupture Forecast2 and 3, GMPEs, and probabilistic seismic hazard processing / Language: Java / Multi-threaded / Primary Developers: Ned Field, Kevin Milner[2] RSQSim: Large scale simulations of earthquake occurrence to characterize system‐ ‐level response of fault systems including processes that control time, place, and extent of earthquake slip/ Language: C / MPI-based / Primary Developers: James Dieterich, Keith Richards-Dinger [3] CyberShake: 3D wave propagation simulations for large set of ruptures, and seismogram processing resulting in peak ground motions and other parameters / Language: C / MPI-based / Primary Developers: Robert Graves, Scott Callaghan, Philip Maechling, Thomas Jordan[4] CSEP: Automated execution and evaluation of short term earthquake forecast models / Language: Python / Multi-threaded / Primary Developers: D. Schorlemmer, T. Jordan, M. Liukis

[1] Field, E.H., T.H. Jordan, and C.A. Cornell (2003), OpenSHA: A Developing Community-Modeling Environment for Seismic Hazard Analysis, Seismological Research Letters, 74, no. 4, p. 406-419.[2] Richards Dinger, K., and James H. Dieterich (2012) RSQSim Earthquake Simulator Seismological Research Letters, 2012 ‐ v. 83 no. 6 p. 983-990 doi: 10.1785/0220120105[3] Graves, R., T. Jordan; S. Callaghan; E. Deelman; E. Field; G. Juve; C. Kesselman; P. Maechling; G. Mehta; K. Milner; D. Okaya; P. Small; and K. Vahi (2010). CyberShake: A Physics-Based Seismic Hazard Model for Southern California, Pure Applied Geophys.,v.169,i.3-4 DOI: 10.1007/s00024-010-0161-6[4] Zechar,J. D., D. Schorlemmer, M. Liukis, J. Yu, F. Euchner, P. J. Maechling and T. H. Jordan (2010) The Collaboratory for the Study of Earthquake Predictability Perspective on Computational Earthquake Science Concurrency and Computation: Practice and Experience, Vol. 22, 1836-1847, 2010.

Page 15: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Information TechnologyOverview

• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations

Page 16: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Computing and Data Estimates

Estimated Annual Large-scale HPC Runs:

RSQSim: Regional (1200Km faults), Simulated time: 100K Years, Number of Rupture: 100M, Repetitions: 50• Core Hours Required: 40M• Local Results Data: 60TB

CyberShake: Regional (300 Sites), Spacing: 10Km, Max Freq: 1Hz, Min Vs: 500m/s• Core Hours Required: 70M• Local Results Data: 10TB

Page 17: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Information TechnologyOverview

• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations

Page 18: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC Software Engineering Practices

Iterative Development Process (Typically 3 Month Iterations)• Develop end-to-end processing that provides scientific value• Deploy operational system and operate during next iteration• Extend system, preserving existing and add new capabilities on development

system• Migrate development system to operational at end of iteration

Software Engineering Practices• Software Version Control• Automated Testing frameworks• Standards based data formats and management• Metadata collection• Process logging• Error detection and monitoring

Page 19: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Information TechnologyOverview

• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations

Page 20: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM IP Principles

1. Integrate best-available academic codes developed and contributed by research community

2. Accept NSF-support and private company gifts to support software development

3. Release as free and open-source software to support scientific transparency and build confidence in results

4. License software in way it can be used by academic and US agencies including USGS.

Page 21: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Open-source license criteria focuses on the availability of the source code and the ability to modify and share it, while free software licenses focuses on the user's freedom to use the program, to modify it, and to share it.

Key Rights and Issues: Apache License v2.0

1. Software distribution must Include license2. Software distribution must Include source code3. No warranty offered4. User agrees to no liability 5. User are granted copyright to software and source code6. Users granted patent license to use software7. Users are not permitted to use any trademarks in distribution without permissions8. Private use is allowed9. Commercial use is allowed10. Redistribution is allowed with licenses intact11. Users is allowed to make modifications12. User must State what changes they made13. User can distribute their modifications under different, including proprietary, licenses14. Users are permitted to link to any other software, that uses different, including proprietary,

licenses

Page 22: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Backup Slides

Page 23: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015
Page 24: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

System Requirement DetailsOur focus will be on system-specific models for time-dependent earthquake forecasting that are comprehensive, physics-based, data calibrated,and prospectively testable.

A model is System-specific if it identifies a physics-based parameter, such as future ground motions, that it seeks to predict, and integrates all relevant physics into the model needed to predict that parameter.

A model is comprehensive if it forecasts ground-motion exceedance probabilities; i.e., the chances that ground motions at any surface site will exceed an adjustable (risk-sensitive) intensity threshold during a forecasting interval. Comprehensive forecasts must thus combine earthquake rupture forecasts with groundmotionpredictions that are conditional on the rupture (Fig. 2)

Physics-based models adhere to the laws of physics and thus automatically approximate many essential physical constraints; they can thereby capture more predictability than strictly empirical models.

A model is Data calibrated if the model is validated against some type of observational data, and can be improved with additional observations.

Page 25: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Key CISM Use CasesBased on CISM Proposal, we identify the following key CISM system use cases:

1. Incorporate into a common software framework the UCERF3 models, one or more rupture simulators developed by SCEC’s Earthquake Simulator technical activity group, and SCEC’s CyberShake ground-motion simulation platform.

2. Integrate UCERF3 and CyberShake into a comprehensive forecasting model, replacing the empirical ground motion prediction equations used in the national maps with a physics-based model derived from simulations of seismic wave excitation and propagation through realistic 3D crustal structures.

3. Couple CyberShake with RSQSim, replacing the empirical UCERF3 model with a physics based rupture simulator that accounts for earthquake nucleation and stress transfer.

4. Use Monte Carlo techniques to incorporate the deterministic, physics-based models into a probabilistic framework that properly accounts for epistemic uncertainties in the models.

Page 26: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Project Personnel

• Executive Director of Science Programs - With the Project PI, responsible for defining and conducting a collaborative scientific process that can identify and address the full range of scientific issues encountered during CISM development, ensuring that all necessary scientific models are integrated into a system that produces testable forecasts.

• Graduate Students - Responsible for in-depth analysis of selected scientific issues related to time-dependent earthquake forecasts that arise during CISM developments under the direction of the Project PI and the Executive Directory of Science Programs.

• Post-Doc - Responsible for integrating existing CISM scientific models into time-dependent earthquake

forecast models and developing software prototypes that can be used as initial implementations of necessary CISM processing systems.

• Phil - Responsible for defining and developing the CISM system and software architecture including the

internal CISM processing and data management capabilities, and CISM external interfaces to observational data sources, external high-performance computer resources, and presentation of results to end-users and stakeholders.

• Software Engineer - Responsible for software implementation of the CISM system architecture, for

implementation of the CISM data interfaces, and for integration of existing and new scientific software into the CISM end-to-end processing systems.

Page 27: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

System Architecture

Page 28: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Modular Processing Architecture

Define Rupture Catalog

Define list of possible earthquakes for region of Interest during period of interest

Assign Rupture Probabilities

Assign a probability to each rupture in catalog during period of interest

Calculate Rupture Ground Motions

Calculate ground motions produced by each rupture in region of interest

Forecast Future Ground Motions

Combine ground motions with probabilities to produce probabilistic ground motion forecast

RSQSim

Long-Period Earthquake Simulations

3D Wave Propagation Simulations

ETAS Probabilities

CyberShake OpenSHA

Combine Amplitudes

into Forecast

Page 29: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CISM Software Combines CSEP and CyberShake Capabilities

29

Page 30: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

30

CyberShake workflows

.

.

.

7,000 jobs 415,000 jobs

Mesh generation

Tensor Workflow

1 job 2 jobs

Post-Processing Workflow

.

.

.

DBInsert

Tensor simulation

Tensor extraction

Tensor extraction

Seismogram synthesis

Seismogram synthesis

Seismogram synthesis

PSA

PSA

PSA

Data Products Workflow

HazardCurve

415,000 jobs

Page 31: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Earthquake

Catalog

Earthquake Catalog

Retrieve

Data

Filter

Catalog

Filtered Earthquake

Catalog

Earthquake

Forecast

Evaluation of Earthquake

Predictions

Forecast

EQs

Evaluate

Forecast

Conceptual CSEP Processing Model For Seismicity Based Forecasts

CSEP Collaboratory

Earthquake

Catalog

Earthquake Catalog

Page 32: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CSEP Software

Retrieve data on a daily basis Prepare data sets Prepare for testing Test Publish results

32

Page 33: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC: An NSF + USGS Research Center

Benefits of Scientific Workflows (from the point of view of an application scientist)

• Conducts a series of computational tasks.– Resources distributed across Internet.

• Chaining (outputs become inputs) replaces manual hand-offs.– Accelerated creation of products.

• Ease of use - gives non-developers access to sophisticated codes.– Avoids need to download-install-learn how to use someone else's code.

• Provides framework to host or assemble community set of applications.– Honors original codes. Allows for heterogeneous coding styles.

• Framework to define common formats or standards when useful.– Promotes exchange of data, products, codes. Community metadata.

• Multi-disciplinary workflows can promote even broader collaborations.– E.g., ground motions fed into simulation of building shaking.

• Certain rules or guidelines make it easier to add a code into a workflow.

Page 34: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC/CME HPC Allocation Growth

Page 35: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC: An NSF + USGS Research Center

SCEC Pursuing Leadership Class Computer Systems

100 TF Systems10’s of Projects

10’s of 10 TF Systems1,000’s of Users

100’s of 1 TF Systems10,000’s of Users

Workstations

Departmental HPC

HPCCenters

GigaFLOPSMillions of Users

Key function of the NSF Supercomputer

Centers:

Provide facilities over and above what can

be found in the typical campus/lab

environment

Sci

enti

fic

Co

mp

uti

ng

Compute (more FLOPS)

Dat

a (m

ore

BY

TE

S)

Home, Lab, Campus, Desktop

TraditionalHPC

environment

Data-oriented Science

and Engineering Environment

Page 36: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CyberShake Estimates

G4 CyberShake PSHA Jordan 1.0Hz CyberShake Hazard map at 1.0Hz 500m/s Min Vs, output 3 components using 10 billion elements, 40k timesteps

AWP-ODC-GPU

300 0.33 99.00 10.00

SGT data: 878.90625

Page 37: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

RSQSim Estimates

Page 38: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015
Page 39: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC: An NSF + USGS Research Center

Intensity-Measure Relationship

List of Supported IMTs

List of Site-Related Ind. Params

IMT, IML(s) Site(s) Rupture

Attenuation Relationship

Waveform-Simulation Based IMR

Two types of IMRs (subclasses)

Figure 3

Here, the possible IMLs for a given Site and Earthquake Rupture are assumed to exhibit a Gaussian distribution.

Thus, in addition to reporting Prob(IMT>IML), this class can also give the predicted mean IML and the standard deviation.

These models are usually constructed by regression of observed IMLs onto some functional form.

Given an arbitrary Site and Earthquake Rupture, a suite of “viable” synthetic seismograms is computed using Pathway 2.

where the range of synthetics reflects uncertainties in the modeling process

The IMLs computed from the suite are then used to compute the probability of exceeding the specified IML.

Page 40: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC/CME OpenSHA Conceptual Model

Seismic Hazard Calculation

IntensityMeasure

Type & Level

(IMT & IML)

Intensity-Measure

Relationship

List of Supported Intensity-Measure Types

List of Site-RelatedIndependent Parameters

Earthquake-RuptureForecast

List of AdjustableParameters

TimeSpan

SiteLocation

List of Site-Related

Parameters

Page 41: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Computing Estimates

80M Core Hours/Year x 0.0280M Total• 10M Year RSQSim 20M• 1Hz CyberShake regional Map 50M• Post-processing Local

Page 42: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Key Databases

• Simulation History• Rupture Lists• Rupture Variation Lists• Seismograms and Amplitudes• Seismic Hazard Curves at different periods• Ground Motion Forecasts

Page 43: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Existing Data Exchange Formats

• OpenSHA: Earthquake Rupture Catalog• RSQSim: Rupture Exchange Format• CyberShake: Standard Rupture Format• Ground Motion Forecast:

Page 44: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Standardized Rupture Description Development

Standardized Rupture Description (Robert Graves) supports exchange of ruptures between Pathways 1 and 2 and between Pathway 2 codes.

1.0PLANE 1 -118.1020 33.9670 46 27 46.00 27.00 289 27 5.00 -10.00 20.25POINTS 1242 -117.8700 33.9049 5.2270 289 27 1.00000e+10

8.5146 1.00000e-01 90 6.51 25 0.00 0 0.00 0 0.00000e+00 8.35041e-01 1.67008e+00 6.81352e-01

6.48907e-01 6.16462e-01 5.84016e-01 5.51571e-01 5.19126e-01 4.86680e-01

4.54235e-01 4.21790e-01 3.89344e-01 3.56899e-01 3.24454e-01 2.92008e-01

2.59563e-01 2.27117e-01 1.94672e-01 1.62227e-01 1.29781e-01 9.73361e-02

6.48907e-02 3.24454e-02 0.00000e+00 -117.8802 33.9078 5.2270 289 27 1.00000e+10

8.3068 1.00000e-01 90 19.61 25 0.00 0 0.00 0 0.00000e+00 8.35041e-01 1.67008e+00 6.81352e-01

6.48907e-01 6.16462e-01 5.84016e-01 5.51571e-01 5.19126e-01 4.86680e-01

4.54235e-01 4.21790e-01 3.89344e-01 3.56899e-01 3.24454e-01 2.92008e-01

2.59563e-01 2.27117e-01 1.94672e-01 1.62227e-01 1.29781e-01 9.73361e-02

6.48907e-02 3.24454e-02 0.00000e+00

Page 45: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

The testing area is separated into cells (grid-based models)

A bin defines a volume (cell), magnitude range, and range of focal mechanism angles for which a forecast is issued

The default binning:

Lon/Lat 0.1°x0.1°Depth 0-30kmMagnitude 0.1Focal Mech. None (30°)

0.1°x0.1° Cells

Page 46: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Identification of Key Interfaces

• UCERF3 -> OpenSHA• OpenSHA -> CyberShake• CyberShake -> OpenSHA• OpenSHA -> CSEP• RSQSim -> OpenSHA• Users -> RSQSim• CISM -> Users• CISM - CSEP

Page 47: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

CSEP Objectives & Design

1. Establish rigorous procedures for registering and evaluating prediction experiments

2. Construct community standards and protocols for comparative testing of predictions

3. Develop an infrastructure that allows groups of researchers to participate in prediction experiments

4. Provide access to authorized data sets and monitoring products for calibrating and testing prediction algorithms

5. Accommodate experiments involving fault systems in different geographic and tectonic environments

Page 48: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC Computational Platform Concept• Computational Platform Concept emerged from the following

observations

– Using Cyberinfrastructure in large scale research quickly identifies which technologies are ready for application, and what are still research.

– A significant portion of the work involved in a large research study is the vertical integration of the Cyberinfrastructure used. It is desirable to preserve this integration once achieved

– Large scale research computing needs geoscientists and computer scientists working together.

Page 49: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

SCEC Computational Platform Concept• Definition of Computational Platform

– A vertically integrated collection of hardware, software, and people that provides a broadly useful research capability

• Implied capabilities– Validated simulation software and geophysical models– Re-usable simulation capabilities– Imports parameters from other systems. Exports results to other

systems– IT/geoscience collaboration involved in operation– Access to High-performance hardware and large scale data and

metadata management.– May use Workflow management tools

Page 50: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Public and Governmental

ForecastsEngineering and interdisciplinary

Research

Collaborative Research Project

Individual Research Project

Computational codes, structural models, and simulation results versioned with associated tests.

Development of new computational, data, and physical models.

Automated retrospective testing of forecast models using community defined validation problems.

Automated prospective performance evaluation of forecast models over time within collaborative

forecast testing center.

Quantitatively Managed

Defined

Managed

Initial Activity

SCEC Computational Research Users

Scientific and Engineering Requirements for Computational Research Systems

Platform Maturity Levels

Page 51: CISM Collaboratory Development Plan Philip J. Maechling Information Technology Architect Southern California Earthquake Center March 11, 2015

Our NSF awards requires “standard open-source license” and have approved the Open Source Initiative (http://opensource.org) requirements as standard. The Open Source Initiative requires distribution terms of open-source software must comply with the following criteria:1. Free RedistributionThe license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.2. Source CodeThe program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.3. Derived WorksThe license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.4. Integrity of The Author's Source CodeThe license may restrict source-code from being distributed in modified form only if the license allows the distribution of "patch files" with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.5. No Discrimination Against Persons or GroupsThe license must not discriminate against any person or group of persons.6. No Discrimination Against Fields of EndeavorThe license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.7. Distribution of LicenseThe rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.8. License Must Not Be Specific to a ProductThe rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.9. License Must Not Restrict Other SoftwareThe license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.10. License Must Be Technology-NeutralNo provision of the license may be predicated on any individual technology or style of interface.