scec/cme project - how earthquake simulations drive middleware requirements philip maechling scec it...

46
SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

Upload: egbert-griffin

Post on 01-Jan-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

SCEC/CME Project - How Earthquake Simulations Drive

Middleware Requirements

Philip Maechling

SCEC IT Architect

24 June 2005

Page 2: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 2

Southern California Southern California

Earthquake CenterEarthquake Center

• Consortium of 15 core institutions and 39 Consortium of 15 core institutions and 39 other participating organizations, founded other participating organizations, founded as an NSF STC in 1991as an NSF STC in 1991

• Co-funded by NSF and USGS under the Co-funded by NSF and USGS under the National Earthquake Hazards Reduction National Earthquake Hazards Reduction Program (NEHRP)Program (NEHRP)

• Mission:Mission:– Gather data on earthquakes in Southern Gather data on earthquakes in Southern

CaliforniaCalifornia

– Integrate information into a comprehensive, Integrate information into a comprehensive, physics-based understanding of physics-based understanding of earthquake phenomena earthquake phenomena

– Communicate understanding to end-users Communicate understanding to end-users and the general public to increase and the general public to increase earthquake awareness and reduce earthquake awareness and reduce earthquake riskearthquake risk

Core InstitutionsUniversity of Southern California (lead)University of Southern California (lead)

California Institute of TechnologyCalifornia Institute of Technology

Columbia UniversityColumbia University

Harvard UniversityHarvard University

Massachusetts Institute of TechnologyMassachusetts Institute of Technology

San Diego State UniversitySan Diego State University

Stanford UniversityStanford University

U.S. Geological Survey (3 offices)U.S. Geological Survey (3 offices)

University of California, Los AngelesUniversity of California, Los Angeles

University of California, San DiegoUniversity of California, San Diego

University of California, Santa BarbaraUniversity of California, Santa Barbara

University of Nevada, RenoUniversity of Nevada, Reno

Participating Institutions39 national and international universities and 39 national and international universities and

research organizationsresearch organizations http://www.scec.orghttp://www.scec.org

Page 3: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 3

Recent Earthquakes In California

Page 4: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 4

Observed Areas of Strong Ground Motion

Page 5: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 5

Simulations Supplement Observed Data

Page 6: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 6

SCEC/CME ProjectSCEC/CME ProjectGoal:Goal: To develop a cyberinfrastructure that can support system-level To develop a cyberinfrastructure that can support system-level earthquake science – earthquake science – the SCEC Community Modeling Environment (CME)the SCEC Community Modeling Environment (CME)

Support:Support: 5-yr project funded by the NSF/ITR program under the CISE 5-yr project funded by the NSF/ITR program under the CISE and Geoscience Directoratesand Geoscience Directorates

Start date:Start date: Oct 1, 2001 Oct 1, 2001

SCEC/ITRProject

NSFCISE GEO

SCECInstitutions

IRIS

USGSISI

SDSCInformationInformation

ScienceScienceEarthEarth

ScienceScience

www.scec.org/cme

Page 7: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 7

SCEC/CME Scientific Workflow ConstructionSCEC/CME Scientific Workflow Construction

A major SCEC/CME objective is the ability to construct and run A major SCEC/CME objective is the ability to construct and run complex scientific workflow for SHAcomplex scientific workflow for SHA

9000 Hazard Curve files (9000 x 0.5 Mb = 4.5Gb)

Extract IMR

Value

Plot Hazard

Map

Lat/Long/Amp (xyz file) with 3000 datapoints (100Kb)

Calculate Hazard Curves

Gridded Region Definition

IMR Definition

ERF Definition

Probability of Exceedence

and IMRDefinition

GMT MapConfigurationParameters

Define Scenario

Earthquake

Pathway 1 example

Page 8: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 8

SCEC/CME Scientific Workflow System

Grid-BasedData Selector

CompositionalAnalysis Tool

(CAT)

DAXGenerator

Pegasus

CondorDAGMAN

PathwayComposition

Tool

GRID

host1host2

Data

Data

CAT KnowledgeBase

SCEC DatatypeDB

MetadataCatalog Service

ReplicaLocationService

Dax

Dag

Rsl

HAZARD MAP

Page 9: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005 9

SCEC/CME SRB-based Digital Library

SCEC Community Library

Select Receiver (Lat/Lon)

OutputTime HistorySeismograms

Select ScenarioFault Model

Source Model

SRB-based Digital Library

– More than 100 Terabytes of tape archive– 4 Terabytes of on-line disk– 5 Terabytes of disk cache for derivations

Page 10: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

June 26, 2006 GRIDS Center Community Workshop 2005

10

Component Library

WorkflowTemplateEditor(CAT)

Workflow Template (WT)

Query for data given metadata

L. Hearn @ UBC

K. Olsen @ SDSU

Execution requirements

I/O data descriptions

COMPONENTS

J. Zechar @ USC (Teamwork: Geo + CS)

DomainOntologyWorkflow

Library

MetadataCatalog

Conceptual Data Query Engine(DataFinder)

DataSelection

D. Okaya @ USC

Query for WT

WorkflowInstance (WI)

WorkflowMapping(Pegasus)

ExecutableWorkflow

Grid information services

Grid

Query for components

INTEGRATED WORKFLOW ARCHITECTURE

Engineer

Tools

Tools

Page 11: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 11

SCEC/CME HPC Allocations• SCEC/CME researchers have need and have access to

significant High Performance Computing capabilities

• TeraGrid Allocations (April 2005 – March 2006)– TG-MCA03S012 (Olsen) 1,020,000 SUs– TG-BCS050002S (Okaya) 145,000 Sus

• USC HPCC Allocations– CME Group Allocations (Maechling) 100,000 SUs– Investigator Allocations (Li, Jordan) 300,000 SUs

• SCEC Cluster– Dedicated Pentium 4 16 Processor Cluster (102 GFlops)

Page 12: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 12

SCEC/CME TeraGrid Support• TeraGrid Strategic Application Collaboration

(SAC) greatly improved our AWM run-time on TeraGrid

• Advanced TeraGrid Support (ATS) for TeraShake 2 and CyberShake simulations

• SDSC Visualization Services support for SCEC simulations.

Page 13: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 13

Three Types of Simulations

• SCEC/CME supports widely varying types of earthquake simulations

• Each Simulation type creates it’s own set of middleware requirements

• Will Describe three examples and comment on their middleware implications and on computational system requirements:– Probabilistic Seismic Hazard Maps– 3D Waveform Propagation Simulations– 3D Waveform-based Intensity Measure Relationship

Page 14: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

(1)Earthquake-Rupture Forecast (ERF)

Probability of all possible fault-rupture events (M≥~5) for region & time span

(2) Intensity-Measure Relationship (IMR)

Gives Prob(IMT≥IML) for a given site and fault-rupture event

Attenuation Relationships(traditional)

(no physics)

Full-Waveform Modeling

(developmental)(more physics)

Ý Ý u i ij, j fi

ij ijpp 2ij

Probabilistic Seismic Hazard Maps

Page 15: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 15

Example Hazard Curve

Site: USC ERF: Frankel-02IMR: Field IMT: Peak VelocityTime Period: 50 Years

Page 16: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 16

Probabilistic Hazard Map Calculations

Page 17: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 17

Characteristic of PSHA Simulations• 10k Independent hazard curve calculations for each map calculations.

– High throughput, not high performance, computing problem.

• 10k resulting files per map– Metadata saved for each file

• Short run times on each calculation– Overhead of starting up job is expensive.

• Would like to offer map calculations as service to SCEC users (who may not have an allocation)

Page 18: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 18

Middleware Implications• High throughput scheduling

– Well Suited to Condor Pool

• Bundling of short run-time jobs will reduce job startup overhead.

• Bundling of jobs useful for clusters execution.

• Metadata tracking with a RDBMS-based catalog system (e.g. Metadata Catalog System (MCS) and Replication Location Service (RLS)

– Databases present installation and operational problems at ever site we request them

• Software support for interpreted language on Computational Clusters– Implemented in an interpreted programming language

• On-demand execution by non-allocated user

Page 19: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 19

3D Wave Propagation Simulations

Page 20: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 20

Characteristics of 3D Wave Propagation Simulations• More physically realistic than existing PSHA but more

computationally expensive.

• High Performance Computing, cluster-based codes

• 4D data calculations (time varying volumetric data)

• Output large volumetric data sets

• Physics limited by resolution of grid. Higher ground motion frequencies require denser grid. Double of density increases storage by factor of 8.

Page 21: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 21

Example: TeraShake Simulation

• Magnitude 7.7 earthquake on southern San Andreas• Mesh of ~2 billion cubes, dx=200 m

• 0.011 sec time step, 20,000 time steps: 3 minute simulation

• Kinematic source (from Denali) from Cajon Creek to Bombay Beach – 60 sec source duration

– 18,886 point sources, each 6,800 time steps in duration

• 240 processors at San Diego SuperComputer Center DataStar

• ~ 20,000 CPU hours, approximately 5 days wall clock

• ~ 50 Tbytes of output

• During execution “on-the-fly” graphics (…attempt aborted!)

• Metadata capture and storage in the SCEC digital library

Page 22: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 22

Domain Decomposition For TeraShake Simulations

Page 23: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 23

Simulations Supplement Observed Data

Page 24: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 24

Peak Velocity NW-SE Rupture SE-NW rupture

Page 25: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

SCEC/CME 25

Montebello: 337 cm/sDowntown: 52 cm/sLong Beach: 48 cm/sSan Diego: 8 cm/sPalm Springs: 36 cm/s

Montebello: 8 cm/sDowntown: 4 cm/sLong Beach: 9 cm/sSan Diego: 6 cm/sPalm Springs: 23 cm/s

SE-NW

NW-SE

Page 26: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 26

Break-down of output

Full volume velocities every 10th time step 43.2Tb

Full surface velocities every time step 1.1Tb

Checkpoints (restarts) every 1,000 steps 3.0Tb

Input files, etc 0.1Tb

Page 27: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 27

Middleware Implications for 3D Wave Propagation Simulations

• Multi-day high performance runs– Check point restart support needed

• Schedule reservations on clusters– Reservations and special queues are often arranged.

• Large file and data movement– TeraByte transfers require high reliably, long term, data transfers

• Ability to stop and restart– Can we move restart from one system to another

• Draining of temporary storage during runs– Storage required for full often exceeds capability of scratch, so output

files must be moved during simulation

Page 28: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 28

Middleware Implications for 3D Wave Propagation Simulations

• On the fly visualization for rapid validation of results– Verify before full simulation is completed

• Standard protocols for data transfers, and metadata registration into SRB-based storage

Page 29: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

(1)Earthquake-Rupture Forecast (ERF)

Probability of all possible fault-rupture events (M≥~5) for region & time span

(2) Intensity-Measure Relationship (IMR)

Gives Prob(IMT≥IML) for a given site and fault-rupture event

Attenuation Relationships(traditional)

(no physics)

Full-Waveform Modeling

(developmental)(more physics)

Ý Ý u i ij, j fi

ij ijpp 2ij

Waveform-based Intensity Measure Relationship (CyberShake)

Page 30: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

Intensity-Measure RelationshipIntensity-Measure Relationship

List of Supported IMTs

List of Site-Related Ind. Params

IMT, IMT, IML(s)IML(s) Site(s)Site(s) RuptureRupture

Prob(IMT IML | Site,Rup)

Attenuation Relationships

Simulation IMRs

exceed. prob. computed using a suite of synthetic seismograms

Vector IMRs

compute joint prob. of exceeding multiple IMTs

(Bazzurro & Cornell, 2002)

Multi-Site IMRs

compute joint prob. of exceeding IML(s) at multiple sites

(e.g., Wesson & Perkins, 2002)

Various IMR types (subclasses)

Gaussian dist. is assumed; mean and std. from various parameters

Page 31: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 31

CyberShake Simulations Push Macro and Micro Computing

• CyberShake requires large forward wave propagation simulations, volumetric data storage

• CyberShake requires 100k seismogram synthesis computations using multi-Terabyte volumetric data sets. During synthesis processing, this data needs to be disk-based.

• 100k of data files, and metadata, files to be managed

• High throughput requirements are driving implementation toward TeraGrid wide computing approach.

• High throughput requirements are driving integration of non-TeraGrid grids with TeraGrid

Page 32: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 32

Example CyberShake Region (200km x 200km)

USC: 34.05,-118.24 minLat=31.889,minLon=-120.60,maxLat=36.1858,maxLon=-115.70

Page 33: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 33

CyberShake Strain Green Tensor AWM

• Large (TeraShake Scale) forward calculations for each site.– SHA typically ignore rupture > 200km from site, so this is

used as cutoff distance.– 20km buffer distance is used around edges of volume to

reduce edge effects– 65km depth to support frequencies of interest– Volume is 440km x 440km x 65km at 200m spacing

• 1.573 Billion mesh pts• Simulation time 240 seconds

– Volumetric Data Saved for 2 horizontal simulations• Estimated Storage per site is: 7 TB (4.5 data 2.5 checkpoint files)

Page 34: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 34

Ruptures in ERF within 200KM of USC

43227 Ruptures in Frankel02 ERF with M 5.0 or larger within 200km of USC

Page 35: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 35

CyberShake Computational Elements

Page 36: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 36

CyberShake Seismogram Synthesis• Requires calculation of 100,000+ seismogram for each site.• Estimate Rupture Variations scale by magnitude:

– Mw 5.0 x 1 = 20,450– Mw 6.0 x 10 = 216,990– Mw 7.0 x 100 = 106,900– Mw 8.0 x 1000 = 9,000

------------------ 353,340 Ruptures

x 2 components

• Current estimated number of seismogram files per site is 43,000 (due to combining components and variations into single file per rupture).

Page 37: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 37

CyberShake Seismogram Synthesis

• Seismogram synthesis stage requires disk-based data storage of large volumetric data sets so tape based archive of volumetric data sets does not work.

• To distribute seismogram synthesis across TeraGrid, we need to either duplicate TB of data, or have global visibility on disks systems

Page 38: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 38

Example Hazard Curve

Site: USC ERF: Frankel-02IMR: Field IMT: Peak VelocityTime Period: 50 Years

Page 39: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 39

Workflows run Using Grid VDS Workflow Tools

Input Data Selector

Compositional Analysis Tool

(CAT)

PegasusCondor

DAGManConcrete Workflow

Results

Workflow Template

Chimera

MontageAbstract Workflow Service

Abstract Workflow

Grid Resourcesjobs

Application-dependent

Application independent

Page 40: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 40

Examples Hazard Map Region (50km x 50km at 2km grid spacing = 625 sites)

OpenSHA SA 1.0 Frankel 2002 ERF and Sadigh with 10% POE in 50 years.

Page 41: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 41

Summary of SCEC Experiences

• As soon as we develop a computational capability, the geophysicists develop application that push the technology.– Compute technology, data management technology,

resource sharing technology all are applied.

• In many ways, IT capabilities required for geophysical problems exceed what is currently possible and limit the state of knowledge in geophysics and public safety.– For example, higher frequency simulations, are of

significant interest, but exceed computational and storage capabilities currently available.

Page 42: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 42

Major Middleware related issues for SCEC/CMESecurity and Allocation Management

• No widely accepted CA makes adding organizations to SCEC grid problematic.

• Ability to run under group allocations for “on demand” requests. (Community Allocation ?)

Page 43: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 43

Major Middleware related issues for SCEC/CMESoftware Installation and Maintenance

• Middleware software stack, even at supercomputer systems, support should include micro jobs support such as Java.

• Database management support for database-oriented tools such as Metadata Catalogs are important (backup, recovery, cleanup, performance, modifications)

• Guidelines for tools in middleware software stack, should describe when local installations are required and when remote installations are acceptable for tools such as RLS and MCS

Page 44: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 44

Major Middleware related issues for SCEC/CMESupercomputing and Storage

• Globally (TeraGrid – wide) visible disk storage

• Well supported, reliable file transfers with monitoring and restart of jobs with problems are essential.

• Interoperability between grid tools and data management tools such as SRB must include data and metadata and metadata search.

Page 45: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 45

Major Middleware related issues for SCEC/CMEScheduling Issues

• Support for Reservation-based scheduling

• Partial run and restart capability

• Failure detection and alerting

Page 46: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements Philip Maechling SCEC IT Architect 24 June 2005

GRIDS Center Community Workshop 2005June 26, 2006 46

Major Middleware related issues for SCEC/CMEUsability Related and Monitoring

• Monitoring tools that include status of available storage resources.

• On-the-fly visualizations for run-time validation of results

• Interfaces to workflow systems are complex, developer oriented interfaces. Easier to user interfaces needed