dkrz german climate computing center

85
DKRZ German Climate Computing Center Stephan Kindermann <[email protected]> Distributed Data Handling Infrastructures in Climatology and “the Grid”

Upload: belva

Post on 12-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

DKRZ German Climate Computing Center. Distributed Data Handling Infrastructures in Climatology and “the Grid”. Stephan Kindermann . Talk Context: From climatology to grid infrastructures. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DKRZ German Climate Computing Center

DKRZGerman Climate Computing Center

Stephan Kindermann <[email protected]>

Distributed Data Handling Infrastructures in Climatology

and “the Grid”

Page 2: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Talk Context: From climatology to grid infrastructures

Climatology: Climatology is the study of climate, scientifically defined as weather conditions averaged over a period of time and is a branch of the atmospheric sciences (Wikipedia)

We concentrate on the part of climatology dealing with complex global climate models and especially on the aspect of data handling:

Climatology Global Climate Models HPC computers (Intro part of talk)

huge amount of model data data handling infrastructure grid

(Main focus of talk)

Page 3: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Grid- infrastructures: From prototypes towards a sustainable infrastructure

Access to distributed heterogeneous data repositories

• A national grid project: C3Grid

• Prototype C3Grid/EGEE integration

An emerging worldwide infrastructure to support intercomparison and management of climate model data

Page 4: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Climate Models and HPC

Page 5: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Intertrans Umzüge wrote:

Motivation: Unprecedented environmental change is indisputable

–  The red areas on these two images show the expansion of seasonal melting of the Greenland ice sheet from 1992 to 2002.– The Yellow line shows the temperature increased by 1ºC from 1900 to 2000

Page 6: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

(One) Question:

Environmental change

because of antropogenic

forcings ?!

Models to understand

earth system needed !!

Page 7: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

"Science may be described as the art of oversimplification: the art of discerning what we may with advantage omit."[Karl Popper, “The Open Universe”, Hutchinson, London (1982)]

But:

The earth system is complex and with many highly coupled subsystems (and often poorly understood coupling effects)

The need for (complex) coupled General Circulation Models (GCMs) requiring tightly coupled HPC ressources

Page 8: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Complex Earth System Models: Components

Page 9: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Atmosphere GCM

Dynamics ECHAM5+PhysicsAerosols HAM(M7)

Ocean+Ice GCM

Dynamics MPI-OM+PhysicsBiogeochem. HAMOCC/DMS

Land model

Hydrology HDVegetation JSBACH

Example: The COSMOS Earth System Model

COSMOS: Community Earth System Model Initiative (http://cosmos.enes.org)

Page 10: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

The complexity of models is increasing

Page 11: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Increasing Complexity, increasing computing demands

Page 12: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Complexity is just one dimension .. !

Disagreement about what terms meanWhat is a model?What is a component?What is a coupler?What is a code base?

Page 13: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Thus the need for dedicated HPC ressources ...

Page 14: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

The DKRZ: A national facility for the climate

community

(providing compute + data services)

Page 15: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

The german climate computing centre: DKRZ

DKRZ in Europe unique as a national service

in its combination of

• HPC

• Data services

• Applications consulting

Non profit organization (gmbH)

with 4 shareholders: MPG (6/11),

HH/UniHH: (3/11), GKSS (1/11), AWI (1/11);

investment costs BMBF (until now)

Hamburg „centre of excellence“

for climate related studies

Page 16: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

A brand new building ..

Page 17: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

• 252x32 IBM System p575 Power6

• 8x 288 port Qlogic 4x DDR IB-Switch

.. for a brand new supercomputer

Power6-Cluster and HPSS movers nodes

connected to the same Infiniband Switches

• Storage Capacity 10 PB / year

• Archive Capacity 60 PB

Transfer Rates (proposed)

• 5 GB/s (peak)

• 3 GB/s (sustained)

Data migration from GPFS to HPSS

Page 18: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Compute power for the next generation of climate model runs ..

Linpack = 115,9 TFLOPS*

252 Nodes = 8064 Cores

76,4% of peak 152 TFLOP

Aggregate transfer rate*

Write: 29 GB/s

Read: 32 GB/s

Single stream transfer rate

Write: 1.3 GB/s

Read: 1.2 GB/s

Metadata operations

10 k/s – 55 k/s

* 12x p575 I/O-Servers

Page 19: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Fine, but …

.. Centralized HPC Centers ..

.. Centralized Data Centers ..

.. And where ist the „Grid“ perspective ??

[Ma:07]

Page 20: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

The Climate Model Data Handling Problem

Modeling Centers produce an exponentionally growing amount of data stored in distributed data centers

Integration of model data and observation data

Page 21: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Expected growth rate for data archive @ DKRZ

We are forced to limit data archiving to ~10 PB/year

Page 22: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Data management for the IPCC Assessment Report

Data Volume 10s of terabytes (1012 bytes) Downloads: ~500GB/day

Models 25 models

Metadata CF-1 + IPCC-specific

User Community Thousands of users WG1, domain knowledge

Data Volume 1-10 petabytes (1015 bytes) Downloads: 10s of TB/day

Models ~35 models Increased resolution More experiments Increased complexity (ex: biogeochemistry)

Metadata CF-1 + IPCC-specific Richer set of search criteria Model configuration Grid specification from CF (support for native grids)

User Community 10s of thousands of users Wider range of user groups will require better descriptions

of data, attention to ease-of-use

AR5AR4

Page 23: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Network Traffic, Climate and Physics Data, and Network Capacity (foil from ESG-CET)

All

Th

ree

Dat

a S

erie

s ar

e N

orm

aliz

ed t

o “

1” a

t Ja

n. 1

990 Ignore the units of the quantities being graphed they are normalized to 1 in 1990, just look at the long-term trends: All of the “ground truth”

measures are growing significantly faster than ESnet projected capacity

Page 24: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Problem to access data stored at distributed data centers all over the world

Move computation to data

Infrastructural (grid) support components needed

Page 25: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Collect & Prepare

Visualize4

Analyse

Find & Select

Distributed Climate Data

Model DataObservation Data

Analysis Dataset

Result Dataset

Scenario data

3

2

Data description

1

A typical scientific workflow

E-infrastructure components needed to support 1,2,3,4:

Data volume

“humidity flux”

workflow example:

Several PB

~3,1TB

(300-500 files)

~10,3GB

(28 files)

~76 MB

~6MB

~66KB

Page 26: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

E-Science Infrastructures for Climate Data Handling

(1) A National Climate Community Grid:

The German Collaborative Climate Community

Data and Processing Grid (C3Grid) Project

Page 27: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

C3Grid Data and Job Management Middleware

D-Grid(SRM, d-cache,..)

D-Grid(SRM, d-cache,..)

C3Grid: Overview

World Data Centers Research Institutes

Climate Mare RSAT PIK GKSSAWI MPI-M

Universities

FU Berlin Uni Köln

Data Access Interface

DWD

ISO Discovery Metadata

Data +

Metadata

WorkflowData +

Metadata

Grid Data / Job Interface

ISO 19139

Discovery

Catalog

Result Data Products + Metadata

C3Grid Data Providers

Collaborative Grid Workspace(A)(B)

?!

IFM-GeomarDKRZ

Portal

C3RC

Page 28: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

(A) Finding data

Description at aggregate level (e.g. experiment)

Aggregate extent description

with multiple verticalExtent sections

Sub-selection in data request

C3Grid metadata description based on ISO 19139

Page 29: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

(A) Finding Data: The C3Grid Portal

Page 30: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

(B) Accessing Data: Portal

Page 31: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

primarydata

primarydata

base data

pre-proc.pre-proc.

ComputeResourceComputeResource

workspace

(B) Accessing Data: Server Side

Generic Data Request

Web Service Interface

Provider specific data access

interface

Analysis

Selection of preprocessing tools

Metadata generation

Implementation Examples:

- DB + Archive Wrapper (DKRZ, M&D)- Data Warehouse (Pangaea)- OGSA-DAI + DB (DWD)- ....

Grid based datamanagementmetadata

netCDF, GRIB, HDF, XML, ..

geographical + vertical + temporal + content + file format selection

Initial Implementation: • WSDL Web Service • next: WSRF Web Service

Page 32: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Workflow Processing

WorkflowSchedulerWorkflowScheduler

RISRIS

ComputeResourceComputeResource

workspace

local resources and interfaces

• GT4 WS-GRAM Interfaces

• Preinstalled SW packages (use of „modules“ system)

• „modules“ info published to Grid Resource Information Service (MDS based)

• Scheduler controls execution (decision based e.g. on modules info + data avalability)

• Initial set of fixed workflows integrated in portal

Open Issues:

• workflow composition support interdependency between processing and data

• user defined processing debugging, substantial user support needed; security !

PortalPortal

JSDL basedworkflow

description

Page 33: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

PortalPortal

DISDIS

primarymetadataprimary

metadataprimary

dataprimary

database data

metadata

pre-proc.pre-proc. workspace

local resources and interfaces

C3Grid

Portal

DMSDMS

Distributed gridinfrastructure

WorkflowSchedulerWorkflowScheduler

RISRIS

ComputeResourceComputeResource

: Interface

Research Institutes

Climate Mare RSAT PIK GKSSAWI MPI-M

Universities

FU Berlin Uni KölnDWDIFM-GeomarDKRZ

C3Grid Data / Compute Providers World Data Centers

Page 34: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

C3Grid Security Infrastructure:

Shibboleth + GSI + VOMS / SAML attributes embedded in grid certificates …

I omit details in this talk ..

Page 35: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

E-Science Infrastructures for Climate Data Handling

(2) Climate data handling in an international Grid infrastructure:The C3Grid / EGEE Prototype

Page 36: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Collect & Prepare

Visualize4

Analyse

Find & Select

AWI, GKSS, …

World Data Centers

Analysis Dataset

Result Dataset

DKRZ

3

2

1

C3Grid: community specific tools and agreements

• Standardized data description

• Uniform data access with preprocessing functionality

• Grid based data delivery

EGEE: Approved international grid infrastructure

• mature middleware

• secure and consistent data management

• established 7-24 support infrastructureC3Grid Middleware

Page 37: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Bridging EGEE and C3

EGEEEGEE

UI

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

LFCCatalog

Web Portal C3

Lucene Index

OAI-PMHserver

Webservice Interface

OAI-PMHserver

AMGAMetadata Catalog

(f) Publish (ISO

19115/19139)

(g) Harvest (OAI-PMH)

German Climate Data Providers:

WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS

DataResource Metadata

(a) Publish (ISO

19115/19139)

(b) Harvest (OAI-PMH)

Page 38: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Finding Data

Page 39: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Accessing Data

EGEEEGEE

UI

DataResource

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

LFCCatalog

Web Portal C3

Lucene Index

Webservice Interface

OAI-PMHserverOAI-PMH

server

AMGAMetadata Catalog

(1) Find & Select

(2) Collect & Prepare

(b) Retrieve (jdbc or archive)

(c) Stage & Provide

Webservice Interface

(a) Request (webservice)

(d) notifyWebservice Interface

(f) Transfer &

Register (lcg-tools)

(e) Request (webservice)

(g) Register

(Java-API)

Metadata

(f) Publish (ISO

19115/19139)

Page 40: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Trigger qflux workflow

EGEEEGEE

UI

DataResource Metadata

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

(3) Analyse

LFCCatalog

(4) Visualize

Web Portal C3

Lucene Index

Webservice Interface

OAI-PMHserverOAI-PMH

server

AMGAMetadata Catalog

Webservice Interface

(b) submit

(glite)

qflux

qflux

(a) Request (webservice)(g)

Harvest (OAI-PMH)

(f) Publish (ISO

19115/19139)

(c) retrieve

(lcg-tools)

(e) Return graphic

(d) Update (Java-

API)

Page 41: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

The context: Climate models and HPC • A national climate research facility: The DKRZ

Climate data handling e-/grid- infrastructures Bridging Heterogenity: Access to distributed data repositories

• A national grid project: C3Grid• Prototype C3Grid/EGEE integration

An emerging infrastructure to support intercomparison and management of climate model data (in the context of CMIP5 and IPCC AR5)

Talk Overview

Page 42: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Motivation (1): Different models, different results

CCMa

ECHAM GFDL

HADCM

Change in mean annual temperature (°C) SRES A2

Page 43: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Motivation (2): Complexity adds uncertainty and new data intercomparison requirements !

Friedlingstein et al., 2006

„Carbon cycle feedbacks are likly to play a critical role in determining the atmospheric concentration of CO2 over the coming centuries (Friedlingstein et al. 2006; Denman et al. 2007; Meehl et al. 2007)” – taken from Climate-Carbon Cycle Feedbacks: The implications for Australian climate policy, Andrew Macintosh and Oliver Woldring, CCLP Working Paper Series

Coupled Carbon Cycle Climate Model Intercomparison Project

Page 44: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

The Climate Model Intercomparison Project (CMIP)

• There are different, highly complex global coupled atmosphere-ocean general circulation models (`climate models‘)

• They provide different results over next decades and longer timescales

Intercomparisons necassary to discover why and where different models give different output or detect ‚consensus‘ aspects.

The World Climate Research Programme`s Working Group on Coupled Modelling (WGCM) proposed and developed CMIP (now in phase 5)

CMIP5 will provide the basis for the next Intergovernmental Panel on Climate Change Assessment (AR5), which is scheduled for publication in 2013

Page 45: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Data management for the IPCC Assessment Report

Data Volume 10s of terabytes (1012 bytes) Downloads: ~500GB/day

Models 25 models

Metadata CF-1 + IPCC-specific

User Community Thousands of users WG1, domain knowledge

Data Volume 1-10 petabytes (1015 bytes) Downloads: 10s of TB/day

Models ~35 models Increased resolution More experiments Increased complexity (ex: biogeochemistry)

Metadata CF-1 + IPCC-specific Richer set of search criteria Model configuration Grid specification from CF (support for native grids)

User Community 10s of thousands of users Wider range of user groups will require better descriptions

of data, attention to ease-of-use

AR5AR4

Page 46: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

An emerging world wide infrastructure for climate model data intercomparison

The scene:

CMIP5 / IPCC AR5 ESG-CET (Earth System Grid – Center for enabling

technologies) IS-ENES and Metafor FP7 programs

Page 47: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

IPCCCore

Gateways

(Tier 1)Data Nodes

GB(BADC)

US(PCMDI)

DE(WDCC)

DKRZ

Data nodes:

• holding data from individual modeling groups

Gateways:

• search, access services to data

• oftenly co-allocated to (big) data nodes

• roadmap: Curator+ESG in US,

Metafor+IS-ENES in Europe

Core Nodes:

• providing CMIP5 defined CORE data (on rotating disks)

• roadmap: several in US,

two in Europe (BADC, WDCC) and one in Japan

The CMIP5 federated architecture:

Federation is a virtual trust relationship among independent management domains that have their own set of services. Users authenticate once to gain access to data across multiple systems and organizations.

Page 48: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

CMIP5:

> 20 modelling centres

> 50 numerical experiments

> 86 simulations (total ensemble members) within experiments

> 6500 years of simulation

> Data to be available from “core-nodes” and “modelling-nodes” in a global federation.

> Users need to find & download datasets, and discriminate between models, and between simulation characteristics.

CMIP5, IPCC-AR5, Timeline:

- Simulations Starting in mid-2009.

- Model and Simulation Documentation needed in 2009 (while models are running).

- Data available: end of 2010

- Scientific Analysis, Paper Submission and Review: early to mid 2012 (current absolute deadline, July).

- Reports: early 2013!

Page 49: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Page 50: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

An emerging world wide infrastructure for climate model data intercomparison

The scene:

CMIP5 / IPCC AR5

ESG-CET (Earth System Grid – Center for enabling technologies)

Page 51: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Architecture of AR5 federation based on ESG

AR5 ESG Gateway (PCMDI)user

registrationsecurityservices

monitoringservices

metadataservices

notificationservices

services startup/shutdown

OPeNDAP/OLFS(aggregation) product server publishing

(harvester)

storagemanagement

backend analysisand vis engine workflow

metricsservices

replica location services

replicamanagement

ESG Node (GFDL)accesscontrol

HTTP/FTP/GFTPservers

metricsservices

publishing(extraction) OPeNDAP/OLFS OPeNDAP/BS

backend analysisand vis engine

monitoringinfo provider

storagemanagement

diskcache

deeparchive

data

onlinedata

ESGnodeESGnode

ESGnodeESGnode

ESGnodeESGnode

ESG Gateway (CCES)ESG Gateway (CCES)

centralizedmetrics services

centralizedmetrics services

ESGnodeESGnode

ESGnodeESGnode

ESGnodeESGnode

ESG Gateway (CCSM)ESG Gateway (CCSM)

centralizedsecurity services

centralizedsecurity services

Analysis ToolAnalysis Toolbrowserbrowser

data provider

service APIservice

implementation

client

Global sevices

ARS mandatorycomponent

FEDERATION

JOIN

ESG-CETarchitecture

Publication GUIPublication GUI

Page 52: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Web based single sign on (SSO):• Authentication based on OpenID

• Authorization based on Attribute Service

• Details omitted in this talk ..

Security infrastructure:

Page 53: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

That`s the basic technology, but …..

… to compare model data we need a common understanding / common language ….

• Metadata definition in the Metafor FP7 project (EU)

(Metafor is cooperating with the US metadata initiative - the Earth System Curator project)

Page 54: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Metafor: Metadata Definition

An activity uses software to produce data to be archived in a repository.

Page 55: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Metafor is also defining a common vocabulary ..

Page 56: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

An emerging world wide infrastructure for climate model data intercomparison

The scene:

CMIP5 / IPCC AR5

ESG-CET (Earth System Grid – Center for enabling technologies)

The Metafor FP7 project

European deployment: The IS-ENES FP7 project

Page 57: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Infrastructure for European Network for Earth System Modelling (IS-ENES)

• IS-ENES will provide a service for models and model results both to modelling groups and to the users of model results, especially the impact community.

•Joint research activities will improve:

– efficient use of high-performance computers

– model evaluation tool sets

– access to model results

– climate services for the impact community.

•Networking activities will

– increase the cohesion of the European ESM community

– advance a coherent European Network for Earth System modelling.

•A 4 year, FP7 project, starting March 2009

•Led by IPSL, 20 partners

Page 58: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

IS-ENES data services

Core data nodes

Large data node

Ancillary data node

Supercomputer

Server cluster

v.E.R.C.: virtual Earth System Resource Centre

Enhancing European data services infrastructure

– OGC service infrastructure

– Access to distributed data and processing resources

– Integration into CMIP5 federation

Page 59: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Summary: Infrastructure building for

climate model data intercomparison

E-Infra

CMIP5 / AR5

Metafor

IS-ENES

ESG-CET

A big problem: climate model data management

A technology provider: .. + „grid“

Common portal + resource sharing

A community vocabulary and a common conceptual model

community nebula community e-infrastructure !?

Page 60: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Is this only for the climate model community ?

What about related communities ?

Page 61: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Climate Impact Community

IPCCCore(Tier 0)

Gateways

(Tier 1)

GB

(BADC)

US

(PCMDI)

DE

(WDCC)

DKRZ

Data Nodes

(Tier 2)

International climate model data federation

IS-ENES Portal

IS-ENES PortalIm

pact Comm

unity Portal

Impact Com

munity Portal

IS-ENES plan:

• OGC interfaces

• Analysis services

• ..

A long way to go towards standardized interfaces / services..

Page 62: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

IPCCCore(Tier 0)

Gateways

(Tier 1)

GB

(BADC)

US

(PCMDI)

DE

(WDCC)

DKRZ

C3Grid data nodes

WDC RSAT

WDC Mare

DWD

….

Data Nodes

(Tier 2)

International climate model data federation (IPCC AR5)

Datenlebenszyklus-verwaltung

Datenlebenszyklus-verwaltung

Workflow-Management

Workflow-Management

Portal

Portal

VirtualWorkspace

VirtualWorkspace

C3Grid Infrastructure

Uni Köln

DKRZ

Climate model data analysis in the (proposed) C3-INAD project

Page 63: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Summary (1): Infrastructure building/using experience

• (Prototype) Grid-Infrastructures

C3Grid (in context of D-Grid), C3Grid/EGEE

heterogeneous data integration, few users by now

• New infrastructure building effort for a highly demanding community problem:

CMIP5/IPCC data federation and associated e-infra initiatives

community specific e-infrastructure components, lots of users, a „must not fail“ project ..!

Page 64: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Summary (2): A social perspective

The term scientific cyberinfrastructure refers to a new research environment.

BUT: a cyberinfrastructure is NOT only technical.

A cyberinfrastructure is also an infrastructure with heterogeneous participants (informatics, domain scientists, technologists, etc.), organizational and political practices and social norms. Therefore developing cyberinfrastructure is a technical and a social endeavor!

[Thanks to Sonja Palfner (TU Darmstadt) for the following foils]

Page 65: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

„Speaking of cyberinfrastructure as a machine to be built or technical system to be designed tends to downplay the importance of social, institutional, organizational, legal, cultural, and other non-technical problems developers always face.“ (Edwards et al. 2007: 7)

Page 66: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Example:

Monitoring, Modeling and Memory: Dynamics of Data and Knowledge in Scientific Cyberinfrastructures (2008-2011)

The project investigates different cases of cyberinfrastructure developments: Long Term Ecological Research Network, the Center for Embedded Networked Sensing, the WATer and Environmental Research Systems Network, and the Earth System Modeling Framework.

Objective: to understand how scientists actually create and share data in practice, and how they use it to create new knowledge.

(www.si.umich.edu/~pne/mmm.htm)

The National Science Foundation (NSF) pays attention to this complexity of cyberinfrastucture developments.

Page 67: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

What can social sciences bring to cyberinfrastructure developments?

Reflection on the social challenges and problems within cyberinfrastuctures.

Making the social, political and cultural dimensions visible.

Understanding the larger national and transnational context of cyberinfrastructures in different scientific cultures.

Analyzing the conditions for successful cyberinfrastucture projects and „best practices“.

Social scientists can „act as honest brokers between designers and users, explaining the contingencies of each to the other and suggesting ways forward“.

(Edwards et al. 2007:34)

Page 68: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Thank You !

Page 69: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Appendix – Additional foils …

Page 70: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Time

LevelVariable

[David Viner, CRU]

• High volume „gridded“ datasets

• self describing („container“) data formats (netcdf, grib, HDF)

Page 71: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Page 72: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

AA

Data access and security

PortalPortalPortal

DISDIS

DMSDMSWorkflowSchedulerWorkflowScheduler

Distributed gridinfrastructure

RISRIS

AA

primarymetadataprimary

metadataprimary

dataprimary

database data

metadata

pre-proc.pre-proc.

ComputeResourceComputeResource

workspace

local resources and interfaces

: Interface

Research Institutes

Climate Mare RSAT PIK GKSSAWI MPI-M

Universities

FU Berlin Uni KölnDWDIFM-GeomarDKRZ

C3Grid Data / Compute Providers World Data CentersAA

• single sign on• support of users without grid certificates• federated identity management

• X509 grid certificates (EU-GridPMA CA) • Grid security infrastructure (GSI)

• legacy AA infrastructure (LDAP, DB based, ..)• legacy data access infrastructure

Page 73: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Identity ProviderHome Organisation

Identity ProviderVirtual Organisation

MyProxyMyProxy

Delegation Service

Grid Service

Grid ServiceGrid

Resource

GRAM / DataRAM

C3Grid Middleware

GridShibSAML tools

wflowclient

SLCS(CA)

SLCS(CA)

X509 Grid-proxy

GridShib for GT policy

Portal

<..SAML Assertions..>

SAML SAML

SAML

SAML

Personal /Group

Account

„Home attributes + VO attributes“

WAYF

Page 74: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Visualize

selected

result

Collect & Prepare a temporal and spatial subset of the data

4

Analyse the integrated, transport of humidity between selected levels

Find & Select relevant & available datasets

Distributed Climate Data

Analysis Dataset

Result Dataset

Wind speed

3

2

1TemperatureSpecific

humidity

I want to control where my job is running !!

Uniform discovery for these data centers nice, but I also need data from ….

I need version xx of yy and …

I want to know exactly what`s happening, e.g. need reproducable results

I don`t want to learn a new job description language, get a certificate to do a simple analysis …

Debugging ???!!!What went wrong ???

Data collection is fine, but I don`t need a „grid“ get my results !!

Page 75: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

ESG-CET

Earth System Grid - Center for Enabling Technologies (ESG-CET)

• Will deliver a federation architecture capable of allowing data held at “nodes” to be visible via “gateways”.

• Support for CMIP5 via “modelling-nodes” and “core-nodes”, with the former holding all the data from one modelling group, and the latter holding the CMIP5 defined “core” data.

• Expect multiple “core-nodes”, with two in Europe (BADC, WDCC), several in the US, and one in Japan.

• Expect multiple gateways (Metafor+IS-ENES in Europe, Curator+ESG in US). ESG being lead by U.S. Program for Climate

Model Diagnosis and Intercomparison (PCMDI)

at Lawrence Livermore National Laboratory

Page 76: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Page 77: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Compare !! What ???

Disagreement about what terms mean

What is a model?

What is a component?

What is a coupler?

What is a code base?

What is a (canonical) dataset (data-aggregate)

What is a model configuration ?

Little or no documentation of the “simulation context” (the whys and wherefores and issues associated with any particular simulation

Need to collect information from modelling groups !!!!

Page 78: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Metafor

Common Metadata for Climate Modelling Digital Repositorieshttp://metaforclimate.eu

SEVENTH FRAMEWORK PROGRAMMEResearch Infrastructures

INFRA-2007-1.2.1 - Scientific Digital Repositories

METAFOR describes activities, software, and data involved in the simulation of climate so that “models” can be discovered and compared between distributed digital repositories

Page 79: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

• “scientific” words end up in controlled vocabularies

• definitions of “other” end up in description

• choices end up in values

.. and METAFOR is Responsible for CMIP5 metadata questionnaire

[from Bryan Lawrence]

Page 80: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Page 81: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

UI

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

LFCCatalog

Web Portal C3

Lucene Index

OAI-PMHserver

Webservice Interface

OAI-PMHserver

AMGA/…Metadata Catalog

Publish (ISO

19115/19139)

Harvest (OAI-PMH)

DataResource Metadata

Publish

(ISO 19115/19139

)

Harvest (OAI-PMH)

Webservice Interface

Download, upload & analysis

incl. republishin

g (webservice)

Download, preprocessing& analysis (webservice)

Nice early prototype, but ..

Community ?? Users ?? ..

Page 82: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

further info:

www.c3grid.de

[email protected]

Page 83: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Pink = core, yellow = tier 1, green = tier 2.

“A Summary of the CMIP5 Experiment Design”

Lead authors: Karl E. Taylor, Ronald J. Stouffer and Gerald A. Meehl.

31 December 2008

The Climate Model Intercomparison Project CMIP5

Page 84: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

Location

Various data centers & portals

Institutional storage & computing facilities

local facilities

Personal Computer

Visualize

selected

result

A concrete example: “qflux”

Collect & Prepare a temporal and spatial subset of the data

4

Analyse the integrated, transport of humidity between selected levels

Find & Select relevant & available datasets

Distributed Climate Data

Analysis Dataset

Result Dataset

Wind speed

3

2

1TemperatureSpecific

humidity

Datavolume

Several PB

~3,1TB

(300-500 files)

~10,3GB

(28 files)

~76 MB

~6MB

~66KB

Page 85: DKRZ German Climate Computing Center

ECSAC09 Stephan Kindermann / DKRZ Veli Losinj, August 2009

A common metadata description

(Simplified) System Overview

Graphical User

Interface

Metadata

Simulation Datasets

Models

??Query

Model Run Output

…………………………………………

Model Metadata