9-sept-2003cas2003, annecy, france, wfs1 distributed data management at dkrz distributed data...

37
9-Sept-2003 CAS2003, Annecy, France, WFS 1 Distributed Data Management Distributed Data Management at DKRZ at DKRZ Wolfgang Sell Wolfgang Sell Hartmut Fichtel Hartmut Fichtel Deutsches Klimarechenzentrum GmbH Deutsches Klimarechenzentrum GmbH [email protected], [email protected] [email protected], [email protected]

Upload: jasper-potter

Post on 01-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

9-Sept-2003 CAS2003, Annecy, France, WFS 1

Distributed Data ManagementDistributed Data Management

at DKRZat DKRZ

Wolfgang SellWolfgang Sell

Hartmut FichtelHartmut Fichtel

Deutsches Klimarechenzentrum GmbHDeutsches Klimarechenzentrum [email protected], [email protected]@dkrz.de, [email protected]

Wolfgang SellWolfgang Sell

Hartmut FichtelHartmut Fichtel

Deutsches Klimarechenzentrum GmbHDeutsches Klimarechenzentrum [email protected], [email protected]@dkrz.de, [email protected]

9-Sept-2003 CAS2003, Annecy, France, WFS 2

Table of ContentsTable of Contents

• DKRZ - a German HPC Center

• HPC Systemarchitecture suited for Earth System Modeling

• The HLRE Implementation at DKRZ

• Implementing IA64/Linux based Distributed Data Management

• Some Results

• Summary

• DKRZ - a German HPC Center

• HPC Systemarchitecture suited for Earth System Modeling

• The HLRE Implementation at DKRZ

• Implementing IA64/Linux based Distributed Data Management

• Some Results

• Summary

9-Sept-2003 CAS2003, Annecy, France, WFS 3

DKRZ - a German HPCCDKRZ - a German HPCCDKRZ - a German HPCCDKRZ - a German HPCC

• Mission of DKRZ

• DKRZ and its Organization

• DKRZ Services

• Model and Data Services

• Mission of DKRZ

• DKRZ and its Organization

• DKRZ Services

• Model and Data Services

9-Sept-2003 CAS2003, Annecy, France, WFS Page 4

In 1987 DKRZ was founded with the Mission to

• Provide state-of-the-art supercomputing

and data service to the German scientific

community to conduct top of the line Earth

System and Climate Modelling.

• Provide associated services including

high level visualization.

In 1987 DKRZ was founded with the Mission to

• Provide state-of-the-art supercomputing

and data service to the German scientific

community to conduct top of the line Earth

System and Climate Modelling.

• Provide associated services including

high level visualization.

Mission of DKRZMission of DKRZ

9-Sept-2003 CAS2003, Annecy, France, WFS Page 5

Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center

• organised under private law (GmbH) with 4 shareholders

• investments funded by federal government,operations funded by shareholders

• usage 50 % shareholders and 50 % community

Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center

• organised under private law (GmbH) with 4 shareholders

• investments funded by federal government,operations funded by shareholders

• usage 50 % shareholders and 50 % community

DKRZ and its Organization (1)DKRZ and its Organization (1)

9-Sept-2003 CAS2003, Annecy, France, WFS Page 6

DKRZ internal Structure

• 3 departments for

• systems and networks

• visualisation and consulting

• administration

• 20 staff in total

• until restructuring end of 1999 a fourth department supported climate model applications and climate data management

DKRZ internal Structure

• 3 departments for

• systems and networks

• visualisation and consulting

• administration

• 20 staff in total

• until restructuring end of 1999 a fourth department supported climate model applications and climate data management

DKRZ and its Organization (2)DKRZ and its Organization (2)

9-Sept-2003 CAS2003, Annecy, France, WFS Page 7

• operations center: DKRZ

• technical organization of computational ressources(compute-, data- and network-services, infrastructure)

• advanced visualisation• assistance for parallel architectures

(consulting and training)

• operations center: DKRZ

• technical organization of computational ressources(compute-, data- and network-services, infrastructure)

• advanced visualisation• assistance for parallel architectures

(consulting and training)

DKRZ ServicesDKRZ Services

9-Sept-2003 CAS2003, Annecy, France, WFS Page 8

competence center: Model & Data

• professional handling of community models• specific scenario runs• scientific data handling

Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF

competence center: Model & Data

• professional handling of community models• specific scenario runs• scientific data handling

Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF

Model & Data ServicesModel & Data Services

9-Sept-2003 CAS2003, Annecy, France, WFS 9

HPC Systemarchitecture HPC Systemarchitecture suited for Earth System Modeling suited for Earth System ModelingHPC Systemarchitecture HPC Systemarchitecture suited for Earth System Modeling suited for Earth System Modeling

• Principal HPC System Configuration

• Links between Different Services

• The Data Problem

• Principal HPC System Configuration

• Links between Different Services

• The Data Problem

9-Sept-2003 CAS2003, Annecy, France, WFS Page 10

Principal HPC System ConfigurationPrincipal HPC System ConfigurationPrincipal HPC System ConfigurationPrincipal HPC System Configuration

9-Sept-2003 CAS2003, Annecy, France, WFS Page 11

• Functionality and Performance Requirements for Data Service

• Transparent Access to Migrated Data

• High Bandwidth for Data Transfer

• Shared Filesystem

• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile

• Functionality and Performance Requirements for Data Service

• Transparent Access to Migrated Data

• High Bandwidth for Data Transfer

• Shared Filesystem

• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile

Link between Compute Powerand Non-Computing ServicesLink between Compute Powerand Non-Computing Services

9-Sept-2003 CAS2003, Annecy, France, WFS Page 12

Compute server powerCompute server powerCompute server powerCompute server power

Installed compute power (peak)

0,10

1,00

10,00

100,00

1000,00

10000,00

GF

lop

s

Installed compute power (peak)

0,10

1,00

10,00

100,00

1000,00

10000,00

GF

lop

s

9-Sept-2003 CAS2003, Annecy, France, WFS Page 13

Adaptation Problem for Data ServerAdaptation Problem for Data ServerAdaptation Problem for Data ServerAdaptation Problem for Data Server

Dataproblem in HPC

0

500

1.000

1.500

2.000

2.500

3.000

0 50 100 150 200 250 300 350 400 450 500

Effective Compute Power (P) in GFlops

Da

ten

erz

eu

gu

ng

sra

te in

TB

yte

/Ja

hr

data increase:

linear, P1

P3/4

P2/3

9-Sept-2003 CAS2003, Annecy, France, WFS Page 14

• High Bandwidth between the Coupled Servers

• Scalability supported by Operating System

• No Needs for Multiple Copies

• Record Level Access to Data with High Performance

• Minimized Data Transfers

• High Bandwidth between the Coupled Servers

• Scalability supported by Operating System

• No Needs for Multiple Copies

• Record Level Access to Data with High Performance

• Minimized Data Transfers

Pros of Shared Filesystem CouplingPros of Shared Filesystem Coupling

9-Sept-2003 CAS2003, Annecy, France, WFS Page 15

• Proprietary Software needed

• Standardisation still missing

• Limited Number of Vendors whose Systems can be connected

• Proprietary Software needed

• Standardisation still missing

• Limited Number of Vendors whose Systems can be connected

Cons of Shared Filesystem CouplingCons of Shared Filesystem Coupling

9-Sept-2003 CAS2003, Annecy, France, WFS 16

HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ

HöchstLeistungsRechnersystem für die Erdsystem-forschung = HLREHigh Performance Computer System for Earth System Research

• Principal HLRE System Configuration

• HLRE Installation Phases

• IA64/Linux based Data Services

• Final HLRE Configuration

HöchstLeistungsRechnersystem für die Erdsystem-forschung = HLREHigh Performance Computer System for Earth System Research

• Principal HLRE System Configuration

• HLRE Installation Phases

• IA64/Linux based Data Services

• Final HLRE Configuration

9-Sept-2003 CAS2003, Annecy, France, WFS Page 17

Principal HLRE System ConfigurationPrincipal HLRE System ConfigurationPrincipal HLRE System ConfigurationPrincipal HLRE System Configuration

9-Sept-2003 CAS2003, Annecy, France, WFS Page 18

HLRE PhasesHLRE PhasesHLRE PhasesHLRE Phases

Mass Storage Capacit y [Tbytes] >720 >1400 >3400

Date Feb 2002 4Q 2002 3Q 2003

Nodes 8 16 24

CPUs 64 128 192

Expected Sustained Performance [Gflops]

ca. 200 ca. 350 ca. 500

Expected Increase in Thruput compared to CRAY C916

ca. 40 ca. 75 ca. 100

Main Memory [Tbytes] 0.5 1.0 1.5

Disk-Capacit y [Tbytes] ca. 30 ca. 50 ca. 60

9-Sept-2003 CAS2003, Annecy, France, WFS Page 19

DS phase 1: basic structureDS phase 1: basic structure

• CS performance increase • f = 37• F = f3/4 = 15• minimal component

performanceindicated in diagram

• explicit user access• ftp, scp ...• CS disks with local copies• DS disks for cache

• physically distributed DS

• NAS architecture

• CS performance increase • f = 37• F = f3/4 = 15• minimal component

performanceindicated in diagram

• explicit user access• ftp, scp ...• CS disks with local copies• DS disks for cache

• physically distributed DS

• NAS architecture

CS client(s)

DS

other clients

GE

180 MB/s

45 MB/s

150 MB/s375 MB/s

16.5 TB ~ PB

11 TB

9-Sept-2003 CAS2003, Annecy, France, WFS Page 20

Adaptation Option for Data ServerAdaptation Option for Data ServerAdaptation Option for Data ServerAdaptation Option for Data Server

Dataproblem in HPC

0

500

1.000

1.500

2.000

2.500

3.000

0 50 100 150 200 250 300 350 400 450 500

Effective Compute Power (P) in GFlops

Dat

ener

zeu

gu

ng

srat

e in

TB

yte/

Jah

r data increase:

linear, P1

P3/4

P2/3

9-Sept-2003 CAS2003, Annecy, France, WFS Page 21

DS phases 2,3: basic structureDS phases 2,3: basic structure

• CS performance increase • f = 63/100• F = f3/4 = 22.4/31.6• minimal component

performanceindicated in diagram

• implicit user access• local UFS commands

• CS disks with local copies

• shared disks (GFS)

• DS disks for IO buffercache

• Intel/Linux platforms• homogenous HW

• technological challenge

• CS performance increase • f = 63/100• F = f3/4 = 22.4/31.6• minimal component

performanceindicated in diagram

• implicit user access• local UFS commands

• CS disks with local copies

• shared disks (GFS)

• DS disks for IO buffercache

• Intel/Linux platforms• homogenous HW

• technological challenge

CS client(s)

DS

other clients

GE

270/325 MB/s

70/80 MB/s

225/270 MB/s

560/675 MB/s

16.5 TB ~ PB

FC25/30 TB

11 TB

9-Sept-2003 CAS2003, Annecy, France, WFS 22

Implementing IA64/Linux based Implementing IA64/Linux based Distributed Data ManagementDistributed Data ManagementImplementing IA64/Linux based Implementing IA64/Linux based Distributed Data ManagementDistributed Data Management

• Overall Phase 1 Configurations

• Introducing Linux based Distributed HSM

• Introducing Linux based Distributed DBMS

• Final Overall Phase 3 Configuration

• Overall Phase 1 Configurations

• Introducing Linux based Distributed HSM

• Introducing Linux based Distributed DBMS

• Final Overall Phase 3 Configuration

9-Sept-2003 CAS2003, Annecy, France, WFS Page 23

Proposed final phase 3 configurationProposed final phase 3 configurationProposed final phase 3 configurationProposed final phase 3 configuration

HS/MS LAN

GE x 48

x 16

x 25

FE x 2/nodeFor PolestarLite

AsAmA 16wayAsAmA 16wayGFS/Server

UVDM AsAmA 16wayAsAmA 16wayGFS/Server

UVDM

UDSN

AsAmA 4wayAsAmA 4wayGFS/Client

Oracle AsAmA 4wayAsAmA 4way

GFS/ClientOracle

UDSN/UDNL UDSN/UDNL

GFS Disk(Polestar)0.28 x 53=14.8TB

x 36GFS Disk(Polestar)0.28 x 53=14.8TB

x 36

x 8x 8 x 8

x 2 x 2

x 4

x 4

FC x 72

x 8

Disk Cache (Polestar)

0.57TB x 15= 8.5TB

Disk Cache (DDN)

0.69TB x 12= 8.3TB

x 72

Local DiskFC- RAID

0.28TB x20=5.6TB

Local DiskFC- RAID

0.28TB x20=5.6TB

Silkworm 12000

x 20 x 20

x 120

x 32SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6

IXS   24nodes

9940B x 209840B x 09840C x 5

Local Disk(Polestar) 0.14 x 2

= 0.28TB

Local Disk(Polestar) 0.14 x 2

= 0.28TB

x 2 for Local disk

x 2 for Local disk

Fibre channel

GigabitEther

AsAmA 4wayAsAmA 4way

GFS/Client Oracle AsAmA 4wayAsAmA 4way

GFS/ClientOracle

UDSN/UDNL UDSN/UDNL

x 2x 2

x 4x 4

AzusAAzusA 16way16wayGFS/Server

Post processing system UCFM/UDSN

Disk FC x 8Tape FC x 6

Disk FC x 8Tape FC x 6

x 4 x 4 x 4 x 4x 16 x 8

Oracle DB(DDN)

2TB x 4= 8TB

x 8

x 4

Oracle ApplicationServer

SQLNET

Sun 4CPUSun 4CPU

The Internet

AsamA 4CPUAsamA 4CPUSQLNET

Migration upon market availability of components

9-Sept-2003 CAS2003, Annecy, France, WFS 24

Some ResultsSome ResultsSome ResultsSome Results

• Growth of the Data Archive

• Growth of Transferrate

• Observed Transferrates for HLRE

• FLOPS-Rates

• Growth of the Data Archive

• Growth of Transferrate

• Observed Transferrates for HLRE

• FLOPS-Rates

9-Sept-2003 CAS2003, Annecy, France, WFS Page 25

DS archive capacity [TB]DS archive capacity [TB]

0

100

200

300

400

500

600

[TB]

1992

1994

1996

1998

2000

2002

archive capacity

duplicates

original

0

100

200

300

400

500

600

[TB]

1992

1994

1996

1998

2000

2002

archive capacity

duplicates

original

9-Sept-2003 CAS2003, Annecy, France, WFS Page 26

DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)

0

200

400

600

800

1000

[TB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

archive capacity

original

duplicates

0

200

400

600

800

1000

[TB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

archive capacity

original

duplicates

9-Sept-2003 CAS2003, Annecy, France, WFS Page 27

DS transfer rates [GB/day]DS transfer rates [GB/day]

0

500

1000

1500

2000

2500

3000

[GB]

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

daily transfer volume

fetch

store

0

500

1000

1500

2000

2500

3000

[GB]

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

daily transfer volume

fetch

store

9-Sept-2003 CAS2003, Annecy, France, WFS Page 28

DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)

0500

1000150020002500300035004000

[GB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer volume

fetch

store

0500

1000150020002500300035004000

[GB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer volume

fetch

store

9-Sept-2003 CAS2003, Annecy, France, WFS Page 29

DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)

0

5000

10000

[GB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer volume

minimum

average

maximum0

5000

10000

[GB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer volume

minimum

average

maximum

9-Sept-2003 CAS2003, Annecy, France, WFS Page 30

Observed Transferrates for HLREObserved Transferrates for HLREObserved Transferrates for HLREObserved Transferrates for HLRE

Link Single StreamTransferrate [MB/s]

AggregateTransferrate [MB/s]

CS -> DSvia ftp, (12.1 SUPER-UX)

13 100

CS -> DSvia ftp, (12.2 SUPER-UX)

25 200

CS -> local disk,(12.1 SUPER-UX)

40 - 50 > 2.000

CS -> GFS disk,(13.1 SUPER-UX)

Up to 90 3.900

DS -> GFS disk,(Linux)

Up to 80 500 per node

9-Sept-2003 CAS2003, Annecy, France, WFS Page 31

Observed FLOPS-rates for HLREObserved FLOPS-rates for HLREObserved FLOPS-rates for HLREObserved FLOPS-rates for HLRE

• 4 node performance > approx.100 GLFOPS

( about 40 % Efficiency) for

• ECHAM (70-75)

• MOM

• Radar Reflection on Sea Ice

• 24 node performance for Turbulence Code about 470 GFLOPS (30+ % Efficiency)

9-Sept-2003 CAS2003, Annecy, France, WFS 32

SummarySummarySummarySummary

• DKRZ provides Computing Resources for Climate Research in Germany on an competitive international level

• The HLRE System Architecture is suited to cope with a data-intensive Usage Profile

• Shared Filesystems today are operational in Heterogenous System Environments

• Standardisation-Efforts for Shared Filesystems needed

• DKRZ provides Computing Resources for Climate Research in Germany on an competitive international level

• The HLRE System Architecture is suited to cope with a data-intensive Usage Profile

• Shared Filesystems today are operational in Heterogenous System Environments

• Standardisation-Efforts for Shared Filesystems needed

9-Sept-2003 CAS2003, Annecy, France, WFS 33

Thank you for your attention !Thank you for your attention !

9-Sept-2003 CAS2003, Annecy, France, WFS Page 34

010002000300040005000600070008000

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer volume [GB]

repack

client

010002000300040005000600070008000

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer volume [GB]

repack

client

Tape transfer rates (2001-2003)Tape transfer rates (2001-2003)Tape transfer rates (2001-2003)Tape transfer rates (2001-2003)

9-Sept-2003 CAS2003, Annecy, France, WFS Page 35

DS transfer requests (2001-2003)DS transfer requests (2001-2003)DS transfer requests (2001-2003)DS transfer requests (2001-2003)

05000

10000150002000025000300003500040000

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer requests

fetch

store

05000

10000150002000025000300003500040000

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

daily transfer requests

fetch

store

9-Sept-2003 CAS2003, Annecy, France, WFS Page 36

DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)

0

200

400

600

800

1000

[TB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

archive capacity

9940

9840

SD3

VHS

94900

200

400

600

800

1000

[TB]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

archive capacity

9940

9840

SD3

VHS

9490

9-Sept-2003 CAS2003, Annecy, France, WFS Page 37

DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)

0

2

4

6

8

10

12

[million]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

number of files stored

9940

9840

SD3

VHS

94900

2

4

6

8

10

12

[million]

Sep

01

Dez

01

Mrz

02

Jun

02

Sep

02

Dez

02

Mrz

03

Jun

03

number of files stored

9940

9840

SD3

VHS

9490