a grid approach to geographically distributed data analysis for virgo

37
A Grid Approach to Geographically Distributed Data Analysis for Virgo F. Barone, M. de Rosa, R. De Rosa, R. Esposito, P. Mastroserio, L. Milano, F. Taurino, G.Tortone INFN Napoli Università di Napoli “Federico II” Università di Salerno L. Brocco, S. Frasca, C. Palomba , F. Ricci INFN Roma1 Università di Roma “La Sapienza” GWADW 2002 – Isola d’Elba (Italy) – May 19-26 2002

Upload: kishi

Post on 02-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

A Grid Approach to Geographically Distributed Data Analysis for Virgo. F. Barone, M. de Rosa, R. De Rosa, R. Esposito, P. Mastroserio, L. Milano, F. Taurino, G.Tortone INFN Napoli Università di Napoli “Federico II” Università di Salerno L. Brocco, S. Frasca, C. Palomba , F. Ricci - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Grid Approach to Geographically Distributed Data Analysis for Virgo

A Grid Approach to Geographically Distributed Data Analysis for Virgo

F. Barone, M. de Rosa, R. De Rosa, R. Esposito, P. Mastroserio, L. Milano, F. Taurino, G.Tortone

INFN NapoliUniversità di Napoli “Federico II”

Università di Salerno

L. Brocco, S. Frasca, C. Palomba, F. RicciINFN Roma1

Università di Roma “La Sapienza”

GWADW 2002 – Isola d’Elba (Italy) – May 19-26 2002

Page 2: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Outline scientific goals and requirements basic concepts of GRID what the Grid offers layout of VIRGO Virtual Organisation application to gravitational waves data analysis conclusions

Page 3: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Scientific goals and requirements the coalescing binaries and periodic sources analysis

needs large computing power ~ 300 Gflops for coalescing binaries search ~ 1000 Gflops for periodic sources search

computational grids allows to use computing resources

available in different laboratories/institutions

Page 4: A Grid Approach to Geographically Distributed Data Analysis for Virgo

GRID: a definition

GRID:

an infrastructure to allow the sharing and coordinated use of resources within large, dynamic and multi-institutionals communities;

Page 5: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Basic resources of DataGrid Middleware DataGrid is an European Community project (3 years) to

develop Grid Middleware and testbed infrastructure on European scale;

need to execute a program Computing Element (CE)

need to access data Storage Element (SE)

need to move data network

Page 6: A Grid Approach to Geographically Distributed Data Analysis for Virgo

GRID resource that provides CPU cycles

Examples:• clusters of PCs• supercomputers• ...

Computing Element (CE)

Page 7: A Grid Approach to Geographically Distributed Data Analysis for Virgo

GRID resource that provides disk space to store files

Examples:• simple disks pool• big Mass Storage System• ...

Data is accessible to all processes running on CEs via multiple protocols

Storage Element (SE)

Page 8: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Grid resource

A Grid resource provides a standard interface (protocol and API) that is common to that type of resource:

all CEs talk the same protocol (CE protocol) independently of the underlying batch system;

all SEs talk the same protocol (SE protocol) independently of the underlying Mass Storage System

Page 9: A Grid Approach to Geographically Distributed Data Analysis for Virgo

What the Grid offers independence from execution location

the user doesn’t want to know where a job will run (what CE)

independence from data location the user doesn’t want to know where is data (what SE);

security authentication, authorization;

Page 10: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Independence from execution location

Page 11: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Workload Management System

Resource Broker (RB)a Resource Broker tries to find a good match between the job requirements and preferences and the available resources, in particular CEs

Job Submission Service (JSS)the Job Submission Service then guarantees a reliable job submission and monitoring

Page 12: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Scheduling criteria1. authorization information2. data availability3. job requirements4. job preferences5. accounting

Page 13: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Monitoring/Information System The Resource Broker needs some information:

what are available resources ? what is their status ?

The Resource Broker query the Monitoring Information System to locate producers (CE, SE,...) and then obtain data directly from producers;

Page 14: A Grid Approach to Geographically Distributed Data Analysis for Virgo

status update “pushed” on MIS

data obtained from CE

Page 15: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Logging and bookkeeping

The LB service is a database of events concerning jobs and the other service of Workload Management System (RB and JSS)

provides status info for jobs; designed to be highly reliable and available;

Page 16: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Independence from data location

Page 17: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Replica Catalogue (RC) With Replica Catalogue the same file (master) can

exists in multiple copies (replicas) LFN – Logical File Name: name for a set of replicas

example: lfn://virgo.org/virgofile-1.dat PFN – Physical File Name: location of a replica

example: pfn://virgo-se.na.infn.it/virgo/virgofile-1.dat

it’s up to RB to translate LFN in PFN

to locate the SE “closed” to a CE

Page 18: A Grid Approach to Geographically Distributed Data Analysis for Virgo

GridFtp

GridFtp is an efficient data transfer protocol Features:

GSI security; multiple data channels for parallel transfers; partial file transfers; third-party (direct server-to-server) transfers; interrupted transfer recovery;

Page 19: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Napoli - Bologna file transfer performance for different file size and number of sockets using GSIFTP

0

5

10

15

20

25

30

1 2 4 8 16 32 64

number of sockets used

ba

nd

wit

h (

Mb

it/s

)

1 MB 10 MB 50 MB 100MB 500 MB

saturation of lowest bandwith

INFN Napoli – 34 Mbit/s

CNAF Bologna – 98 Mbit/s

GridFTP tests period

“standard FTP” average bandwith

Page 20: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Grid Approach to Geographically Distributed Data Analysis for Virgo

Page 21: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Layout of VIRGO Virtual Organisation

Computing Element

Worker Node 1

Worker Node 3

Storage Element

Worker Node 2

CNAF-BolognaCNAF-Bologna

Resource Broker

Information Index

Replica Catalogue

Computing Element

Worker Node 1

Worker Node 2

User Interface

INFN Roma1INFN Roma1

Computing Element

Worker Node 1

Worker Node 2

User Interface

INFN NapoliINFN Napoli

GARR

E0 run

Storage Element

Storage Element

Page 22: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Computing Element

Worker Node 1

Worker Node 2

Worker Node 3

Storage Element

Computing Element

Worker Node 1

Worker Node 1

Computing Element

Worker Node 1

Worker Node 1

Job submission mechanismUser Interface

PBS

ResourceResource BrokerBroker I II I

IS

IS

OS

OS

Page 23: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Job submission mechanism The general scheme for distributed computation is the following:

multiple jobs submission from the Rome UI;

the Resource Broker interrogates the Information Index and submit each job to an available WN; the Input Data file is staged from the SE on the WN;

the output is sent back to the UI or published on SE;

the Resource Broker automatically distributes the jobs among the nodes (according to specifications in the JDL file) unless we decide to tie a given job to a particular node;

job scheduling at the node level is done via PBS.

Page 24: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Grid tests for coalescing binaries search 1/2

Algorithm: standard matched filters Templates generated at PN order 2 with Taylor approximants Data

VIRGO E0 run start GPS time: 685112730 data length: 600 s

 Conditions raw data resampled at 2 kHz lower frequency: 60 Hz upper frequency: 1 kHz search space: 2 – 10 solar masses minimal match: 0.97

number of templates: ~ 40000

Page 25: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Grid tests for coalescing binaries search 2/2

Step 1The data were extracted from CNAF-Bologna Mass Storage System. The extraction process reads the VIRGO standard frame format, performs a simple resampling and publishes the selected data file on the Storage Element;

Step 2The search was performed dividing the template space in 200 subspace and submitting from Napoli User Interface a job for each template subspace.Each job reads the selected data file from the Storage Element (located at CNAF-Bologna) and runs on the Worker Nodes selected by Resource Broker in the VIRGO VO.Finally, the output data of each job were retrieved from Napoli User Interface.

Page 26: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Grid tests for periodic sources search

The analysis for periodic sources search is based on a hierarchical approach in which coherent steps, based on FFTs and incoherent ones, based on the Hough Transform, alternates. At each iteration a more refined analysis is done on the selected candidates.

This procedure fits very well in a geographically distributed computational scheme.

The whole problem can be divided in a number of independent smaller tasks, each performed by a given computational node. E.g. each node can analyze a frequency band and/or a portion of the sky.

We have performed some preliminary test to evaluate the DataGrid software with respect to our analysis problem.

For the GRID tests we have used the code for the Hough Transform. The source spin-down is not taken into account. The input of the code is given by a “peak map” in the time-frequency plane.

Page 27: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Grid tests for periodic sources search 1/2

The tests consists of two phases:

1. Production of input data on the SE;

2. Distributed computation.

We start from raw data of engineering run E1 (~ 5 hours) and the steps are the following:

channel extraction;

decimation at 1 kHz;

generation of periodograms by computing interlaced and windowed FFT (T_FFT=4194.304 s);

peaks selection (above two times the average noise);

The produced time-frequency peaks map covers 20 Hz in frequency (from 480 to 500 Hz).

Page 28: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Grid tests for periodic sources search 2/2

Each computing node processes a subset of the whole frequency band. Each job runs according to this scheme:

reads its initial reference frequency and the velocity vector direction;

migrates on a worker node;

takes from the SE the input data corresponding to the frequency band associated to that job;

calculates the current frequency band of interest, i.e the Doppler band;

calculates the Hough Transform;

iterates on the reference frequency until the full band has been processed.

The output of each job would be a set of candidates which will be followed in the next coherent phase.

Page 29: A Grid Approach to Geographically Distributed Data Analysis for Virgo

Conclusions we have successfully verified that multiple jobs can be

submitted and the output retrieved with small overhead time;

computational grids seems very suitable to perform data analysis for coalescing binaries and periodic sources searches;

Future plans

testing MPI-job submission for coalescing binaries search (feature provided in next DataGrid release);

testing the whole data analysis chain for periodic sources search;

first tests for network analysis among interferometers;

Page 30: A Grid Approach to Geographically Distributed Data Analysis for Virgo
Page 31: A Grid Approach to Geographically Distributed Data Analysis for Virgo
Page 32: A Grid Approach to Geographically Distributed Data Analysis for Virgo
Page 33: A Grid Approach to Geographically Distributed Data Analysis for Virgo
Page 34: A Grid Approach to Geographically Distributed Data Analysis for Virgo
Page 35: A Grid Approach to Geographically Distributed Data Analysis for Virgo
Page 36: A Grid Approach to Geographically Distributed Data Analysis for Virgo
Page 37: A Grid Approach to Geographically Distributed Data Analysis for Virgo