a tale of two gridsgrand goal using cyberinfrastructure: • federate geographically distributed...

20
1 A Tale of Two Grids April 2004 Dr. Philip Papadopoulos Program Director, Grid and Cluster Computing San Diego Supercomputer Center University of California, San Diego [email protected] http://www.pragma-grid.net http://www.nbirn.net Two Very Successful Grids PRAGMA – Pacific Rim Applications and Grid Middleware Assembly Focused on making grid infrastructure usable by scientists Cooperating administrators BIRN – Biomedical Informatics Research Network Focused on neuro-imaging scientists as a test bed Highly prescribed software infrastructure

Upload: others

Post on 08-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

1

A Tale of Two Grids

April 2004

Dr. Philip PapadopoulosProgram Director, Grid and Cluster Computing

San Diego Supercomputer CenterUniversity of California, San Diego

[email protected]

http://www.pragma-grid.net http://www.nbirn.net

Two Very Successful Grids

• PRAGMA – Pacific Rim Applications and Grid Middleware Assembly– Focused on making grid infrastructure usable by

scientists– Cooperating administrators

• BIRN – Biomedical Informatics Research Network– Focused on neuro-imaging scientists as a test bed– Highly prescribed software infrastructure

Page 2: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

2

Agenda

• Overview of PRAGMA• Overview of BIRN• High-level comparisons• What have we learned?

PRAGMA Founding Motivations

• The grid is transforming computing and collaboration

• The grid is too difficult to use

• Middleware software needs to interoperate

• Science is an intrinsically global activityIVOA

Page 3: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

3

PRAGMA PARTNERS

Affiliate Member

Key GoalsEstablish sustained collaborations

andAdvance the use of the grid technologies for

applications among a community of investigators working

with leading institutions around the Pacific Rim

PRAGMA

Working closely with established activities that promote grid activities or the underlying infrastructure,

both in the Pacific Rim and globally.

Page 4: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

4

Series of Meetings

• PRAGMA 4: 4-5 June 2003, Melbourne, Australia– ICCS2003: 3-4 June– David Abramson (APAC): Chair; Co-chair: Fang-Pang Lin

(NCHC)

• PRAGMA 5: 22-23 October 2003, Hsinchu/Fushan, Taiwan– Fang-Pang Lin (NCHC): Chair; Co-chair: Kai Nan (CNIC)

• PRAGMA 6: 16 – 18 May 2004, Beijing, China– Baoping Yan (CNIC): Chair; Co-chairs: Mason Katz

(UCSD), Jim Williams (TransPAC) • PRAGMA 7: 15-17 September 2004, San Diego,

USA– Chairs: Mason Katz (UCSD), Jim Williams (TransPAC)

PRAGMA Success Stories• Grid Community Pulls together to Battle SARS

• Merging Grid Technology and Computational Chemistry• Telescience Marshals Rich Network of Technologies at iGRID2002

• Grid Demo Sets US to Japan Data Speed Records• EcoGrids• Encyclopedia of Life

Page 5: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

5

NCHC SARS Task Force

http://antisars.nchc.gov.tw/

Developers at the NCHC Access Grid node test the SARS Grid network links ¤T

¤ë5

¤T¤ë8

¤T¤ë1

¤T¤ë1

¤T¤ë1

¤T¤ë2

¤T¤ë2

¤T¤ë2

¤T¤ë2

¥|¤ë1

¥|¤ë4

¥|¤ë7

¥|¤ë1

¥|¤ë1

¥|¤ë1

¥|¤ë1

¥|¤ë2

¥|¤ë2

¥|¤ë2

¤-¤ë1

¤-¤ë4

¤-¤ë7

¤-¤ë1

¤-¤ë1

¤-¤ë1

¤-¤ë1

¤-¤ë2

¤-¤ë2

¤-¤ë2

¤-¤ë3

¤»¤ë3

0

10

20

30

40

50

60

70

80

90

100

SuspectedProbable

Date of onset

No.

of c

ases

PRAGMA 4 Program Committee / request for help

(16, May, Fri)

Suggest in PRAGMA 4 Draft Agenda to

help SARS relief…(15, Apr)

1st Hospital Outbreak (Taipei Municipal Ho-Pin Hospital)

Chang Gung Hospital outbreak (South)

PRAGMA 4

2 AG nodes + H.323 X-ray image interface + medical information + high speed network

2 AG nodes + …

14, May:SARS AG\Task Force

Source: Fang-Pang Lin

http://antisars.nchc.gov.tw/ Source: Fang-Pang Lin

Using Grids to Battle SARS

Page 6: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

6

TelescienceTelescience/BIRN Portal was Quickly Adapted to a SARS Portal for Taiwan/BIRN Portal was Quickly Adapted to a SARS Portal for Taiwan

GRID TECHNOLOGIESPortals

MiddlewareGraphics

Computational Chemistry EngineData Analysis Tools

Hardware

ENABLING NEW SCIENCE

Exploiting grid technology & hybrid computational methods

PARAMETER SEARCH4 variables

15,876 pointsRefineable hypersurface

ICCS’03 PRAGMA 4

Monash, Australia HPCC, JapanCRAY, Japan SDSC, USAUCSD, USA CPE, Thailand

KISTI, Korea

GEOGRAPHIC DISTRIBUTIONOF JOBS DURING EXECUTION

Source: Wibke Sudholt, Kim Baldridge, David Abramson, Colin Enticott, Slavisa Garic

GAMESS and Nimrod/G

Page 7: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

7

Demonstrate advanced features of the Telescience Portal:1. Perform Telemicroscopy controlling the IVEM at NCMIR

• Digital Video is encapsulated in IPv6 and transmitted at 30fps over native IPv6 networks (SDSC, Abeline, SURFnet) between San Diego and Amsterdam

2. Data will be computed with heterogeneous, distributed resources within NCMIR, NPACI, NCHC and Osaka University3. Render and visualize data in Amsterdam using distributed resources in NCHC

PRAGMA Telescience at iGRID 2002

Courtesy: Abel Lin, Steve Peltier, Mark Ellisman, Shinji Shimojo, Toyokazu Akiyama, Fang-Pang Lin

File replication performance between Japan and US (total)

National Institute of Advanced Industrial Science and TechnologySource: Osamu Tatebe

Stable transfer rate of 3.79Gbpsout of theoretical peak 3.9 Gbps (97%)

using 11 node pairs (MTU 6000B) 1.5TB data was transferred in an hour

Participants:Maffin, APAN, NII, Abilene, Tsukuka, SuperSINET, Force10 Networks, PRAGMA, APGrid, SDSC, TransPAC/Indiana U, Kasetsart U

Astronomical Object SurveySubaru Tele-Scope [AIST]

Lattice QCD[CCP, U Tsukuba]

Page 8: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

3D Visualization System (by NCHC)Client

(AIST)

WeatherPortal

Ninf-G

……

Results on the WebCourtesy Yoshio Tanaka

22 clusters from 21 institutions and 10 countries (total 853 cpus)

ROCKS Roll

TeraGrid

Enables Large Scale Computing Across Supercomputers on the Grid

: GridRPC System based on the GlobusToolkit

MOE

NPUST

NDHU

NCHC-HQ1

2

3

45

6

7

NCHC-CENTRAL

NCHC-SOUTH

EcoGrid: Fushan

Liberty Time 2003 March 09United Daily 2003 March 09

Page 9: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

9

LTER-ANDCorvallis, OR

SDSCLa Jolla, CA

LTER-VCRCharlottesville, VA

CNIC / CASBeijing, China NARC

Tsukuba, Japan

NCHCHsinchu, Taiwan

SOAP / XMLSOAP / XML

JDBC

JDBC / EML

JDBC

JDBC

- SOAP Servers where web services are deployed - Database Servers where data sources are hosted

HTTP

- Sensor Data from web cam deployed at fields

Sensors in North Temperate Lakes: Trout

Lake, Allequash Lake, Big Muskellunge, Sparking Lake, Crystal Lake etc.

I2G Web Services Infrastructure and Sensor-based Lake Monitoring and Analysis

Understanding Impacts of Episodic Events in Lake Metabolism

Sensorsin Yuan

Yang Lake

LTER-NTLMadison, WI

PRAGMA Testbed DetailsResources/Middleware Working Group

• Bottom-up approach on hardware– Most systems are linux-based, but other types are available– More than 240 nodes across 15 (or more) sites– Work with APGrid (http://www.apgrid.org)

• Agreement to Participate:– Minimum software requirement to join (e.g. Globus 2.2)– Need to exchange certificates with all other sites

• Technical and Policy Issues– Compatibility of basic middleware (versions of Globus)– What other software to have at all sites

• Using Software that others have developed• Eg. PRAGMA Cluster at SDSC Runs Rocks (SDSC) and SCE (Thailand)

• Challenges– Run applications on grid on routine basis (only a few at first)– Capture rough measure of usage (international resource)– What does it mean to dedicate a resource to an international group (with national funding

supporting the resource)?

Page 10: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

10

BIRN is Team Science BIRN is Team Science Applied to Stretch GoalsApplied to Stretch GoalsA Big Challenge or Vision:A Big Challenge or Vision:

““Enable new understanding of Enable new understanding of the healthy and diseased brain the healthy and diseased brain by linking data about by linking data about macroscopic brain function to macroscopic brain function to its molecular and cellular its molecular and cellular underpinningsunderpinnings””Taking practical steps toward a Taking practical steps toward a grand goal using cyberinfrastructure:grand goal using cyberinfrastructure:

•• Federate geographically Federate geographically distributed brain data of the same distributed brain data of the same & different types& different types

•• Accommodate requirements to Accommodate requirements to collaboratively interact with collaboratively interact with shared databases of largeshared databases of large--scale scale data, share methods, and data, share methods, and computational resourcescomputational resources

Scales of NS data from Maryann MartoneScales of NS data from Maryann Martone

IT Infrastructure to hasten the derivation of new understanding and treatment of disease through use of distributed knowledge

IT Infrastructure to hasten the derivation of new understanding and treatment of disease through use of distributed knowledge

BIRN Network

Page 11: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

11

BIRN Today is …

• Three neuroscience test beds building on research projects:- Mouse BIRN- Morph BIRN- Functional BIRN - BIRN Coordinating Center

• Integrating the activities of the advanced biomedical imaging and clinical research centers in the US.

• Developing hardware and software infrastructure for managing distributed data: creation of data grids.

• Exploring data using “intelligent” query engines that can make inferences upon locating “interesting” data.

• Building bridges across tools and data formats.• Changing the use pattern for research data from the individual

laboratory/project to shared use

BIRN Project CoordinationBIRN Project Coordination

Internet 2

SiSi SiSi

Functional Imaging BIRN Test-bed

Human Morphometry BIRN Test-bed

Mouse BIRN Test-bed

BIRN Coordinating

Center

The BIRNThe BIRN--CC leadsCC leads……•• the deployment and maintenance the deployment and maintenance of a network infrastructure capable of a network infrastructure capable of quickly moving large amounts of quickly moving large amounts ofof data between BIRN sites across data between BIRN sites across the country. the country.

•• the creation of a federation of the creation of a federation of databases pertaining to the BIRN databases pertaining to the BIRN scientific projects. scientific projects.

•• the development and integration the development and integration of software to refine, combine, of software to refine, combine, compare, and analyze complex compare, and analyze complex biomedical data. biomedical data. •• and cultivates group and cultivates group activities to overcome activities to overcome cultural barriers to building cultural barriers to building a forum for collaborative a forum for collaborative research,research, coco--authoring authoring research papers, and research papers, and sharing sharing methods/tools/codes across methods/tools/codes across institutions. institutions.

Page 12: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

12

Each BIRN Site Has Standard Hardware• Controlled Software and Hardware

configuration• Software managed from the BIRN

Coordinating Center• OS and BIRN tool integration

enabled by Rocks Cluster management

• Software Stack Components– Globus– Storage Resource Broker– Test bed application tools– Portal Technologies– Oracle Database– Data Mediation SW

BIRN Forms a Virtual Data GridBIRN Forms a Virtual Data Grid• Defines a Distributed Data Handling System• Integrates Storage Resources in the BIRN network• Integrates Access to Data, to Computational and

Visualization Resources

• Acts as a Virtual Platform for Knowledge-based Data Integration Activities

• Provides a Uniform Interface to Users

Page 13: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

13

Function BIRN: Integrated Data QueryFunction BIRN: Integrated Data Query

fMRI

Are chronic, but not first-onset patients, associated with superior temporal gyrus dysfunction (MMN)?

IntegratedView

Receptor Density ERP

Web

PubMed,Expasy

Wrapper

WrapperWrapper

Wrapper

Structure

Wrapper

Clinical

Wrapper

MediatorMediator

0.150.18

0.140.11

-0.14-0.10-0.06-0.020.020.060.100.140.180.220.260.30

ARIP - 20MGARIP - 30MGRISP - 06MG PLACEBOTreatment Group

Function BIRN: Federated Imaging DatabasesFunction BIRN: Federated Imaging DatabasesCalibration, Integration from ½ dozen sites

Page 14: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

14

• Overall Goal:Develop capability to analyze and mine data acquired at multiple sites using processing and visualization tools developed at multiple sites

• Context: – Human Brain MR Based Morphometry

• Initial Application:–Alzheimer’s, Depression, Aging Brain

• Participants: –BWH, MGH, Duke, UC Los Angeles, UC San Diego, Johns Hopkins, UC Irvine, Washington University

Morphometry BIRN

Multi-site Structural MRI Data Acquisition & Calibration

Methods: common acquisition protocol, distortion correction, evaluation by scanning human phantoms multiple times at all sites

•GH (NMR): J. Jovicich, A. Dale,

D. Greve, E. Haley •WH (SPL): S. Pieper•CI: D. Keator•CSD (fMRI): G. Brown •uke University (NIRL): J. MacFall

CorrectedUncorrected

Image intensity variability onsame subject scanned at 4 sites

Morphometry BIRN: Solving Issues in Distributed Data Acquisition

Accomplishment: develop acquisition & calibration protocols that improve reproducibility, within- and across-sites

Page 15: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

15

MIRIAD Project: Improving throughput

Segmentation Duke BIRN-MIRIADItem (semi-automated) (fully-automated)

# of tissue classes 3 (Fig1) 23 (Fig2)Time for 200 brains 400 hours 1 hourTime for 200 lobe & 250 hours all lobes (Fig3) and 27 regional analysis

regions included above

Improved computational capabilities

1 2 3

BIRN Portal: Launches Scientific Workflow

1. User Login In BIRN Portal, selects data and LONI settings2. LONI Pipeline is launched from Portal3. Results are automatically displayed in Slicer 3D

Page 16: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

16

Mouse BIRN: Multiscale Data Mediation

1. Create databases at each site

2. Create conceptual links to a shared

ontology

3. Situate the data in a common spatial

framework

4. Use mediator to navigate and query across data sources

1) Established a data sharing infrastructure using the BIRN for multiscaleinvestigations of animal models of human neurological disease• Shared file collections using the Storage Resource Broker

• Developed common specimen preparation protocols

• Developed a set of shared analysis and visualization tools working through the BIRN portal

2) Developed a database federation as a data sharing mechanism and a persistent data archive• Established independent databases at each site and populated them with

mouse imaging data

• Mapped data to shared knowledge sources like the UMLS and atlas coordinate systems

• Created a virtual data federation through semantic and spatial mediation tools

Accomplishments of Mouse BIRN

Page 17: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

17

Purkinje neuron

Registering My DataUMLS

Spatial RegistrationSpatial Registration

Human-Mouse Data Integration

Query Atlas (3D Slicer)

-Alex Joyner, Steve Pieper, Greg Brown, Nicole Aucoin

Page 18: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

18

Key Systems Challenges• Large-scale data is distributed on a National Scale

– How do you easily locate what you want?– How do you translate it to what your SW tools understand?– Where do you analyze it? – How do you move it efficiently? – How do you secure it to properly limit and log access?

• The underlying software systems are complex– How effectively can this complexity be hidden?

• Software technology continually evolves and BIRN must adapt

• Goal: provide a systems “cookie-cutter” for adding new, secured, resources to form a federation

Meta DataCatalog

PortalServer

SoftwareServer

BIRN CC

A View on BIRN Federated Data

Multi TB Disk array

StorageServer

DB Server

AccessControl

MRI Images

Mouse DB-B

EM Images

Access

Access

Mouse DB-D

Histology

Access

Mouse DB-C

2 Ph. Img

Access

Mouse DB-A

EM Images

BIRN User

? Give me an index of all DAT-KO Striatum Images

Federated data may be in a variety of representations

• databases

• image files

• simulation files

• flat text files

Page 19: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

19

http://www.nbirn.nethttp://www.nbirn.net

High-Level Comparisons

Agreed-upon minimal software at each site

Well-controlled hardware and software environment

Some dedicated, most shared Dedicated Resources

How to share resources and software across the PacRim

Security of information is a key driver

How to make the grid workIntegration across scales and federation of data

A variety of scientific discplines

3 well-defined biomedical testbeds

PRAGMABIRN

Page 20: A Tale of Two Gridsgrand goal using cyberinfrastructure: • Federate geographically distributed brain data of the same & different types • Accommodate requirements to collaboratively

20

Key Software Systems Being Deployed

• Rocks Cluster Mgmt• BIRN Certificate Authority• Globus• Storage Resource Broker• Oracle• Data Mediator• ½ dozen specific applications• Netscout Monitoring• BIRN Portal

• Globus• Accept Certs from Many

Authorities

• Rocks• SCE • Ninf-G• gFarm• NIMROD• ½ dozen specific applications• SARS portal• Telescience Portal

BIRN PRAGMA

What have we learned

• Top-down (BIRN) and Bottom-up (PRAGMA) can both work– These work because of committed collaborators– Application drivers are critical to keeping focus

• Both grids deployed and used the infrastructure even when all SW was not available.– Hands on experience has taught us a great deal– A large fraction of grid software is still “fragile”

• Software packaging and availability is critical to making things practical

• Integration of networked resources and people have enabled new ways of doing research