design, construction and early use of the biomedical informatics research network

27
Design, Construction and Early Use of the Biomedical Informatics Research Network July 2004 Dr. Philip Papadopoulos Program Director, Grid and Cluster Computing San Diego Supercomputer Center University of California, San Diego [email protected] http://www.nbirn.ne

Upload: twila

Post on 11-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Design, Construction and Early Use of the Biomedical Informatics Research Network. Dr. Philip Papadopoulos Program Director, Grid and Cluster Computing San Diego Supercomputer Center University of California, San Diego [email protected]. July 2004. http://www.nbirn.net. BIRN Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design, Construction and Early Use of the Biomedical Informatics Research Network

Design, Construction and Early Use of the Biomedical Informatics Research Network

July 2004

Dr. Philip PapadopoulosProgram Director, Grid and Cluster Computing

San Diego Supercomputer CenterUniversity of California, San Diego

[email protected]

http://www.nbirn.net

Page 2: Design, Construction and Early Use of the Biomedical Informatics Research Network

BIRN Overview

• BIRN – Biomedical Informatics Research Network– Funded by the National Institutes of Health

– Focused on the data sharing needs of neuro-imaging scientists

• 17 Institutions

• 3 Test bed application groups

• Security, integrity, and tracking of data access very important

– Well-defined software and hardware infrastructure that is replicated across sites

– Challenges are not just technical

• Differing policies on across universities

• Sharing of data is new to the scientists

Page 3: Design, Construction and Early Use of the Biomedical Informatics Research Network

Agenda

• Overview of BIRN• Some of the software/hardware details• Initial results for grid-based science• An incomplete set of challenges

Page 4: Design, Construction and Early Use of the Biomedical Informatics Research Network

BIRN is Team Science BIRN is Team Science Applied to Stretch Applied to Stretch

GoalsGoals A Big Challenge or Vision:A Big Challenge or Vision: ““Enable new understanding of Enable new understanding of the healthy and diseased brain the healthy and diseased brain by linking data about by linking data about macroscopic brain function to macroscopic brain function to its molecular and cellular its molecular and cellular underpinnings”underpinnings”

Taking practical steps toward a Taking practical steps toward a grand goal using grand goal using cyberinfrastructure:cyberinfrastructure:

• Federate geographically Federate geographically distributed brain data of the same distributed brain data of the same & different types& different types

• Accommodate requirements to Accommodate requirements to collaboratively interact with collaboratively interact with shared databases of large-scale shared databases of large-scale data, share methods, and data, share methods, and computational resourcescomputational resourcesScales of NS data from Maryann MartoneScales of NS data from Maryann Martone

Page 5: Design, Construction and Early Use of the Biomedical Informatics Research Network

IT Infrastructure to hasten the derivation of new understanding and treatment of disease through use

of distributed knowledge

IT Infrastructure to hasten the derivation of new understanding and treatment of disease through use

of distributed knowledge

The BIRN Network

Page 6: Design, Construction and Early Use of the Biomedical Informatics Research Network

BIRN Today is …

• Three neuroscience test beds building on research projects– Mouse BIRN– Morph BIRN– Functional BIRN

• BIRN Coordinating Center (BIRN-CC) – IT hub for BIRN

• Major Activities include• Integrating advanced biomedical imaging and clinical research centers in the US.• Developing hardware and software infrastructure for managing distributed data:

creation of data grids.• Exploring data using “intelligent” query engines that can make inferences upon

locating “interesting” data.• Building bridges across tools and data formats.• Changing the use pattern for research data from the individual laboratory/project

to shared use

Page 7: Design, Construction and Early Use of the Biomedical Informatics Research Network

BIRN Project CoordinationBIRN Project Coordination

Internet 2

SiSi SiSi

Functional Imaging BIRN Test-bed

Human Morphometry BIRN Test-bed

Mouse BIRN Test-bed

BIRN Coordinating

Center

The BIRN-CC leads…The BIRN-CC leads…• • the deployment and maintenance the deployment and maintenance of a network infrastructure capable of a network infrastructure capable of quickly moving large amounts of  of quickly moving large amounts of  data between BIRN sites across the data between BIRN sites across the country. country.

• • the creation of a federation of the creation of a federation of databases pertaining to the BIRN databases pertaining to the BIRN scientific projects. scientific projects.

• • the development and integration of the development and integration of software to refine, combine, software to refine, combine, compare, and analyze complex compare, and analyze complex biomedical data. biomedical data.

• • and cultivates group and cultivates group activities to overcome activities to overcome cultural barriers to building cultural barriers to building a forum for collaborative a forum for collaborative research,  co-authoring research,  co-authoring research papers, and research papers, and sharing sharing methods/tools/codes across methods/tools/codes across institutions. institutions.

Page 8: Design, Construction and Early Use of the Biomedical Informatics Research Network

Basic Premise of BIRN

• If given access to larger data populations, scientists can – Investigate new scientific questions

– Have a better statistical basis for testing hypothesis

• Working together – Improves the pace with which discoveries can be made

• Reduce redundant activities in labs

Page 9: Design, Construction and Early Use of the Biomedical Informatics Research Network

BIRN Forms a Virtual Data GridBIRN Forms a Virtual Data Grid

• Defines a Distributed Data Handling System

• Integrates Storage Resources in the BIRN network

• Integrates Access to Data, to Computational and Visualization Resources

• Acts as a Virtual Platform for Knowledge-based Data Integration Activities

• Provides a Uniform Interface

to Users

Page 10: Design, Construction and Early Use of the Biomedical Informatics Research Network

Each BIRN Site Has Standard Hardware• Controlled Software and Hardware

configuration

• Software managed from the BIRN Coordinating Center

• OS and BIRN tool integration enabled by Rocks Cluster management

• Software Stack Components– Globus

– Storage Resource Broker

– Test bed application tools

– Portal Technologies

– Oracle Database

– Data Mediation SW

Page 11: Design, Construction and Early Use of the Biomedical Informatics Research Network

Function BIRN: Integrated Data QueryFunction BIRN: Integrated Data Query

fMRI

Are chronic, but not first-onset patients, associated with superior temporal gyrus dysfunction (MMN)?

Integrated View

Receptor Density ERP

Web

PubMed, Expasy

Wrapper

WrapperWrapper

Wrapper

Structure

Wrapper

Clinical

Wrapper

MediatorMediator

0.150.18

0.140.11

-0.14-0.10-0.06-0.020.020.060.100.140.180.220.260.30

ARIP - 20MG ARIP - 30MG RISP - 06MG PLACEBOTreatment Group

Page 12: Design, Construction and Early Use of the Biomedical Informatics Research Network

Function BIRN: Federated Imaging DatabasesFunction BIRN: Federated Imaging DatabasesCalibration, Integration from ½ dozen sites. First-ever normalization protocol for fMRI machines

Page 13: Design, Construction and Early Use of the Biomedical Informatics Research Network

• Overall Goal:Develop capability to analyze and mine data acquired at multiple sites using processing and visualization tools developed at multiple sites

• Context: – Human Brain MR Based Morphometry

• Initial Applications:–Alzheimer’s, Depression, Aging Brain

• Participants: –BWH, MGH, Duke, UC Los Angeles, UC San Diego, Johns Hopkins, UC Irvine, Washington University

Human Morphometry BIRN

Page 14: Design, Construction and Early Use of the Biomedical Informatics Research Network

Multi-site Structural MRI Data Acquisition & Calibration

Methods: common acquisition protocol, distortion correction, evaluation by scanning human phantoms multiple times at all sites

•MGH (NMR): J. Jovicich, A. Dale, D. Greve, E. Haley

•BWH (SPL): S. Pieper•UCI: D. Keator•UCSD (fMRI): G. Brown •Duke University (NIRL): J. MacFall

CorrectedUncorrected

Image intensity variability onsame subject scanned at 4 sites

Morphometry BIRN: Solving Issues in Distributed Data Acquisition

Accomplishment: develop acquisition & calibration protocols that improve reproducibility, within- and across-sites

Page 15: Design, Construction and Early Use of the Biomedical Informatics Research Network

MIRIAD Project: Improving throughput

Segmentation Duke BIRN-MIRIAD

Item (semi-automated) (fully-automated)

# of tissue classes 3 (Fig1) 23 (Fig2)

Time for 200 brains 400 hours 1 hour

Time for 200 lobe & 250 hours all lobes (Fig3) and 27 regional analysis regions included above

Improved computational capabilities

1 2 3

Page 16: Design, Construction and Early Use of the Biomedical Informatics Research Network

BIRN Portal: Launches Scientific Workflow

1. User Login In BIRN Portal, selects data and LONI settings

2. LONI Pipeline is launched from Portal

3. Results are automatically displayed in Slicer 3D

Page 17: Design, Construction and Early Use of the Biomedical Informatics Research Network

Mouse BIRN: Multiscale Data Mediation

1. Create databases at each site

2. Create conceptual links to a shared

ontology

3. Situate the data in a common spatial

framework

4. Use mediator to navigate and query across data sources

Page 18: Design, Construction and Early Use of the Biomedical Informatics Research Network

1) Established a data sharing infrastructure using the BIRN for multiscale investigations of animal models of human neurological disease

• Shared file collections using the Storage Resource Broker

• Developed common specimen preparation protocols

• Developed a set of shared analysis and visualization tools working through the BIRN portal

2) Developed a database federation as a data sharing mechanism and a persistent data archive

• Established independent databases at each site and populated them with mouse imaging data

• Mapped data to shared knowledge sources like the UMLS and atlas coordinate systems

• Created a virtual data federation through semantic and spatial mediation tools

Accomplishments of Mouse BIRN

Page 19: Design, Construction and Early Use of the Biomedical Informatics Research Network

Purkinje neuron

Registering My Data

UMLS

Spatial RegistrationSpatial Registration

Page 20: Design, Construction and Early Use of the Biomedical Informatics Research Network

Human-Mouse Data Integration(Unanticipated New Science Questions)

Query Atlas (3D Slicer)

-Alex Joyner, Steve Pieper, Greg Brown, Nicole Aucoin

Page 21: Design, Construction and Early Use of the Biomedical Informatics Research Network

Key Systems Challenges• Large-scale data is distributed on a National Scale

– How do you easily locate what you want?

– How do you translate it to what your SW tools understand?

– Where do you analyze it?

– How do you move it efficiently?

– How do you secure it to properly limit and log access?

• The underlying software systems are complex– How effectively can this complexity be hidden?

• Software technology continually evolves and BIRN must adapt

• Goal: provide a systems “cookie-cutter” for adding new, secured, resources to form a federation

Page 22: Design, Construction and Early Use of the Biomedical Informatics Research Network

Meta DataCatalog

PortalServer

SoftwareServer

BIRN CC

A View on BIRN Federated Data

Multi TB Disk array

StorageServer

DB Server

AccessControl

MRI Images

Mouse DB-B

EM Images

Access

Access

Mouse DB-D

Histology

Access

Mouse DB-C

2 Ph. Img

Access

Mouse DB-A

EM Images

BIRN User

? Give me an index of all DAT-KO Striatum Images

Federated data may be in a variety of representations

• databases

• image files

• simulation files

• flat text files

Page 23: Design, Construction and Early Use of the Biomedical Informatics Research Network

http://www.nbirn.nethttp://www.nbirn.net

Page 24: Design, Construction and Early Use of the Biomedical Informatics Research Network

Key Software Systems Being Deployed

• Rocks Cluster Mgmt – www.rocksclusters.org• BIRN Certificate Authority - MyProxy• Globus• Storage Resource Broker – www.sdsc.edu/srb• Oracle• Data Mediator – being developed by BIRN-CC• ½ dozen specific applications• Netscout Monitoring – Commercial tooling• BIRN Portal

BIRN

Page 25: Design, Construction and Early Use of the Biomedical Informatics Research Network

What have we learned

• Top-down – Works because of committed collaborators

– Application drivers are critical to keeping focus

• Grid is deployed and used even when all SW was not available.– Hands on experience has taught us a great deal

– A large fraction of grid software is still “fragile”

• Software packaging and availability is critical to making things practical

• Integration of networked resources and people have enabled new ways of doing research

Page 26: Design, Construction and Early Use of the Biomedical Informatics Research Network

Key Observations

• Computer scientists have to learn some new language to better understand needs

• Grids are new to scientists and it is natural for them to be skeptical

• Data sharing policy issues are quite troublesome– No uniform policy across institutions on how, but

• NIH has declared that all data taken with public (tax) money will eventually be public

– Tracking use of human data is important• Removing identifiers (like facial features in a full-

skull MRI) is essential.

Page 27: Design, Construction and Early Use of the Biomedical Informatics Research Network

One Final Thought

• I have been involved in 4 large-scale scientific collaborations– BIRN

– GEON (GeoSciences Network)

– OptIPuter

– Teragrid

• In all cases it has taken at least 18 months for the large projects to make their first significant steps as a group.– Is there something fundamental about large group creation for

distributed projects that limits how quickly new results can be obtained?

– Is there a way to shorten this “spin-up” time?