nesc data projects and initiatives dr. dave berry research manager

Post on 28-Mar-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NeSC Data Projects and Initiatives

Dr. Dave BerryResearch Manager

Contents

The Data DelugeWeb ServicesThe DAI visionThe OGSA-DAI Project and GGFThe OGSA-DAI SoftwareEdiktOther relevant projects in the UK

Acknowledgements

This talk includes material prepared by:The OGSA-DAI projectThe e-Diamond projectThe BRIDGES projectThe GGF OGSA Working Groupand others…

The Data Deluge

Mont Blanc(4810 m)

Entering an age of dataCERN: LHC will generate 1GB/s = 10PB/yVLBA (NRAO) generates 1GB/s todayPixar generate 100 TB/Movie

Data stored in many different waysRelational databasesXML databasesFlat files

Need ways to facilitate Data discoveryData accessData integration

Downtown Geneva

Astronomical Databases

No. & sizes of data sets as of mid-2002, grouped by wavelength• 12 waveband coverage of large areas of the sky• Total about 200 TB data• Doubling every 12 months• Largest catalogues nr. 1B objects

Data and images courtesy Alex Szalay, John Hopkins

Bioinformatics DatabasesPDB Content Growth

•Biobliographic (MedLine, …)

•Amino Acid Seq (SWISS-PROT, …)

•3D Molecular Structure (PDB, …)

•Nucleotide Seq (GenBank, EMBL, …)

•Biochemical Pathways (KEGG, WIT…)

•Molecular Classifications (SCOP, CATH,…)

•Motif Libraries (PROSITE, Blocks, …)

Web Services

Using the protocols and ideas that have made the web a success for humans…And applying them to distributed programming

HTTP Single networking port Autonomy & Failure handlingOpen standards

Tools & PlatformsApache axisWebsphere, .NET, Oracle Application Server, Sun ONE, …

From Browsing to Programming

  Browsing the web Programming the web

Readers People Software

Discovery Google, Altavista, … UDDI, …

Description N/A WSDL

Operations Get, post, … Service-specific

Protocol HTTP SOAP over HTTP

Format HTML, XHTML XML + Schema

A Perspective on WS Specifications

Open Grid Services Architecture

Web Services

Business integration

Secure and universal access

Applications on demand

Grid Protocols

Vast resourcescalability

Global Accessibility

Resourceson demand

ContinuousAvailability

Accessresource

Manageresource

Shareresource

The architecture of the Global Grid Forum

ContextServices

InformationServices

InfrastructureServices

SecurityServices

ResourceMgmt

Services

ExecutionMgmt

Services

DataServices

PolicyMgmt

VOMgmt

Access

Integration

Provisioning

Cataloging

BoundaryTraversal

Integrity

Authorization

Authentication

WSRF WSN WSDM

EventMgmt

Trouble-shooting

Discovery

JobMgmt

Logging

ExecutionPlanning

WorkflowMgmt

WorkloadMgmt

Provisioning

ApplicationMgmt

DeploymentConfigurationReservation

Naming

SelfMgmt

Services

HeterogeneityMgmt

Service LevelAttainment

QoSMgmt

Optimization

GGF11:OGSA specification

informationaldocument

Data Access and Integration

Web Services for querying and integrating structured data resourcesThe foundation framework for:

Building tailored DAI applicationsHigher-level services:

Replication: Data located in multiple locations Federation: Composition of multiple sources Provenance: How was data generated?

The OGSA-DAI Project

Powered by ….

Funded by the Grid Core ProgrammeOGSA-DAI£3 million, 18 months, from Feb 2002

Three major releases, three interim releases

DAIT (DAI-Two)Keep the OGSA-DAI brand name£1.5 million, 24 months, from Oct 2003Four major releases

DAI in GGF and OGSA

Data Access and Integration Services WGStrong involvement from OGSA-DAI membersStandardise the interfaces – WS-DAIOGSA-DAI a reference implementationExperience informing specification work

OGSA WG Data Design TeamDesigning the data-oriented aspects of OGSACreated after GGF10 (March 2004)Led by NeSC

Context Services Info

Services

InfraServices

SecurityServices

Rsrc Mgmt Services

Execution Mgmt

Services

DataServices

PolicyMgmt

VOMgmt

Access

Integration

Provisioning

Cataloging

BoundaryTraversal

Integrity

Authorization

Authentication

WSRF WSN WSDM

EventMgmt

Trouble-shooting

Discovery

JobMgmt

Logging

ExecutionPlanning

WorkflowMgmt

WorkloadMgmt

Provisioning

ApplicationMgmt

DeploymentConfigurationReservation

Naming

Self MgmtServices

HeterogeneityMgmt

Service LevelAttainment

QoSMgmt

Optimization

OGSA Design Teams

OGSA-WG

Information Service design teamData Service design team

EMS design team

Resource Mgmt design team

Security Service design team

Self Mgmt design team

Core (roadmap) design team

Naming design team

Data Services design team

Informal domain expert groups within OGSAMay include co-chairs of other WG/RGsOutput is included in OGSA specification

OGSA-WG

OGSA Data ServiceDesign team

DAIS-WG

GSM-WG

GFS-WG

Info-D WG

ADF, OREP, …

Tele cons, F2F meetings

OGSA v2 Document Deliverables

RootDocuments

Usecase doc Architecture v2 Glossary

Design team

DocumentsService descriptions Scenarios

Working Group

Specifications GGF Recommendation documents

1a. Request to Registry for sources of data about “x”

1b. Registry responds with

Factory handle2a. Request to Factory for access to database

2c. Factory returns handle of GDS to client

3a. Client queries GDS with XPath, SQL, etc

3b. GDS interacts with database

3c. Results of query returned to client as XML

SOAP/HTTP

service creation

API interactions

Registry

Factory

2b. Factory creates GridDataService to manage access

Grid Data Service

Client

XML / Relational database

How OGSA-DAI works

OGSA-DAI compared to JDBC

Language independence at the client endPlatform independence

Do not have to worry about connection technology, drivers, etc

Can handle XML resourcesCan embed additional functionality at the service end

TransformationsThird party deliveryAvoiding unnecessary data movement

Provision of Metadata is powerfulUsefulness of the Registry for service discovery

Dynamic service binding process

GDTS2 GDS3

GDS2

GDTS1

Sx

Sy

1a. Request to Registry for sources of data about “x” & “y”

1b. Registry responds with

Factory handle

2a. Request to Factory for access and integration from resources Sx and Sy

2b. Factory creates GridDataServices network

2c. Factory returns handle of GDS to client

3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc

3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation

SOAP/HTTP

service creation

API interactions

Data Registry

Data Access& Integrationmaster

Client

Analyst XML database

Relational database

GDS

GDS

GDS

GDTS

GDTS

3b. Client tells analyst

GDS1

Future DAI Services

Application Code

Activities are the drivers

Express a task to be performed by a GDSThree broad classes of activities:

StatementTransformationsDelivery

Extensible:Easy to add new functionalityDoes not require modification to the service interfaceExtension operate within the OGSA-DAI framework

Functionality:Implemented at the serviceWork where the data is (do not require to move data back)

OGSA-DAI Deck

Building Applications

Activities are grouped togetherPerform documentData can flow between activities

OptimisationAvoids multiple message exchanges

Can deliver to other GDSsPrerequisite for data integration

Base middleware for projects requiring data access

Some capability for data integration

Release 4, April 2004

Provides Data Access components, an extensible framework for building applications and some integration componentsBuilt on top of Globus Toolkit 3.2Supports relational, xml and some files

MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSVSupports various delivery options

SOAP, FTP, GridFTP, HTTP, files, email, inter-serviceSupports various transforms

XSLT, ZIP, GZipSupports message level security using X509 certificatesClient Toolkit library for application developersGUI data browser (contributed by FirstDIG project)Separate Distributed Query Processing componentsComprehensive documentation and tutorials in XHTML format

Downloads by Release

0

500

1000

1500

2000

2500

3000

15/0

1/20

03

15/0

3/20

03

15/0

5/20

03

15/0

7/20

03

15/0

9/20

03

15/1

1/20

03

15/0

1/20

04

15/0

3/20

04

15/0

5/20

04

15/0

7/20

04

R1 R2

R3

R4

2746 downloads (~4.7 downloads a day)

Downloads by country

792 registered users @ 23/8/04

Release 5, October 2004

Re-engineered interface-independent core OGSA-DAI functionality.Improved dependability and security integration.New file data resources representing flat files queried using full text searches (e.g. EMBL format).Installation and Configuration Wizard, including “all-in-one installer”Improved Data Browser which allows XPath querying.Set of standard benchmarks.JSP Quick View interface.Support for other databases (e.g. Access, Exist, HSQL).

Release 6, April 2006

Data Integration applications supporting identified scenariosOGSA-DQP as an integrated part of releaseFully compliant JDBC Driver for OGSA-DAISupport for WS-Security implementationsSupport for stored procedures on all supported databasesImproved support for different database specific SQL typesSQL translation between vendor dialects for subset of queries Support for XQuery data resourcesWe expect to comply with a version of the emerging DAIS specification at this release.

Who is Using OGSA-DAI?

OGSA-DAI(http://www.ogsadai.org.uk)

AstroGrid(http://www.astrogrid.org/)

BioSimGrid(http://www.biosimgrid.org/)

BioGrid(http://www.biogrid.jp/)

Bridges(http://www.brc.dcs.gla.ac.uk/projects/bridges/)

eDiaMoND (http://www.ediamond.ox.ac.uk/)

FirstDig(http://www.epcc.ed.ac.uk/~firstdig/)

GeneGrid(http://www.qub.ac.uk/escience/projects.php#genegrid)

GEON(http://www.geongrid.org/)

IU RGRBench(http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html)

myGrid(http://www.mygrid.org.uk/)

N2Grid(http://www.cs.univie.ac.at/institute/index.html?project-80=80)

ODD-Genes(http://www.epcc.ed.ac.uk/oddgenes/) OGSA-WebDB

(http://www.gtrc.aist.go.jp/dbgrid/)

MCS(http://www.isi.edu/~deelman/MCS/)

INWA(http://www.epcc.ed.ac.uk/projects/inwa/)

GridMiner(http://www.gridminer.org/)

OGSA-DAIBiologicalSciences

PhysicalSciences

Commercial Applications

ComputerSciences

• FirstDig

• I NWA

• Bridges • AstroGrid

• BioSimGrid• BioGrid

• eDiamond• myGrid

• ODD- Genes

• N2Grid

• GEON

• MCS

• I U RGBench

• OGSA Web- DB

• GeneGrid

• GridMiner

OGSA-DAIBiologicalSciences

PhysicalSciences

Commercial Applications

ComputerSciences

• FirstDig

• I NWA

• Bridges • AstroGrid

• BioSimGrid• BioGrid

• eDiamond• myGrid

• ODD- Genes

• N2Grid

• GEON

• MCS

• I U RGBench

• OGSA Web- DB

• GeneGrid

• GridMiner

Project classification

Edikt

The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SABSHEFC funded research and development grant

3 years funding: May 2002 – 2005+3 years funding upon successful project and review

Standards

Edikt project

Requirementsanalysis

Technologymatchmaking

Gap filling Rigorousengineering

CS Research

Grid Services fore-Science Data Management

Commercial SW components

and skills

E-Science Apps

JavaFramework

ELDAS – Data Access Service

Implemented using Enterprise Java BeansData Access Components interface to distinct DBMSsAccessible as a grid data service or a web data service

ELDAS

DB2 DBMySQL DBXindice DB

Web User1

Oracle 9i DB

EJB - DAS

DACDACDACDAC

Another (partial) implementation of the GGF WS-DAI specifications

Web ServletGrid Proxy

Grid User1 Grid User2

e-ScienceApplication

BinaryData File

BinaryData FileBinary

Data File

BinaryData FileBinary

Data File

BinaryData File

BinX – accessing legacy binary data

The Problem:Many binary data filesApplications must “know”the data formatBinary data formats are machine-specific

BinX Library

The Solution:Write a “stand-aside” format description in XMLProvide a library to

Interpret the description Provide file access across different

machines

Build higher-level services

BinX file describes binary file structure

BinX file describes binary file structure

simulations

Mammography

Mammograms have different appearances, depending on image settings and acquisition systems

StandardMammoFormat

StandardMammoFormat

Temporal mammography

ComputerAidedDetection

3D View

A prototype of a national database of mammographic images in support of the UK breast screening programme

DB2 ContentManager

DB2 ContentManager

DB2 ContentManager

DB2 ContentManager

DB2 Federation

OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI

Database Files

OGSA-DAI

Core Services

Core Services

Core Services

Core Services

DataLoad

TrainingApp

TrainingServices

UCLKCL UEDCHU

CoreAPI

TrainingAPI

TrainingApplication

Core & Training API

OGSA-DAI

DataLoad

TrainingApp

Core & Training API

DataLoad

TrainingApp

Core & Training API

DataLoad

TrainingApp

Core & Training API

The BRIDGES Project

Biomedical Research Informatics Delivered by Grid Enabled Services

NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges

Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases

Variety of tools usedBLAST, BLAT, Gene Prediction, visualisation, …

Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, medical records, …

Aim is integrated infrastructure supportingData federationSecurity

BRIDGES

Glasgow Edinburgh

Leicester Oxford

London

Netherlands

Publically Curated Data

Private data

Private data

Private data

Private data

Private data

Private data

CFG Virtual Organisation Ensembl

MGI

HUGO

OMIM

SWISS-PROT

… DATA HUB

RGD

SyntenyGrid

Service

blast

+

VO Authorisation

Information Integrator

OGSA-DAI

INWA Project

Innovation Node Western AustraliaInforming Business & Regional Policy: Grid-enabled fusion of global data and local knowledge

Involved 10 partners (6 UK + 4 Australia)Aim

Data mine commercially sensitive dataSecurity an absolute MUSTEmploy Grid technologiesNeed access to data and computational resources

OGSA-DAIAccess data resources

SunDCG's TOG (Transfer-queue Over Globus)Handle job submission to analyse micro array data

user@australia

Curtin,Australia

EPCC,UK

INWA

Grid Engine

Bank Telco

Grid Engine

Bank Telco

OGSA-DAI OGSA-DAI

OGSA-DAI OGSA-DAI

TOG

TOG

Data Browser

Data Browser

user@edinburgh

Telco data

Bank data

Australian property

UK Property

Further Information on OGSA-DAI

The OGSA-DAI Project Site:http://www.ogsadai.org.uk

The DAIS-WG site:http://cs.man.ac.uk/grid-db

OGSA-DAI Users Mailing listusers@ogsadai.org.ukGeneral discussion on grid DAI matters

Formal support for OGSA-DAI releaseshttp://www.ogsadai.org.uk/supportsupport@ogsadai.org.uk

OGSA-DAI training courses

top related