nesc data projects and initiatives dr. dave berry research manager
Post on 28-Mar-2015
217 Views
Preview:
TRANSCRIPT
NeSC Data Projects and Initiatives
Dr. Dave BerryResearch Manager
Contents
The Data DelugeWeb ServicesThe DAI visionThe OGSA-DAI Project and GGFThe OGSA-DAI SoftwareEdiktOther relevant projects in the UK
Acknowledgements
This talk includes material prepared by:The OGSA-DAI projectThe e-Diamond projectThe BRIDGES projectThe GGF OGSA Working Groupand others…
The Data Deluge
Mont Blanc(4810 m)
Entering an age of dataCERN: LHC will generate 1GB/s = 10PB/yVLBA (NRAO) generates 1GB/s todayPixar generate 100 TB/Movie
Data stored in many different waysRelational databasesXML databasesFlat files
Need ways to facilitate Data discoveryData accessData integration
Downtown Geneva
Astronomical Databases
No. & sizes of data sets as of mid-2002, grouped by wavelength• 12 waveband coverage of large areas of the sky• Total about 200 TB data• Doubling every 12 months• Largest catalogues nr. 1B objects
Data and images courtesy Alex Szalay, John Hopkins
Bioinformatics DatabasesPDB Content Growth
•Biobliographic (MedLine, …)
•Amino Acid Seq (SWISS-PROT, …)
•3D Molecular Structure (PDB, …)
•Nucleotide Seq (GenBank, EMBL, …)
•Biochemical Pathways (KEGG, WIT…)
•Molecular Classifications (SCOP, CATH,…)
•Motif Libraries (PROSITE, Blocks, …)
Web Services
Using the protocols and ideas that have made the web a success for humans…And applying them to distributed programming
HTTP Single networking port Autonomy & Failure handlingOpen standards
Tools & PlatformsApache axisWebsphere, .NET, Oracle Application Server, Sun ONE, …
From Browsing to Programming
Browsing the web Programming the web
Readers People Software
Discovery Google, Altavista, … UDDI, …
Description N/A WSDL
Operations Get, post, … Service-specific
Protocol HTTP SOAP over HTTP
Format HTML, XHTML XML + Schema
A Perspective on WS Specifications
Open Grid Services Architecture
Web Services
Business integration
Secure and universal access
Applications on demand
Grid Protocols
Vast resourcescalability
Global Accessibility
Resourceson demand
ContinuousAvailability
Accessresource
Manageresource
Shareresource
The architecture of the Global Grid Forum
ContextServices
InformationServices
InfrastructureServices
SecurityServices
ResourceMgmt
Services
ExecutionMgmt
Services
DataServices
PolicyMgmt
VOMgmt
Access
Integration
Provisioning
Cataloging
BoundaryTraversal
Integrity
Authorization
Authentication
WSRF WSN WSDM
EventMgmt
Trouble-shooting
Discovery
JobMgmt
Logging
ExecutionPlanning
WorkflowMgmt
WorkloadMgmt
Provisioning
ApplicationMgmt
DeploymentConfigurationReservation
Naming
SelfMgmt
Services
HeterogeneityMgmt
Service LevelAttainment
QoSMgmt
Optimization
GGF11:OGSA specification
informationaldocument
Data Access and Integration
Web Services for querying and integrating structured data resourcesThe foundation framework for:
Building tailored DAI applicationsHigher-level services:
Replication: Data located in multiple locations Federation: Composition of multiple sources Provenance: How was data generated?
The OGSA-DAI Project
Powered by ….
Funded by the Grid Core ProgrammeOGSA-DAI£3 million, 18 months, from Feb 2002
Three major releases, three interim releases
DAIT (DAI-Two)Keep the OGSA-DAI brand name£1.5 million, 24 months, from Oct 2003Four major releases
DAI in GGF and OGSA
Data Access and Integration Services WGStrong involvement from OGSA-DAI membersStandardise the interfaces – WS-DAIOGSA-DAI a reference implementationExperience informing specification work
OGSA WG Data Design TeamDesigning the data-oriented aspects of OGSACreated after GGF10 (March 2004)Led by NeSC
Context Services Info
Services
InfraServices
SecurityServices
Rsrc Mgmt Services
Execution Mgmt
Services
DataServices
PolicyMgmt
VOMgmt
Access
Integration
Provisioning
Cataloging
BoundaryTraversal
Integrity
Authorization
Authentication
WSRF WSN WSDM
EventMgmt
Trouble-shooting
Discovery
JobMgmt
Logging
ExecutionPlanning
WorkflowMgmt
WorkloadMgmt
Provisioning
ApplicationMgmt
DeploymentConfigurationReservation
Naming
Self MgmtServices
HeterogeneityMgmt
Service LevelAttainment
QoSMgmt
Optimization
OGSA Design Teams
OGSA-WG
Information Service design teamData Service design team
EMS design team
Resource Mgmt design team
Security Service design team
Self Mgmt design team
Core (roadmap) design team
Naming design team
Data Services design team
Informal domain expert groups within OGSAMay include co-chairs of other WG/RGsOutput is included in OGSA specification
OGSA-WG
OGSA Data ServiceDesign team
DAIS-WG
GSM-WG
GFS-WG
Info-D WG
ADF, OREP, …
Tele cons, F2F meetings
OGSA v2 Document Deliverables
RootDocuments
Usecase doc Architecture v2 Glossary
Design team
DocumentsService descriptions Scenarios
Working Group
Specifications GGF Recommendation documents
1a. Request to Registry for sources of data about “x”
1b. Registry responds with
Factory handle2a. Request to Factory for access to database
2c. Factory returns handle of GDS to client
3a. Client queries GDS with XPath, SQL, etc
3b. GDS interacts with database
3c. Results of query returned to client as XML
SOAP/HTTP
service creation
API interactions
Registry
Factory
2b. Factory creates GridDataService to manage access
Grid Data Service
Client
XML / Relational database
How OGSA-DAI works
OGSA-DAI compared to JDBC
Language independence at the client endPlatform independence
Do not have to worry about connection technology, drivers, etc
Can handle XML resourcesCan embed additional functionality at the service end
TransformationsThird party deliveryAvoiding unnecessary data movement
Provision of Metadata is powerfulUsefulness of the Registry for service discovery
Dynamic service binding process
GDTS2 GDS3
GDS2
GDTS1
Sx
Sy
1a. Request to Registry for sources of data about “x” & “y”
1b. Registry responds with
Factory handle
2a. Request to Factory for access and integration from resources Sx and Sy
2b. Factory creates GridDataServices network
2c. Factory returns handle of GDS to client
3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc
3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation
SOAP/HTTP
service creation
API interactions
Data Registry
Data Access& Integrationmaster
Client
Analyst XML database
Relational database
GDS
GDS
GDS
GDTS
GDTS
3b. Client tells analyst
GDS1
Future DAI Services
Application Code
Activities are the drivers
Express a task to be performed by a GDSThree broad classes of activities:
StatementTransformationsDelivery
Extensible:Easy to add new functionalityDoes not require modification to the service interfaceExtension operate within the OGSA-DAI framework
Functionality:Implemented at the serviceWork where the data is (do not require to move data back)
OGSA-DAI Deck
Building Applications
Activities are grouped togetherPerform documentData can flow between activities
OptimisationAvoids multiple message exchanges
Can deliver to other GDSsPrerequisite for data integration
Base middleware for projects requiring data access
Some capability for data integration
Release 4, April 2004
Provides Data Access components, an extensible framework for building applications and some integration componentsBuilt on top of Globus Toolkit 3.2Supports relational, xml and some files
MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSVSupports various delivery options
SOAP, FTP, GridFTP, HTTP, files, email, inter-serviceSupports various transforms
XSLT, ZIP, GZipSupports message level security using X509 certificatesClient Toolkit library for application developersGUI data browser (contributed by FirstDIG project)Separate Distributed Query Processing componentsComprehensive documentation and tutorials in XHTML format
Downloads by Release
0
500
1000
1500
2000
2500
3000
15/0
1/20
03
15/0
3/20
03
15/0
5/20
03
15/0
7/20
03
15/0
9/20
03
15/1
1/20
03
15/0
1/20
04
15/0
3/20
04
15/0
5/20
04
15/0
7/20
04
R1 R2
R3
R4
2746 downloads (~4.7 downloads a day)
Downloads by country
792 registered users @ 23/8/04
Release 5, October 2004
Re-engineered interface-independent core OGSA-DAI functionality.Improved dependability and security integration.New file data resources representing flat files queried using full text searches (e.g. EMBL format).Installation and Configuration Wizard, including “all-in-one installer”Improved Data Browser which allows XPath querying.Set of standard benchmarks.JSP Quick View interface.Support for other databases (e.g. Access, Exist, HSQL).
Release 6, April 2006
Data Integration applications supporting identified scenariosOGSA-DQP as an integrated part of releaseFully compliant JDBC Driver for OGSA-DAISupport for WS-Security implementationsSupport for stored procedures on all supported databasesImproved support for different database specific SQL typesSQL translation between vendor dialects for subset of queries Support for XQuery data resourcesWe expect to comply with a version of the emerging DAIS specification at this release.
Who is Using OGSA-DAI?
OGSA-DAI(http://www.ogsadai.org.uk)
AstroGrid(http://www.astrogrid.org/)
BioSimGrid(http://www.biosimgrid.org/)
BioGrid(http://www.biogrid.jp/)
Bridges(http://www.brc.dcs.gla.ac.uk/projects/bridges/)
eDiaMoND (http://www.ediamond.ox.ac.uk/)
FirstDig(http://www.epcc.ed.ac.uk/~firstdig/)
GeneGrid(http://www.qub.ac.uk/escience/projects.php#genegrid)
GEON(http://www.geongrid.org/)
IU RGRBench(http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html)
myGrid(http://www.mygrid.org.uk/)
N2Grid(http://www.cs.univie.ac.at/institute/index.html?project-80=80)
ODD-Genes(http://www.epcc.ed.ac.uk/oddgenes/) OGSA-WebDB
(http://www.gtrc.aist.go.jp/dbgrid/)
MCS(http://www.isi.edu/~deelman/MCS/)
INWA(http://www.epcc.ed.ac.uk/projects/inwa/)
GridMiner(http://www.gridminer.org/)
OGSA-DAIBiologicalSciences
PhysicalSciences
Commercial Applications
ComputerSciences
• FirstDig
• I NWA
• Bridges • AstroGrid
• BioSimGrid• BioGrid
• eDiamond• myGrid
• ODD- Genes
• N2Grid
• GEON
• MCS
• I U RGBench
• OGSA Web- DB
• GeneGrid
• GridMiner
OGSA-DAIBiologicalSciences
PhysicalSciences
Commercial Applications
ComputerSciences
• FirstDig
• I NWA
• Bridges • AstroGrid
• BioSimGrid• BioGrid
• eDiamond• myGrid
• ODD- Genes
• N2Grid
• GEON
• MCS
• I U RGBench
• OGSA Web- DB
• GeneGrid
• GridMiner
Project classification
Edikt
The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SABSHEFC funded research and development grant
3 years funding: May 2002 – 2005+3 years funding upon successful project and review
Standards
Edikt project
Requirementsanalysis
Technologymatchmaking
Gap filling Rigorousengineering
CS Research
Grid Services fore-Science Data Management
Commercial SW components
and skills
E-Science Apps
JavaFramework
ELDAS – Data Access Service
Implemented using Enterprise Java BeansData Access Components interface to distinct DBMSsAccessible as a grid data service or a web data service
ELDAS
DB2 DBMySQL DBXindice DB
Web User1
Oracle 9i DB
EJB - DAS
DACDACDACDAC
Another (partial) implementation of the GGF WS-DAI specifications
Web ServletGrid Proxy
Grid User1 Grid User2
e-ScienceApplication
BinaryData File
BinaryData FileBinary
Data File
BinaryData FileBinary
Data File
BinaryData File
BinX – accessing legacy binary data
The Problem:Many binary data filesApplications must “know”the data formatBinary data formats are machine-specific
BinX Library
The Solution:Write a “stand-aside” format description in XMLProvide a library to
Interpret the description Provide file access across different
machines
Build higher-level services
BinX file describes binary file structure
BinX file describes binary file structure
simulations
Mammography
Mammograms have different appearances, depending on image settings and acquisition systems
StandardMammoFormat
StandardMammoFormat
Temporal mammography
ComputerAidedDetection
3D View
A prototype of a national database of mammographic images in support of the UK breast screening programme
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 Federation
OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI
Database Files
OGSA-DAI
Core Services
Core Services
Core Services
Core Services
DataLoad
TrainingApp
TrainingServices
UCLKCL UEDCHU
CoreAPI
TrainingAPI
TrainingApplication
Core & Training API
OGSA-DAI
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
The BRIDGES Project
Biomedical Research Informatics Delivered by Grid Enabled Services
NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges
Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases
Variety of tools usedBLAST, BLAT, Gene Prediction, visualisation, …
Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, medical records, …
Aim is integrated infrastructure supportingData federationSecurity
BRIDGES
Glasgow Edinburgh
Leicester Oxford
London
Netherlands
Publically Curated Data
Private data
Private data
Private data
Private data
Private data
Private data
CFG Virtual Organisation Ensembl
MGI
HUGO
OMIM
SWISS-PROT
… DATA HUB
RGD
SyntenyGrid
Service
blast
+
VO Authorisation
Information Integrator
OGSA-DAI
INWA Project
Innovation Node Western AustraliaInforming Business & Regional Policy: Grid-enabled fusion of global data and local knowledge
Involved 10 partners (6 UK + 4 Australia)Aim
Data mine commercially sensitive dataSecurity an absolute MUSTEmploy Grid technologiesNeed access to data and computational resources
OGSA-DAIAccess data resources
SunDCG's TOG (Transfer-queue Over Globus)Handle job submission to analyse micro array data
user@australia
Curtin,Australia
EPCC,UK
INWA
Grid Engine
Bank Telco
Grid Engine
Bank Telco
OGSA-DAI OGSA-DAI
OGSA-DAI OGSA-DAI
TOG
TOG
Data Browser
Data Browser
user@edinburgh
Telco data
Bank data
Australian property
UK Property
Further Information on OGSA-DAI
The OGSA-DAI Project Site:http://www.ogsadai.org.uk
The DAIS-WG site:http://cs.man.ac.uk/grid-db
OGSA-DAI Users Mailing listusers@ogsadai.org.ukGeneral discussion on grid DAI matters
Formal support for OGSA-DAI releaseshttp://www.ogsadai.org.uk/supportsupport@ogsadai.org.uk
OGSA-DAI training courses
top related