1 overall architectural design of the earth system grid

Post on 21-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Overall Architectural Design of the Earth System Grid

2

Architecture of the Production Earth System Grid

Centralized portal provides all user interactions, most system services

Data may be co-located with gateway or at remote sites

Data nodes respond to gateway requests for specific files

Users access gateway via web browser or Data Mover Lite (DML)

Users do not talk to data nodes directly

3

Technologies Underlying the Production ESG

Climate Data• Metadata Catalog• NcML (metadata schema)• OPeNDAP-g (aggregation and

subsetting) Data Management

• Storage Resource Mgr Data Transfer

• Globus Security Infra-structure

• Data Mover Lite• GridFTP• Monitoring and Discovery

Services• Replica Location Service

Security• Access Control• MyProxy• User Registration

4

Current production Deployments

Holdings: CCSM, POP, CISM, CLM, NARCCAP, PCM• Gateway: NCAR• Data nodes: LANL, NCAR, NERSC, ORNL

Holdings: CMIP3 (IPCC AR4)• Gateway: LLNL• Data node: LLNL

Holdings: C-LAMP• Gateway: ORNL• Data node: ORNL

5

Key Requirements for Next Generation ESG

CMIP5 drives most requirements for the scale and global of ESG We are expecting…

• 30+ contributing sites in 17+ countries• Data volumes 600+ TB “core”, 6+ PB total• Collect and replicate core to ~4 sites

Surveyed initial testbed sites for details of setup, plans, expectations

Keep data (close to) where it is generated• Server-side analysis and processing to minimize delivered data volumes• Deliver to users from archive/processing location, not gateway

Give contributors significant autonomy to ease participation• ESG team does not own or operate all (most) nodes• Flexibility on hardware, personnel commitments• Nodes can come & go without taking down ESG

Interface with local data, identity management where appropriate Support topical & institutional gateways as needed

6

The Next-Generation ESG: A Federated Global Enterprise

Independent gateways federating metadata, users Any user can discover any data from any gateway Each data node publishes to one or more gateways Specific data collections are managed through specific gateways

7

Federated architectureFederation is a virtual trust relationship among independent management domains that have their own set of services. Users authenticate once to gain access to data across multiple systems and organizations

Gateways• Where data is discovered, requested• Portals, search capability, distributed metadata, registration and user management• May be customized to an institution’s requirements, topical focus• More complex architecture than nodes, fewer sites• Initially PCMDI, NCAR, ORNL, eventually GFDL

Nodes• Where data is stored and published• Data may be on disk or tertiary mass store• Each data node can publish to any gateway (facilitates topical gateways)• Data reduction/analysis• Less complex architecture, including possible minimalist deployment w/o services• Anticipate ~20 data nodes for CMIP5, many others have expressed interest

Sites• A site can be both a gateway and a data node

Gateways and Data Nodes

8

Next-Generation ESG Architectural Details

New architectural features “Global services” layer Gateway adds data

products UI, metadata harvesting

Data node adds subsetting and analysis capabilities

More details about next-gen software stack throughout the day…

9

OpenID for Accessing Federated Data Systems

ESG-CET invested a lot of effort in examining security/identity approaches

Relatively open data access for thousands of users around the world

More in common with social networking than high-value computational environments

OpenID provides a user-centric federated identity Estimates are upwards of a billion OpenID’s, 40+K sites

accepting IBM, Microsoft, Google, Verisign, PayPal, FaceBook as

corporate board members (BBC, Orange, SourceForge adoption)

10

Federated Registration and Authentication

All users must register their credentials with ESG• OpenID identities might be

managed outside of ESG

Data “owners” manage authorizations to access their collections• Groups may have special

requirements User searching for data is

redirected to authenticate or apply for authorizations as needed

top related