naregi middleware beta 1 and beyond satoshi matsuoka professor, global scientific information and...
Post on 18-Dec-2015
221 views
TRANSCRIPT
NAREGI Middleware Beta 1 and Beyond
Satoshi MatsuokaProfessor, Global Scientific Information and Comp
uting Center, Deputy Director, NAREGI Project
Tokyo Institute of Technology / NIIhttp://www.naregi.org
The Titech TSUBAME Production Supercomputing Cluster, Spring
2006
ClearSpeed CSX600SIMD accelerator360 boards, 35TeraFlops(Current)
Storage1 Petabyte (Sun “Thumper”)0.1Petabyte (NEC iStore)Lustre FS, NFS (v4?)
500GB48disks 500GB
48disks500GB48disks
NEC SX-8Small Vector Nodes (under
plan)
Unified IB network
Sun Galaxy 4 (Opteron Dual core 8-Way)
10480core/655Nodes50.4TeraFlops
OS Linux (SuSE 9, 10) NAREGI Grid MW
Voltaire ISR9288 Infiniband 10Gbps x2 (xDDR) x ~700 Ports
10Gbps+External Network
7th on June2006 Top500, 38.18 TFlops
Titech TSUBAMETitech TSUBAME~80+ racks~80+ racks
350m2 floor area350m2 floor area1.2 MW (peak)1.2 MW (peak)
Titech Supercomputing Grid 2006• ~13,000 CPUs, 90 TeraFlops, ~26 TeraBytes Mem,
~1.1 Petabytes Disk• CPU Cores: x86: TSUBAME (~10600), Campus Grid Cluster (~10
00), COE-LKR cluster (~260), WinCCS (~300) + ClearSpeed CSX600 (720 Chips)
すずかけ台
大岡山
数理・計算 C ( 予定 )
計算工学 C ( 予定 )
1.2km
35km, 10Gbps
Campus GridCampus GridClusterClusterCOE-LKRCOE-LKR
(( 知識知識 ) cluster) cluster
TSUBAMETSUBAME WinCCSWinCCS
Hokkaido UniversityInformation Initiative Center
HITACHI SR110005.6 Teraflops
Tohoku UniversityInformation Synergy Center
NEC SX-7NEC TX7/AzusA
University of TokyoInformation Technology Center
HITACHI SR8000HITACHI SR11000 6 TeraflopsOthers (in institutes)
Nagoya UniversityInformation Technology Center
FUJITSU PrimePower250011 Teraflops
Osaka UniversityCyberMedia Center
NEC SX-5/128M8HP Exemplar V2500/N1.2 Teraflops
Kyoto UniversityAcademic Center for Computing and Media Studies FUJITSU PrimePower2500
10 Teraflops
Kyushu UniversityComputing and Communications Center
FUJITSU VPP5000/64IBM Power5 p5955 Teraflops
University Computer Centers(excl. National Labs) circa Spring 2006
10Gbps SuperSINET Interconnecting the Centers
Tokyo Inst. TechnologyGlobal Scientific Informationand Computing Center
2006 NEC/SUN TSUBAME 85 Teraflops
University of Tsukuba
FUJITSU VPP5000CP-PACS 2048 (SR8000 proto)
National Inst. of InformaticsSuperSINET/NAREGI Testbed17 Teraflops
~60 SC Centers in Japan
- 10 Petaflop center by 2011
1TF
10TF
100TF
1PF
2002 2006 2008 2010 20122004
Earth Simulator 40TF (2002)
Next Gen“PetaGrid”1PF (2010)
2010 Titech “PetaGrid”
=> Interim 200TeraFlops @ 2008=> “Petascale” @ 2010
NORM for a typical Japanese center?
→HPC Software is the key!
10PF
“Keisoku”>10PF(2011)
Titech SupercomputingCampus Grid (incl TSUBAME )~90TF (2006)
Korean Machine >100TF (2006~7)
Chinese National Machine >100TF (2007~8)
US Petascale (2007~8)
US HPCS (2010)
BlueGene/L 360TF(2005)
TSUBAMEUpgrade >200TF (2008-2H)
US 10P (2011~12?)
1.3TF
Scaling Towards Petaflops…
KEK 59TFBG/L+SR11100
Titech Campus Grid
Nano-Science : coupled simluations on the Grid as the sole future for true scalability
The only way to achieve true scalability!
… between Continuum & Quanta.
10 -6 10-9 m
Material physics 的 (Infinite system) ・ Fluid dynamics ・ Statistical physics ・ Condensed matter theory …
Molecular Science ・ Quantum chemistry ・ Molecular Orbital method ・ Molecular Dynamics …
Multi-Physics
Limit of Computing Capability
Limit of Idealization
Coordinates decoupled resources;
Meta-computing, High throughput computing,
Multi-Physics simulationw/ components and data from different groups
within VO composed in real-time
Old HPC environment:・ decoupled resources,・ limited users,・ special software, ...
DistributedServers
LifeCycle of Grid Apps and Infrastructure
MetaComputing
Workflows and Coupled Apps / User
Many VO Users
Application Contents Service
HL WorkflowNAREGI WFML
Dist. Grid Info Service
GridRPC/Grid MPI UserApps
UserApps
UserApps
VO Application Developers&Mgrs
GridVM GridVMGridVM
SuperScheduler
Data 1 Data 2 Data n
Meta-data
Meta-data
Meta-data
Grid-wide Data Management Service (GridFS, Metadata, Staging, etc.)
Place & register data on the Grid
Assign metadata to data
NAREGI Software Stack (beta 1 2006)- WS(RF) based (OGSA) SW Stack -
Computing Resources and Virtual Organizations
NII IMS Research Organizations
Major University Computing Centers
((WSRF (GT4+Fujitsu WP1) + GT4 and other services)WSRF (GT4+Fujitsu WP1) + GT4 and other services)
SuperSINET
Grid-Enabled Nano-Applications (WP6)
Grid PSEGrid Programming (WP2)
-Grid RPC -Grid MPI
Grid Visualization
Grid VM (WP1)
Packag
ing
DistributedInformation Service
(CIM)
Grid Workflow (WFML (Unicore+ WF))
Super Scheduler
Grid Security and High-Performance Grid Networking (WP5)
Data (W
P4)
WP3
WP1WP1
List of NAREGI “Standards”(beta 1 and beyond)
• GGF Standards and Pseudo-standard Activities set/employed by NAREGI
GGF “OGSA CIM profile” GGF AuthZ GGF DAIS GGF GFS (Grid Filesystems) GGF Grid CP (GGF CAOPs) GGF GridFTP GGF GridRPC API (as Ninf-G2/G4)GGF JSDL GGF OGSA-BES GGF OGSA-Byte-IO GGF OGSA-DAI GGF OGSA-EMS GGF OGSA-RSS GGF RUS GGF SRM (planned for beta 2) GGF UR GGF WS-I RUS GGF ACS GGF CDDLM
• Other Industry Standards Employed by NAREGI
ANSI/ISO SQL DMTF CIM IETF OCSP/XKMS MPI 2.0 OASIS SAML2.0 OASIS WS-Agreement OASIS WS-BPEL OASIS WSRF2.0 OASIS XACML
• De Facto Standards / Commonly Used Software Platforms Employed by NAREGI
GangliaGFarm 1.1Globus 4 GRAMGlobus 4 GSI Globus 4 WSRF (Also Fujitsu WSRF for C binding)IMPI (as GridMPI)Linux (RH8/9 etc.), Solaris (8/9/10), AIX, …MyProxy OpenMPI Tomcat (and associated WS/XML standards) Unicore WF (as NAREGI WFML)VOMS Necessary for Longevity
and Vendor Buy-InMetric of WP Evaluation
Implement “Specs” early even if nascent if seemingly viable
Highlights of NAREGI Beta (May 2006, GGF17/GridWorld)
• Professionally developed and tested• “Full” OGSA-EMS incarnation
– Full C-based WSRF engine (Java -> Globus 4)– OGSA-EMS/RSS WSRF components– GGF JSDL1.0-extension job submission, authorization, etc.– Support for more OSes (AIX, Solaris, etc.) and BQs
• Sophisticated VO support for identity/security/monitoring/accounting (extensions of VOMS/MyProxy, WS-* adoption)
• WS- Application Deployment Support via GGF-ACS• Comprehensive Data management w/Grid-wide FS• Complex workflow (NAREGI-WFML) for various coupled simulation
s• Overall stability/speed/functional improvements• To be interoperable with EGEE, TeraGrid, etc. (beta2) • Release next week at GGF17, press conferences, etc.
Ninf-G: A Reference Implementation of the GGF GridRPC API
Large scale computing across supercomputers on the Grid
user
① Call remote procedures
② Notify results
Utilization of remotesupercomputers
Call remote libraries
Internet
• What is GridRPC?Programming model using RPCs on a GridProvide easy and simple programming interfaceThe GridRPC API is published as a proposed recommendation (GFD-R.P 52)
• What is Ninf-G?A reference implementation of the standard GridRPC APIBuilt on the Globus ToolkitNow in NMI Release 8 (first non-US software in NMI)
• Easy three steps to make your program Grid aware– Write IDL file that specifies interface of your library– Compile it with an IDL compiler called ng_gen– Modify your client program to use GridRPC API
GridMPI• MPI applications run on the Grid environment• Metropolitan area, high-bandwidth environment: 10 Gpbs, 500 miles
(smaller than 10ms one-way latency)– Parallel Computation
• Larger than metropolitan area– MPI-IO
Wide-areaNetwork
Single (monolithic) MPI applicationover the Grid environment
computing resourcesite A
computing resourcesite A
computing resourcesite B
computing resourcesite B
Grid Application Environment(WP3)
Workflow Engine &Super Scheduler
FileTransfer
(RFT)
UnderlyingGrid Services
DistributedInformation Service ・・・・・・
CFDVisualization
Service
CFDVisualizer
MolecularVisualization
Service
MolecularViewer
ParallelVisualization
Service
ParallelVisualizer
DeploymentService
ApplicationContentsService
compile deploy un-deploy Application
Repository
(GGF-ACS)
File/ExecutionManager
WorkflowService
GridFile System
VOMSMyProxy
Portal GUI
Visualization GUIRegister UIDeployment UI Workflow GUI
Grid VisualizationGrid WorkflowGrid PSE
NAREGI PortalPortal GUIBio VOBio VO Nano VONano VO
Gateway Services
Core Grid Services
NAREGI-WFML
JM I/F module
BPEL+JSDL
WSRF
Grid PSE- Deployment of applications on the Grid- Support for execution of deployed
applications
Grid Workflow- Workflow language independent of
specific Grid middleware- GUI in task-flow representation
Grid Visualization- Remote visualization of massive data
distributed over the Grid- General Grid services for visualization
WP-3: User-Level Grid Tools & PSE
The NAREGI SSS Architecture (2007/3)
Grid Resource
Grid-Middleware
PETABUS (Peta Application services Bus)
NAREGI-SSS
WESBUS (Workflow Execution Services Bus)
CESBUS (Coallocation Execution Services Bus; a.k.a. BES+ Bus)
BESBUS (Basic Execution Services Bus)
GlobusWS-GRAM I/F
(with reservation)
GridVM
UniGridSAtomic Services(with reservation)
UniGridS-SCGRAM-SC AGG-SC with RS(Aggregate SCs)
CSGEPS
FTS-SC
JMBPEL Interpreter
Service
ApplicationSpecific Service
ApplicationSpecific Service
ApplicationSpecific Service
NAREGI beta 1 SSS ArchitectureAn extended OGSA-EMS Incarnation
WFML2BPEL
SS
NAREGI JM(SS) Java I/F module
NAREGI-WP3 WorkFlowTool, PSE, GVS
JM-Client
SubmitStatusDeleteCancel
EPS
CSG
IS
Cancel StatusSubmit
BPEL2WFST
CreateActivity(FromBPEL)GetActivityStatusRequestActivityStateChanges
CES
OGSA-DAI
CIM
DB PostgreSQL
JSDLJSDL
JSDLJSDL
MakeReservationCancelReservation
globus-url-copy
uber-ftp
CES
CES
S
SR
R
BPEL (include JSDL)BPEL (include JSDL)
Invoke EPS
Invoke SC
JSDLJSDLJSDL
JSDLJSDLJSDL
NAREGI-WFML
NAREGI-WFML
JSDL
Co-allocation FileTransfer
CreateActivity(FromJSDL)GetActivityStatusRequestActivityStateChanges
MakeReservationCancelReservation
JSDLJSDLJSDL
JSDLJSDLJSDL
JSDLJSDLJSDL
JSDLJSDLJSDL
JSDLJSDL
JSDLJSDL
JSDLJSDL
Delete
JSDLJSDL
NAREGI JM (BPEL Engine)
S
JSDLJSDL
JSDLJSDLJSDL
JSDLJSDLJSDL
JSDLJSDL
AbbreviationSS: Super SchedulerJSDL: Job Submission Description DocumentJM: Job ManagerEPS: Execution Planning ServiceCSG: Candidate Set GeneratorRS: Reservation ServiceIS: Information ServiceSC: Service ContainerAGG-SC: Aggregate SCGVM-SC: GridVM SCFTS-SC: File Transfer Service SCBES: Basic Execution Service I/FCES: Co-allocation Execution Service I/F (BES+)CIM: Common Information ModelGNIS: Grid Network Information Service
SelectResourceFromJSDL
GenerateCandidateSet
GenerateSQL QueryFrom JSDL
AGG-SC/RS
SC
CES
PBS, LoadLeveler
S
GRAM4 specific
SC(GVM-SC)
SC
CES
PBS, LoadLeveler
S
GRAM4 specific
SC(GVM-SC)
R R
WS-GRAM
GridVMWS-GRAM
GridVM
FTS-SC
Fork/Exec
is-query
Fork/Exec
globusrun-wsglobusrun-ws
Fork/ExecFork/Exec
GFarmserver
GNIS
GetGroups-OfNodes
Meta computing scheduler is required to allocate and to execute jobs on multiple sites simultaneously.
The super scheduler negotiates with local RSs on job execution time and reserves resources which can execute the jobs simultaneously.
3, 4: Co-allocation and Reservation
ReservationService
ServiceContainer
LocalRS 1
ServiceContainer
LocalRS 2
Cluster (Site) 1
Cluster (Site) 2
Execution Planning Services
with Meta-Scheduling
Candidate Set Generator
Local RS #:Local Reservation Service #
AbstractJSDL(10)
①
AbstractJSDL(10)
②Candidates: Local RS 1 EPR (8) Local RS 2 EPR (6)
④
ConcreteJSDL(8)
(1) 14:00- (3:00)
++ConcreteJSDL(2)
(3)Local RS1 Local RS2 (EPR)
(4)Abstract Agreement Instance EPR
(2)
⑥
ConcreteJSDL(8)
15:00-18:00⑦
ConcreteJSDL(2)
15:00-18:00
⑨
Super SchedulerSuper Scheduler
GridVMGridVM
Distributed Information ServiceDistributed Information Service
Distributed Information Service
③
ConcreteJSDL(8)
⑧
create an agreement instance
ConcreteJSDL(2)
⑩
create an agreement instance
AbstractJSDL(10)
create an agreement instance
⑤
⑪
NAREGI Info Service (beta) Architecture
Client(Resource Broker etc.)
ClientLibrary
Java-API
RDB
Light-weightCIMOMService
AggregatorService OS
Processor
File System
CIM Providers
● ●
ResourceUsageService
RUS::insertURs
ChargeableService
(GridVM etc.)
Job Queue
Cell Domain Information Service
Node B
Node A
Node C
ACL
GridVM
Cell Domain Information Service
Information Service Node
… Hierarchical filtered aggregationParallel Query …
PerformanceGanglia
DataService
ViewerUserAdmin.
・ CIMOM Service classifies info according to CIM based schema. ・ The info is aggregated and accumulated in RDBs hierarchically.・ Client library utilizes OGSA-DAI client toolkit.・ Accounting info is accessed through RUS.
Client(publisher)
NAREGI IS: Standards Employed in the Architecture
GT4.0.1DistributedInformation Service
Client(OGSA- RSS etc.)
Clientlibrary
Java-API
RDB Light-weightCIMOMService
AggregatorService
OS
Processor
File System
CIM Providers
● ●
WS-IRUS
RUS::insertURs
GridVM(Chargeable
Service)
Job Queue
Cell Domain Information Service
Node B
Node A
Node C
ACL
GridVM
Cell Domain Information Service
Information Service Node Information Service Node
… Hierarchical filtered aggregation... Distributed Query …
PerformanceGanglia
OGSA-DAI WSRF2.1
ViewerUserAdmin.
OGSA-DAIClient toolkit
CIM Schema2.10
/w extension
GGF/UR
GGF/UR
APP
CIM spec.CIM/XML
Tomcat 5.0.28
Client(OGSA- BES etc.)
APP
Platform independence as OGSA-EMS SC• WSRF OGSA-EMS Service Container interface for heterogeneous platforms and local schedulers• “Extends” Globus4 WS-GRAM • Job submission using JSDL• Job accounting using UR/RUS• CIM provider for resource information
Meta-computing and Coupled Applications• Advanced reservation for co-Allocation
Site Autonomy•WS-Agreement based job execution (beta 2)•XACML-based access control of resource usage
Virtual Organization (VO) Management• Access control and job accounting based on VOs (VOMS & GGF-UR)
GridVM Features
NAREGI GridVM (beta) Architecture
Virtual execution environment on each site•Virtualization of heterogeneous resources•Resource and job management services with unified I/F Super
SchedulerSuper
SchedulerInformation
ServiceInformation
Service
AIX/LoadLeveler
GridVM SchedulerGridVM Scheduler
GRAM4GRAM4 WSRF I/FWSRF I/F
Local SchedulerLocal Scheduler
GridVM EngineGridVM Engine
Linux/PBSPro
GridVM SchedulerGridVM Scheduler
GRAM4GRAM4 WSRF I/FWSRF I/F
Local SchedulerLocal Scheduler
GridVM EngineGridVM Engine
GridMPI
sitePolicy
Advance reservation, Monitoring, Control
Accounting
ResourceInfo.
SandboxJob Execution site
Policy
NAREGI GridVM: Standards Employed in the ArchitectureSuper
SchedulerSuper
SchedulerInformation
ServiceInformation
Service
GridVM SchedulerGridVM Scheduler
GRAM4GRAM4 WSRF I/FWSRF I/F
Local SchedulerLocal Scheduler
GridVM EngineGridVM Engine
GridVM SchedulerGridVM Scheduler
GRAM4GRAM4 WSRF I/FWSRF I/F
Local SchedulerLocal Scheduler
GridVM EngineGridVM Engine
GridMPI
sitePolicy
xacml-like access control policy
CIM-based resource info. provider
GT4 GRAM-integration andWSRF-based extension services
UR/RUS-based job accounting
Job submission based on JSDLand NAREGI extensions
sitePolicy
GT4 GRAM-GridVM Integration
GridVMscheduler
RFT File Transfer
Local scheduler
GridVMEngine
SS
globusrun RSL+JSDL’
Delegate
Transferrequest
GRAMservices
DelegationScheduler
Event Generator
GRAMAdapter
GridVMJobFactory
GridVMJob
SUDO
Extension Service
Basic job management+ Authentication, Authorization
Site
Integrated as an extension module to GT4 GRAM Aim to make the both functionalities available
PBS-ProLoadLeveler…
Next Steps for WP1 – Beta2
• Stability, Robustness, Ease-of-install• Standard-setting core OGSA-EMS: OGSA-RSS, OGSA-BES
/ESI, etc.• More supported platforms (VM)
– SX series, Solaris 8-10, etc.– More batchQs – NQS, n1ge, Condor, Torque
• “Orthogonalization” of SS, VM, IS, WSRF components– Better, more orthogonal WSRF-APIs, minimize sharing of states
• E.g., reservation APIs, event-based notificaiton– Mix-and-match of multiple SS/VM/IS/external components, man
y benefits• Robutness• Better and realistic Center VO support• Better interoperability with external grid MW stack, e.g. Condor-C
VO and Resources in Beta 2
IS
A.RO1 B.RO1 N.RO1
RO1
Grid
VM
IS
Policy• VO-R01• VO-APL1• VO-APL2
Grid
VM
IS
Policy• VO-R01
Grid
VM
IS
Policy• VO-R01• VO-APL1
VO-RO1ISSS
Client
VO-APL1ISSS
IS
.RO2 .RO2 .RO2
RO2
Policy• VO-R02• VO-APL2
VO-RO2IS SS
Client
Grid
VM
IS
Policy• VO-R02
Grid
VM
ISPolicy• VO-R01• VO-APL1• VO-APL2
VO-APL2
ISSS
Grid
VM
IS
Client
RO3
Decoupling of WP1 components for pragmatic VO deployment
"Peter Arzberger" <[email protected]>
Data 1 Data 2 Data n
Grid-wide File System
MetadataManagement
Data Access Management
Data ResourceManagement
Job 1
Meta-data
Meta-data
Data 1
Grid Workflow
Data 2 Data n
NAREGI Data Grid beta1 Architecture (WP4)
Job 2 Job n
Meta-data
Job 1
Grid-wide Data Sharing Service
Job 2
Job n
Data Grid Components
Import data into workflow
Place & register data on the Grid
Assign metadata to data
Store data into distributed file nodes Currently GF
arm v.1.x
Data 1 Data 2 Data n
Gfarm 1.2 PL4(Grid FS)
Data Access Management
NAREGI WP4: Standards Employed in the Architecture
Job 1
Data SpecificMetadata DB
Data 1 Data n
Job 1 Job nImport data into workflow
Job 2
Computational Nodes
FilesystemNodes
Job n
Data ResourceInformation DB
OGSA-DAI WSRF2.0
OGSA-DAI WSRF2.0
Globus Toolkit 4.0.1
Tomcat5.0.28
PostgreSQL 8.0
PostgreSQL 8.0
Workflow (NAREGI WFML =>BPEL+JSDL)
Super Scheduler (SS) (OGSA-RSS)
Data Staging
Place data on the Grid
Data Resource Management
Metadata Construction
OGSA-RSSFTS SC
GridFTP
GGF-SRM (beta2)
NAREGI-beta1 Security Architecture (WP5)
VOMSVOMSMyProxyMyProxy
VOMSProxy
Certificate
VOMSProxy
Certificate
User Management Server(UMS)
User Management Server(UMS)
VOMSProxy
Certificate
VOMSProxy
Certificate
UserCertificate
PrivateKey
Client EnvironmentClient Environment
Portal WFT
PSE
GVS
VOMSProxy
Certificate
VOMSProxy
Certificate SS
clie
nt
Super Scheduler
VOMSProxy
Certificate
VOMSProxy
Certificate
NAREGICA
GridVM
GridVM
GridVM
MyProxy+MyProxy+VOMSProxy
Certificate
VOMSProxy
Certificate
Grid File System(AIST Gfarm)
Grid File System(AIST Gfarm)
Data Grid
disk nodedisk node disk nodedisk node disk nodedisk node
NAREGI-beta1 Security ArchitectureWP5-the standards
NAREGICA
Client EnvironmentClient Environment
Portal WFT
PSE
GVM SS
clie
nt
ProxyCertificatewith VO
ProxyCertificatewith VO
Super Scheduler
ProxyCertificatewith VO
ProxyCertificatewith VO
log-inlog-in
Request/GetRequest/Get CertificateCertificate
VOMSVOMS
CertificateManagement Server
CertificateManagement Server
ProxyCertificate
withVO
ProxyCertificate
withVO
UserCertificate
PrivateKey
Put ProxyCertificaPut ProxyCertificatete
with VOwith VO
Get VOMS Get VOMS AttributeAttribute
MyProxyMyProxyProxy
Certificatewith VO
ProxyCertificatewith VO
voms-myproxy-initvoms-myproxy-init
InformationService
Resources InfoResources Infoincl. VOincl. VO
ResourceResource
GridVMlocal Info.local Info.incl. VOincl. VO
CA Service
ssh + ssh + voms-myproxy-initvoms-myproxy-init
ProxyCertificateProxyCertificatewith VOwith VO
queryquery(requirements(requirements+VO info)+VO info)
resourcesresourcesin the VOin the VO
globusrun-wsglobusrun-ws GridVM seGridVM se
rvicesrvices(incl. GSI)(incl. GSI)
Resource Info.Resource Info.(Incl. VO info)(Incl. VO info)
Signed Signed Job Job
DescriptionDescription
VO 、 Certificate
Management Service
VO InfoVO Info 、、 Execution Info,Execution Info,Resource InfoResource Info
CP/CPS
GRID CP(GGF CAO
Ps)
Audit Criteria
Subset of WebTrust
Programs for CA
VO and User Management Service
• Adoption of VOMS for VO management– Using proxy certificate with VO attributes for the interoperability wit
h EGEE– GridVM is used instead of LCAS/LCMAPS
• Integration of MyProxy and VOMS servers into NAREGI– with UMS (User Management Server) to realize one-stop service a
t the NAREGI Grid Portal– using gLite implemented at UMS to connect VOMS server
• MyProxy+ for SuperScheduler– Special-purpose certificate repository to realize safety delegation b
etween the NAREGI Grid Portal and the Super Scheduler– Super Scheduler receives jobs with user’s signature just like UNIC
ORE, and submits them with GSI interface.
2 CPU8 CPU1 CPU
4 CPU
2 CPU1 CPU4 CPU
4 CPU
Computational Resource Allocation based on VO
• Resource configulation
• WorkflowVO1
VO2
pbg1042
4 CPU png2041
4 CPU
png2040
2 CPU
8 CPU
pbg2039
Different resource mapping for different VOsDifferent resource mapping for different VOs
Local-File Access Control (GridVM)
• Provide VO-based access control functionality that does not use gridmap files.
• Control file-access based on the policy specified by a tuple of Subject, Resource, and Action.
• Subject is a grid user ID or VO name.
DN
Grid UserY
LocalAccount Resource R
Policy Permit: Subject=X, Resource=R,
Action=read,writeDeny: Subject=Y,Resource=R,
Action=read
Grid UserX
GridVMAccess Control
Structure of Local-File Access Control Policy
+TargetUnit
<GridVMPolicyConfig>
<AccessControl>
1
1
<AccessProtection>+Default+RuleCombiningAlgorithm
1
1
<AccessRule>+Effect
1
1..*
<Resources>
1
1<AppliedTo>
1
0..1
<Actions>
1
0..1
<Subjects>
1
0..1
<Resource>
1
1..*<Action>
1
0..*
<Subject>
1
0..*
WhoWho
What Resouce
What Resouce
Access Type
Access Type
ControlControl
user / VO
read / write / execute
permit / deny
file / directory
Policy Example (1) <gvmcf:AccessProtection gvmac:Default="Permit" gvmac:RuleCombiningAlgorithm="Permit-overrides">
<!-- Access Rule 1: for all user --> <gvmcf:AccessRule gvmac:Effect="Deny"> <gvmcf:AppliedTo> <gvmac:Subjects> … <gvmac:Resources> <gvmac:Resource>/etc/passwd</gvmac:Resource> </gvmac:Resources> <gvmac:Actions> …
<!-- Access Rule 2: for a specific user --> <gvmcf:AccessRule gvmac:Effect=“Permit"> <gvmcf:AppliedTo gvmcf:TargetUnit=“user"> <gvmcf:Subjects> <gvmcf:Subject>User1</gvmcf:subject> </gvmcf:Subje
cts> </gvmcf:AppliedTo > <gvmac:Resources> <gvmac:Resource>/etc/passwd</gvmac:Resource> </gvmac:Resources> <gvmac:Actions> <gvmac:Action>read</gvmac:Action> </gvmac:Actions>
DefaultDefault
Applying rules
Applying rules
Policy Example (2)
<gvmcf:AccessRule gvmac:Effect="Permit">
<gvmcf:AppliedTo gvmcf:TargetUnit="vo"> <gvmcf:Subjects> <gvmcf:Subject>bio</gvmcf:Subject> </gvmcf:Subjects > </gvmcf:AppliedTo>
<gvmac:Resources> <gvmac:Resource>/opt/bio/bin</gvmac:Resource> <gvmac:Resource>./apps</gvmac:Resource> </gvmac:Resources>
<gvmac:Actions> <gvmac:Action>read</gvmac:Action> <gvmac:Action>execute</gvmac:Action> </gvmac:Actions>
</gvmcf:AccessRule>
VO nameVO name
Resource name
Resource name
VO-based Resouce Mapping in Global File System (2)
• Next release of Gfarm (version 2.0) will have access control functionality.
• We will extend Gfarm metadata server for the data-resource mapping based on VO.
Gfarm Metadata Server
Gfarm Metadata Server
file server
file server
Client
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
VO1
VO2
Current Issues and the Future Plan
• Current Issues on VO management– VOMS platform
• gLite is running on GT2 and NAREGI middleware on GT4
– Authorization control on resource side• Need to implement new functions for resource control on GridVM, s
uch as Web services, reservation, etc.
– Proxy certificate renewal• Need to invent a new mechanism
• Future plans– Cooperation with GGF security area members to realize inte
roperability with other grid projects– Proposal of a new VO management methodology and trial of
reference implementation.
InformationService
NAREGI Application Mediator (WP6) for Coupled Applications
Job 1
WorkflowNAREGI WFT
co-allocated jobs
Super Scheduler
GridVM GridVM GridVM
Simulation ASimulation ASimulation A
Mediator
Simulation ASimulation ASimulation B
MediatorA
MediatorA
MediatorA
MediatorB
MediatorB
Sim.B
Sim.B
Sim.A
Sim.A
Sim.A
Support data exchange between coupled simulation
MPI
OGSA-DAI WSRF2.0
SQL
JNI
・ Semantic transform- ation libraries for different simulations
Data transfer management
・ Synchronized file transfer
Mediator Components
・ Multiple protocol GridFTP/MPI
Job n
MPI
Data transformationmanagement
・ Coupled accelerator
MPI
GridFTP
MediatorMediator
Globus Toolkit 4.0.1
GridFTP
APIAPI
Data1
Data2
Data3
*SBC: Storage-based communication
SBC* -XML・ Global Job ID・ Allocated nodes・ Transfer Protocol etc.
NAREGI beta on “VM Grid”Create “Grid-on-Demand” environment
using Xen and Globus Workspace Vanilla personal virtual grid/cluster using
our Titech Lab’s research results
NAREGI beta imagedynamic deployment
“VM Grid” – Prototype “S”• http://omiij-portal.soum.co.jp:33980/gridforeveryone.php
• Request # of virtual grid nodes• Fill in the necessary info in the form• Confirmation page appears, follow instructions• Ssh login to NMI stack (GT2+Condor) + selected NAREGI beta MW (Ninf-G, etc.)• Entire Beta installation in the works• Other “Instant Grid” research in the works in the lab
From Interoperation to Interoperability
GGF16 “Grid Interoperations Now”
Charlie CatlettDirector, NSF TeraGrid
on Vacation
Satoshi MatsuokaSub Project Director, NAREGI ProjectTokyo Institute of Technology / NII
Interoperation Activities• The GGF GIN (Grid Interoperations Now) effort
– Real interoperation between major Grid projects– Four interoperation areas identified
• Security, Data Mgmt, Information Service, Job Submission (not scheduling)
• EGEE/gLite – NAREGI interoperation– Based on the four GIN areas– Several discussions, including 3 day meeting at CERN
mid March, email exchanges• Updates at GGF17 Tokyo next week• Some details in my talk tomorrow
The Ideal World: Ubiquitous VO & user management for international
e-Science
Europe: EGEE, UK e-Science, …
US: TeraGrid, OSG,
Japan: NII CyberScience (w/NAREGI), … Other Asian Efforts (GFK, China Grid, etc.)…
Gri
d R
eg
ion
al In
frast
ruct
ura
l Eff
ort
sC
olla
bora
tive t
alk
s on
PM
A,
etc
.
HEPGridVO
NEES-EDGridVO
AstroIVO
Standardization,commonality in software platforms will realize this
Differentsoftware stacksbut interoperable
The Reality: Convergence/Divergence of Project
Forces(original slide by Stephen Pickles, edited by Satoshi Matsuoka)
NGS(UK)
TeraGrid(US)
OSG(US)
EGEE(EU)
LCG(EU)GridPP(UK)
OMII(UK)
GT4 WSRF (OGSA?)
gLite / GT2
Own WSRF & OGSA
Globus(US)
GGF
DEISA(EU) common staff & procedures
common users
NAREGI (JP)
interoperable infrastructure talks
WSRF & OGSA, GT4/Fujitsu WSRF
NMI(US)
IBMUnicore
CSI (JP)
UniGrids(EU)
WS-I+ & OGSA?
AIST-GTRC
interoperable infrastructure talks
interoperable infrastructure talks
Condor(US)
APAC Grid(Australia)
EU-China Grid(China)
GGF Grid Interoperation Now
• Started Nov. 17 2005 @SC05 by Catlett and Matsuoka– Now participation by all major grid projects
• “Agreeing to Agree on what needs to be Agreed first”• Identified 4 Essential Key Common Services
– Authentication, Authorization, Identity Management• Individuals, communities (VO’s)
– Jobs: submission, auditing, tracking• Job submission interface, job description language, etc.
– Data Management• Data movement, remote access, filesystems, metadata mgmt
– Resource discovery and Information Service• Resourche description schema, information services
“Interoperation” versus “Interoperability”
• Interoperability“The ability of software and hardware on multiple machines frommultiple vendors to communicate“– Based on commonly agreed documented
specifications and procedures• Interoperation
“Just make it work together”– Whatever it takes, could be ad-hoc,
undocumented, fragile– Low hanging fruit, future interoperability
Interoperation Status• GIN meetings GGF16 and GGF17• 3-day meeting at CERN end of March• Security
– Common VOMS/GSI infrastructure– NAREGI more complicated use of GSI/Myproxy and proxy delegation but should be
OK• Data
– SRM commonality and data catalog integration– GFarm and DCache consolidation
• Information Service– CIM vs. GLUE schema differences– Monitoring system differences fairly– Schema translation (see next slides)
• Job Submission– JDL vs. JSDL, Condor-C/CE vs. OGSA SS/SC-VM architectural differences, etc.– Simple job submission only (see next slides)
• Basic syntax: – Resource description schemas (e.g., GLUE, CIM) – Data representations (e.g., XML, LDIF)– Query languages (e.g., SQL, XPath)– Client query interfaces (e.g., WS Resource Properties queries, LDAP, OGSA-DAI)
• Semantics: – What pieces of data are needed by each Grid (various previous works & actual deployment experiences already)
• Implementation: – Information service software systems (e.g., MDS, BDII) – The ultimate sources of this information (e.g., PBS, Condor, G
anglia, WS-GRAM, GridVM, various grid monitoring systems, etc.).
Information Service Characteristics
NAREGI Information Service
GT4.0.1DistributedInformation Service
Client(OGSA- RSS etc.)
Clientlibrary
Java-API
RDB Light-weightCIMOMService
AggregatorService
OS
Processor
File System
CIM Providers
● ●
WS-IRUS
RUS::insertURs
GridVM(Chargeable
Service)
Job Queue
Cell Domain Information Service
Node B
Node A
Node C
ACL
GridVM
Cell Domain Information Service
Information Service Node Information Service Node
… Hierarchical filtered aggregation... Distributed Query …
PerformanceGanglia
OGSA-DAI WSRF2.1
ViewerUserAdmin.
OGSA-DAIClient toolkit
CIM Schema2.10
/w extension
GGF/UR
GGF/UR
APP
CIM spec.CIM/XML
Tomcat 5.0.28
Client(OGSA- BES etc.)
APP
Relational Grid Monitoring Architecture
ProducerService
RegistryService
ConsumerService
Mediator
SchemaService
Consumerapplication
Producerapplication
Publish Tuples
Send Query
Receive Tuples
Register
LocateQu
ery
Tu
ples
SQL “CREATE TABLE”
SQL “INSERT”
SQL “SELECT”
• An implementation of the GGF Grid Monitoring Architecture (GMA)
• All data modelled as tables: a single schema gives the impression of one virtual database for VO
Vocabulary Manager
Grid Schema Data Query Lang
Client IF Software
Tera-Grid
GLUE XML XPath WSRF RP Queries
MDS4
OSG GLUE LDIF LDAP LDAP BDII
NAREGI CIM 2.10+ext
Relational
SQL OGSA-DAIWS-I RUS
CIMOM + OGSA-DAI
EGEE/LCG
GLUE LDIF LDAP LDAP BDII
Relational
SQL R-GMA i/f R-GMA
NorduGrid
ARC LDIF LDAP LDAP GIIS
Syntax Interoperability Matrix
Low Hanging Fruit “Just make it work by GLUEing”
• Identify the minimum common set of information required for interoperation in the respective information service
• Employ GLUE and extended CIM as the base schema for respective grids
• Each Info service in grid acts as a information provider for the other
• Embed schema translator to perform schema conversion
• Present data in a common fashion on each grid ; WebMDS, NAREGI CIM Viewer, SCMSWeb, …
Minimal Common Attributes
Provisioning
ACS
IS
SC DC
Accounting
EPS
CSG
Reservation
JM
Provisioning
ACS
IS
SCDC
Accounting
EPS
CSG
Reservation
JM
Common attributes
• Define minimal common set of attributes required• Each system components in the grid will only access the translated
information
ProducerService
RegistryService
ConsumerService
Mediator
Schema
CIM→GLUEproducer
Publish Tuples
Send Query
Receive Tuples
Register
LocateQu
ery
Tu
ples
SQL “CREATE TABLE”SQL “INSERT”
SQL “SELECT”
GLUE→CIMtranslator
CIM providerskeleton
NRG_UnicoreAccountOSSystem
LightweightCIMOM
Cell Domain Information Service
for EGEE resources
RDBAggregator
ServiceOGSA-DAI
Multi-GridInformation Service Node
OGSA-DAI AggregatorService
CD
IS for N
AR
EG
I
GLUE-CIMmapping;selectedMinimal
Attributes
G-Lite / R-GMA
NAREGI
GLUE→CIM translation
・ Development of information providers with translation from GLUE data model to CIM about selected common attributes such as up/down status of grid services
CD
IS fo
r NA
RE
GI
SQL “SELECT”
Interoperability: NAREGI Short Term Policy
• gLite– Simple/Single Job (up to SPMD)– Bi-Directional Submission
• NAREGI gLite: GT2-GRAM• gLite NAREGI: Condor-C
– Exchange Resource Information• GIN
– Simple/Single Job (up to SPMD)– NAREGI GIN Submission– WS-GRAM– Exchange Resource Information
• BES/ESI– TBD
• Status somewhat Confusing• ESI middleware already developed ?
– Globus 4.X and/or UnicoreGS ?
Job Submission Standdards Comparison: Goals
ESI BES NAREGI
SS SC(VM)
Use JSDL ✔*1 ✔ ✔*1 ✔*1
WSRF OGSA Base Profile 1.0 Platform ✔ ✔ ✔ ✔
Job Management Service ✔ ✔ ✔ ✔
Extensible Support for Resource Models ✔ ✔
Reliability ✔ ✔ ✔ ✔
Use WS-RF modeling conventions ✔ ✔ ✔ ✔
Use WS-Agreement
Advance reservation ✔ ✔
Bulk operations ✔
Generic management frameworks (WSDM)
Define alternative renderings
Server-side workflow ✔
*1: Extended
Job Factory OperationsESI (0.6) BES (Draft v16) NAREGI (
Original CreateManagedJob(there is a subscribe option)
CreateActivityFromJSDLGetActivityStatusRequestActivityStateChangesStopAcceptingNewActivitiesStartAcceptingNewActivitiesIsAcceptingNewActivitiesGetActivityJSDLDocuments
MakeReservationsCommitReservations
WS-ResourceProperties
GetResourcePropertyGetMultipleResourcePropertiesQueryResourceProperties
GetResourcePropertyGetMultipleResourcePropertiesQueryResourceProperties
GetResourcePropertyGetMultipleResourceProperties
SetResourceProperties
WS-ResourceLifeTime
ImmediateResourceDestructionScheduledResourceDestruction
DestroySetTerminationTime
WS-BaseNotification
NotificationProducer NotifySubscribe
Now working with Dave Snelling et. al. to converge to ESI-API (which is similar to WS-GRAM) , plus
* Advance Reservation* Bulk submission
Interoperation: gLite-NAREGI
gLite to NAREGIbridge
NAREGI client lib
• ClassAd JSDL• NAREGI-WF generation• Job Submit./Ctrl.• Status propagation• Certification ?
NAREGI-IS[CIM]
IS bridgegLite-IS[GLUE]
• GLUE CIM
gLite-WMS[JDL]
Condor-C
NAREGI-SS[JSDL]
NAREGI-SCInterop-SC
gLite-CE NAREGI GridVM
• JSDL RSL• Job Submit./Ctrl.• Status propagation
GT2-GRAM
NAREGI Portal
gLite userNAREGI user
Interoperation: GIN (Short Term) NAREGI-IS
[CIM]IS bridge
anotherGrid-IS[CIM or GLUE]
• GLUE CIM
anotherGrid[JSDL]
GT4
NAREGI-SS[JSDL]
NAREGI-SCInterop-SC
anotherGrid-CE NAREGI GridVM
• JSDL RSL• Job Submit./Ctrl.• Status propagation
WS-GRAM
NAREGI Portal
another griduser NAREGI user
WS-GRAM
Sorry !!One way now