ctss 4 strategy and status. general character of ctssv4 to meet project milestones, ctss changes...

21
CTSS 4 Strategy and Status

Upload: piers-murphy

Post on 29-Dec-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

CTSS 4 Strategy and Status

General Character of CTSSv4

• To meet project milestones, CTSS changes must accelerate in the coming years.

• Process– Process will be the focus of CTSSv4.– Significant changes in who and how, not so much in what.– Process changes now will enable us to more effectively manage

content changes in the future.

• Content– Newer component versions that include features we need– More allowable versions– Support for more platforms

CTSS 4 Process Goals

• Change the focus from software packages to capabilities.– Software should be deployed to meet user capability requirements, not

to satisfy abstract agreements.– Which capabilities ought to be coordinated, and why?

• Be explicit about which capabilities are expected to be on which systems.– The CTSS core (mandatory capabilities) is radically smaller.– Each RP explicitly decides which additional capabilities it will provide,

based on the intended purpose of each system.• Make the process of defining CTSS more open and inclusive and

more reflective of the TeraGrid project structure (GIG + multiple RPs, working groups, RATs, etc.).– GIG/RP working groups and areas have an open mechanism for

defining, designing, and delivering CTSS capabilities.– Expertise is distributed, so the process should be distributed also.

• Improve coordination significantly.– Changes are coordinated more explicitly with more TeraGrid sub-

teams.– Each sub-team has a part in change planning.

CTSS 4 Strategy

•Break the CTSS monolith into multiple capability modules.

•Employ a formal change planning process.

CTSS “Kits”

• Reorganize CTSS into a series of kits. – A kit provides a small set of closely related capabilities. (job

execution service, dataset hosting service, high-performance data movement, global filesystem, etc.)

– A kit is (as much as possible) independent from other kits.• Each kit includes:

– a definition of the kit that focuses on purpose, requirements, and capabilities, including a problem statement and a design;

– a set of software packages that RP administrators can install on their system(s) in order to implement the design;

– documentation for RP administrators on how to install and test the software;

– inca tests that test whether a given system satisfies the stated requirements;

– softenv definitions that allow users to access the software.

The “Core” Kit

•Provides the capabilities that are absolutely necessary for a resource to meet the most basic integrative requirements of the TeraGrid.– Common Authentication, Authorization, Auditing, and

Accounting capabilities– A system-wide registry of capabilities and service

information– A Verification & Validation mechanism for capabilities– System-wide Usage Reporting capabilities

•This is much smaller than the current set of “required” CTSSv3 components.

•Unlike other capability kits, the Core Kit is focused on TeraGrid operations, as opposed to user capabilities.

Core Kit Provides Integrative Services

• Authentication, Authorization, Auditing, Accounting Mechanisms– Supports TeraGrid allocation processes– Allows coordinated use of multiple systems– Supports TeraGrid security policies– Goal: Forge a useful link between campus authentication systems,

science gateway authentication systems, and TeraGrid resources• Service Registry

– Goal: Provide a focal point for registering presence of services and capabilities on TeraGrid resources

– Goal: Support documentation, testing, automatic discovery, and automated configuration for distributed services (tgcp)

• Verification & Validation– Independently verifies the availability of capabilities on each resource– Goal: Focus more clearly on the specific capabilities each resource is

intended to offer • Usage Reporting

– Goal: Support the need to monitor and track usage of TeraGrid capabilities

CTSS Capability Kits

•Each CTSS capability kit is an opportunity for resource providers to deploy a specific capability in coordination with other RPs.– Focal point for collecting and clarifying user requirements (via

a RAT)– Focal point for designing, documenting, and implementing a

capability (via a WG)– Focal point for deploying the capability (via the software WG)

•RPs can explicitly decide and declare which capabilities they intend to provide on each resource.– What is appropriate for each resource?– What is the RP’s strategy for delivering service to the

community?

•TeraGrid’s system-wide registry tracks which CTSS capabilities are provided by each resource.– By declaration, by registration, and by verification

CTSS Capability Kits

•Kits may be defined, designed, implemented, packaged, documented, and supported by a broad range of people. – RATs– Working groups– GIG areas– Resource providers– Other communities

•The key feature of a CTSS capability is that its deployment is coordinated among RPs.

CTSS 4 Capability Kits

•Led by GIG SI team and Software Working Group– TeraGrid Core Capabilities– Application Development & Runtime– Remote Compute– Remote Login– Science Workflow Support

•Led by GIG DIVS team and Data Working Group– Data Management– Data Movement– Wide Area Filesystem

CTSS 3 Mapped to Capability Kits

Led by GIG SI Team and Software WGTeraGrid Core AMIE Resource Toolkit, gx-map, Inca, Pacman, MDS4

Index, tg-policy, tgresid

App Devel & Runtime Ant, BLAS, gcc, Globus clients & libs, gsissh client, HDF4, HDF5, HIS, Intel compiler & MKI, Java, MPICH-G2, MPIs - local, PHDF5, Python, SoftEnv, SRB Client, TCL, TGCP, XLF

Remote Compute Pre-WS GRAM, WS GRAM

Remote Login Myproxy client, SSH/GSISSH, tgusage, UberFTP

Science Workflow Support

Condor, GridShell, Condor-G

Led by GIG DIVS Team and Data WGData Management RLS

Data Movement GridFTP, GridFTP SRB, RFT

Wide Area Filesystem GPFS

Change Coordination Process

•Motivation– New CTSS kit structure results in more potential

sources of changes.– Scaling of resources (both number and diversity)

results in more potential points of confusion/coord failure.

– In general, we’d like to do this better.

•Goals– Clarity of purpose for changes– Help for documentation– Help in identifying points requiring coordination– Tracking deployment steps and progress– Easy to use for small changes, helpful for large

changes.

Change Coordination Structure

•Change Description Data Sheet– Collects the basic facts about a proposed change– Provides everyone with the information needed to

understand what is planned and who is involved and to identify potential risks

– Will provide a record of changes

•Change Planning Checklist– Helps the planning team brainstorm about what

the necessary points of coordination are– Provides opportunities for recording the

coordination plans for each sub-team (docs, user services, RP admins, security, etc.)

– Helps get coordination started early

(Introduce Examples)

A Note on Deployment Schedules

•Each kit can have its own schedule.– Software WG will serve as coordination point for

schedules (load management, schedule conflicts, etc.)

– Software WG will also manage dependency issues (between kits)

•Expectation is that each kit rollout (updates, etc.) will follow the change management process.– Coordination with RPs, docs team, user services,

operations, etc.– Staff & friendly user testing

A Note on Documentation

•The “kit” design results in early documentation.– A coherent story about the kit’s capabilities– What is it and why do we have it?– Who is it for?

•Documentation team can use this to build plans for user documentation.

•User services can use this to build plans for testing.– Test plans can be created before the software is

deployed.– Work can begin on identifying friendly user candidates.

•Change coordination process reinforces this.

A Note on Inca

•Inca tests only the capabilities that are provided by each resource.– Which capabilities are declared for each resource?– Which capabilities are registered on each

resource?

•Inca can provide a piece of the system-wide registry.– Which capabilities have been verified on each

resource?

•Inca tests are provided by kit owners, so are more closely linked to intended capabilities.

GIG Software Integration Team

•Role Changing from Sole Integrator to Integrator + Integration Service Provider

•GIG SI still has ownership of several kits.•For other kit owners, we provide services.

– Consultation– Publishing guidelines/recommendations– Relationships with software developers (Globus, VDT, NMI,

etc.)– Multiplatform software builds (and testing) service– Multiplatform packaging service– Software clearinghouse (website, CVS, Pacman cache, etc.)

CTSS 4 Timeline - June

•Develop the list of capability kits that spans CTSS 3.– What distinct capabilities are provided by CTSS?– Which kit should each CTSS 3 component belong

to?

•This work has been done.

CTSS 4 Timeline - July

•Assign each capability kit to a team.– Done (see prev. slide)

•Define capability kit purposes– What capabilities does each kit provide?– In progress (delayed)

•Identify kits that will be updated sooner rather than later (new versions, etc.).– In progress

•Draft change coordination process and test it.– In progress

CTSS 4 Timeline - August

•Draft change plans for kits requiring updates.– Coordination documentation delivered

•Begin executing change plans for updated kits.– Begin implementing packages– Begin test plans, documentation plans, security

reviews…– Decide which resources will have which

capabilities– Schedule deployment activities with each RP