notes on the gae

12
Notes On the GAE Notes On the GAE Harvey B. Newman Harvey B. Newman California Institute of Technology California Institute of Technology Grid-enabled Analysis Environment Workshop Grid-enabled Analysis Environment Workshop June 24, 2003 June 24, 2003

Upload: kelly-collins

Post on 04-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Notes On the GAE. Harvey B. Newman California Institute of Technology Grid-enabled Analysis Environment Workshop June 24, 2003. GAE Workshop Goals (1). “Getting Our Arms Around” the Grid-Enabled Analysis “Problem” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Notes On the GAE

Notes On the GAENotes On the GAE

Harvey B. NewmanHarvey B. Newman California Institute of TechnologyCalifornia Institute of Technology

Grid-enabled Analysis Environment WorkshopGrid-enabled Analysis Environment WorkshopJune 24, 2003June 24, 2003

Page 2: Notes On the GAE

GAE Workshop Goals (1)GAE Workshop Goals (1)GAE Workshop Goals (1)GAE Workshop Goals (1) ““Getting Our Arms Around” the Grid-Enabled Getting Our Arms Around” the Grid-Enabled

Analysis “Problem” Analysis “Problem” Review Existing Work Towards a GAE:Review Existing Work Towards a GAE:

Components, Interfaces, System Concepts Components, Interfaces, System Concepts Review Client Analysis Tools; Consider How to Integrate ThemReview Client Analysis Tools; Consider How to Integrate Them User Interfaces: What does the GAE Desktop Look Like ?User Interfaces: What does the GAE Desktop Look Like ?

(Different Flavors) (Different Flavors) Look At Requirements, Ideas for a GAE Architecture Look At Requirements, Ideas for a GAE Architecture

A Vision of the System’s Goals and WorkingsA Vision of the System’s Goals and Workings Attention to Strategy and Policy Attention to Strategy and Policy

Develop (Continue) a Program of Simulations Develop (Continue) a Program of Simulations of the System of the System For the Computing Model, and Defining the GAEFor the Computing Model, and Defining the GAE Essential for Developing a Feasible Vision; DevelopingEssential for Developing a Feasible Vision; Developing

Strategies, Solving Problems and Optimizing the System Strategies, Solving Problems and Optimizing the System With a Complementary Program of PrototypingWith a Complementary Program of Prototyping

Page 3: Notes On the GAE

GAE Collaboration DesktopGAE Collaboration DesktopExampleExample

Four-screen Analysis Desktop Four-screen Analysis Desktop 4 Flat Panels: 5120 X 1024; RH94 Flat Panels: 5120 X 1024; RH9

Driven by a single server and Driven by a single server and single graphics cardsingle graphics card

Allows simultaneous work on:Allows simultaneous work on: Traditional analysis tools Traditional analysis tools

(e.g. ROOT)(e.g. ROOT) Software development Software development Event displays (e.g. IGUANA)Event displays (e.g. IGUANA) MonALISA monitoring MonALISA monitoring

displays; Other “Grid Views”displays; Other “Grid Views” Job-progress ViewsJob-progress Views Persistent collaboration Persistent collaboration

(e.g. VRVS; shared windows)(e.g. VRVS; shared windows) Online event or detector Online event or detector

monitoringmonitoring Web browsing, emailWeb browsing, email

Page 4: Notes On the GAE

GAE Workshop Goals (2)GAE Workshop Goals (2)GAE Workshop Goals (2)GAE Workshop Goals (2) Architectural Approaches: Choose A Feasible Direction Architectural Approaches: Choose A Feasible Direction

For example a For example a Managed Services ArchitectureManaged Services Architecture Be Prepared to Learn by Doing;Be Prepared to Learn by Doing;

Simulating and Prototyping Simulating and Prototyping Where to Start, and the Development StrategyWhere to Start, and the Development Strategy

Existing and MissingExisting and Missing Parts of the System Parts of the System [Layers; Concepts] [Layers; Concepts]

When to Adapt Existing Components, When to Adapt Existing Components, Or to Re-Build Them “from Scratch” Or to Re-Build Them “from Scratch”

Manpower Available to Meet the Goals; ShortfallsManpower Available to Meet the Goals; Shortfalls Allocation of Tasks; Including Generating a PlanAllocation of Tasks; Including Generating a Plan

Linkage Between Analysis and Grid-Enabled ProductionLinkage Between Analysis and Grid-Enabled Production Planning for Closer Relationship with LCG, Trillium, Planning for Closer Relationship with LCG, Trillium,

and the Experiments’ starting Efforts in this areaand the Experiments’ starting Efforts in this area

Page 5: Notes On the GAE

Self Discovering, CooperativeSelf Discovering, Cooperative Registered Services, Lookup Services; self-describingRegistered Services, Lookup Services; self-describing “ “Spaces” for Mobile Code and ParametersSpaces” for Mobile Code and Parameters

Scalable and Robust Scalable and Robust Multi-threaded: with a thread pool managing engineMulti-threaded: with a thread pool managing engine Loosely Coupled: errors in a thread don’t stop the task Loosely Coupled: errors in a thread don’t stop the task

Stateful: System State as well as task stateStateful: System State as well as task state Rich set of “problem” situations: implies Rich set of “problem” situations: implies Grid Views, Grid Views,

and and User/System DialoguesUser/System Dialogues on what to do on what to do For Example: Raise Priority (Burn Quota); or Redirect WorkFor Example: Raise Priority (Burn Quota); or Redirect Work

Eventually may be increasingly automated asEventually may be increasingly automated as we scale up and gain experience we scale up and gain experience

Managed; to deal with a Complex Execution EnvironmentManaged; to deal with a Complex Execution Environment Real time higher level supervisory services monitor, Real time higher level supervisory services monitor, track, optimize and Revive/Restart services as needed track, optimize and Revive/Restart services as needed

Policy and strategy-driven; Self-Evaluating and OptimizingPolicy and strategy-driven; Self-Evaluating and Optimizing Investable with increasing intelligenceInvestable with increasing intelligence

Agent Based; Evolutionary Learning AlgorithmsAgent Based; Evolutionary Learning Algorithms

HENP Grids: Services Architecture HENP Grids: Services Architecture Design for a Global SystemDesign for a Global System

Page 6: Notes On the GAE

Work on Computing Model (Essential) in ParallelWork on Computing Model (Essential) in Parallel Focus on a Few Scenarios for Doing AnalysisFocus on a Few Scenarios for Doing Analysis

“ “Grid Enabled PROOF” [in CMS; in ATLAS]Grid Enabled PROOF” [in CMS; in ATLAS] Start with Existing Analysis Applications: Start with Existing Analysis Applications:

Can they be recast in GAE Form ? Can they be recast in GAE Form ? Make Some Starting AssumptionsMake Some Starting Assumptions

Need some simple picture of persistencyNeed some simple picture of persistency Supplementary considerations:Supplementary considerations:

Multiuser situation (e.g. with avatars; then Analysis Challenges)Multiuser situation (e.g. with avatars; then Analysis Challenges) Coming to a few Either/Or DecisionsComing to a few Either/Or Decisions

List of rudimentary analysis tools, and way of workingList of rudimentary analysis tools, and way of working ““External” to the application considerations:External” to the application considerations:

Job planningJob planning Key role of query estimation (not only beforehand)Key role of query estimation (not only beforehand) Transparency versus trackingTransparency versus tracking

Getting Started Towards a Workable Getting Started Towards a Workable GAE (1)GAE (1)

Page 7: Notes On the GAE

Session or Sessions on the DesktopSession or Sessions on the Desktop There Modes of Working; All in the GAEThere Modes of Working; All in the GAE

Immediate (within a few seconds)Immediate (within a few seconds) In the background (seconds to a few minutes)In the background (seconds to a few minutes) Spawn batch job or jobs (minutes to hours)Spawn batch job or jobs (minutes to hours)

Decisions and tradeoffsDecisions and tradeoffs Lay out the strategies and consequences (time, quota etc)Lay out the strategies and consequences (time, quota etc) Present ChoicesPresent Choices Monitor progress or get “alarms” and be preparedMonitor progress or get “alarms” and be prepared

to re-strategize to re-strategize

Getting Started Towards a Workable Getting Started Towards a Workable GAE (2) GAE (2)

Page 8: Notes On the GAE

Smart Caching: Or Methods, of Data, or Time to Process Info.Smart Caching: Or Methods, of Data, or Time to Process Info. Intelligence in the system does not only mean problemIntelligence in the system does not only mean problem

solving solving Need to apply intelligence/experience to progressively improveNeed to apply intelligence/experience to progressively improve

system performance system performance Time-to-completion estimation: process a small amount ofTime-to-completion estimation: process a small amount of

data to get a realistic first estimate. data to get a realistic first estimate.

Getting Started Towards a Workable Getting Started Towards a Workable GAE (3) GAE (3)

Page 9: Notes On the GAE

These Slides Focus on Simulation/Prototyping, These Slides Focus on Simulation/Prototyping, as an Integral part of designing and building distributed systems for as an Integral part of designing and building distributed systems for the GAE, and the Grid-Enabled Production Environment (GPE) as the GAE, and the Grid-Enabled Production Environment (GPE) as well. well.

3 Slides About Building a Computing 3 Slides About Building a Computing Model & the GAE System Model & the GAE System

Page 10: Notes On the GAE

Generate a Blueprint: A “Computing Model”Generate a Blueprint: A “Computing Model”Tasks Tasks Workload, Facilities, Priorities & GOALS Workload, Facilities, Priorities & GOALS Persistency; Modes of Accessing Data (e.g. Object Collections)Persistency; Modes of Accessing Data (e.g. Object Collections) What runs where; when to redirectWhat runs where; when to redirect The User’s Working EnvironmentThe User’s Working Environment

What is normal (managing expectations) ?What is normal (managing expectations) ? Guidelines for dealing with problems: Guidelines for dealing with problems: based on which information ? based on which information ?

Performance and problem reporting/tracking/handling ?Performance and problem reporting/tracking/handling ? Known Problems: Strategies to deal with thoseKnown Problems: Strategies to deal with those

Set up, code a Simulation of the ModelSet up, code a Simulation of the Model Develop mechanisms and sub-models as neededDevelop mechanisms and sub-models as needed

Set up prototypes to measure the performance parameters Set up prototypes to measure the performance parameters where not already known to sufficient precisionwhere not already known to sufficient precision

Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (I)and an Analysis Strategy (I)

Page 11: Notes On the GAE

Run simulations (avatars for “actors”; agents; tasks; mechanisms)Run simulations (avatars for “actors”; agents; tasks; mechanisms) Analyze and evaluate performanceAnalyze and evaluate performance

General performance (throughput; turnaround)General performance (throughput; turnaround) Ensure “all” work is done: learn how to do this: within a Ensure “all” work is done: learn how to do this: within a reasonable time; compatible with the Collaboration’s guidelinesreasonable time; compatible with the Collaboration’s guidelines

Vary Model to Improve PerformanceVary Model to Improve Performance Deal with bottlenecks and other problemsDeal with bottlenecks and other problems New strategies and/or mechanisms to manage workflowNew strategies and/or mechanisms to manage workflow Represent key features and behaviors, for example:Represent key features and behaviors, for example:

Responses to Link or Site failuresResponses to Link or Site failures User input to redirect data or jobsUser input to redirect data or jobs Monitoring information gathering Monitoring information gathering Monitoring and management agent actions and Monitoring and management agent actions and behaviors in a variety of situations behaviors in a variety of situations

Validate the ModelValidate the Model Using Dedicated setupsUsing Dedicated setups Using Data Challenges (measure, evaluate, compare; fix key items)Using Data Challenges (measure, evaluate, compare; fix key items) Learn of new factors and/or behaviors to take into accountLearn of new factors and/or behaviors to take into account

Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (II)and an Analysis Strategy (II)

Page 12: Notes On the GAE

MAJOR MilestoneMAJOR Milestone: Obtain a first picture of a Model that : Obtain a first picture of a Model that Seems to Work Seems to Work

This may or may not involve changes in the computing resource This may or may not involve changes in the computing resource requirements-estimates; or Collaboration policies and expectationsrequirements-estimates; or Collaboration policies and expectations It is hard to estimate how long it will take to It is hard to estimate how long it will take to reach this milestone reach this milestone [most experiments until now have reached it [most experiments until now have reached it after the start of data taking] after the start of data taking]

Evolve the Model to Evolve the Model to Distinguish what works and what does notDistinguish what works and what does not Incorporate evolving site hardware and network performanceIncorporate evolving site hardware and network performance Progressively incorporate new and “better” strategies, to Progressively incorporate new and “better” strategies, to improve throughput and/or turnarounds, or fix critical problems improve throughput and/or turnarounds, or fix critical problems Take into account experience with the actual software-system Take into account experience with the actual software-system components as they developcomponents as they develop

In parallel with the Model evolution keep developing the overallIn parallel with the Model evolution keep developing the overall data analysis + Grid + monitoring “system”; represent it in the data analysis + Grid + monitoring “system”; represent it in the simulation simulation

And the associated strategiesAnd the associated strategies

Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (III)and an Analysis Strategy (III)