1 dr. frederica darema cise/nsf performance engineering large scale computing systems sc07-apart...
Post on 21-Dec-2015
216 views
TRANSCRIPT
1
Dr. Frederica DaremaCISE/NSF
Performance Engineering Large Scale Computing Systems
SC07-APART Workhop on:Performance Analysis and Optimization
of High-End Computing Systems
2
Outline
• The BIG PICTURE• Applications Directions• Computing Platforms Directions• Research and Technology Directions• Examples of some advances• Future Challenges and Opportunities
Science, Engineering, and “Commercial” Applications
Environments: how are they shaping in the future
What does it entail for:Large-Scale Computing
and.. for Large-Scale High-End Computing
4
• Processing at multiple levels• Computation and data processing, both at the
application and the instruments/sensors side • New Computational Units
– Beyond commodity microprocessors /superscalar / (D)MT GPU/(GP)2Us (MC-P), MT, FPGAs, GPUs, …
– Populating: high-end platforms, workstations, visualization servers, data servers, etc, …
• Potentially: – MC-Ps, FPGAs, GPUs at application side– MC-Ps, FPGAs, GPUs at the data acquisition side
• One kind of processor EVERYWHERE??? • Or Mix of MC-Ps, FPGAs, GPUs??? • Pros & deficiencies in each - advances close gaps• Complexity persists and increases
Small-Scale and Large-Scale Systems –Increasing complexity of systems and applications …
5
Platforms Directions
Distributed Platform
MPP NOW
SAR
tac-com
database
firecntl
firecntl
alg accelerator
database
SP
….
– Vector Processors– SIMD MPPs
• Latencies– variable (internode,
intranode)• Bandwidths
– different for different links
– different based on traffic
– Distributed Memory MPs– Shared Memory MPs
– Distributed Platforms, Heterogeneous Computers and Networks
• Heterogeneity– architecture (computer &network)– node power
(supernodes, MCP)
Past
Prese
ntFutu
re
Petaflops Platform(Grid-in-a-Box)
6
– Mostly monolithic– Mostly one
programming language
– Multi-Modular– Multi-Language– Multi-
Developers– Multi-Source
Data
Present / Future
– Computation Intensive
– Batch– Hours/days
– Computation Intensive– Data Intensive– Real Time– Few Minutes/hours– Visualization – Interactive Steering– Integrated Simulations&Experiments
Dynamic Data Driven Applications Systems
Past
Applications Directions
7
Dynamic Integration of Computation & Measurements/Data
(from the Real-Time to the High-End)Unification of
Computing Platforms & Sensors/InstrumentsDDDAS guides sensor systems architectures
Example of new applications and systems directionsDynamic Data Driven Application Systems (DDDAS)(www.cise.nsf.gov/dddas & www.dddas.org)
ExperimentMeasurements
Field-Data(on-line/archival)
User
Theory
(First Principles) Simulations
(Math.Modeling
Phenomenology
Observ’n Modeling
Design)
Dynamic Feedback & Control
Loop
Challenges:Application Simulations MethodsAlgorithmic Stability Measurement/Instrumentation MethodsComputing Systems Software Support
DDDAS: ability to dynamically incorporate additional data into an executing application, and in reverse, ability of an application to dynamically steer the measurement process
Software Architecture Frameworks Synergistic, Multidisciplinary
Research
8
TeraGrid• A distributed system of
unprecedented scale•30+ TF, 1+ PB, 40 Gb/s net
• Unified user environment across resources•User software environment
User support resources
• Integrated new partners to introduce new capabilities•Additional computing,
visualization capabilities•New types of resources:
data collections, instruments
• Created an initial community of over 500 users, 80 PIs
• Created User Portal in collaboration with NMI
courtesy Charlie Catlett
9
DDDAS: Beyond Grid Computing “Extended Grid” – “SuperGRID”:
the Application Platform is
the computational&measurement system
Applications
Com
puta
tion
al
Plat
form
s
Inst
rum
ents
Sens
ors
Archi
val/
Stor
ed D
ata
Measurement Grids Computational GridsSuperGrids: Dynamically Coupled Networks of Data and Computations
10
Examples of TeraGrid Applications
Lattice-Boltzman Simulations
Coveney, UCLBruce Boghosian, Tufts
Wheeler/UTAustin, Saltz/OSU,Parashar/RutgersReservoir Modeling
Animation pointed to by 2003 Nobel chemistry prize announcement.
Schulten, UIUC
Aquaporin Mechanism
Groundwater/Flood ModelingMaidment, Wells, UT
Atmospheric ModelingDroegemeier, OU
Advanced Support for TeraGrid Applications:
TeraGrid staff are “embedded”
with applications to create- Functionally distributed workflows- Remote data access, storage
and visualization- Distributed data mining- Ensemble and parameter sweep
run and data management
courtesy Charlie Catlett
11
To address the complexity of today’s and future systems, applications and their
environments We need systematic modeling and analysis
approaches for designing, supporting the runtime, and management of such
systems
Systems Performance Engineering
12
Background• Systems Modeling and Analysis increasingly important:
– systems design cycle and runtime– measurements (static and runtime)– functional correctness of hw, hw and sw performance,
dependability, reliability, power management, security, debugging, …
• Traditionally/in the past (for example): – modeling specific aspects components, rather than full system– architectural simulators trade speed for accuracy – full-system
simulators trade accuracy for speed• Want modeling/simulation capabilities that allow
– accurate – cycle level resolution – complete modeling of the entire system – simulate execution of real workloads (full applications or
realistic benchmarks) on top of real OS systems– allow users to probe features in the systems (hardware,
systems software, application) • A number of research efforts are addressing such challenges, and
more…
13
System Modeling and Analysisdevelop methods and tools for modeling, measuring, analyzing,
evaluating, and predicting the performance, dependability, reliability, runtime management, debugging, security, etc..
for design & runtime support of complex computing and communications systems
• Hardware and Software modeling
– methods tools and measurements, providing multimodal, hierarchical or multilevel modeling and analysis capabilities of such systems;
– methods that describe components of the system, but also the system as a total, and enable assessment of the effects of individual hardware and software layers and components of these systems;
– ability to describe the system in multiple levels of detail (characteristics and time-scales);
– combine different (hybrid) methods of describing components and layers, from analytical, statistical, to simulation, emulation, etc….
– performance specification languages and compilers– testing & validation of developed methods and tools
14
System Modeling and Analysis
• Modeling and measurement approaches– capabilities to describe, analyze and predict the behavior of the
components as well as the systems; – analysis and prediction due to characteristics or changes in the
application, system software, hardware; – multilevel approaches and multi-modal approaches
• Performance Frameworks – combine tools in “plug-and-play” fashion – multiple views of the system
• Use of systems modeling and analysis methods and tools beyond the design cycle..… that is: to support optimized application composition, mapping, runtime with performance, dependability, fault-tolerance
15
Authenication
/
Authorization
Fault Recovery
Services
Distributed Systems Management
Distributed, Heterogeneous, Dynamic, AdaptiveComputing Platforms and Networks
DeviceTechnology . . .
CPUTechnology
Visualization
Scalable I/OData Management
Archiving/Retrieval
Services
Collaboration Environments
Distributed Applications
MemoryTechnology
Prog.Models
Libraries
Tools
Compilers
Systems Modeling and Analysis
Perf
orm
an
ce F
ram
ew
ork
s
. . .
Application
Models
File/IOModels
OSSchedulerModels
ArchitectureNetwork Models
MemoryModels
16
Authenication/ Authorization
DependabilityServices
Distributed Systems Management
VisualizationScalable I/O
Data ManagementArchiving/Retrieval
ServicesOther Services . . .
Collaboration Environments
Distributed Applications
Distributed, Heterogeneous, Dynamic, AdaptiveComputing Platforms and Networks
DeviceTechnology . . .CPU
TechnologyMemory
Technology
Application Models
Architecture /Network Models
MemoryModels
OSScheduler
Models
IO / FileModels
. . . Languages
LibrariesTools
Compilers
Multiple views of the systemThe Operating Systems’ view
17
DynamicallyLink
&Execute
Technology for integrated feedback & control Runtime Compiling System (RCS) and Dynamic Application
CompositionApplication
Model
Application Program
ApplicationIntermediate
Representation
CompilerFront-End
CompilerBack-End Performance
Measuremetns&
Models
DistributedProgramming
Model
ApplicationComponents
&Frameworks
Dynamic AnalysisSituation
LaunchApplication (s)
Distributed Platform
Ada
ptab
leco
mpu
ting
Syst
ems
Infr
astr
uctu
re
Distributed Computing Resources
MPP NOW
SAR
tac-com
database
firecntl
firecntl
alg accelerator
database
SP
….
Great set of efforts that are developing systems modeling methods
along these directionsand leading to performance frameworks
Emphasis on Multidisciplinary Research(across sub-areas of CS)
Application driven validation of research and technology advances
Collaborations with industry are fruitful
Projects can be found in the proceedings of the Next Generation Software Workshop Series
organized every year in conjunction with IPDPS
19
GRADS Project & VGRADS PI: Ken Kennedy, (& Dan Reed, Andrew Chien, Fran Berman, Dennis Gannon, Ian Foster, Jack
Dongarra, et.al)
Performance Contracts - At the Heart of the GrADS Model: •Fundamental mechanism for managing mapping and executionWhat are they?•Mappings from resources to performance •Mechanisms for determining when to interrupt and rescheduleAbstract Definition•Random Variable: r(A,I,C,t0) with a probability distribution
•A = app, I = input, C = configuration, t0 = time of initiation•Important statistics: lower and upper bounds (95% confidence)
Challenge•When should a contract be violated?
•Strict adherence balanced against cost of reconfiguration
Whole-ProgramCompiler
Libraries
DynamicOptimizer
Real-timePerformance
Monitor
PerformanceProblem
ServiceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponents
Performance Feedback
Negotiation
Program Preparation System Program Execution System
Project Goals: To develop program preparation system support for computational Grid applications and technologies to support efficient run-time management of computational Grid resources, and achieve reliable performance under varying load.
GrADSoft Architecture
20
Dynamic Adaptive Systems Software for Robust and Dependable Large-Scale Systems
{Adve & Sanders}
21
Montage - An Integrated End-to-End Design and Development Framework for Wireless Networks
PI: Rappaport (& Browne, Shakkottai, Ramakrishnan, Varadarajan) {UTAustin, VTech}• Project advanced the state-of-the art in fast and efficient methods for simulating large-
scale networks• Deliverables:
– generated a wide range of analytical and simulation-based modeling methods– Developed a wireless channel simulator (the Site Specific Software Simulator for
Wireless - S^4W)• S^4W was used by the PIs to develop more powerful and efficient techniques for
end-to-end improved network performance for users of both wired and wireless networksS^4W has been used by several universities (in US and Canada), industry (Boeing) and NASA, and commercial business (Schlotzky’s deli)
• Developed fast simulation capabilities of networks
• Fast hybrid network simulation using spatiotemporal dilations FluNet: hybrid simulation-emulation environment, based on combined fluid models
• Developed scalable parallel discrete event simulator (Shakkottai, Ramakrishnan)
• Open Network Emulator – Highly scalable distributed direct code execution environment; supports both
simulation and emulation in a single tool; novel method, using the notion of Relativistic Time, so that the global virtual time is derived by dilating the real (wall-clock) time
– Productivity with Performance through Components&Composition (Browne)
• P-COM^2environement: automated compile-time/runtime-composition of a parallel programs - applied here to performance modeling
22
A Fast, Cycle-Accurate Computer System Technology
23
Fast and Accurate Simulation of Scalable Computer Systems
(a) Hybrid Emulation
(b) Multiple-context Interleaved Emulation
ProtoFlex addresses full-system and scaling complexity for FPGA-based simulation in two ways. Hybrid emulation (a) avoids reconstruction of the entire system on FPGAs.Interleaved emulation (b) lets us decouple the size and complexity of the simulated system from that of the underlying FPGA host.
{Falsafi & Hoe}
24
Examples of Modeling & Analysis Efforts (Performance Modeling Frameworks)
• FPGA Accelerated Simulation Technologies – functional simulator + timing model (implemented in FPGAs) for fastest cycle-accurate, full system simulator (within 1-3 orders of real hw)
• Fast and accurate simulator through sampling, checkpointing to capture the microarchitectural state, and performing cycle-accurate simulation in the selected sampled regions, to simulate full (unmodified) applications
• Structural and composable performance simulation of complex systems effort constructs simulators from system descriptions and component libraries (e.g. produced in 11 wks Itanium2 simulator accurate to 3% of actual hardware)
• Real-time large-scale network simulation environment, through a hybrid of continuous and event-driven simulation paradigms, of a fluid-model representation the mean traffic and a packet-oriented simulation. The hybrid testbed will combine advantages of analytical models, simulation and emulation, and physical network testbeds.
• Component based software environment for simulation, emulation and synthesis of network protocols, integrating model-checking with event-driven simulations to allow performance evaluation and protocol validation in a unified way
• End-to-end design and development framework for large-scale wireless networks - composed through capabilities developed under problem solving environments application compile-time and runtime composition methods to compose the simulation and emulation systems for setting-up experimental testbeds, performance engineering methods (of the POEMS project), the Weaves runtime and the P-COM for parallel/distributed execution of discrete event simulations, and integrate low level channel models to higher level protocol layers and the relativistic time temporal model developed under the collabort’n.
25
Examples of Modeling & Analysis Efforts(Application modeling, resource management,
…)• Modeling system for enabling algorithm designers and programmers to develop,
evaluate and compare application algorithms for CMP/CMT systems• Software tools to enable access to coordinated information collected through
hardware-based profiling of local and remote memory access of application computation and communication patterns
• Dynamic profiling of application phases for optimizing power consumption under set performance constraints for reconfigurable multi-core environments and data servers
• Cross platform performance estimation by partial execution of applications, capturing computation and communication parameters, and generalizing prediction to problem-scaling scenarios, in parallel and distributed platforms
• Language support continuous monitoring of distributed systems, grids and other data-centric and network systems
• Adaptive resource sharing mechanisms autonomically matching resources to dynamically changing needs via statistical and stochastic approaches
• Data driven resource allocation in complex systems, through workload characterization, analytical models and policy development
• Compiler enabled model- and measurement-driven adaptation environment for dependability and performance (performability)
• Engineering reliability at software design time by coupling software component architectural models with statistical methods to address uncertainties in design stage
• Tools for pro-active runtime system health monitoring and enhancement for large-scale parallel systems, by collecting and analyzing through on-line models data collected over extended periods of time, and in real-time, filtering and correlating evolving failure data with respect to factors such as workload and operating temperature, and use this information to schedule or checkpoint jobs
26
Summary Thoughts• Large scale high-End systems cannot be treated as
isolated platforms• Such systems demand: enhanced and optimized
computation, communication and data management capabilities, in the presence of resource heterogeneity, dynamicity, adaptivity
• Need to advance the technologies that will automate the mapping of complex and dynamic applications on complex platforms with multiple and heterogeneous levels of processors, memory, and networks
• Modeling and Analysis Methods – Performance Engineering of systems are crucial in enabling optimized design, runtime, and management of such systems
27
Dynamic Adaptive Systems Software for Robust and Dependable Large-Scale Systems
• Award 0406351: A Compiler-Enabled Model- and Measurement-Driven Adaptation Environment for Dependability and PerformanceWilliam Sanders and Vikram AdveDevelops compiler controlled performance data monitoring together with performance models for adaptive and optimized runtime support, in environments with underlying computational, communication, and storage resources maybe changing, as well as environments where also the application requirements may be changing
Combines and advances in novel directions work on dynamic runtime compilation methods (LLVM) developed by Adve in 0093426(CAREER) - NGS: Techniques and Applications of Dynamic Compilation; and system level integrated performance methods developed by Sanders in 0228762 - Next Generation Software: An Integrated Framework for Performance Engineering and Resource-Aware CompilationIn addition to the multidisciplinary work from two sub-areas of computer sciences: compilers and performance modeling and analysis the project includes collaboration with industry, and specifically with two senior researchers from ATT Labs-Research, which provides resources such as production-level software, to drive and validate the research methods, and also provides opportunities for student internships at the ATT Research Lab.
Other Technical impacts of the individual projects: The LLVM compiler infrastructure has been publicly distributed since October 2003 and downloaded well over 2000 times since. It has attracted at least 40 serious users in academia (instructors and researchers) and industry (startups and established companies). Apple Computer has not only adopted LLVM and has set up an active group of developers working on incorporating LLVM in Apple’s products such as the next release of MacOS due in Spring 2007 A paper: Automatic Pool Allocation, on novel methods developed under the project and incorporated in LLVM, won a Best Paper award at the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), the premier conference in the area of compilers.
Other Technical impacts of the individual projects: Möbius is a performance engineering framework and tool for the evaluation of distributed and parallel computing systems, accounting for system components including the application software itself, the operating system, and the underlying computing and communication hardware. The framework provides a means by which multiple, heterogeneous models can be composed together, each representing a different module (software or hardware), component, or view of the system.
Möbius has made a significant worldwide impact in the research area of stochastic model analysis. The impact spans both academic and commercial domains. In addition to being the principal tool used in the graduate-level system reliability courses at the University of Illinois, USA and the Univ. of Florence, Italy, Möbius has been licensed to over 150 university sites throughout the world for teaching and research purposes. International Partnerships with tesearch groups from the Univ. of Twente, Dörtmund University, University of the Federal Armed Forces München, and Saarland University are partnering with the Möbius team to developing plug-in modules for the Möbius framework. The first International Möbius Developer’s Working group meeting was held in Sept. 2004, further increasing the number of groups that use Möbius in their research.Möbius has also been licensed for commercial use to many companies, including: Motorola, Iridium, Pioneer Hybrids, Windber Research Institute, General Dynamics and Boeing. For example, Möbius have been used for numerous telecommunications and computer system applications at Motorola and was designated one of three company wide system availability modeling packages.Recently, researchers have begun to use Möbius for biological applications; over 25 universities and Pioneer Hybrid (the world's largest seed producer) and Windber Research Incorporated (non-profit research organization with projects studying the disease progression of breast cancer) have licensed it for use with biological systems.