technology drivers

7
Technology Drivers Traditional HPC application drivers OS noise, resource monitoring and management, memory footprint Complexity of resources to be managed New and evolving programming models Shifting emphasis from managing cycles to managing data Programming models require more access to resource management decisions Hybrid/Mixed programming models (composing applications) Node and Memory structures On-node RAM, DRAM, Flash Stacked memory (performance implications for different access patterns) Explicit cache/hierarchy management On-node interconnect Heterogenous cores On-node power management Global structures Global address space Integration of collectives, esp synchronization Resilience (soft errors and damaged cores) HPC OS Sustainability Increasing importance and complexity of resource management

Upload: walt

Post on 05-Jan-2016

22 views

Category:

Documents


3 download

DESCRIPTION

Technology Drivers. Traditional HPC application drivers OS noise, resource monitoring and management, memory footprint Complexity of resources to be managed New and evolving programming models Shifting emphasis from managing cycles to managing data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Technology Drivers

Technology Drivers• Traditional HPC application drivers

– OS noise, resource monitoring and management, memory footprint– Complexity of resources to be managed

• New and evolving programming models– Shifting emphasis from managing cycles to managing data– Programming models require more access to resource management decisions– Hybrid/Mixed programming models (composing applications)

• Node and Memory structures– On-node RAM, DRAM, Flash– Stacked memory (performance implications for different access patterns)– Explicit cache/hierarchy management– On-node interconnect– Heterogenous cores– On-node power management

• Global structures– Global address space– Integration of collectives, esp synchronization

• Resilience (soft errors and damaged cores)• HPC OS Sustainability

Increasing importance and complexity of resource management

Page 2: Technology Drivers

Alternate R&D Strategies

• Evolve an existing OS– Linux, Plan 9, IBM CNK, Kitten

• Start with an empty emacs buffer• Steal components from existing operating systems

• Partitioning resources – independent management within a partition– Composibility

• Collective/Global OS– Global address space?

It’s time to define the winner

Page 3: Technology Drivers

Research Agenda

• HPC Community OS– Define basic structure– Individual groups work on components

• Expose management of critical resources• Simulation to evaluate scalability of resource management strategies• Enable co-design of hardware to support resource management

• Define and implement OS mechanisms that will enable global, autonomic runtime systems

Page 4: Technology Drivers

Priority Research Direction:Community OS Framework for HPC Systems

Key challenges

1. Develop an OS framework specific to the needs of HPC

2. Open system architecture that exposes the management of critical resources

3. Empower developers of libraries and runtime systems

1. HPC applications have unique resource management needs (e.g., memory layout)

2. Anticipated rapid evolution/revolution in architectures and programming models

3. Limited ability to innovate in existing commodity operating systems

4. Sustainability of HPC OS is difficult

1. Context for individual innovation and contribution

2. Common foundation for libraries and runtime environments

1. This will enable full access to hardware resources

2. Timeframe: 2-3 years

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

Page 5: Technology Drivers

Priority Research Direction:Scalable System Simulation

Key challenges

1. Develop a scalable, full system simulation capability

2. Address multi-scale challenges3. Adapt techniques that have been used in

other branches of computational science4. Develop common interfaces between

simulators

1. Inability to conduct “apples to apples” comparisons in scalable resource management

2. Evolution / revolution in new systems

3. Wide variety of existing simulators

1. Ability to evaluate resource management mechanisms and policies at scale

2. Enable architecture/OS co-design

1. Critical for the OS research/development community

2. Important for runtime community

3. Timeframe: 2-4 years

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

Page 6: Technology Drivers

Priority Research Direction:Open System APIs

Key challenges

1. Develop community based APIs to expose critical resources

2. Develop prototype runtime environments for common programming models

1. Communication management

2. Thread management

3. Memory management

4. Power management

5. Resilience (fault/failure isolation/management)

1. Provides a fixed point for innovation in API implementation and innovation in the implementation of runtimes (hourglass principle)

2. Differentiation based on performance, not functionality

1. Critical for supporting the development of new programming models

2. Critical for enabling the development of new architectures

3. Timeframe: 3 to 8 years

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

Page 7: Technology Drivers

4.1 Operating SystemsA Community HPC OS

Next Generation Interconnect API

Community OS Framework

Robust, Scalable System

Simulation

APIs for energy management API for node

resilience

Autonomic runtime systems

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Runtime Environm

ents enabled

Prototype implementation

of OS Framework