technology drivers

Post on 05-Jan-2016

22 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Technology Drivers. Traditional HPC application drivers OS noise, resource monitoring and management, memory footprint Complexity of resources to be managed New and evolving programming models Shifting emphasis from managing cycles to managing data - PowerPoint PPT Presentation

TRANSCRIPT

Technology Drivers• Traditional HPC application drivers

– OS noise, resource monitoring and management, memory footprint– Complexity of resources to be managed

• New and evolving programming models– Shifting emphasis from managing cycles to managing data– Programming models require more access to resource management decisions– Hybrid/Mixed programming models (composing applications)

• Node and Memory structures– On-node RAM, DRAM, Flash– Stacked memory (performance implications for different access patterns)– Explicit cache/hierarchy management– On-node interconnect– Heterogenous cores– On-node power management

• Global structures– Global address space– Integration of collectives, esp synchronization

• Resilience (soft errors and damaged cores)• HPC OS Sustainability

Increasing importance and complexity of resource management

Alternate R&D Strategies

• Evolve an existing OS– Linux, Plan 9, IBM CNK, Kitten

• Start with an empty emacs buffer• Steal components from existing operating systems

• Partitioning resources – independent management within a partition– Composibility

• Collective/Global OS– Global address space?

It’s time to define the winner

Research Agenda

• HPC Community OS– Define basic structure– Individual groups work on components

• Expose management of critical resources• Simulation to evaluate scalability of resource management strategies• Enable co-design of hardware to support resource management

• Define and implement OS mechanisms that will enable global, autonomic runtime systems

Priority Research Direction:Community OS Framework for HPC Systems

Key challenges

1. Develop an OS framework specific to the needs of HPC

2. Open system architecture that exposes the management of critical resources

3. Empower developers of libraries and runtime systems

1. HPC applications have unique resource management needs (e.g., memory layout)

2. Anticipated rapid evolution/revolution in architectures and programming models

3. Limited ability to innovate in existing commodity operating systems

4. Sustainability of HPC OS is difficult

1. Context for individual innovation and contribution

2. Common foundation for libraries and runtime environments

1. This will enable full access to hardware resources

2. Timeframe: 2-3 years

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

Priority Research Direction:Scalable System Simulation

Key challenges

1. Develop a scalable, full system simulation capability

2. Address multi-scale challenges3. Adapt techniques that have been used in

other branches of computational science4. Develop common interfaces between

simulators

1. Inability to conduct “apples to apples” comparisons in scalable resource management

2. Evolution / revolution in new systems

3. Wide variety of existing simulators

1. Ability to evaluate resource management mechanisms and policies at scale

2. Enable architecture/OS co-design

1. Critical for the OS research/development community

2. Important for runtime community

3. Timeframe: 2-4 years

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

Priority Research Direction:Open System APIs

Key challenges

1. Develop community based APIs to expose critical resources

2. Develop prototype runtime environments for common programming models

1. Communication management

2. Thread management

3. Memory management

4. Power management

5. Resilience (fault/failure isolation/management)

1. Provides a fixed point for innovation in API implementation and innovation in the implementation of runtimes (hourglass principle)

2. Differentiation based on performance, not functionality

1. Critical for supporting the development of new programming models

2. Critical for enabling the development of new architectures

3. Timeframe: 3 to 8 years

Summary of research direction

Potential impact on software component Potential impact on usability, capability, and breadth of community

4.1 Operating SystemsA Community HPC OS

Next Generation Interconnect API

Community OS Framework

Robust, Scalable System

Simulation

APIs for energy management API for node

resilience

Autonomic runtime systems

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Runtime Environm

ents enabled

Prototype implementation

of OS Framework

top related