transparent heterogeneous hardware architecture deployment...

D2.1. – TANGO Requirements and Architecture Specification – Alpha Version 10/05/2016

TANGO Consortium 2016

of 173

Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation

D2.1: TANGO Requirements and Architecture Specification Alpha Version

Authors Karim Djemame (UNIVLEEDS), David García Pérez (ATOS), Django Armstrong (ULEEDS), Richard Kavanagh (ULEEDS), Jorge Ejarque (BSC), Yiannis Georgiou (BULL), Renaud De Landtsheer (CETIC), Jean-Christophe Deprez (CETIC), Bruno Wery (DELTATEC), Rosa M. Badia (BSC)

Institution lead UNIVLEEDS

Version 1.31

Reviewers Clara M. Pezuela Robles (ATOS), Francisco Javier Nieto De Santos (ATOS), Jesús Gorronogoitia Cruz (ATOS)

Work package WP2

Task T2.1, T2.2, T2.3, and T2.4

Due date 30/04/2016

Submission date 11/05/2016

Distribution level (CO, PU): Public



of 173

Abstract Computer systems have faced significant power challenges over the past 20 years; these challenges have shifted from the devices and circuits level, to their current position as first-order constraints for system architects and software developers. TANGO’s goal is to characterise factors which affect power consumption in software development and operation for heterogeneous parallel hardware environments. Our main contribution is the combination of requirements engineering and design modelling for self-adaptive software systems, with power consumption awareness in relation to these environments. The energy efficiency and application quality factors are integrated in the application lifecycle (design, implementation, operation). To support this, the key novelty of the project is a reference architecture and its implementation. Moreover, a programming model with built-in support for various hardware architectures including heterogeneous clusters, heterogeneous chips and programmable logic devices will be provided. TANGO will create a new cross-layer programming approach for heterogeneous parallel hardware architectures featuring automatic code generation including software and hardware modelling. This will consider power, performance, data location and time criticality optimization, in addition to security and dependability on the target hardware architecture. These results will be demonstrated in two real-world applications: reconfigurable power optimized connected platform and HPC. In order to improve collaboration and sustainability of TANGO’s and fellow projects results, TANGO considers the foundation of a Research Alliance in which complementary research efforts into novel programming approaches will nucleate, leading to a strong research collaboration and effective integration of project results.

Keywords Requirements, market analysis, architecture, SOTA, use cases definition

Licensing information: This work is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

http://creativecommons.org/licenses/by-sa/3.0/

http://creativecommons.org/licenses/by-sa/3.0/



of 173

Document Revision History

Version Date

Modifications Introduced

Description of change Modified by

v0.1 20/01/2016 First draft version Oliver Barreto (ATOS)

v0.1.2 27/01/2016 First comments and suggestions received by consortium partners

ALL

V0.1.3 10/02/2016 Changes in several sections Oliver Barreto (ATOS)

V0.1.4.1 17/02/2016 Changes in several sections Oliver Barreto (ATOS)

V0.1.4.2 24/02/2016 Sent for first round of comments and suggestions received by consortium

partners Oliver Barreto (ATOS)


v0.2 04/02/2016 ToC for SotA Karim Djemame

(ULEEDS), David García Pérez (ATOS)


v0.3

01/03/2016

Contributions to sections 3.3, 3.5, 3.6, and 3.8 of SotA

Karim Djamame (ULEEDS), Django

Armstrong (ULEEDS), Richard Kavanagh (ULEEDS), Jorge

Ejarque (BSC), David García Pérez (ATOS)

v0.4 03/03/2016 Contributions to sections 3.3, 3.5, 3.6,

and 3.8 of SotA

Karim Djamame (ULEEDS), Django

Armstrong (ULEEDS), Richard Kavanagh (ULEEDS), Jorge


v0.4 07/03/2016 General updates to all sections of SotA Karim Djamame

(ULEEDS), Django Armstrong (ULEEDS),



of 173

Richard Kavanagh (ULEEDS), Jorge


V0.4.5.1 09/03/2016

Comments and suggestions received by consortium partners

Introduced changes from first internal revision

Oliver Barreto (Atos)

ALL

V0.4.6.1 11/03/2016

Changes in several sections

Sent for second round of comments and suggestions received by

consortium partners

Oliver Barreto (ATOS)

V0.5 15/03/16 Outline of the document structure for

Architecture and Requirements

Richard Kavanagh (ULEEDS),

Django Armstrong (ULEEDS),

Karim Djemame (ULEEDS)

v0.6 22/03/2016 Updates to sections 3.4, 3.5, 3.6, 3.7, 3.8, 3.9 of SotA

Yiannis Georgiou (BULL), Renaud De Landtsheer (CETIC), Jean-Christophe Deprez (CETIC), Karim Djamame (ULEEDS), Django Armstrong (ULEEDS), Richard Kavanagh (ULEEDS), David García Pérez (ATOS)

V0.6.1 23/03/2016

Comments and suggestions received by consortium partners

Introduced changes from last internal revision

Oliver Barreto (ATOS)

ALL

V0.6.2 28/03/2016 Introduced other changes and

modifications Oliver Barreto (ATOS)

V0.7 06/04/16 Adding parts of the front and back

matter to the document. Richard Kavanagh

(ULEEDS)

V0.7.1 11/04/2016 Added requirements elicitation

approach. Richard Kavanagh

(ULEEDS)

V0.8 12/04/2016 Adding content for various ULE and

BULL components.

Richard Kavanagh (ULEEDS), Django

Armstrong (ULEEDS),



of 173

Yiannis Georgiou (BULL)

V0.8.1 15/04/2016 Adding content for the Programming

model. Jorge Ejarque (BSC)

V0.9 22/04/2016 Integrating contributions for section 4.

Richard Kavanagh (ULEEDS), Karim

Djemame (ULEEDS), Django Armstrong

(ULEEDS), Bruno Wery (DeltaTec), David

Garcia (ATOS)

V0.9.1 22/04/2016 Adding content to section 3

requirements analysis and section 4.

Richard Kavanagh (ULEEDS), Django

Armstrong (ULEEDS), Bruno Wery (DeltaTec),

Jean-Christophe Deprez (CETIC)

V1.0 25/04/2016 Final version for internal review David García Pérez

(ATOS)

V1.1 29/04/2016 Updates to BULL UC and Introduction

Yiannis Georgiou (BULL), Karim Djamame (ULEEDS), David García

Pérez (ATOS)

V1.2 04/05/2016 Corrections to the whole document Rosa M. Badia (BSC)

V1.3 09/05/2014 Version with corrections after the

internal review

David García Pérez (ATOS), Richard

Kavanagh (ULEEDS), Oliver Barreto (ATOS)

V1.31 10/05/2016 Version with additional corrections

after the internal review

Richard Kavanagh (ULEEDS), Renaud De Landtsheer (CETIC),

Lotfi Guedria (CETIC), Yiannis Georgiou (Bull)



of 173

Table of Contents

Table of Contents .............................................................................................................................................. 6

Table of Figures ................................................................................................................................................. 9

Table of Tables ................................................................................................................................................. 10

Terms and abbreviations ................................................................................................................................. 12

Executive Summary ......................................................................................................................................... 14

Part 1. Introduction ..................................................................................................................................... 15

1.1 Introduction to Market Analysis ...................................................................................................... 15

1.2 Introduction to State of the Art ....................................................................................................... 19

1.3 Introduction to architecture ............................................................................................................ 20

Part 2. Market Analysis ............................................................................................................................... 22

2.1 Introduction ..................................................................................................................................... 22

2.1.1 About the market analysis ....................................................................................................... 22

2.1.2 Market analysis structure ........................................................................................................ 22

2.2 Initial Market Analysis ..................................................................................................................... 23

2.2.1 Market Introduction ................................................................................................................ 23

2.2.2 Markets, Contexts and Trends ................................................................................................. 24

2.2.2.1 Markets ................................................................................................................................ 24

2.2.2.2 Business-Driven Trends ....................................................................................................... 36

2.2.2.3 IT Trends .............................................................................................................................. 38

2.2.2.4 Legislation and Regulation................................................................................................... 43

2.2.3 Business & Market Landscape Analysis ................................................................................... 44

2.2.3.1 Competitors, Substitutive or Existing Solutions .................................................................. 44

2.2.3.2 Comparative analysis ........................................................................................................... 46

2.2.4 Market and Business Analysis Conclusions ............................................................................. 47

2.3 Market Analysis Conclusions ........................................................................................................... 50

Part 3. State of the Art ................................................................................................................................ 51

3.1 Introduction ..................................................................................................................................... 51

3.2 State of the Art structure ................................................................................................................ 51

3.3 Architecture Support for Low Power Computing ............................................................................ 52

3.3.1 State of the Art ........................................................................................................................ 52

3.3.2 Relevance for TANGO & Progress beyond the SotA ................................................................ 53

3.4 Handling Quality Properties in the Software Development Life Cycle for Customised Low-Power Heterogeneous ............................................................................................................................................ 54

3.4.1 State of the Art ........................................................................................................................ 54


3.4.3 Optimizing the scheduling of software on hardware .............................................................. 56



of 173

3.4.3.1 Requirements ...................................................................................................................... 56

3.4.3.2 Approaches for solving scheduling problems ...................................................................... 58

3.4.3.3 Dealing with on-line optimization ....................................................................................... 58

3.5 Programming Models and Run-Time Management techniques for Heterogeneous Parallel Architectures ............................................................................................................................................... 60

3.5.1 State of the Art ........................................................................................................................ 60


3.6 Modelling tools for prototyping with software emulation/simulation of heterogeneous parallel architectures ................................................................................................................................................ 63

3.6.1 State of the Art ........................................................................................................................ 63


3.7 Monitoring of heterogeneous architectures ................................................................................... 65

3.7.1 State of the Art ........................................................................................................................ 65

3.7.1.1 Hardware Counters ............................................................................................................. 65

3.7.1.2 FPGA Monitoring ................................................................................................................. 65

3.7.1.3 Energy measurements ......................................................................................................... 65


3.8 Workload management techniques for heterogeneous architectures ........................................... 68

3.8.1 State of the Art ........................................................................................................................ 68


3.9 Other: Distributed systems, security, networking, and Data Management ................................... 69

3.9.1 Security .................................................................................................................................... 69

3.9.1.1 State of the Art .................................................................................................................... 69

3.9.1.2 Relevance to TANGO and Progress beyond SotA ................................................................ 69

Part 4. Requirements and Architecture Specification ................................................................................. 71

4.1 Introduction ..................................................................................................................................... 71

4.1.1 Requirements and Architecture Specification Structure ......................................................... 71

4.2 Vision ............................................................................................................................................... 72

4.3 Requirements .................................................................................................................................. 75

4.3.1 Tango Requirement Elicitation Approach ............................................................................... 75

4.3.1.1 Description of the Overall Requirements Approach ........................................................... 75

4.3.2 Business Requirements ........................................................................................................... 77

4.3.2.1 Summary of Business Interviews Conducted ...................................................................... 77

4.3.2.2 Business Goals Synthesis - Pilot Cases from TANGO Industrial Partners ............................ 82

4.3.2.3 DELTATEC ............................................................................................................................. 82

4.3.2.4 Bull ....................................................................................................................................... 83

4.4 Architecture ..................................................................................................................................... 85

4.4.1 Overview .................................................................................................................................. 85



of 173

4.4.2 Layer 1 – SDK ........................................................................................................................... 88

4.4.2.1 Programming Model ............................................................................................................ 88

4.4.2.2 Requirements & Design Modelling Plug-in .......................................................................... 93

4.4.2.3 Code Optimizer Plug-in ...................................................................................................... 101

4.4.2.4 Runtime Abstraction Layer ................................................................................................ 104

4.4.3 Layer 2 – Middleware ............................................................................................................ 108

4.4.3.1 Application Lifecycle Deployment Engine ......................................................................... 108

4.4.3.2 Self-Adaptation Manager .................................................................................................. 111

4.4.3.3 Energy Modeller ................................................................................................................ 114

4.4.3.4 Monitoring Infrastructure ................................................................................................. 120

4.4.4 Layer 3 – Fabric Layer ............................................................................................................ 124

4.4.4.1 Device Supervisor .............................................................................................................. 124

4.4.4.2 Heterogeneous Parallel Device Clusters ............................................................................ 127

4.4.4.3 Device Emulator ................................................................................................................ 130

4.5 The Tango Architecture Workflow ................................................................................................ 135

4.5.1 Service Deployment ............................................................................................................... 135

4.5.2 Service Operation .................................................................................................................. 135

4.6 Critical Path .................................................................................................................................... 137

4.6.1 Construction .......................................................................................................................... 137

4.6.2 Deployment ........................................................................................................................... 137

4.6.3 Operation ............................................................................................................................... 138

4.6.4 Interface Work Plan ............................................................................................................... 138

Part 5. Conclusions .................................................................................................................................... 140

Annex A. Results from Market and Value Internal Workshop .................................................................. 153

A.1 Markets .......................................................................................................................................... 153

A.2 Market Top Influencers ................................................................................................................. 154

Annex B. Trends/Requirements identification from Internal Market and Value Workshop.................... 155

B.1 Energy consumption/optimization: ............................................................................................... 155

B.2 Design\Development\Operation for (Optimized) Heterogeneity: ................................................ 156

B.3 Technical Requirements ................................................................................................................ 156

Annex C. Characterizing an Interviewee ................................................................................................... 158

C.1 Interviewee and Organization Data ............................................................................................... 158

C.2 Understanding philosophical and psychological viewpoint .......................................................... 159

C.2.1 Green Energy ......................................................................................................................... 159

C.2.2 Heterogeneity ........................................................................................................................ 160

C.3 Expertise and experiences of an interviewee ................................................................................ 161

Annex D. Business/Technical Requirements Questionnaire ..................................................................... 163

D.1 Questions to Consortium R&D Partners, to Consortium Industry Partners and beyond: ............. 163



of 173

D.1.1 Business Requirements: ........................................................................................................ 163

D.2 Technical Requirements: ............................................................................................................... 169

Table of Figures

FIGURE 1: IT SPENDING ........................................................................................................................................ 24 FIGURE 2: IT MARKET SPENDING TREND ................................................................................................................. 25 FIGURE 3: CLOUD MARKET TRENDS ........................................................................................................................ 26 FIGURE 4: IOT VALUE CHAIN.................................................................................................................................. 27 FIGURE 5: IOT REVENUES ...................................................................................................................................... 27 FIGURE 6: SEMICONDUCTOR (IC) REVENUE FORECASTS .............................................................................................. 29 FIGURE 7: IC MARKET BY SYSTEM TYPE .................................................................................................................... 29 FIGURE 8: HPC MARKET FORECASTS ....................................................................................................................... 31 FIGURE 9: HPC MARKET REVENUE ......................................................................................................................... 32 FIGURE 10: ENERGY CONSUMPTION BY CLOUD COMPUTING EXTRACTED FROM “HOW GREEN IS YOUR CLOUD?” GREENPEACE

REPORT (2012) ........................................................................................................................................... 34 FIGURE 11: GARTNER DATA CENTER CONFERENCE 2012 CONCLUSIONS ....................................................................... 36 FIGURE 12: THE TARGET MARKETS OF INTERVIEWEES ................................................................................................. 77 FIGURE 13: CLASSIFICATION OF THE TYPE OF BUSINESS OF THE INTERVIEWEES ................................................................ 78 FIGURE 14: EXPERIENCE OF INTERVIEWEES ............................................................................................................... 78 FIGURE 15: FAVOURITISM TOWARDS SCENARIO 1, 2 AND 3 REGARDING ENERGY EFFICIENCY: 1) THE DOOM'S DAY SCENARIO, 2)

THE OPTIMIST’S SCENARIO 3) THE CENTRIST SCENARIO ...................................................................................... 79 FIGURE 16: INTERVIEWEE'S OUTLOOK ON DEVICE HETEROGENEITY ............................................................................... 79 FIGURE 17: REASONS FOR ENERGY SAVING ............................................................................................................... 80 FIGURE 18: RESULTS OF SEVERAL LIKERT SCALES FOR DETERMINING THE IMPORTANCE OF SECURITY, ENERGY/POWER AND

PERFORMANCE, TO POTENTIAL END USERS. ....................................................................................................... 81 FIGURE 19 PROPOSED REFERENCE ARCHITECTURE ..................................................................................................... 86 FIGURE 20: PROGRAMMING MODEL COMPONENT DIAGRAM ..................................................................................... 90 FIGURE 21: PROGRAMMING MODEL APPLICATION DEVELOPMENT SEQUENCE DIAGRAM ................................................ 91 FIGURE 22: PROGRAMMING MODEL BUILDING AND DEPLOYMENT SEQUENCE DIAGRAM ................................................. 92 FIGURE 23: PROGRAMMING MODEL DEPLOYMENT DIAGRAM ..................................................................................... 92 FIGURE 24: DESIGN-TIME USE CASES ...................................................................................................................... 98 FIGURE 25: COMPONENT DEPENDENCY DIAGRAM ..................................................................................................... 99 FIGURE 26: COMPONENT INTERACTION DIAGRAM. ................................................................................................... 99 FIGURE 27: COP COMPONENT DIAGRAM .............................................................................................................. 102 FIGURE 28: SEQUENCE DIAGRAM OF INTERNAL FUNCTIONALITY OF THE CODE OPTIMIZER PLUG-IN ................................. 103 FIGURE 29: COP DEPLOYMENT DIAGRAM ............................................................................................................. 104 FIGURE 30: RUNTIME COMPONENT DIAGRAM ........................................................................................................ 105 FIGURE 31: RUNTIME OPERATION SEQUENCE DIAGRAM DURING APPLICATION EXECUTION ........................................... 106 FIGURE 32: RUNTIME OPERATION SEQUENCE DIAGRAM DURING SELF-ADAPTATION ..................................................... 106 FIGURE 33: RUNTIME DEPLOYMENT DIAGRAM ....................................................................................................... 107 FIGURE 34: APPLICATION LIFE-CYCLE DEPLOYMENT ENGINE COMPONENT DIAGRAM .................................................... 109 FIGURE 35: APPLICATION LIFE-CYCLE DEPLOYMENT ENGINE SEQUENCE DIAGRAM ........................................................ 110 FIGURE 36: APPLICATION LIFE-CYCLE DEPLOYMENT ENGINE DEPLOYMENT DIAGRAM ................................................... 111 FIGURE 37: SELF ADAPTATION MANAGER COMPONENT DIAGRAM ............................................................................ 112 FIGURE 38: SELF ADAPTATION MANAGER SEQUENCE DIAGRAM ................................................................................ 113 FIGURE 39: SELF ADAPTATION MANAGER DEPLOYMENT DIAGRAM ............................................................................ 114 FIGURE 40: ENERGY MODELLER COMPONENT DIAGRAM .......................................................................................... 116



of 173

FIGURE 41: ENERGY MODELLER SEQUENCE DIAGRAM ............................................................................................. 117 FIGURE 42: ENERGY MODELLER DEPLOYMENT DIAGRAM ......................................................................................... 118 FIGURE 43: MONITOR INFRASTRUCTURE COMPONENT DIAGRAM .............................................................................. 122 FIGURE 44: COLLECTING INFORMATION FROM A DEVICE SEQUENCE DIAGRAM ............................................................. 122 FIGURE 45: COLLECTING INFORMATION FROM A NODE SEQUENCE DIAGRAM ............................................................... 123 FIGURE 46: MONITORING INFRASTRUCTURE DEPLOYMENT DIAGRAM ........................................................................ 123 FIGURE 47: DEVICE SUPERVISOR COMPONENT DIAGRAM ......................................................................................... 125 FIGURE 48: DEVICE SUPERVISOR SEQUENCE DIAGRAM............................................................................................. 126 FIGURE 49: DEVICE SUPERVISOR DEPLOYMENT DIAGRAM ........................................................................................ 127 FIGURE 50: OVERVIEW OF THE EMBEDED USE-CASE PLATFORM ................................................................................ 129 FIGURE 51: DEVICE EMULATOR COMPONENT DIAGRAM .......................................................................................... 132 FIGURE 52: DEVICE EMULATOR SEQUENCE DIAGRAM .............................................................................................. 133 FIGURE 53: DEVICE EMULATOR DEPLOYMENT DIAGRAM .......................................................................................... 133 FIGURE 54: ARCHITECTURE SUPPORT FOR TRAINING APPLICATION POWER PROFILES AND DEPLOYMENT ............................ 135 FIGURE 55: ARCHITECTURE SUPPORT FOR SELF-ADAPTATION AT RUNTIME ................................................................... 136 FIGURE 56: TANGO ARCHITECTURE CRITICAL PATH ................................................................................................ 138 FIGURE 57: TANGO ARCHITECTURE INTERFACE WORK PLAN ................................................................................... 139

Table of Tables

TABLE 1: COMPARISON CHART AMONG EXISTING PLATFORMS AND TARGET TANGO OBJECTIVES ...................................... 47 TABLE 2: OVERVIEW OF TECHNICALLY RELEVANT SURVEY QUESTIONS .......................................................................... 75 TABLE 3: QUESTIONNAIRE TRACEABILITY MATRIX ...................................................................................................... 76 TABLE 4: PROGRAMMING MODEL REFERENCED REQUIREMENTS ................................................................................. 89 TABLE 5: PROGRAMMING MODEL BASELINE TECHNOLOGY ......................................................................................... 89 TABLE 6: PROGRAMMING MODEL API .................................................................................................................... 93 TABLE 7: REQUIREMENTS AND DESIGN TOOLING REFERENCED REQUIREMENTS .............................................................. 97 TABLE 8: REQUIREMENT &DESIGN TOOLING (R&D-T) BASELINE TECHNOLOGY ............................................................. 98 TABLE 9: COP REFERENCED REQUIREMENTS .......................................................................................................... 101 TABLE 10: COP BASELINE TECHNOLOGY ................................................................................................................ 102 TABLE 11: RUNTIME REFERENCED REQUIREMENTS .................................................................................................. 104 TABLE 12: RUNTIME BASELINE TECHNOLOGY ......................................................................................................... 105 TABLE 13: RUNTIME API ..................................................................................................................................... 107 TABLE 14: APPLICATION LIFE-CYCLE DEPLOYMENT ENGINE REFERENCED REQUIREMENTS .............................................. 108 TABLE 15: APPLICATION LIFE-CYCLE DEPLOYMENT ENGINE BASELINE TECHNOLOGY ...................................................... 109 TABLE 16: APPLICATION LIFE-CYCLE DEPLOYMENT ENGINE MANAGER API.................................................................. 111 TABLE 17: SELF-ADAPTATION MANAGER REFERENCED REQUIREMENTS ...................................................................... 111 TABLE 18: SELF-ADAPTATION MANAGER BASELINE TECHNOLOGY .............................................................................. 112 TABLE 19: SELF ADAPTATION MANAGER API ......................................................................................................... 114 TABLE 20: ENERGY MODELLER REFERENCED REQUIREMENTS .................................................................................... 115 TABLE 21: ENERGY MODELLER BASELINE TECHNOLOGY............................................................................................ 116 TABLE 22: ENERGY MODELLER API ....................................................................................................................... 120 TABLE 23: MONITORING INFRASTRUCTURE REFERENCED REQUIREMENTS ................................................................... 120 TABLE 24: MONITORING INFRASTRUCTURE BASELINE TECHNOLOGY ........................................................................... 121 TABLE 25: MONITORING INFRASTRUCTURE API ...................................................................................................... 124 TABLE 26: DEVICE SUPERVISOR REFERENCED REQUIREMENTS ................................................................................... 124 TABLE 27: DEVICE SUPERVISOR BASELINE TECHNOLOGY ........................................................................................... 125



of 173

TABLE 28: DEVICE SUPERVISOR MANAGER API ...................................................................................................... 127 TABLE 29: DEVICE EMULATOR REFERENCED REQUIREMENTS..................................................................................... 131 TABLE 30: DEVICE EMULATOR BASELINE TECHNOLOGY ............................................................................................ 132 TABLE 31: DEVICE EMULATOR API ....................................................................................................................... 134



of 173

Terms and abbreviations

EC European Commission

API Application Programming Interface

APU Accelerated Processing Unit

ASIP Application-specific instruction set processor

BMC Baseboard Management Controller

COMPSs A programming model for distributed infrastructures.

COP Code Optimizer Plug-in

CPS Cyber-Physical Systems

CPU Central Processing Unit

CUDA Programming Model for NVIDIA GPUs

DC Data Center

DSP Digital Signal Processing

FPGA Field Programmable Gate Array

GPGPU General-Purpose Computing on Graphics Processing Units

GPU Graphics Processor Unit

HPC High Performance Computing

HPD Heterogeneous Parallel Devices

HPA Heterogeneous Parallel Architectures

HSA Heterogeneous System Architectures

HW Hardware

IC Integrated Circuit

IDE Integrated Development Environment

IoT Internet of Things

IPMI Intelligent Platform Management Interface

KPI Key Performance Indicator



of 173

JVM Java Virtual Machine

MARCOMM Marketing and Communication

MAPE Monitor, Analyze, Plan, Execute

MPSoC Multi Processor System on Chip

OEM Oringinal Equipment Manufacturer

OmpSs Programming Model based on OpenMP.

OpenCL The open standard for parallel programming of heterogeneous

systems

OpenMP API specification for parallel programming

OpenSPL Open Spatial Programming Language

PM Programming Model

QEMU Open Source Processor Emulator

QoP Quality of Protection

QoS Quality of Service

RJMS Resource and Job Management System

SAM Self-Adaptation Manager

SDK Software Development Kit

SIMD Simple instruction, Multiple Data

SLURM Simple Linux Utility for Resource Management

SoC System on Chip

TANGO Transparent heterogeneous hardware Architecture deployment

for eNergy Gain in Operation



of 173

Executive Summary

Energy efficiency is at the heart of the EU’s Europe 2020 Strategy for smart, sustainable and inclusive growth as well as for the transition to a resource efficient economy.

The project Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (TANGO) aims to simplify the way developers approach the development of next-generation applications based in heterogeneous hardware architectures, configurations and software systems including heterogeneous clusters, chips and programmable logic devices.

This deliverable D2.1 TANGO Requirements and Architecture Specification Alpha Version achieves this as the first step of the TANGO vision by analysing the business needs to understand how to best shape the TANGO architecture to increase chance of successful Industry adoption.

To initiate the business requirements gathering, a market analysis is performed. This preliminary analysis has looked at current Heterogeneous Parallel Architectures, including needs and competitors as well as the trends for the future, to be used as a basis for positioning TANGO project results in the market.

The state of the art in the central technology areas of TANGO is also reviewed. It gives a thorough definition of the state of the art, an indication of the envisaged progress beyond the state of the art and the baseline for its research. It identifies challenges and action items that have to be handled throughout the course of the project to achieve the objectives defined in the Description of Work. These tasks and challenges give an insight into the project expected outcomes.

Finally, this document contains a high-level view of TANGO main architectural components, identification of interfaces among these and analysis of communication patterns. The components are described following their respective business and technical requirements that need to be fulfilled in Year 1. The aim of the architecture is to control and abstract underlying heterogeneous hardware architectures, configurations and software systems including heterogeneous clusters, chips and programmable logic devices while providing tools to optimize various dimensions of software design and operations (energy efficiency, performance, data movement and location, cost, time-criticality, security, dependability on target architectures). To validate the architecture, this deliverable contains illustrations of how the architectural components can be used in two use cases provided by industrial partners Deltatec and Bull.

This deliverable contains 3 parts:

Part 1: Market Analysis

Part 2: State of the Art

Part 3: Requirements and Architecture Specification



of 173

Part 1. Introduction

This document is divided into three main parts: A first one, the market analysis (Part 2), studies the actual market status in relation with the project; The second one, the State of the Art (Part 3), evolves the study of the State of the Art written for the project proposal to the actual status; Finally, the last part (Part 4), introduces the first year requirements and architecture for TANGO and the reasoning around them.

Due to the longitude of the document, we felt necessary to make this first introduction to it. In this introduction you can find a resume of all the information explained for Market Analysis, State of the Art, Requirements, and Architecture. All this parts contain references to the rest of the document to easily find the section that gives much more detail to a specific topic.

The rest of the section continues as follows: Subsection 1.1contians the resume to the Market Analysis; Subsection 1.2 contains a short resume of the SotA related to TANGO; Finally, subsection 1.3, contains a short resume to the requirements ilicitation and first year architecture for TANGO.

1.1 Introduction to Market Analysis

This part of the document, in conjunction with the State of the Art Analysis, intends to provide a commentary perspective of current market status in terms of IT areas related with the project. It has also been key to establish an initial version of TANGO stakeholder, relevant players, main competitors/substitutive players list. All will serve as market vision and starting point for later activities in terms of impact creation.

This part analyses IT trends and Market trends and context in various IT areas in which TANGO can later create impact. The analysis covers various other factors that influence on the development of the market and the potential impact of the project results.

There are several markets and areas in which TANGO provides impact, which, therefore, must be analyzed and followed during the project life. For TANGO, it is the confluence of various colliding trends and markets evolving which are important (analyzed in full detail in section 2.2.2.1):

- The IT Market in general: The IT market is expected to continue with slow motion growth. The main drivers found in the market are: that Cloud Services continue growing due to growing demand; and that there is a major transformation in industry and vertical industries. It is also noted that the main areas of spending included the Internet of Things (IoT), High Performance Computing (HPC) and Energy-Saving & carbon-Reducing Technologies. The main inhibitors identified in the market are: a bad economic situation worldwide; and fluctuant exchange rates.

- Software industry: is more important than ever in strategic transitioning and enabling greater degrees of digitalization for businesses, even though it has been immersed during the last years in a transition provoking disruptive changes in the way how software services and technologies are developed, deployed, accessed and used. The main drivers is Cloud Computing as the key pillar of this transformational process, as software vendors acquire and provide applications and infrastructure technology to support the cloud and the Internet of Things (IoT) movement. This generates temporary competitive advantage for new entrants and strain on the incumbents.

- Cloud Computing: In terms of new ways of organizations adopting the cloud model, IDC also predicts in the same report, that the main driver is organizations adopting multi-clouds and hybrid–clouds solutions. More than 65% of enterprises will commit to hybrid cloud technologies before 2016, vastly driving the rate and pace of change in IT organizations. In a hybrid cloud two trends emerge. First more than half of these organizations will purchase new or updated workload-aware cloud management solutions; and second, that by 2017, 35% of new applications will use cloud-enabled, continuous delivery and DevOps life cycles for faster rollout of new features and business innovation.



of 173

- IoT, CPS, and Embedded Systems: The market drivers found include factors such as a growing of initiatives for smart cities, and increasing role of system integrators; and more demanding applications and user experiences that include the mesh of devices. End users’ expectations increase, and in this context, companies must have the ability to be agile and meet demands by using the right enabling technologies, such as virtualization, converged infrastructure, cloud, IoT, BigData and network. Also for organizations top drivers for creating an IoT strategy are increased rates for employee productivity, time to market, cost reduction strategies and improving supply chain and logistics. Inhibitors found detect that methodologies has shifted in 2015 from the purely theoretical to being more anchored in early adoption performance gains. Also top challenges and factors for organizations to adopt IoT are security, upfront costs, and ongoing costs.

- Semiconductors & Heterogeneous Mobile Processor Hardware Markets: The semiconductor industry lives a steady growth trend that will reach double digits in 2016, followed by a cyclical downturn to -1% in 2017, having a strong correlation with global economy and consumer expenditure. A trend that has had an enormous impact on the Integrated Circuit market (IC market) is the unexpected growth of tablets, smartphones and consumer products, especially the expected growth of the IoT and the Fog Computing markets. In terms of European market, the same report concludes that Europe’s IC industry is healthy in terms of being successful at “More than Moore”, but still remains as not significant player in terms of global volume, being stuck at 7% of world‐wide production. EU still needs to gather better traction in the IC market, and there are several EU and national initiatives launched to increase Europe’s market‐share and competitiveness. The advent of Mobile and Internet Services during the last years and the battle from not only device vendors, but also from semiconductor vendors to serve the more demanding needs expected for mobile devices. These demands started to push more and more functionalities into the same chip, and semiconductor industries realized the concept of single System on Chip (SoC), which then brought the concept of Heterogeneous Mobile Processing (HMP) and computing, in which one single chip integrates together various components such as processors, connectivity, digital signal processors, graphics processing units solutions are integrated on a single chip, they work simultaneously to improve the performance of the device. In addition to showcasing achievement, part of this process requires opening up the technology to an even broader basis of users, making it possible for less and less specialized programmers to use – effectively hiding complexity through an intelligent programming model. Therefore, Software also plays a relevant role in the market. In order to utilize HMP hardware components, the market requires the introduction of programming languages and tools used to develop for these heterogeneous mobile processing and computing devices and platforms. The software components consist of the programming languages, and middleware tools such as Open CL, C/C++, Open VX among others. The main driver is the growing demand for better performance, fast computational speed and power saving are some of the factors contributing towards the development of HMP and computing solutions, clearly providing the means for ramping up the IoT/Fog1 context and consumer electronics segment. Inhibitors in the market include factors such as the consolidation trend has greatly reduced the number of IC manufacturers, which will lessen oversupply. The strong correlation of the market with global economy spending is also another relevant factor

- HPC market: The market will continue to grow at a solid rate. Within HPC Big Data continues to expand and perhaps, is identified to be the most transformative trend in the HPC world. Also Cloud Adoption for HPC purposes trend is probably higher than any might have thought, with more users taking advantage of HPC, more potential adoption options in the future. Software issues continue to grow requiring control of computing resources to be entirely automated by software pursuing the goal of

1 IoT: Internet of Things is the network of physical objects—devices, vehicles, buildings and other items—embedded with

electronics, software, sensors, and network connectivity that enables these objects to collect and exchange data. (https://en.wikipedia.org/wiki/Internet_of_Things) Fog Computing: is an architecture that uses one or a collaborative multitude of end-user clients or near-user edge devices to carry out a substantial amount of storage (rather than stored primarily in cloud data centers), communication (rather than routed over the internet backbone), and control, configuration, measurement and management (rather than controlled primarily by network gateways such as those in the LTE core network). (https://en.wikipedia.org/wiki/Fog_computing)

https://en.wikipedia.org/wiki/Internet_of_Things

https://en.wikipedia.org/wiki/Fog_computing



of 173

computing delivered in the form of “IT as a service”. The Vendor ecosystem will continue evolving with new heterogeneous hardware being adopted by the HPC market with the increasing adoption rates for non-X86 processors, FPGAs, GPUs & other accelerators hardware that are expected to alter the vendor landscape. There is a growing influence of the datacenter in the IT chain. From consolidation to expanding scale to embrace growing demands, HPC can be a key factor in a variety of Data Center strategies. HPC provides the means to solve some of the scale limitations exposed by current data centers, scaling computing capacity and reducing energy vs. computing capacity ratios. Several main drivers are fostering this solid expansion. First reduced pricing of entry solutions is attracting new users, including SMEs, and consolidating scaling on established players. Second, the emergence of new technologies that provide more cost effective solutions for extreme computing are now on top of the hype (e.g. GPU/FPGA). Third, the global race towards “exascale computing” is propelling sales of high-end supercomputers, but also important, is the fact that more SMEs and research organizations are exploiting HPC servers for high performance data analysis and advanced simulations used in everyday’ operations. Some Inhibitors are that the adoption of data-intensive simulations and high-performance data analysis is that firms adopting HPC for the first time to run high-performance business analytics and data-intensive simulations are too complex and time critical for enterprise server technology to handle effectively alone.

- Data Center market: The main driver is that organizations need to more effectively address the “almost-infinite growing” demand coming from mobile, cloud, analytics, and IoT services, which are shifting workloads allocation in corporate datacenters and driving greater use of service provider datacenters. At the same time datacenter operations have to deal with lower budgets, energy constrains, at pure economics, and also dealing with CO2 emissions, dealing with environmental concerns. At the same time organizations seeking the goal of IT delivered in the form of “IT as a service” with need of more automated control, by the use of proper software and middleware layers of IT resources for better, and more agile and flexible operations of disparate systems and technologies across multiple data centers. In terms of Inhibitors, while IT executives are now immersed in the trend of sourcing and deploying IT infrastructure in new ways; datacenter executives are now concerned with limiting factors, such as power and cooling, datacenter life-cycle management, floor space, and staffing. Winthin this market, Data Center Infrastructure Management (DCIM) segment is suffering a huge transformation during the last years with trends embracing DevOps and Agile methodologies, thus representing a huge shift in the industry, and therefore representing a high transformation pace of infrastructure management within organizations, and data centers. The most important Drivers and forces pushing the market are availability and sustainable IT.

- System Integration Market: the market will continue to grow as businesses continue to search for open and distributed systems and architectures bringing the main driver of moved organizations to eradicate heterogeneity, multiplicity, and silos created by a myriad of applications and infrastructures used, to truly cooperate together at one pace for the organization in truly cost-effective and unified solutions for managing IT infrastructures and application software available globally (distributed or centralized). Inhibitors aspects restraining the market are the challenges faced by organizations while integrating various systems together to become more flexible and productive; and the extremely high adoption rate of new technology approaches in various areas of the enterprise. Thus, requiring better management of IT infrastructures and services and data.

In terms of pure business-driven factors (covered in section 2.2.2.2), the document has identified various major aspects that require attention:

- In global economy, budgets are still tight, and IT budget is not different. This is especially visible in the case of Data Centers, or large supercomputing farms. In this cases energy is on top the of the cost list, forcing organizations to search for different approaches to address Power/Cost Reductions

- Organizations are Going Green driven either by Social Corporate Responsibility roadmaps, or by in an effort to create a Green Badge for marketing purposes. Companies are also using Green as positioning,



of 173

creating new services for customers enabling the user to choose form a list of potential green options and even applying gamifications techniques to develop their opportunities.

- Companies now seek to reduce cost and/or enable greener policies, which might come from using greener energy sources, or by involving energy reduction plans. These plans might consider energy optimizations from various angles, and now include more ambitious and holistic approaches in which applications, not only workload allocation managers, also take part on its own energy optimization. This optimization, even advancing at small steps, takes advantage of the law of large numbers, with thousands of servers that datacenters manage. This same optimizations can be pursued for HPC or supercomputing farms. Another related aspect is datacenters seeking new hardware configurations that provide better performance and computing power vs cost. This in fact turns to be an old way of approaching problems, by continuously searching new potential candidate technologies. Today the limits of old CPU-based architectures and Moore’s Law are almost reaching a limit in clock and computing power due to power/thermal and memory issues related to operation on higher frequencies, which can now be solved by using other hardware architectures such as GPUs, DSPs, FPGAs, SoC, Multi-Core, etc. configurations.

- Organizations now also have to deal with, and manage, risks associated with their environmental performance, rising a critical role for optimizing energy consumption and sources, due to a growing trend of government initiatives increasing environmental safety requirements and a societal pressure for adopting green solutions for organizations, which makes clear the need of proper strategies for cutting environmental implication and costs of the digital economy.

- And finally, a very important aspect is the emergence of New Business Models and Applications in IoT/CPS/Embedded Systems/Fog Computing domains. The IoT can be highly disruptive for individuals, enterprises and whole industries in different ways. First, its potential to reduce costs; lower organizational knowledge barriers; and to integrate and scale today’s infrastructures to meet future demands and futuristic visions. Second, IoT allows disruptive individual innovation in products, services, and solutions. And third, IoT impacts on the emergence of possibilities for new business models.

In terms of pure technical market-oriented factors (covered in section 2.2.2.3), the document has identified the following aspects:

- For datacenters, energy is an issue, a very expensive one. The workload carried by datacenters is also rapidly increasing due to a high growth rate in demand of Cloud, IoT and BigData. As energy costs grow due to larger energy consumption demand or due to increasing operational costs as demand grows, optimizing energy and improving the energy efficiency design of the facilities will be on top of the priorities list.

- In terms of HPC Data Center Trends the document highlights that HPC enables new ways to solve current data center scalability issues in several areas such as computing capacity; grow space limitations; reduced IT budgets; cooling and energy constrains, etc.

- In terms of HPC Hardware Trends the market is “Going Mobile” by using processors (and co-processor) technologies initially envisioned for mobile contexts but now provide valuable-solutions for other contexts.

- In terms of HPC Middleware Trends the document highlights two relevant trends. First the emergence of HPC Cloud, or cloud virtualization in an HPC environment; and second, BigData. Both need of new approaches for proper middleware software in the form of Workload Managers and Resource and Job Scheduling Management System (RJMS) taking into account hardware heterogeneity, energy and performance.

- In the IoT and Fog Computing contexts the digital mesh of smart devices require to run at speeds which are now being achieved by using novel architectures of heterogeneous hardware, especially with SoC



of 173

and FPGAs that provide significant gains such as high-powered and ultra-efficient. Another factor is that they are packed into small spaces and have demanding energy constrains.

The document also covers regulations in the areas of Energy and IoT (section 2.2.2.4). We provide an initial outlook on regulation due to the fact that it’s expected to become relevant in the future. However, this section does not pretend to be a full extension analysis of regulatory conditions since it’s outside the scope of the project and the task itself. The section covers the European Union 2020 Energy Strategy which defines the EU's energy priorities between 2010 and 2020; the EU Emissions Trading System - EU ETS (European Commission Climate Change Policies): the European Code of Conduct for Data Centres Programme; the Community Energy Savings Programme (CESP) or the Green Deal, and the “UK CRC Energy Efficiency Scheme” (CRC).

Section 2.2.3 covers the first assessment from the business analysis perspective of TANGO value proposition generated by the analysis carried out by the consortium, and included in the D7.1 Dissemination and Communication Plan (M3) document, providing an initial step to position the project results, and the alliance benefits, in the market, which is focused to solve real problems found in the market. First, it covers the dimension of Hardware Heterogeneity (notably, CPU+GPU and CPU+FPGA on chip and CPU+FPGA on board/chip and also GPU clusters). All these are are difficult to program, and there exists a range of tools. They require complex global architectures and architecting processes that makes them not so attractive. Second, it covers the dimension of Resource optimization showing that they require optimization in performance, security, criticality and power consumption. Moreover, they require optimization depending on heterogeneous hardware and tools as well.

The ecosystem includes several stakeholders grouped in several categories: Hardware Vendors (designers, manufacturers, resellers); Software Lifecycle (Software developers and designers; System Engineers; System Integrators; Software Middleware or Tools Vendor on Workload Management); IT Community (Software engineering; Other programming model approaches; Heterogeneous systems research); HPC, IoT/CPS, Smart Anywhere Everything, and BigData; Data Centers (Asset Management -Energy Management and Environment Sustainability Management; Software Middleware or Tools Vendor on Data Center Asset Management, Data Center Infrastructure Management); Business & Influencers (Analysts; Policy Markers); Technology Landscape Influencers (Standardization Bodies (OpenCL, OpenMP, OpenSPL, HSAIL) and industry groups; Research Projects and initiatives); Open Source Communities (SLURM; OscaR, etc.).

The document concludes including (in Annex A) the results and conclusions extracted from the work done in the internal market and value workshop. Also (included in annex B), the document presents various trends and requirements grouped in three groups: energy consumption/optimization; development frameworks; and technical aspects. These were extracted from the market analysis in order to be used in the definition of the questions that have been included in the engagement process with stakeholders, now underway. The intention is to validate trends and requirements that have been found in the early stage market analysis.

1.2 Introduction to State of the Art

Evolving the State of the Art that can be found in the DoW of the project, Part 3 of the document focus into see the actual SotA for three fundamental steps in the application lifecycle: development of the application, deployment of the application, and operation of the application. Thinking of this three steps, we decided to study the actual SotA for the following topics:

In Architecture Support for Low Power Computing (section 3.3), we studied several proposed architectures to support low power computing, including several previous European projects. We expect that TANGO architecture (subsection 4.4) will go beyond the current state of the art by tackling self-adaptation of both heterogeneous parallel devices and the applicatiuons that make use of them using a wider optimization



of 173

criteria (energy consumption, cost, time critically). TANGO will not only support this architectures during runtime, but it will also study the deployment process.

In Handling Quality Properties in the Software Development Life Cycle for Customised Low-Power Heterogeneous (section 3.4), we studied how to model hardware and software to express requirements that must be met by the mapping, as well as relevant terms of the objective function to optimize. We expect that the requirement modelling technique explored in TANGO will enable to express variability on quality properties trade-offs and indentify requirement patterns; A second part focus on how to optimize the mapping of software into hardware.

In Programming Models and Run-Time Management techniques for Heterogeneous Parallel Architecture (section 3.5), we study how in the lately years, the complexity in hardware architecture has been tackle by the use of programming models. In TANGO we will use a combination of StartSs mdoels to provide users a high-prductive programming model to develop application for clusters of heterogeneous parallel nodes.

In Modelling tools for prototyping with software emulation/simulation of heterogeneous parallel architectures (section 3.6) we take a look at the actual state to model and simulate the underlying hardware for the purpose of fast application prototyping. We feel that nor project before TANGO focused the simulation study into the field of energy consumption reduction.

In Monitoring of heterogeneous architectures (section 3.7) we take a look at the SotA techniques to monitoring energy consumption in CPUs, FPGAs or GPUs. We are not going to focus the project into evolving this technologies, but we need to employ the best possible accuracy in energy consumption per job if we want to be able to reduce the total energy consumption of an application without performance impact.

In Workload management techniques for heterogeneous architecture (section 3.8) we study the actual status of Resource Management and Job Scheduling in tradition HPC systems. The objective of TANGO is to enhance SLURM to facilitate the deployment of jobs into heterogenous arechitectures.

Finally, in Other: Distributed systems, security, networking, and Data Management (section 3.9), this year we focus into security aspect for heterogenous applications.

1.3 Introduction to architecture

Following the analysis of the business and technical requirements to understand how to best shape TANGO

solution for increase chance of successful Industry adoption, the specification of the architecture is

proposed. The development of this reference architecture covers the whole application development life

cycle and covers tools to assist engineers with fast prototyping from requirements/design modelling using

software emulation and hardware-in-the-loop prototypes as well as tools to assist engineers with the

specification of formal critical behaviours of their applications and quality property target levels for other

less critical behaviours. The architecture specification also includes the architectural roles, scope and

interfaces of the architectural components and communication patterns. The commonalities between the

envisioned use cases have been a topic of significant attention following the interaction between the

business goals analysis, the technical requirements elicitation and the architecture definition. These

requirements will assist with the development of high quality software and the identification of the

adequate design approaches to guarantee these factors.

This architecture complies with a standard IDE, middleware and infrastructure layers and supports components such as the Programming Model, the Application Lifecycle Deployment Engine VM manager and the Heterogeneous Parallel Device Cluster. The design of the architectural components was described in detail, some of which will require specific extensions in order to be able to deal with energy efficiency/low power management. In addition to this, the architecture also requires specific components



of 173

to be developed from scratch such as the Energy modeller and the Device Emulator. The rationale and functionalities of all those components are explained in this document.



of 173

Part 2. Market Analysis

2.1 Introduction

2.1.1 About the market analysis

This second part analyses IT trends and Market trends and context in various IT areas in which TANGO can later create impact. The analysis covers various other factors that influence on the development of the market and the potential impact of the project results.

The purpose of this deliverable is to provide the initial vision of a general overview of the current status of markets related with TANGO project. This includes aspects such as: Vendors of HPA Systems (CPU/GPU/FPGAs/DSP/Embedded Chip Market); HPC Vendors; Software Development Vendors (Software Vendors/Integrators, Workload/Resource/Scheduler Management & System Administration); Efficient Management of Heterogeneous computing clusters with HPAs; HPC Workload Management (Pure HPC & Hybrid HPC/BigData/Cloud infrastructures); Software for Data Centers, HPC and Computational Farms; Optimization tool suites and techniques (Energy/Power/Performances); or Computational intensive end-user applications.

This document has been created in synchronization with the exploitation task to define the initial vision of target stakeholders, value definition, and impact intentions roadmap, in which the dissemination and communication plan is based.

It is also important to highlight that this analysis and its conclusions, has been initially created with the intention to become a live vision, evolving and incorporating changes as the market evolves and matures. This will be achieved by carrying out a continuous market watch.

The conclusions extracted from this analysis are expected to serve as the main guidelines for dissemination and communication team; and exploitation and sustainability team as well.

2.1.2 Market analysis structure

The market analysis has been structured in two big sections providing details in each one of the specific topics related to the market analysis for TANGO.

- Section 2.2 provides a detailed analysis of the market and contains the following subsections:

o Section 2.2.1 establishes the baseline and introduction of the market related to TANGO scope.

o Section 2.2.2 analysis in deep level of details all related markets. This section includes the pure market analysis in Section 2.2.2.1; the analysis of trends form the pure organizational perspective in Section 2.2.2.2; the analysis of important trends imposed by technical perspective in Section 2.2.2.3; and finally, the analysis is of legislation and regulations that might affect TANGO results scope for energy (Section 2.2.2.4) and IoT (Section 2.2.2.5).

o Section 2.2.3 includes the results and main conclusions extracted after the initial business analysis carried out by the consortium. The section includes the vision of current problems solved by TANGO; list of stakeholders; and the current vision of competitors and other relevant players in the market (Section 2.2.3.1). Also relate with this business analysis, the document also includes as Annexes the results of the markets and market influencers in Annex A, and the inputs to define aspects to validate in stakeholder engagement questionnaires for market aspects in Annex B.

- Section 2.3 provides final conclusions of the analysis.



of 173

2.2 Initial Market Analysis

2.2.1 Market Introduction

There are several markets and areas in which TANGO may provide impact which, therefore, must be analysed and followed during the project life. For TANGO, it is the confluence of various colliding trends and markets evolving which are important:

- the emergence of markets like the Internet of Things (IoT) and Cyber Physical Systems (CPS), and the evolution of the existing, but related, Embedded Systems market;

- the rise of new markets from the high demands of Mobile and Global Internet Services markets, creating the new market of Heterogeneous Mobile Processing (HMP) and computing for Semiconductor Vendors enabling a myriad of new potential with new powerful devices in computing, changing the rules from the well-established CPU world, to a Heterogeneous-Hardware IT World;

- the evolution of High Performance Computing (HPC) market adopting other forms of powerful computing an middleware software in the quest for better performance vs. cost and power consumption ratios;

- the consolidation of trends in needs from the Data Center management and operations, requiring to operate more efficiently while reducing costs and power consumption, driven by high demands from businesses, and regulations;

- and the continuous grow of next-generation applications and domains that require intensive computational environments like Big Data (& Near-Realtime Big Data Analytics); eScience (Scientific/Research); Highly Demanding Computing Power Applications (HPC Apps, Image compression for space applications, Neural networks., etc.).

It is important to highlight here that these are all direct factors that thrive influence. However, we want to highlight one more that is one of the more relevant ones. This refers to software evolution and the way that the IT industry, and more concretely, how the related industry with the development lifecycle evolves. This includes how the industry reacts to these changes and evolves to provide a way to develop applications ready for the new world. This is a world that needs to create applications using existing, and expected hardware, to create new business opportunities or more efficiently support current ones.

Paul Valery once said2 that “The trouble with our times is that the future is not what it used to be.” This cannot be truer in the IT landscape. Every year, we are witnesses of market evolution, changes in trends, and colliding once-distant technologies. Surely, in the case of TANGO, there is truth in that. The future of several technical and business-driven areas is leveraging an opportunity for TANGO to create impact in the market; taking advantage of what is now definitely not what it used to be, or at least it goes in that direction.

Therefore, in the next sections, we analyse various adjacent and distant markets, providing the conclusions extracted from our initial work.

2 http://www.goodreads.com/quotes/87454-the-trouble-with-our-times-is-that-the-future-is

http://www.goodreads.com/quotes/87454-the-trouble-with-our-times-is-that-the-future-is



of 173

2.2.2 Markets, Contexts and Trends

We now provide, and analyses in more detail, markets, trends and specific contexts relevant for TANGO.

2.2.2.1 Markets

2.2.2.1.1 IT Market in general The IT market is expected to continue with slow motion growth. The forecasts for the last quarter of 2015 were updated to consider a negative 5.8%, instead of the last quarter forecasted negative4.9% in Gartner's forecast for worldwide dollar-valued IT spending growth in 2015. The analyst firm highlighted currency fluctuations as responsible for the change. However, maintained its forecast growth for 2015, predicting it to be virtually unchanged at 2.4%.

In terms of categories, the Gartner Worldwide IT Spending Forecast 3, predicts the following indicator behaviors for various major technology trends across the devices, IT services, data center systems, enterprise software, communication and telecom services markets, and across geographies. For the next period, Gartner predicts4 that worldwide IT spending will total $3.54 trillion dollars in 2016, which represents a slight increment over 2015 spending of $3.52 trillion dollars.

Figure 1: IT Spending

From Gartner analysis, we should highlight that only two segments are forecasted to decrease growth. First, Devices market (PCs, ultramobiles, mobile phones, tablets and printers) is forecasted to decline 1.9%, however, ultramobile premium devices are expected to drive the PC market forward with the move to Windows 10 and Intel Skylake-based PCs in 2016. Second, Telecom services spending is projected to decline 1.2%t in 2016 impacted by the abolition of roaming charges in the European Union and parts of North America, that outpass the growth in mobile voice and data traffic.

On the contrary, other segments continue to grow due to various reasons. One of the main reasons is the accelerating momentum in cloud infrastructure adoption and buyer acceptance of the cloud model. This impact in the IT Services market expenditure which is expected to return to growth in 2016, up 3.1% from 2015 and now projected to reach 940 billion in 201. Also, Data Center Systems' market expenditure is projected to reach 3.0% increase from 2015, also due to increase in demand, which is expected to continue to be strong through 2016. An important fat to highlight is that the Server market segment has seen stronger-than-expected demand from the Hyper-Scale sector, which has lasted longer than expected and not respecting the stationary spikes in demand which lasts for a couple of quarters before moderating. Another segment in which worse economic conditions hast had little effect is the global enterprise software market which is forecasted to grow a 5.3% from 2015.

3 http://www.gartner.com/technology/research/it-spending-forecast

4 http://www.gartner.com/newsroom/id/3186517

http://www.gartner.com/technology/research/it-spending-forecast/

http://www.gartner.com/newsroom/id/3186517



of 173

The main drivers found in the market are: that Cloud Services continue growing due to growing demand; and that there is a major transformation in industry and vertical industries. According to another analysist firm, IDC’s top five tech spending in 20155. The report shows that enterprises predicted their spending on security technologies by 46%, and 42% in cloud computing in 2015, especially in enterprises (more than 1.000 employees). It is also noted that 43% of IT executives predicted an increase in IT budgets in 2015, pointing that the main areas of spending included the Internet of Things (IoT) with 32%,

High Performance Computing (HPC) with 22%, and Energy-Saving & carbon-Reducing Technologies with 16%.

The main inhibitors identified in the market are: a bad economic situation worldwide; and fluctuant exchange rates. IDC expects 6constant currency CAGRs of 5.1% and 5.3% for the United States and Western Europe, respectively, over the forecast period. In this context, it is important to note that global economy has been volatile in 2015, and downside risks have increased, but mature economies have remained relatively stable (the United States rebounded from a weak first quarter to record stronger growth in the second quarter, and Western Europe has for the most part continued its gradual recovery.

2.2.2.1.2 Software Market Software industry is more important than ever in strategic transitioning and enabling greater degrees of digitalization for businesses. According to analyst Gartner7, the software market has been immersed during the last years in a transition provoking disruptive changes in the way how software services and technologies are developed, deployed, accessed and used.

Providers must now provide solutions in much more short time-to-market periods while they provide tools that underpin business reinvention. According to the same Gartner report, worldwide software expenditure is expected to grow 5.3% in 2016. This trend is also reinforced as organizations main focus is to stay competitive and are focusing investment on technologies to support existing system structures, while adopting cloud infrastructures (private or public) sustaining existing business models, or creating new ones when possible, to grow and advance the business. According to the same Gartner report, the main drivers are Cloud as the key pillar of this transformational process, as software vendors acquire and provide applications and infrastructure technology to support the cloud and the Internet of Things (IoT) movement. This generates temporary competitive advantage for new entrants and strain on the incumbents. A clear indicator of this is that there is a pure cloud vendor, Salesforce.com, in the top 10 list, with more than 5 Billion dollars in annual revenue, growing faster than any other enterprise software company.

2.2.2.1.3 Cloud Market According to a study from analyst firm Goldman Sachs8 spending on cloud computing infrastructure and platforms will grow at a 30% CAGR from 2013 through 2018 compared with 5% growth for the overall

5 Computerworld’s 2015 Forecast Predicts Security, Cloud Computing And Analytics Will Lead IT Spending

6 https://www.idc.com/getdoc.jsp?containerId=US40710015

7 http://www.gartner.com/newsroom/id/2696317

8 http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.pdf

Figure 2: IT Market Spending Trend

https://www.idc.com/getdoc.jsp?containerId=US40710015


http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.pdf



of 173

enterprise IT, with Amazon as the big player with shares taking 26% of the IaaS and PaaS markets¡ ($4B in revenue). According to IDC forecasts9 public cloud spending will more than double to 127.5 billion dollars by 2018, with $24.6 billion for IaaS; $20.3 billion in PaaS expenditures and $82.7 billion in SaaS expenditure. Another analysis from Cisco provides a comparative analysis of IaaS, PaaS and SaaS forecasts from 2013 to 201810 that predicts that 59% of the total cloud workloads

will be Software-as-a-Service (SaaS) workloads, up from 41% in 2013, eating space from IaaS (28% workloads down from 44% in 2013) and PaaS (13% workloads down from 15% in 2013).

In terms of types of cloud, IDC FutureScape predicts 11 that by 2016, “there will be an 11% shift of IT budget away from traditional in-house IT delivery, toward various versions of cloud computing as a new delivery model”, estimating that the 27.8% of the worldwide enterprise applications market will be SaaS-based, generating $50.8B in revenue up from $22.6B or 16.6% of the market in 2013. Another relevant figure is that 35% of new applications will use cloud-enabled, continuous delivery and enabled by faster DevOps life cycles to streamline time-to-market and more agile business models and innovation pace by 2017.

According to IDC report, Worldwide Quarterly Cloud IT Infrastructure Tracker12, total spending on cloud IT infrastructure (server, storage, and Ethernet switch, excluding double counting between server and storage) expected growth in 2015 was of 24.1% ($32.6 billion dollars). This corresponds to growing in expenditure on private cloud IT infrastructures by 15.8% ($12.1 billion), and on public cloud IT infrastructures by 29.6% ($20.5 billion).

For the five-year forecast period, IDC expects that cloud IT infrastructure spending will grow at a compound annual growth rate (CAGR) of 15.1% and will reach $53.1 billion by 2019 accounting for 46% of the total spending on enterprise IT infrastructure. In detail, they predict that spending on public cloud IT infrastructure will grow at a higher rate than spending on private cloud IT infrastructure (16.3% vs 13.2% CAGR) with predictions that show that by 2019, service “providers will spend $33.6 billion on IT infrastructure for delivering public cloud services, while spending on private cloud IT infrastructure will reach $19.4 billion”.

In terms of vendors, the cloud landscape continues in consolidation phase, in which big leaders continue to grow, with enterprises 75% of enterprises working with fewer than ten cloud vendors; and witnessing some vendor consolidations.

In terms of new ways of organizations adopting the cloud model, IDC also predicts in the same report, that the main driver is organizations adopting multi-clouds and hybrid–clouds solutions. More than 65% of enterprises will commit to hybrid cloud technologies before 2016, vastly driving the rate and pace of change in IT organizations. In a hybrid cloud two trends emerge. First more than half of these organizations will purchase new or updated workload-aware cloud management solutions; and second, that by 2017,

9 http://www.enterprisetech.com/2015/10/05/idc-forecasts-worldwide-cloud-it-infrastructure-market-to-reach-32-6-billion-in-

2015 10


11

http://www.idc.com/research/Predictions15/index.jsp 12

http://www.idc.com/tracker/showproductinfo.jsp?prod_id=961

Figure 3: Cloud Market Trends

http://www.enterprisetech.com/2015/10/05/idc-forecasts-worldwide-cloud-it-infrastructure-market-to-reach-32-6-billion-in-2015

http://www.enterprisetech.com/2015/10/05/idc-forecasts-worldwide-cloud-it-infrastructure-market-to-reach-32-6-billion-in-2015


http://www.idc.com/research/Predictions15/index.jsp

http://www.idc.com/tracker/showproductinfo.jsp?prod_id=961



of 173

35% of new applications will use cloud-enabled, continuous delivery and DevOps life cycles for faster rollout of new features and business innovation.

2.2.2.1.4 IoT, CPS, and Embedded Systems Opportunities for adopting what it is called the Internet of Things (IoT) are proliferating. Organizations are realizing the potential that this new context enables to streamline user experience, deliver better time responses and cost savings to a broad spectrum of enterprise tasks, current and future with the emergence of new contexts and business models beyond current organizational boundaries (physical and technological in the mobile world).

According to ABI Research’s forecast13 IoT-related value-added services are forecasted to grow from $50B from 2012 to $120B in 2018 at 15.71% CAGR growth.

According to MarketsAndMarkets.com report14 the global Internet of Things Market (which includes CPS, Embedded Systems and M2M communications markets) is estimated to grow from “$255.87 Billion in 2014 to $947.29 Billion in 2019, at a CAGR of 29.9%” during the forecast period of 2014-2019. It also says that “the software platform segment in the technologies, platform and services market is expected to grow at the highest rate”.

Cisco predicts15 that the global IoT market will be $14.4T by 2022, with the majority invested in improving customer experiences; and then followed by aspects such as reducing time-to-market ($3T); improving supply chain and logistics ($2.7T); cost reduction strategies ($2.5T); and increasing employee productivity ($2.5T).

IC Insights predicted16 that IoT-related revenues (excluding Internet servers, network infrastructure, and cloud-computing systems) will grow at a compound annual growth rate (CAGR) increase of 21.1% from 2013 to 2018 ($104.1 billion). It also segments the industry into five IoT market categories: connected homes, connected vehicles, wearable systems, industrial Internet, and connected cities (smart cities) and provides figures for segmented growth. A study from Oracle17 shows that 50% of today’s IoT activity is in manufacturing, transformation, smart cities and consumer markets. The market still debates over whether wearable systems segment will evolve from a niche to a major end-use category. However, according

to the same IDC report, “thanks to Apple’s smartwatches, which uses lots of ICs, sensors, and other components, there has been a jump in semiconductor shipments and sales to wearable IoT systems”.

13

http://www.woodsidecap.com/wp-content/uploads/2015/03/WCP-IOT-M_and_A-REPORT-2015-3.pdf 14

http://www.marketsandmarkets.com/Market-Reports/internet-of-things-market-573.html?gclid=CJ-h7_PPrssCFXMz0wodr-ULJg 15

http://www.cisco.com/c/dam/en_us/about/ac79/docs/innov/IoE_Economy.pdf 16

http://electronicspurchasingstrategies.com/2015/07/16/ic-insights-raises-growth-forecast-for-iot 17

http://tamarafranklin.com/wp-content/uploads/2015/09/Oracle-Internet-of-Things-Cloud-Service_RGB.pdf

Figure 4: IoT Value Chain

Figure 5: IoT Revenues

http://www.woodsidecap.com/wp-content/uploads/2015/03/WCP-IOT-M_and_A-REPORT-2015-3.pdf

http://www.marketsandmarkets.com/Market-Reports/internet-of-things-market-573.html?gclid=CJ-h7_PPrssCFXMz0wodr-ULJg

http://www.cisco.com/c/dam/en_us/about/ac79/docs/innov/IoE_Economy.pdf

http://electronicspurchasingstrategies.com/2015/07/16/ic-insights-raises-growth-forecast-for-iot

http://tamarafranklin.com/wp-content/uploads/2015/09/Oracle-Internet-of-Things-Cloud-Service_RGB.pdf



of 173

The EC foresees that CPS and Embedded Systems markets are ramping up, and predicts an estimated value on the embedded ICT market of more than € 850 billion world-wide and that more than 3 billion Embedded Systems are integrated in the devices and other systems every year.

But market reports state an important factor: IoT is a transformative technology itself. IoT has turned out to be a game in which everyone plays, not only hardware or networking vendors that have lost leadership of the market, while software/data analytics/service vendors, and device/component vendors have gained in market awareness. Gartner reiterated its forecast of more than 30 billion installed IoT units and estimated it will result in a 20% increase in potential revenue generated from software for manufacturers running ‘intelligent devices’. In Gartner’s view, the IoT context turns every manufacturer into a software provider, a transformation which will have profound impact on application strategy, architecture, development and integration. In fact, Gartner recommends that manufacturers differentiate with software, increasing the device intelligence by adding interactive software, also enduring a relevant role in the derived value chain (licensing and tools to manage the software).

Gartner also highlights that standards and their associated APIs will be essential because IoT devices will need to interoperate and communicate, and many IoT business models will rely on sharing data and services between multiple devices and organizations.

It is expected that the market will witness commercial battles between platforms and ecosystems in areas such as the smart home, the smart city, healthcare and mass consumer products and wearables due to the emergence of disparate vendor-based IoT approaches. In this context, mass market product vendors might need to release variants supporting multiple standards and ecosystems.

Another interesting market trend derived from IC Insights report is the shift in the location of data processing, having organizations demanding data processing capabilities on IoT sensors at the “edge” rather than in the data center, which reversed the trend from last year’s survey.

In order to facilitate the process of creating IoT applications, a number of stakeholders have created an industry itself. Industrial services providers, consulting firms, system integrators, OEMs, and operators, but also to a number of suppliers of IoT platforms and Middleware focused on simplifying all these tasks are now involved in what it is called, IoT Middleware Market. According to analyst firm MarketsAndMarkets.com18 the IoT Middleware Market is estimated to grow at a CAGR of 24.6% from 2015 to 2020, from $3.86 Billion in 2015, to $11.58 Billion by 2020.

Every computing, sensor, or communication device requires assessment to determine the best software to gather, transport and analyze relevant data from the IoT device object to an application, which is run on the cloud; locally on the device; on in an intermediate fog device. IoT middleware are responsible for addressing all these aspects, but also for taking care of large number of events generated in the network of devices; connectivity, data and device management device issues, especially on heterogeneous connected devices scenarios.

The market drivers found include factors such as a growing of initiatives for smart cities, and increasing role of system integrators; and more demanding applications and user experiences that include the mesh of devices. End users’ expectations increase, and in this context, companies must have the ability to be agile and meet demands by using the right enabling technologies, such as virtualization, converged infrastructure, cloud, IoT, BigData and network. Also for organizations top drivers for creating an IoT strategy are increased rates for employee productivity ($2.5T investment), time to market ($3T investment), cost reduction strategies ($2.5T) and improving supply chain and logistics ($2.7T).

Inhibitors found detect that methodologies has shifted in 2015 from the purely theoretical to being more anchored in early adoption performance gains. Also top challenges and factors for organizations to adopt

18

http://www.marketsandmarkets.com/Market-Reports/iot-middleware-market-84839232.html

http://www.marketsandmarkets.com/Market-Reports/iot-middleware-market-84839232.html



of 173

IoT are security, upfront costs, and ongoing costs. The battle in IoT platform might force vendors to have platform-aware versions of their products, which introduces the need to be prepared to update products during their life span as current standards evolve, and finally as new standards and APIs emerge.

2.2.2.1.5 Semiconductors Market & Heterogeneous Hardware Market Analyst firm IC Insights report19 predicts a steady growth trend for the semiconductor industry that will reach double digits in 2016, followed by a cyclical downturn to -1% in 2017, having a strong correlation with global economy and consumer expenditure.

A trend that has had an enormous impact on the IC market is the unexpected growth of tablets and smartphones. While it had the collateral effect to adversely affect PC sales, in the “Post-PC World”, it was good for the overall IC market and can be credited with the rapid rise of companies like Samsung, TSMC, and Qualcomm, who invested early in the mobile market. According to the same report, Japanese semiconductor industry and Intel missed the initial wave. In fact, TSMC has now more impact on the IC market than Intel (Intel still leads the market in semiconductor sales, but TSMC has surpassed Intel in “final market sales”, assuming a 57% gross margin for TSMC’s customers). Consolidation trend has also greatly reduced the number of IC manufacturers.

The market is segmented, according to the same report from IC Insights, into Computer, Communications, Consumer, Auto, and Industrial/other categories, showing that communications systems changed the evolution trend and will have more market share than computers and will held dominant in their projections until 2017. The same market analysis from Semi, states that specific segments within the IC market will CAGR grow of 9% in Industrial Applications; 10% in Automotive; 15% Equipment; and 23% CAGR in Internet of Things as the most relevant growth ratio, which means more chips, sensors, actuators, memories at lower costs.

In terms of European market, the same report concludes that Europe’s IC industry is healthy in terms of being successful at “More than Moore”, but still remains as not significant player in terms of global volume, being stuck at 7% of world‐wide production. EU still needs to gather better traction in the IC market, and there are several EU and national initiatives launched to increase Europe’s market‐share and competitiveness.

Another trend to watch, is that the segment of consumer products, the third segment in dispute, with wearable devices expected to ramp up adoption by mass market, bringing a lot of debate if it will finally be a reality. This type of devices seems not to make the jump to mass market adoption. These novel devices seem to be very attractive to younger generations, but they are not for everyone, they don’t create the “need” factor that smartphones have established across users from all ages.

Another trend in the market is the continuous evolution

19

http://www.3dincites.com/2014/09/major-trends-shaping-future-ic-industry

Figure 6: Semiconductor (IC) Revenue Forecasts

Figure 7: IC Market by system type

http://www.3dincites.com/2014/09/major-trends-shaping-future-ic-industry



of 173

on shrinking technology according to Moore’s Law. Now on the topic of the transition to 450mm, the report states that it is not clear if it would finally happen, with recent debate predicting that it will be delayed until 2019. Memory manufacturers seem not to be interested in 450nm because of the device scaling roadmap. The reason appears to be that “when 450mm enters volume production, there will be new memory architectures being implemented and its unlikely the memory manufacturers will transition to 450nm at the same time.”

Another trend in the market is the IoT and, more recently, the Fog effect. This scenario needs specialized hardware that can handle and act locally on the data acquired in the interaction with humans, other machines and sensors. Thus, cost (for large scale adoption) and complexity (computing capabilities) are two of the biggest challenges facing IoT/Fog implementations on a large scale, and the best way to overcome them, is to combine multiple functions into a single chip or system-on-a-chip (SoC) that handles processing, connectivity and sensor interfaces on a single component. These solutions also simplify and speed development and time to market, which is expected to keep the market hot. As these systems become available to business, a series of new business models and user experiences leveraged by adoption in mobile and IoT applications and services will help demonstrate market readiness and effectively help the technologies “cross the chasm” to become mainstream.

The previous context has brought the rise of a new market, Heterogeneous Mobile Processing (HMP) and computing for Semiconductor Vendors enabling a myriad of new potential with new powerful devices in computing, changing the rules from the well-established CPU world, to a Heterogeneous-Hardware IT World.

The concept of Heterogeneous HMP and computing came into focus when increasing demands for higher functionality on single System on Chip (SoC) was realized by semiconductor industries. As different heterogeneous hardware components and other connectivity solutions are being integrated on a single chip, they work simultaneously to improve the performance of the device. HMP and computing is therefore defined as an arrangement of different components such as processors, GPUs (Graphic Processing Units), DSPs (Digital Signal Processor), and other accelerators, which works together in System on Chip (SoC) to improve the performance and power efficiency of the device. The development of these heterogeneous systems is expected to continue over the next decade, with an increasing number of systems in deployment.

Heterogeneous systems have evolved from being mere specialized systems for appliances that accomplish IT purposes, and being an emerging trend in HPC to address the requirements of the most demanding workloads in the first years of this decade. Now, they have reached the general-purpose computing market from a different angle.

The advent of Mobile and Internet Services during the last years and the battle from not only device vendors, but also from semiconductor vendors to serve the more demanding needs expected for mobile devices. These demands started to push more and more functionalities into the same chip, and semiconductor industries realized the concept of single System on Chip (SoC), which then brought the concept of Heterogeneous Mobile Processing (HMP) and computing, in which one single chip integrates together various components such as processors, connectivity, digital signal processors, graphics processing units solutions are integrated on a single chip, they work simultaneously to improve the performance of the device.

According to MarketsAndMarkets.com analyst firm report20 this market is expected to grow at CAGR of 20.75% from 2014 onwards, and reach $61.7 billion by 2020. Some of the key players identified by the same analyst firm in this specific market parallel to the IC market, include ARM Holdings Plc. (U.K.), Auviz Systems (U.S.), Advanced Micro Devices Inc. (U.S.), Imagination Technologies Group Plc. (U.K.), Media Tek

20

http://www.marketsandmarkets.com/PressReleases/heterogeneous-mobile-processing-computing.asp

http://www.marketsandmarkets.com/PressReleases/heterogeneous-mobile-processing-computing.asp



of 173

Inc.( Taiwan), Qualcomm Inc.(U.S.), Texas Instruments Inc.(U.S.), and Samsung Electronics Co. Ltd.( South Korea).

In addition to showcasing achievement, part of this process requires opening up the technology to an ever broader basis of users, making it possible for less and less specialized programmers to use – effectively hiding complexity through an intelligent programming model. Therefore, Software also plays a relevant role in the market. In order to utilize HMP hardware components, the market require the introduction of programming languages and tools used to develop for these heterogeneous mobile processing and computing devices and platforms. The software components consist of the programming languages, and middleware tools such as Open CL, C/C++, Open VX among others.

The main driver, according to the predictions included in the MarketsAndMarkets.com report, is the growing demand for better performance and fast computational speed are some of the factors contributing towards the development of HMP and computing solutions. The benefits of HMP and computing are improved performance, and power saving, clearly providing the means for ramping up the IoT/Fog context and consumer segment. It also predicts that, in the overall market, the consumer electronics sector is expected to be the highest revenue generator and lead the market from the demand side. Also, with a large number of benefits, the market is gaining increased visibility in a variety of applications like consumer electronics, medical, tele-communications, automotive, and military and defense.

Inhibitors in the market include factors such as the consolidation trend has greatly reduced the number of IC manufacturers, which will lessen oversupply. The strong correlation of the market with global economy spending is also another relevant factor.

2.2.2.1.6 HPC Market IDC market analysis21 highlights that the HPC market, in the past 15 years, has been one of the fastest-growing global IT markets, and will continue to grow at a solid rate to reach $31 billion in 2019.

The HPC market (divvied in supercomputer; divisional; departmental; and workgroup categories) recovered in 2015 from a after a period of slowness. IDC expects estimate that it’s now around $10 billion (servers) and that it will continue to grow substantially this year and steadily through 2019. An important factor for this

behavior is the collision of HPC and Big Data trends, which had perhaps been bigger than expected. The report states that Storage segment remains hot and is expected to grow faster than other segments; that is being fueled by more data collection and more data analysis.” It also highlights that the Middleware segment has potential to establish a fast grow rate because of the expected increase in tool buying to accompany any large-scale movement to upgrade software. The report also shows that HPC plays a

relevant role in a growing number of vertical industries, most notably in financial technologies.

21

http://www.idc.com/getdoc.jsp?containerId=259211

Figure 8: HPC Market forecasts

http://www.idc.com/getdoc.jsp?containerId=259211



of 173

The study remarks that the scenario has dramatically changed in terms of vendor shuffling with the effect of IBM’s sale of its x86 business to Lenovo. HP is now the market leader with almost one third of the market. We were expecting to see other vendors around 15% like Dell and that Lenovo strengthening while IBM stumbles.

Big Data continues to expand and perhaps, is identified to be the most transformative trend in the HPC world, and the combination of Big Data and HPC is creating new solutions and scenarios for a more demanding scenario in which business requirements lead the way. Simply put, competitiveness depends on the ability to process enormous amounts of data, and HPC environments are necessary to properly deal with this kind of demand

driven by mobile and social analytics and the cloud, which are all playing a crucial role in driving new experiences.

Software issues continue to grow requiring control of computing resources to be entirely automated by software pursuing the goal of computing delivered in the form of “IT as a service”. There is the emerging need of middleware software that provides centralized management and unified orchestration that handles resource provisioning, performance monitoring, and management of multiple HPC-capable technologies, and its collaboration with other IT resources across multiple data centers and cloud computing resources.

Cloud Adoption for HPC purposes is probably higher than any might have thought. According to IDC report, 25% out of 157 surveyed HPC sites, report that they use clouds, and that 31.2% of their workloads were run on clouds due to an increase in the number of parallel applications that don’t require very specific architecture. The key pillar is the number of differences that distinguish the traditional cloud and HPC applications types requiring parallel computing power, which can be translated into different offerings with cloud-like resources. A number of companies, including Amazon, Univa, Penguin, R-HPC, SGI, Sabalcore, and Gompute offer specialized HPC clouds. At this point, it is worth highlighting the absence of IBM in this segment who provides many options for private clouds. HPC clouds will increase the accessibility and flexibility of HPC systems, which brings HPC to wider audiences by lowering costs. Thus, with more users taking advantage of HPC, more potential adoption options in the future.

Nascent HPC ROI models are showing dramatic value. According to IDC report, simply put, the model input is HPC investment and the output is revenue growth, profit, and job creation. The report shows that for every $1 invested in HPC $356 in revenue, and $38 in profit were generated. In this context, cloud adoption for HPC and optimization of computing performance vs cost (energy, hardware) ratio plays a critical role in the ROI equations

Vendor ecosystem will continue evolving with new heterogeneous hardware being adopted by the HPC market with the increasing adoption rates for non-X86 processors, FPGAs, GPUs & other accelerators hardware that are expected to alter the vendor landscape.

There is a growing influence of the datacenter in the IT chain. From consolidation to expanding scale to embrace growing demands, HPC can be a key factor in a variety of Data Center strategies. HPC provides the means to solve some of the scale limitations exposed by current data centers, scaling computing capacity and reducing energy vs. computing capacity ratios.

This solid expansion is being fostered by several Drivers according to IDC report. First reduced pricing of entry solutions is attracting new users, including SMEs, and consolidating scaling on established players. Second, the emergence of new technologies that provide more cost effective solutions for extreme

Figure 9: HPC Market Revenue



of 173

computing are now on top of the hype. For example, GPU and accelerator tracking has been promoted to a formal activity. Third, the global race towards “exascale computing” is propelling sales of high-end supercomputers, but also important, is the fact that more SMEs and research organizations are exploiting HPC servers for high performance data analysis and advanced simulations used in everyday’ operations.

Some Inhibitors are that the adoption of data-intensive simulations and high-performance data analysis is that firms adopting HPC for the first time to run high-performance business analytics and data-intensive simulations are too complex and time critical for enterprise server technology to handle effectively alone. Thus, creating the need to buy new tools to ease the move for large-scale, and of course, it creates at the same time an opportunity for existing HPC sites and among commercial software players, providing

2.2.2.1.7 Data Center Market MarketsAndMarkets.com report22 states that the global service market for data center is estimated to grow at a CAGR of 15.80% during 2016-2019 due to expected global rise in the numbers of new data centers, high demands and requirements for improved operational efficiency, and cost & energy optimizations under shrinking IT budgets, which presents a great opportunity to various established, and new, datacenter solution and services providers (Construction, Consulting, Installation, Professional, Training and Development, Maintenance, and Colocation).

However, the tendency of having lower IT budgets; the emergence of new infrastructure types and approaches to mange IT, and the increasing adoption rate of new technologies, solutions and services; while providing better computing performance are some of the key factors effecting the growth of this market. Also important is the rising role and awareness for optimizing energy consumption, in which there is a growing trend of government initiatives increasing environmental safety requirements and pressure for adopting green solutions for organizations.

Organizations seeking the goal of IT delivered in the form of “IT as a service” are now struggling with the need bring automated control, by the use of proper software and middleware layers of IT resources for better, and more agile and flexible operations of disparate systems and technologies across multiple data centers. The novel Software Defined Data center (SDDC) technologies allow to magically creating data centers with keystrokes and mouse clicks instead of lifting servers and messing with cables required.

According to the same analyst, “over the past few years, many established datacenter solutions providers and new companies have emerged in this market providing individual services or full portfolio of services for data center. Various companies such as HP (U.S.), Cisco (U.S.) and IBM (U.S.) etc. have adopted expansion of its Research and Development (R&D) capabilities, new service launches, and strategic partnerships/agreements as its key business strategies to ensure its dominant position in this market. In addition, Schneider Electric (France), Emerson Network Power (U.S.) and Dell (U.S.) are some of the key players in the global service market for data center.”

According to analyst 451 Research report23, the worldwide colocation data center market is expected to reach $36 billion by the end of 2017 (63% grow); and that the global footprint will grow over 40 million square feet in that time as well, reaching to 150 million square feet in 2017 (75% grow). An important aspect that the report states is that the market seems to remain fragmented, having 75% of current revenue generated by local providers with less than $500 million in annualized colocation revenue. In terms of maket leaders, Equinix, with close to 8.5% of global market revenue remains as leader; followed by Digital Realty with 5.6%., but number one in terms of operational square feet, with or 9.6% of global capacity.

22

Service Market for Data Center by Service Type (Professional, Training and Development, Maintenance), by Data Center Type (Mid-Size, Enterprise, Large), and by Data Center Tier Type - Global Forecast to 2019 23

http://www.datacenterknowledge.com/archives/2015/04/17/colocation-data-center-market-to-reach-36b-by-2017

http://www.datacenterknowledge.com/archives/2015/04/17/colocation-data-center-market-to-reach-36b-by-2017



of 173

Consolidation in the data center market has been ongoing. The biggest recent deal was the merger between Interxion and TelecityGroup in Europe. A recent example in the U.S. was the Fortune Data Centers and Dallas Infomart merger last October. Consolidation still governs the market, with telcos and cable companies buying service providers such as Latisys been acquired by Zayo; and Canada’s Shaw Communications acquired ViaWest, that have followed the steps of Verizon acquiring Terremark in the past. The market focus seems to be splited between two dimensions. Those markets focusing on core markets (e.g. Equinix and Coresite); and those focusing on emerging markets (e.g. 365 Data Centers and EdgeConneX).

In the same report, 451 Research estimates that today, “less than half of the world’s total operational space for colocation (space supporting IT equipment) is in North America: about 43 percent. EMEA and Asia-Pacific compose a large portion of the other half, each accounting for one quarter of the market. However, this is the first quarter that APAC has edged out EMEA as the second-largest market. Latin America is around 4.5 percent of the market.”

As seen, reducing energy consumption is a critical factor in the datacenter market. Significant improvements in efficiency have occurred during the last years in datacenters. However, market demand has grown exponential (according to a Cisco report24 annual global data center IP traffic will reach 10.4 zettabytes (863 exabytes [EB] per month) by the end of 2019, up from 3.4 zettabytes (ZB) per year (287 EB per month) in 2014) especially driven by cloud computing and mobile services (annual global

cloud IP traffic will reach 8.6 ZB (719 EB per month) by the end of 2019, up from 2.1 ZB per year (176 EB per month) in 2014.) outstrips by far these energy savings.

According to a report from Green Peace25, “If the cloud were a country, it would have the fifth largest electricity demand in the world” as shown on the next diagram. Energy used to satisfy only Internet demand was approximately 623bn kWh, and what’0s even worse, the same report shows that, in 2012, datacenters demand more power at an increase rate of 58%. Energy consumed in datacenters becomes critical, and proper strategies must be established for rapidly cutting environmental implication and costs of the digital economy.

This is pure economics. Energy input is a significant cost for datacenters. The 451 Research analysts firm identified26 that operational energy consumption is about 30% of a 15-year Total Cost of Ownership for datacenters. The Power Usage Effectiveness (PUE) of a datacenter is typically 1.8-1.9 meaning that for every 1.8 Watts of energy put in, 0.8 of that is used in cooling and distributing. As example, the electricity bill for all US datacenters is predicted to be 14 billion dollars by 202027. Therefore, clearly even small percentage point improvements have very high market value.

Furthermore environmental concerns, focusing on CO2 emissions, have come to the forefront of global and national politics, societal activism and corporate social responsibility (CSR). In Europe, the whole ICT

24

http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html 25

GreenPeace: http://www.greenpeace.org 26

http://www.zdnet.com/article/energy-costs-mean-tough-decisions-for-datacentre-owners 27

http://www.computerworld.com/article/2598562/data-center/data-centers-are-the-new-polluters.html

Figure 10: Energy Consumption by Cloud Computing extracted from “How Green is your Cloud?” GreenPeace Report (2012)

http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html

http://www.greenpeace.org/

http://www.zdnet.com/article/energy-costs-mean-tough-decisions-for-datacentre-owners/

http://www.computerworld.com/article/2598562/data-center/data-centers-are-the-new-polluters.html



of 173

environment accounts for about 10% of all energy consumed28. Large internet, cloud and datacenter providers are a target for criticism because of the scale of their contribution to this and as large corporations with well-known brands. Beyond economic benefits and CSR, reducing consumption and emissions for these companies is also necessary as part of risk management.

The most important Driver is that organizations need to more effectively address demand coming from mobile, cloud, analytics, and IoT services, which are shifting workloads allocation in corporate datacenters and driving greater use of service provider datacenters.

In terms of Inhibitors, while IT executives are now immersed in the trend of sourcing and deploying IT infrastructure in new ways; datacenter executives are now concerned with limiting factors, such as power and cooling, datacenter life-cycle management, floor space, and staffing.

Also important within the datacenter market, the datacenter infrastructure management segment (DCIM) is suffering a huge transformation during the last years with trends embracing DevOps and Agile methodologies, thus representing a huge shift in the industry, and therefore representing a high transformation pace of infrastructure management within organizations.

According to a MarketsAndMarkets.com report29 the DCIM global market is estimated to grow from in a CAGR growth rate of 47.33% from 2011 ($307 million) to 2017 ($3.14 billion). It also states that currently, banking continues to be the largest adopter for DCIM; and that in terms of geographies, North America continues to be the biggest market for DCIM software and services. However, it expects that over the next five years, Asia-Pacific region will experience increased market traction, to become the biggest DCIM market globally.

The most important Drivers and forces pushing the market are availability and sustainable IT. The report, also states that “these factors will positively impact the data center infrastructure management market because of the global push for Green datacenters."

2.2.2.1.8 System Integration Market The analyst firm MarketsandMarkets.com forecasts30 the system integration market will grow at CAGR growth rate of 11.6% from 2013 ($191.36 billion) to 2018 ($331.76 billion) and it is expected that future outlook for growth will continue to remain bright as businesses continue to search for open and distributed systems and architectures.

System integration solutions have brought to enterprises improved management of IT infrastructures, services and data, reducing the boundaries of management. This market has moved organizations to eradicate heterogeneity, multiplicity, and silos created by a myriad of applications and infrastructures used, to truly cooperate together at one pace for the organization. This brings cost-effective and unified solutions for managing IT infrastructures and application software available globally (distributed or centralized) in the organization, thus providing decentralization, digitalization, and globalization of business processes.

Relevant segments of the market are: Cloud Integration (Cloud Stack -as-a-Service), Data Center Infrastructure Management (DCIM), Application Integration services (application integration, data integration, unified communication, integrated security software, and integrated social software); Integration Consulting (application lifecycle management, business transformation, and business process integration).

The same report identifies relevant players in this market such as Accenture, Capgemini Group, Computer Sciences Corporation (CSC), Fujitsu Limited, IBM Corporation, Infosys Technologies, Lockheed Martin

28

Mobilising Information and Communications Technologies to facilitate the transition to an energy-efficient, low-carbon economy." , European Commission Recommendation - C(2009) 7604 - Brussels, 9.105.2009 29

http://www.marketsandmarkets.com/PressReleases/dcim-market.asp 30

http://www.marketsandmarkets.com/PressReleases/system-integration.asp

http://www.marketsandmarkets.com/PressReleases/dcim-market.asp

http://www.marketsandmarkets.com/PressReleases/system-integration.asp



of 173

Corporation, Northrop Grumman Corporation, Science Applications International Corporation, Tata Consultancy Services, and Wipro.

The most important Drivers of the market driving the consistent demand for integration of systems globally is current demand, having more demanding requirements in today’s economy and organizations, that increase the need for industrial and competitive growth. Telecommunication and IT verticals are the ones driving the market, followed by others like defense, and especially, the rise of public administration investment in IT.

Inhibitors aspects restraining the market are the challenges faced by organizations while integrating various systems together to become more flexible and productive; and the extremely high adoption rate of new technology approaches in various areas of the enterprise. Thus, requiring better management of IT infrastructures services and data.

2.2.2.2 Business-Driven Trends

In terms of pure business-driven factors, we identify various major aspects that require attention:

- In global economy, budgets are still tight, and IT budget is not different. One of the key aspects that CIOs refer about strategies, is the need to reduce, or limit, the IT budget while extending IT infrastructures to become more flexible with new approaches like cloud. This is especially visible in the case of Data Centers, or large supercomputing farms. In this cases energy is on top the of the cost list, forcing organizations to search for different approaches to address Power/Cost Reductions Gartner Data Center Conference 201231.

Figure 11: Gartner Data Center Conference 2012 conclusions

- Organizations are Going Green driven either by Social Corporate Responsibility roadmaps, or by in an effort to create a Green Badge for marketing purposes. Companies are also using Green as positioning, creating new services for customers enabling the user to choose form a list of potential green options and even applying gamifications techniques to develop their opportunities.

- Companies now seek to reduce cost and/or enable greener policies which might come from using greener energy sources, or by involving energy reduction plans. These plans might consider energy optimizations from various angles, and now include more ambitious and holistic approaches in which applications, not only workload allocation managers, also take part on its own energy optimization. This optimization, even advancing at small steps, takes advantage of the law of large numbers, with thousands of servers that datacenters manage. This same optimizations can be pursued for HPC or supercomputing farms.

31

Gartner Data Center Conference 2012 Profile, December 3 – 6, 2012, Las Vegas USA, gartner.com/us/datacentre, (http://www.gartnerinfo.com/lsc31/DataCenter2012Profile.pdf)



of 173

Even though technology is continuously optimizing and reaching to new limits, efficiency is still important, in large numbers. Quantity makes small increments become significant for big improvement and optimization in large data centers of thousands of servers, especially in the current context which shows a growing number in data centers and workload. Moreover, from the operational perspective, provides ROI in terms of volatile and higher energy prices; and regulation to reduce carbon footprint. Another related aspect, is datacenters seeking new hardware configurations that provide better performance and computing power vs cost. This in fact turns to be an old way of approaching problems, by continuously searching new potential candidate technologies. Today the limits of old CPU-based architectures and Moore’s Law are almost reaching a limit in clock and computing power due to power/thermal and memory issues related to operation on higher frequencies, which can now be solved by using other hardware architectures such as GPUs, DSPs, FPGAs, SoC, Multi-Core, etc. configurations.

- Organizations now also have to deal with, and manage, risks associated with their environmental performance. Corporations have been affected by their green involvement, such as the case of Facebook,32 which built a completely optimized data center in terms of energy-efficiency, but happens to be located in a town that generates the majority of its power using coal plant. Fact that rapidly spread on Internet and social networks, including Facebook, and finally involving Greenpeace to ask Facebook to find ways to better run their facilities entirely using renewable energy sources. To mitigate this kind of undesired promotion, brands now buy or generate their own green or renewable energy sources (RES), or purchase Green Energy Purchase Agreements. Another relevant derived aspect is that public administration is taking participation by giving companies governmental incentives to put in place this kind of initiatives.

- Now that sustainability continues to climb up on the corporate strategy agenda, organizations are starting to adopt (and implement) various methodologies and software technologies that help organizations keep measure, monitor, and manage, energy aspects. Moreover, corporations have already started to extend accountability of these aspects along the full Supply-Chain Accountability, making the use of software and metrics a requirement. This trend needs strong support from reporting; optimization; and performance management tools now that companies use multiple data center facilities around the world with Cloud Platform Management (CPM) and Workload Scheduling and Allocation Software’s; and different and green energy sources;

- Environmental sustainability is a novel market developing sustainability services and solutions for corporate and also small organizations, enabling that companies embrace greener policies and strategies in various dimensions into the value chain. The evolution of this market can be seen from larger software and IT service vendors who started with their own internal solutions. Nowadays the market is encompassed by players ranging from a number of sectors, including consulting and IT services companies that have created broader portfolios including sustainability; and software industry players pushing green and optimization techniques into DCIE/DCIM/CPM/JRMS systems among others.

- Emergence of New Business Models and Applications in IoT/CPS/Embedded Systems/ Fog Computing domains. The IoT can be highly disruptive for individuals, enterprises and whole industries in different ways. First, its potential to reduce costs; lower organizational knowledge barriers; and to integrate and scale today’s infrastructures to meet future demands and futuristic visions. Second, IoT allows disruptive individual innovation in products, services, and solutions. And third, IoT impacts on the emergence of possibilities for new business models.

For business, the IoT brings the potential to help companies to develop new markets and customer experiences, built on the foundation that, now, businesses are able to gather valuable real-time data directly from customers, sensors and other systems, allowing to know how they use products

32

http://www.datacenterknowledge.com/the-facebook-data-center-faq-newest-page

http://www.datacenterknowledge.com/the-facebook-data-center-faq-newest-page/



of 173

and services. Therefore, opportunities for adopting IoT solutions and platforms are proliferating. Organizations are realizing the potential that this new context enables to streamline user experience, deliver better time responses and cost savings to a broad spectrum of enterprise tasks, current and future with the emergence of new contexts and business models beyond current organizational boundaries (physical and technological in the mobile world). In the offering side, with the introduction of embedded software and app-driven hardware into IoT devices, and the ability, through software licensing, to monetize those device functions and features, devices have now become intelligent solutions capable of generating completely new types of revenue streams when connected to the Internet, enabling all kinds of services, solutions and big data offerings around every day consumers and industrial contexts. In this regards, hardware companies, and IoT makers, are becoming more software-centric. Manufacturers are already starting to think and act more like software companies, leveraging the software applications they build into their products as a driver to reduce manufacturing costs, increase product innovation, and capture new revenue streams. Taking a software-centric approach means that device makers are now redesigning products from fixed-function and disconnected devices; to more flexible and seamlessly connected systems. However, in order for this to happen, developers must be able to build such (not so) visionary scenarios and device vendors must create software that facilitates the development of applications for the heterogeneous hardware market. Moreover, IoT platforms will also need to be able to manage multiple heterogeneous devices and facilitate software middleware that helps managing IoT complex scenarios. Every computing, sensor, or communication device requires assessment to determine the best software to gather, transport and analyze relevant data from the IoT device object to an application, which is run on the cloud; locally on the device; on in an intermediate fog device. This aspect supports a whole industry of IT and industrial services providers, consulting firms, system integrators, OEMs, and operators, but also to a number of suppliers of IoT platforms and middleware focused on simplifying all these tasks. Thus, facilitating development process to creation IoT applications.

2.2.2.3 IT Trends

The use of heterogeneous hardware and software platforms offers significant promise in terms of increasing performance, lowering cost, boosting security and dramatically reducing energy consumption. We now describe some of the IT trends that the market is showing.

2.2.2.3.1 Data Center Energy is an issue, a very expensive one. The workload carried by datacenters is also rapidly increasing due to a high growth rate in demand. Every day, more services are used by more people, with more devices and from multiple devices, during more time, from everywhere, at any time, in more countries, etc. This demand scenario grows if we consider the future effect that IoT and Big Data trends.

As energy costs grow due to larger energy consumption demand or due to increasing operational costs as demand grows, optimizing energy will be on top of the priorities list. And therefore the datacenter industry is taking new approaches to use more efficient systems scaling from pure technology components and equipment such as servers, to chassis, racks, networking, etc.; to other equipment such as air cooling systems; and finally related with construction such as aisles design, capacity planning, or facility construction, and energy source co-generation.

Moreover, improving the energy efficiency design of the facilities (datacenters are now built considering modern specifications to optimize energy; generate their own energy, often rentable sources; and even in some cases, reuse their own impact to feed the generation of energy such as internal heat)

2.2.2.3.2 Heterogeneous Hardware Our world is heterogeneous with millions of different lifeforms in which every form is unique and has its own particularities and needs, which is why it is so beautiful and complex at the same time when different



of 173

species interact in an environment. Computing systems are not different. There are several different architectures and hardware best suited for different tasks and requirements, expected to provide different outcomes and performance.

Just like in the real world, why should programmers stick only with just one of them to solve a problem? Why should they just accept all pro’s and con’s that come with that just one solution? Thankfully recent trends tend to experiment and embrace diversity and heterogeneity when building systems. We are now seeing new systems that combine a number of different classes of hardware and architectures, in an effort have the flexibility to use the best architecture for each task or the ability to pick the right task to make optimal use of a given architecture or hardware.

This brings two problems to the table. First it needs to solve how heterogeneous systems will evolve by adopting new architecture types and heterogeneous hardware, to collaborate together and making them compatible, abstracting underlying complexity. Second, this is exactly where middleware software comes in. The lack of middleware and a standardized programming model and environments empowering developers and simplifying how to program for diversity in hardware configurations and disparate architectures, that can manage the diverse set of resources in a common framework, is what has slowed down the evolution of this kind of paradigm. However, things seem to start changing and the need for this kind of solutions in contexts such as the IoT and HPC shows that organizations will invest in this kind of solutions. However, the part of the equation that says that “heterogeneity is properly architected and managed” has been easier said than done, and the difficulty inherent in traditional heterogeneous systems collaborating only make sense if they provide less trouble than they’re worth. And now is the moment for this.

Driven by the relentless and increasing volume and velocity of Big Data analytics, the largest datacenters and leading Internet services, social networks, analytics and search engines are now accelerating their data analysis and search operations using systems based on heterogeneous hardware like FPGAs & GPUs that dramatically speed the entire lifecycle of data analysis. In the old days, computing power demand was once solved using spare space and adding hardware, but now, it needs greater horsepower, lower latency, and the ability to instantly analyze disparate types of data which can only be provided by using new purpose-built hardware. These configurations are enabling the replacement of thousands of old CPU-based servers with new systems that massively execute operations in parallel; consuming less space, less power and providing massive Total Cost Of Ownership (TCO) savings.

2.2.2.3.3 HPC Evolution Today, there is a large number of context demanding high computing power in Industry, Research, Academia and other collectives for Data Analytics; Climate & Weather Simulation; Ocean Simulation; Electro-Magnetics; Computational Chemistry Quantum Mechanics and Dynamics; Computational Fluid Dynamics; Reservouir Simulation; Computational Biology; Seismic Processing; Structural Mechanics Implicit and Structural Mechanics Explicit. And there will be more tomorrow, with the emergence of Digital Health Sector, Nano & Bio Technology, Materials, and several Nano-Sciences, etc.

This needs have been typical satisfied with HPC solutions, which have even evolved to now offer more cheaper and flexible, but yet powerful HPC solutions based on Cloud resources. But now, the HPC community is starting to look to a broader concept using heterogeneous types of computing devices, and even embracing distributed parallel architectures to provide HPC services.

In terms of energy and computing power efficiency in HPC we identify trends in the following categories: Data Center, Hardware, Middleware and Software.

- In terms of HPC Data Center Trends we have already analyzed that HPC enables new ways to solve current data center scalability issues in several areas such as computing capacity; grow space limitations; reduced IT budgets; cooling and energy constrains, etc.



of 173

- In terms of HPC Hardware Trends the market is “Going Mobile” by using processors (and co-processor) technologies initially envisioned for mobile contexts.

According to a NVIDIA report33, that at the time of writing the report, the top two systems on Green500 list34 (which rates the 500 most energy efficient supercomputers based on performance achieved relative to power consumed) were powered by advanced GPU accelerators. This shows that raw performance is no longer the exclusive measure of the value and impact of supercomputers, systems now need to deliver higher performance with reduced power consumption, and GPU accelerators, and other types of hardware, demonstrates that heterogeneous hardware have the ability to deliver unmatched levels of energy-efficient computing power. Two trends are important to be highlighted here. First, while GPUs can condense more processing power into smaller chips than CPUs, but they both consume the same power. This is when FPGAs come into the game. The need is now to pack as much computing power as possible with the least amount power consumption possible. FPGAs shine here since the enable to reduce an algorithm into an optimized hardware implementation in an FPGA. Moreover, hardware can even go further level of power vs performance ration with the use of custom non-reprogrammable chip such as ASIC or ASSP, but they are not flexible, since they never, ever, cannot run a different algorithm.

- In terms of HPC Middleware Trends we must cite two relevant trends. First the emergence of HPC Cloud, or cloud virtualization in an HPC environment. Cloud computing would seem to be an HPC user’s dream offering almost unlimited storage and instantly available and scalable computing resources, under pay-as-you-go cost basis. However, this trend must be understood having in mind that virtualization is not suitable for applications requiring pure HPC performance super-powers. However, not all workloads running on HPC are pure-parallel. This is the key pillar in which this emerging trend provides value to virtualize resources and manage parallel workloads with the aid of hypervisors.

Moving up the HPC tree, we find that requirements imposed by the type of application are the most critical aspects that enable the use of cloud environments in HPC. It is important the special case of constrains imposed by Big Data applications, which require large amounts of data; low latency and high throughput interconnects not found in the traditional cloud. For instance, the time to move large datasets to the cloud outweighs computation time, making the cloud approach a slow solution. Also, if high-performance networks are not available, many HPC applications run slowly and suffer from poor scalability issues and there is no performance gain when adding more nodes to the system. In the case of I/O-sensitive applications, without a very fast I/O subsystem, they will have a poor performance because of storage bottlenecks, which are solved in pure HPC systems. Also important is the trend of using specialized accelerators and new types of hardware that provide high computing performance ratios, which is mostly specific to HPC systems and therefore is not found on typical cloud hardware. And on top of all these aspects, Cloud HPC requires the overhead introduced by hypervisors, possibly making the gains not worth the effort. However, basic HPC applications requiring non-critical parallel computing power can benefit from this cloud approach with a proper use of HPC services that can range from shared HPC clusters to fully virtualized cloud environments. In the context of large HPC supercomputers with a large number of resources, and the submission of large numbers of computation jobs in larger network diameters and more complex designs, the need for new approaches for proper middleware software in the form of Workload Managers and Resource and Job Scheduling Management System (RJMS) is needed. The goal of the RJMS is to satisfy users’ demands for computation and assign user jobs upon the computational resources in the most efficient manner in terms of a number of dimensions such as time criticality, performance, cost, energy optimization, etc. Current trends include firstly, the need to benefit of adopting scheduling capabilities using, and abstracting, heterogeneous hardware and cloud virtualization environments; and secondly, of

33

http://nvidianews.nvidia.com/news/nvidia-tesla-gpu-accelerators-power-world-s-most-energy-efficient-supercomputer#sthash.G36jD98a.dpuf 34

http://green500.org/.#sthash.G36jD98a.dpuf

http://nvidianews.nvidia.com/news/nvidia-tesla-gpu-accelerators-power-world-s-most-energy-efficient-supercomputer#sthash.G36jD98a.dpuf

http://nvidianews.nvidia.com/news/nvidia-tesla-gpu-accelerators-power-world-s-most-energy-efficient-supercomputer#sthash.G36jD98a.dpuf

http://green500.org/#sthash.G36jD98a.dpuf



of 173

course, the need to introduce Multi-Agent Systems (MAS) for energy aware applications and its enabling frameworks for more efficient and dynamic approaches for HPC workload allocation and adaptation. As we look forward in terms of the future of workload management, we identify various major inter-related trends: application insight; big data awareness; virtualization and HPC Cloud and HPC Heterogeneous architectures. First, workload managers need to have greater insight into the applications they run. The more understanding they have, the more efficiently it can schedule, manage, and adapt the computing environment. Besides the basic workload requirements that today’s systems are able to track, the future expects an evolution that puts more emphasis on understanding an application’s purpose, current and future needs, via a set of key metrics, to make much more optimal decisions. Application-specific metrics are more important than generic CPU, storage, or memory metrics, since they provide insights of the application’s expected behavior. In this context, Energy related metrics are expected to grow in the context of HPC in its conquest of exascale computing.

- Second, managing the current flood of data that Big Data world has brought is difficult, and in terms of workload management, the future depends on being able to efficiently manage it. With multiple applications running simultaneously in a cluster, workload managers need to have a holistic vision of how to satisfy I/O requirements for all applications, so their I/O demands do not conflict, and therefore, not becoming a bottleneck. This puts the focus on future workload managers to integrate directly with the storage management system. Another relevant aspect in terms of data is the fact that today’s systems have a strong dependence on data locality. Current workload managers treat data as blobs of raw bytes, to simply be exchanged, having little understanding of their content. The future more demanding scenarios need that workload managers understand data’s structure and attributes, and therefore exploit those factors in scheduling decisions to better allocate more I/O operations per second.

Third, now that the overhead of virtualization has decreased in recent years to the point of being almost negligible for many applications; combined with virtualization’s flexibility, has made virtualization a growing demand for HPC. Today, more and more HPC sites are adopting virtualization for a wider variety of workloads because of this increased flexibility, resulting in higher system utilization and greater return on investment. Virtualization can be then taken to the next level, with HPC clouds (public or private) combining automated machine provisioning with workload management technologies, enabling pay-per-use cost models and self-service job submission. Consider adding in the equation the power of using multiple underlying heterogeneous hardware architectures providing computing power, and expect a bright future.

- In terms of HPC Software Trends we are immersed in a phase of Rethinking Parallel Computing. The market is now in the quest for reaching Exascale computing as a major achievement in computer engineering for being the order of processing power of the human brain. This scenario will imply new requirements for software in various areas: it imposes extreme power constrains; it introduces extreme scalability issues; and introduces the problem of fault detection and correction in systems in which HW & SW are extremely correlated. All these factors require rethinking the current parallel computing paradigm from the hardware, to the architecture perspective, but more importantly, in terms of the execution approaches (performance, synchronization, memory operation, hardware operations, etc.) and programming models challenges for exascale. The latter requires to provide solutions to coordinate resource allocation; a clean way to share data with consistent memory models; the introduction of guidance from mathematical models that enable potential adaptive and continuous representation; that is able to manage code by Abstract Data Structure Language (ADSL); adaptation under multi-level approaches considering lightweight, locally optimized vs. intra node vs. regional. And all these aspects might require relying on different programming models.

2.2.2.3.4 IoT, CPS, and Embedded Systems According to Gartner, they identify that IoT and the device landscape will shape the future of businesses through 2020. In this list35, the first thing to note is that they identify the merging of the physical and virtual

35





of 173

worlds and the emergence of the digital mesh encompassed by multiple and heterogeneous types of hardware. Then they recognize the emergence of algorithmic business, based in the capacity of gathering and analysis of data, in which will happen a lot of things in the background in which humans are not involved, but smart machines. In both cases Smart Devices and Hardware, Mobile, IoT, CPS and Embedded Systems play a relevant role. Thus, establishing a new IT reality needs new architectures and platforms trends to support digital and algorithmic business.

In this vision, the combination of smart machines, algorithms, analytics, data architectures and IoT platforms have the potential to revolutionize businesses. The firm predicts the emergence of an entirely new class of business models based on technologies for smart machines, advanced analytics and big data that encompass advances from other technologies fields such as: the advances and adoption rates of Internet of Things (IoT) sensors and supporting data models; Big Data and machine learning technologies with advanced intelligence to interpret and Computing capabilities that enable acting on the data.

The concept introduced by Gartner, device mesh, will expand to include IoT/CPS/Embbeded-based devices that scale well beyond the enterprise walls in the next years. In a post-mobile world in which traditional computing and communication devices, that used to consider desktop and mobile devices, are now augmented by new smart devices (wearable, home electronics and appliances, sensors, etc.) all capable of capturing, and some even capable of processing and analyzing, data in real-time.

This concept creates the foundation for creating new business models based on new outreach in the enterprise boundaries of influence assuring continuity for services across boundaries of device type, time and space; and providing much more enriched ambient user experiences. The experience seamlessly flows across a heterogeneous number of smart devices and the interaction between channels melting the frontier between physical, virtual and electronic environments with users changing from context to context, and everything producing, using and transmitting information. Therefore a digital mesh, encompassed by smart devices, services, platforms, informational networks and individuals, will continue to proliferate. This scenario will require aligning apps and devices to individuals’ specific roles and tasks, thanks to providing contextual intelligence and enabling greater collaboration with new applications which impose new designs and approaches.

The proliferation of algorithm-based businesses enabling automated background tasks including smart machines, predicted by Gartner’s technology trends for 2016, set a solid foundation for the growth of other platforms besides pure cloud platforms. In addition, other technologies need also to advance (e.g. more robust analytics and predictive modeling systems; support unstructured data analysis including latent semantic indexing (LSI), data taxonomy and classification algorithms to ensure data fidelity and scalability); which can in fact be provided by cloud computing power; or by now more powerful local devices. Machine learning advances will gives enhance these experiences even more, establishing the foundation to act in semiautonomous (yet) manner.

In this context the digital mesh of smart devices require to run at speeds of greater than a teraflop with high-energy efficiency in order to make them viable for IoT devices and organizations. This is being achieved by using novel architectures of heterogeneous hardware, especially with SoC and FPGAs that provide significant gains such as high-powered and ultra-efficient. Another factor is that they are packed into small spaces, which allows to proliferate advanced machine learning capabilities into the tiniest IoT/Fog devices, and consumer devices in varies areas (wearables, homes, cars, etc.).

In the mobile and IoT world, applications are not built how like they used to be in the past. They don’t scale well if using e.g. three-tier architecture, they need a more loosely, distributed and at the same time, integrative approach. In this context apps and services architectures are winning the battle providing flexibility, agility and Web-scale performance. Besides this software-defined application services approach, another approach that is also starting to shape the future is Micro-Service architectures, which is emerging and helping build distributed applications that support agile delivery and scalable deployment, both on-



of 173

premises and in the cloud. In this landscape, containers are emerging as a critical enabling technology, and becoming more and more popular leveraged by its traction in cloud environments empowered by the DevOps and Agile development movement.

Now that organizations are seeing the potential of new business models in the Post-Mobile world of IoT devices and user experiences, application teams must create new and much more flexible, dynamic and modern approaches and architectures. And bringing mobile and IoT elements into the app and service architecture creates a comprehensive model to address back-end cloud scalability and front-end device mesh experiences.

Todays’ landscape forces enterprises embracing the IoT to the difficult task of developing an IoT platform strategy, since the market is very fragmented and there are multiple, and yet incomplete, competing vendor approaches without any standardization roadmap compliance. IoT platforms are the ones in charge of providing all work that IT needs behind the scenes from an architectural and a technology point of view. IoT platforms are the ones that make the IoT a reality providing basic capabilities like management, integration, data, security, and other technologies and standards of the IoT platform, for building, managing and securing elements in the IoT world, enabling the interactions of the digital mesh and ambient user experience.

Fog Computing, is one of these novel approaches that represents an extremely significant evolution in Cloud Computing, and in computing in general to solve the need’s for the Post-Mobile World, in which necessary agility and flexibility needed for Big Data applications taking the form of the IoT; and its low or no latency requirements, and no-latency requirements introduced by a world of enriched user experiences, must be properly addressed. Its emergence emphasizes the emergence of decentralized models that enable more flexible and agile computing models than traditional centralized approaches. Fog Computing may not prove a panacea, at least at this early point, but at least recognizes and attempts to address many of the limitations in data and response latencies as Big Data continues to grow and more and more interactive scenarios are envisioned by organizations, that extend the Fog to more contexts. In this regards

Fog is starting to embrace modern heterogeneous hardware that add higher computing power capabilities and require less energy, an aspect that might sound obvious, but is really important since it helps reduce low energy availability constrains imposed by devices at the edge of the Fog, and essential to create this enriched experiences. However, Fog yet requires more maturity and taking platforms and programming models into a next level in order to ready for mass adoption.

2.2.2.4 Legislation and Regulation

There are a number of expected regulations to appear in the horizon in both areas, IoT and Energy. However, this section does not pretend to be a full extension analysis of regulatory conditions since it’s outside the scope of the project and the task itself.

However we identified a number of regulations at European level, like the European Union 2020 Energy Strategy includes Europe’s 20% Policy36 which sets 20% targets for renewable energy, greenhouse gas reduction, and energy efficiency for 2020, that try to foster the reduction of energy consumption or the use of greener energy sources. And all these principles are well aligned with TANGO core objectives.

Also, in the scope of IoT related regulations, there are reports and analysis documents that fully cover all of the implications of regulations and legislations in terms of the IoT at this point. Again, this is outside the scope of the project. However, an important aspect that all studies cover is the relevance of

36

https://ec.europa.eu/energy/en/topics/energy-strategy

https://ec.europa.eu/energy/en/topics/energy-strategy



of 173

Privacy and Security, specially in large-scale IoT deployments, in which without adequate security in the full chain of IoT systems, there is a huge potential risk for intentional access of potentially sensitive personal information, and for using vulnerable devices to attack local other devices, and accessing networks and applications or smart cities systems in both in public and in private sector contexts. Regulators suggest that IoT companies should follow a security and privacy “by design” approach, building security and privacy functionality into the device from the outset of the development process, when it is much more likely to be effective.

Therefore, Security and Privacy issues are something that the project should keep monitored and include an approach in the architecture to follow this security by-design principle. In the State of The Art section we provide more details about the approach that will be followed by TANGO on this regards in the future.

2.2.3 Business & Market Landscape Analysis

We included in section 2.5 of the deliverable “D2.1 Dissemination and Communication Plan” (M3) the results of the initial approach to the market, result of the initial work done in business analysis task. We refer the reader to take a deeper understanding of TANGO perspective. The document includes important sections such as section 2.5.1 Stakholders, which provides a list of potential stakeholders within the range of the previously analyzed markets. Also includes, section 2.5.2 Value Proposition, Positioning and Message, which describes the problems that TANGO tries to solve in these markets, and the project positinioning.

However, we include here the results of the work carried out by the business tema that has been done after that deliverable was submitted, that includes a preliminary comparative analysis of current solutions, which will be updated and used during the project.

2.2.3.1 Competitors, Substitutive or Existing Solutions

2.2.3.1.1 Existing solutions This section is a list of tools that help solving only a part of the problem covered by TANGO. Some of them may become underlying components of the framework.

Compilers or tools for specific technologies

Various HLS compilers that enable generation of FPGA cores from C or C-style code (Includes Vivado HLS from Xilinx, Stratus HLS from Cadence, Synphony C Compiler, from Synopsys and CyberWorkBench from Aldec)

CUDA (from NVidia - Language, compiler and tool set for NVidia GPU parallel programming, with connection to the CPU)

Tools and standards for parallel programming on non-heterogeneous platforms

OpenMP (standard for parallel programming on shared-memory platforms – currently consider extending towards heterogenous platforms)

MPI (Message Passing Interface – Inter-process communication protocol for parallel programming on distributed platforms)

Intel composer (Optimising compiler with Intel Threading Building Blocks library – This compiler also supports OpenMP and includes support for Xeon Phi)

Various programming languages (or extensions) based on the Partitioned Global Address Space concept (like Unified parallel C, Titanium, X10, Chapel, etc.)

Tools and standards for workload management and task scheduling

SLURM (workload manager for Linux clusters)



of 173

IBM Platform LSF (workload manager for heterogeneous, local or distributed HPC platforms – include support for GPU or Xeon PHI based platforms)

Grid Engine (or Open Grid scheduler – Batch-queuing system for distributed resource management)

MAUI (From Adaptive computing - Job scheduler for clusters and supercomputers)

MOAB (From Adaptive computing - Workload management for massive scale, multi-technology HPC)

TORQUE (From Adaptive computing - Resource manager for distributed systems)

PBS Works (From Altair – workload manager and job scheduler for HPC)

SUN grid engine (Formerly from ORACLE and SUN, now belonging to UNIVA)

UNIVA grid engine (Distributed resource management for data centers)

LSF (Platform Load Sharing Facility – Workload manager and job scheduler from IBM – OpenLava is a GPL distributed derivative of LSF)

Power Management and Control

POWERAPI (specification of an API for power monitoring and management at process level)

GEOPM (Global Energy Optimization Power Management – A power management framework for performance computing)

Distributed system simulators

Simgrid (From INRIA: behavioral simulation of large-scale distributed systems)

Development tools

Eclipse Parallel Tools Platform (extension for the Eclipse development environment that supports the development of parallel applications based on various technologies)

2.2.3.1.2 Tools for programming on specific heterogeneous platforms Following tools are closer to a framework like TANGO, but target specific platforms and are less generic than frameworks presented in the next section.

SDSoC (From Xilinx - Framework for C/C++-based CPU/FPGA parallel development on Xilinx Zync platform – based on Vivado HLS)

SDAccel (From Xilinx - Framework for generic C/C++-based (+ OpenCL) CPU/FPGA development from Xilinx – probably based on Vivado HLS)

PLDA quickplay (Framework for generic C/C++-based (+ OpenCL) CPU/FPGA development for Xilinx and Altera)

Maxeler management and compilation tools for Maxeler Data Flow Engines (Tools dedicated to HPCs based on Maxeler hardware)

2.2.3.1.3 Frameworks and Standardization Efforts This section lists frameworks or standardization actions aiming at making parallel programming on heterogeneous platforms easier or more efficient.

Frameworks or development environments



of 173

HSAIL-based ecosystem (HSAIL is an intermediate language to ease compilation and tooling for heterogeneous platforms, defined by the HAS consortium)

Dyplo (From TOPIC embedded products - framework for dynamic process loading and management on CPU/FPGA platforms – currently limited to Xilinx Zync)

StarPU (Graph based programming environment for CPU/GPU hybrid systems)

PGI Compilers and tools (from Portland – Some of them based on the PGI Accelerator Programming Model – Also supports other standards like MPI and OpenMP – Targets CPU and GPUs)

OmpSs and related software chain (from BSC - OpenMP extensions to support asynchronous parallelism and heterogeneity. BSCprovides run-time and compilation software)

COMPSs (from BSC – Programming model for distributed infrastructure such as clusters, grids and clouds. BSC provides run-time software)

Standards, standardization trials or organization

OpenACC (Compiler directives for parallel and heterogenous platforms)

OpenCL (Open standard for parallel programming of heterogeneous systems)

HSA consortium (Heterogeneous platform architecture, programming language and intermediate language)

PGAS “group” (Partitioned Global Address Space). This organization promotes a programming concept, but this one seems not having a single implementation.

2.2.3.2 Comparative analysis

Framework

Stand

ard (std

) or so

ftware fram

ew

ork (sw

)

Free softw

are available

Co

mm

ercial softw

are available

Co

mm

ercial integratio

n p

ossib

le (1)

Emb

edd

ed targets (3

)

Clu

ster sup

po

rt

HP

C targets

Intern

et distrib

uted

targets

GP

U su

pp

ort

FPG

A su

pp

ort (2

)

Asyn

chro

no

us p

arallelism su

pp

ort

Sched

ulin

g and

wo

rload

man

agemen

t

Energy aw

are (4)

job

sched

ulin

g and

resou

rce man

agemen

t

HSAIL std

Dyplo sw

StarPU sw

PGI sw

OmpSs sw

COMPSs sw



of 173

OpenACC std

OpenCL std

TANGO sw

Not available

Available or foreseen or intended

Implicit through a run-time

Seems intended for these targets, but does not include explicit support

No information available, not relevant or possible but not explicitly supported or available

- (1) This corresponds to licencing terms that allow integration into a propietary product without the needto disclose the code nor to make the application free software. Free software under the GPL is not suitable for integration in propietary software.

- (2) FPGA synthesis software is usually commercial software. Free software frameworks that support FPGA may require somecommercial software being present to enable this support.

- (3) This means that the software is not intended for clusters or HPC centres only. Software that can provide benefits on a low-end PC is considered as suitable for embedded targets.

- (4) This means that the software includes tools or scheduling policies that takes energy consumption into account.

Table 1: Comparison chart among existing platforms and target TANGO objectives

Disclaimer: Purpose of this chart is to show the relevance of TANGO objectives, not to perform a comparison among existing products nor to present these products. The accuracy of this information cannot be guaranteed. Most of this information has been collected from the Internet (it is not collected from the provider nor from the product documentation). Moreover some aspects of the existing products may evolve with time and the information may not reflect the current state of the products.

2.2.4 Market and Business Analysis Conclusions

After the previous analysis of the market of this section (2.2), under the perspective of pure markets trends (2.2.2.1), business-driven trends (2.2.2.2), IT trends (2.2.2.3), and legislation trends (2.2.2.4), we can extract a few final conclusions:

- Energy is a problem, a huge problem. The world has a problem scaling power supply to meet the growth of the digital economy. In datacenters, there is always the option of increasing computing power by adding more cores, servers, or processors, with a significant increase in power consumption. But this implies that there is a maximum capacity, unless you plug in more power into the facility; and properly decrease the heat (which requires more energy); and you have endless available floor space. It seems to create an issue establish since there is not endless energy, floor or air-cooling capacity. Heterogeneous hardware is now seen as the sustainable solution to compete in the exascale race.

- The world is heterogeneous. The market has embraced heterogeneous hardware, and it is getting more and more competitive with players like Altera and Xilinx competing on acceleration performance with different technical approaches on their programing tools, with different strategies for different audiences. Altera’s strategy appears to pursue software engineers, while Xilinx’s strategy appears to pursue hardware engineers. However, what’s more interesting is that the market leader in CPU, Intel,



of 173

has now announced that they plan to produce versions of their Xeon processor with Intel FPGAs inside, and could render both of these approaches obsolete. Heterogeneous solutions are now being adopted as the best affordable solutions to cope with contexts like BigData and HPC.

- There are big, and small things. This competitive market of heterogeneous hardware is also expected to grow, and bring even more players in the equation. It is expected that the IoT will soon be driven by FPGA-like devices. These devices can provide lowest power, lowest latency and best determinism while very easily interfacing with the outside world (temperature, pressure, position, acceleration, analogue-to-digital converters (ADCs), digital-to-analogue converters (DACs), current and voltage, among other sensors). However, power, cost, complexity and limited space are some of the biggest challenges facing IoT implementation on a large scale. The best way to overcome those challenges is to combine multiple functions into a single chip like FPGAs+CPU combined resulting a system-on-a-chip (SoC).

- Programming for heterogeneity is a challenge. This idea is not new, it’s been there for decades, but it needs highly specialized engineers, and time, to create applications of this type. Heterogeneity provides a lot of benefits including FPGAs, GPUs, DSPs, etc. especially the power part. For instance, GPUs provide the same computing power than a CPU and require the same amount of power. And FPGAs shine since they provide the more processing that require less power. But the problem is about programming for this type of devices. And this is when middleware for efficiently and securely abstracting underlying hardware architectures to facilitate the life of developers shine. In this regards, industry vendors have taken the step forward and have been talking for a while about their support for various languages and tools that allows code written in kind of parallel-friendly programming dialects that target each vendor hardware.

- Embracing heterogeneity is a challenge, and an opportunity. The IT industry in general, and more concretely the IC industry, as the supplier side, is interested in growing the production and adoption of new hardware while at the same time investing in developing technologies that help to fully embrace heterogeneous hardware which allow performance improvements and bring energy reductions. In the demand side, the software industry is interested in facilitating the creation and execution of applications that require high computing power; and the datacentre industry is interested in providing and operating infrastructures in a much more optimized, efficient and automated way, requiring technologies that leverage the use under today’s paradigms and trends such as DevOps, Agile, etc. expected by todays organizational needs, facilitating its adoption.

Therefore, as identified by various analysts, having IoT, HPC, datacenter markets and energy issues as some of the main drivers found in the market in the top list of current’s CIO budget spending intentions, TANGO is really well positioned to provide impact by positioning on Simplifying & Optimizing Heterogeneity.

Moreover, TANGO will do so by creating a toolbox that includes runtime, optimization, modeling and development tools to express heterogeneous hardware capabilities, which implies transparent adaptation of application execution upon heterogeneous architectures; and that includes a number of optimized energy efficient techniques for heterogeneous architectures that increases power and computing efficiency of implementations (asynchronism, task management) providing simple deployment and efficient (optimized) use of resources at runtime.

TANGO approach assures addressing these conclusions by providing the means to:

- cope with energy problem providing optimization techniques for application and datacentre operations

- use heterogeneous hardware and architectures for IoT, HPC and datacenters growing markets

- simplify heterogeneity and the use of disparate hardware solutions with the proposed architecture that will provide broader support for a wide range of heterogeneous parallel device resources with the



of 173

abstraction of new available hardware, from small (embedded e.g. for IoT/Embedded/CPS, etc.) to large (HPC datacentre environments) and with varying architectures (SoC, CPU, GPGPU, FPGA, etc.).

- abstract and simplify programming and operation for heterogeneity with the release of a toolbox. Moreover, the reference architecture to be implemented as part of TANGO will go beyond the current state of the art by tackling self-adaptation of both heterogeneous parallel devices and the applications that make use of them using a wider range of optimization criteria (energy consumption, cost, time criticality) while it addresses important aspects such as security and privacy.

- democratize the use of heterogeneous hardware and the creation and operation of applications. Simply put, with these needs and trends, enabling developers with TANGO to create applications in a simplified way, embracing technologies that help them forget about heterogeneity will not require rocket science engineers to create applications. Moreover, it will enable applications to be designed and executed with energy optimization techniques built-in will help better reduce energy constrains.

To cope with these market needs and trends, all these aspects are then translated into what TANGO has identified as the main innovations of the project at this stage, which are later analyzed in full detail in the State of The Art section, in part 2 of this deliverable, which also provides details of the relevance for our project to advance and provide a valuable solution for all these needs and trends with the approach proposed by TANGO that includes:

- Architecture Support for Low Power Computing

- Handling Quality Properties in the Software Development Life Cycle for Customised Low-Power Heterogeneous

- Programming Models and Run-Time Management techniques for Heterogeneous Parallel Architectures

- Modeling tools for prototyping with software emulation/simulation of heterogeneous parallel architectures

- Monitoring of heterogeneous architectures

- Workload management techniques for heterogeneous architectures

- Security and Privacy



of 173

2.3 Market Analysis Conclusions

In this second part, we have presented an initial version of the project vision on the market, providing details of markets involved, its main drivers and trends, and stakeholders that the project will need to monitor during the project life in order to elicit valid outcomes and potential impact routes.

We see that the IT landscape is shaping its future right now and that TANGO is well aligned with current trends and expectations for the future, with a potential market ahead that will develop around the opportunity not only to exploit the capabilities provided by new heterogeneous hardware being applied into novel fields such as IoT, Big-Data HPC and others, but also by influencing with its valuable technologies and development tools. It is worth mentioning the expected influence in domains like IoT, CPS, and Embedded Systems and HPC computing, thanks to its approach that simplifies heterogeneity, and facilitates developers daily tasks, a key aspect used to position the project and to be used by the Dissemination and Communication Team to promote and engage with targeted audiences.

However, the project will put in place a continuous market watch and assessment to monitor relevant changes that could affect our project positioning, and an updated vision of this initial picture will be included in future deliverables.



of 173

Part 3. State of the Art

3.1 Introduction

In this part the state of the art and the progress beyond the state-of-the-art of TANGO is discussed in related thematic areas, thus describing the innovation potential which TANGO represents. It gives a thorough definition of the state of the art, an indication of the envisaged progress beyond the state of the art and the baseline for its research. It also identifies challenges and action items that have to handle throughout the course of the project to achieve the objectives defined in the Description of Work. These tasks and challenges give an insight into the project expected outcomes.

3.2 State of the Art structure

The structure of this part follows the full service lifecycle of an application running on Heterogeneous Parallel Architectures.

There are three fundamental steps in the application lifecycle: development of the application, deployment of the application, and operation of the application. The lifecycle starts when a developer implements an application (using a Programmin Model) and describes the functional and non-functional requirements necessary for deployment and operation of this application. Once the application is deployed and running, Key Performance Indicators (KPIs) including power consumption/energy efficiency are monitored. TANGO architectural components act upon them, which may require self-adaptation (planned in Year 2).

An overview of seven main topics covered by the project is given, each topic is also discussed from the project point of view and scope, which leads to the identification of specific technical requirements to be addressed by the project.



of 173

3.3 Architecture Support for Low Power Computing

3.3.1 State of the Art

There have been several previously proposed architectures to support low power computing, including those in research projects such as: ALMA [11], 2PARMA [2], PEPPHER [3] [4], EXCESS [5], P-SOCRATES [6], FiPS [7] [8], HARPA [9] and ADEPT [1].

The PEPPHER [3], [4] architecture provides a programming framework for C++ applications that targets heterogeneous many-core processors with the aim of ensuring performance and portability. This is achieved by utilising implementations variants for different types of hardware, which are then selected at runtime. Variants can themselves be parallelized in the most suitable framework. The creation of variants was partly automated by libraries created by “expert” programmers with the use of transformation and compilation techniques. The 2PARMA [2] adopted a different approach by using bytecode with a final stage of optimisation to target more specific architectures. It also includes a runtime management component that adapts the code while it is running. An outcome of this project was the Barbeque Open Source Project [10].

It is common for these projects to target specific hardware environments that are heterogeneous. ALMA [11] for example utilises annotated MATLAB and Scilab code, to target two very specific architectures, namely Recode and KIT Kahrisma. The ALMA project resulted in a spin out company Emmtrix whose products generate embedded C/C++ code from MATLAB and Scilab code [12]. The FiPS project [7], like TANGO, recognises the benefits that the convergence of HPC and embedded systems technology can create. Its goal is to improve the power performance ratio within data-centres by integrating FPGAs and other accelerators in high-performance and low-power heterogeneous computing servers, with a focus on the RECS server architecture.

The convergence of HPC and embedded systems has also led to the acceleration of regular HPC systems such as the usage of GPUs. This leads to challenges of restructuring frameworks or individual programs to cope with the parallelism. Glasswing [13] is one such example that is a Map Reduce framework that utilises OpenGL and GPUs for acceleration while simplifying the interface to a simple API, which has not been restricted to device specific issues such as the size of the cores memory. Similar issues apply when considering accelerators as a means of providing low power computing. GPU based processing tends not to consider power, where as other stream processor architectures have considered such issues [14].

Aspects of power and time criticality have also featured in the literature. The P-SOCRATES [2] project examines time-criticality and parallelization challenges for executing workload-intensive applications with real-time requirements on top of commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The ADEPT [1] project developed a tool that guides software developers and help them to model and predict the power consumption and performance of parallel software and hardware.

Time criticality and energy are important in many fields of computing, such as Embedded Systems (ES) and HPC. Several past projects have attempted to bridge the perceived gap between these two fields. The EXCESS project [5] in this regards looked at programming methodologies to drastically simplify the development of energy-aware applications over a range of computing systems whilst considering performance. The HARPA project [9] and its architecture is another example of this with the overall aim of providing efficient mechanisms to offer performance guarantees in the presence of unreliable heterogeneous systems. It provides proactive and reactive adaptive mechanisms, targeting both embedded and HPC-based systems.

Euroserver [15] aims to find power-efficient solutions for the future datacentre. It is addressing these challenges in a holistic manner: from the architecture point of view, investigating the use of state-of-the-art low-power ARM processors, taking into account the memory and I/O, all managed by new systems



of 173

software providing transparent system-wide virtualization and efficient resource use by cloud applications. The project aims to build a prototype that will demonstrate how the proposed approach can lead to 10x DC energy efficiency by 2020.

3.3.2 Relevance for TANGO & Progress beyond the SotA

The reference architecture to be implemented as part of TANGO will go beyond the current state of the art by tackling self-adaptation of both heterogeneous parallel devices and the applications that make use of them using a wider range of optimization criteria (energy consumption, cost, time criticality). Furthermore, the proposed architecture will provide broader support for a wide range of heterogeneous parallel device resources from small (embedded) to large (HPC datacentre environments) and with varying architectures (SoC, CPU, GPGPU, FPGA, etc.). This support will not only be limited to the runtime environment but also filter up the stack to enable device agnostic deployment and provide capabilities to an application developer, through a range of fully integrated software engineering tools (design time modelling, profiling etc.), which are energy aware.



of 173

3.4 Handling Quality Properties in the Software Development Life Cycle for Customised Low-Power Heterogeneous

This section is made of two parts: a generic introduction on the modelling of hardware and software to express requirements that must be met by the mapping, as well as relevant terms of the objective function to optimize. The second part is focusing on how the actual mapping of software onto hardware can be optimized, and focuses on this problem as a scheduling problem. This scheduling can be performed at run time or design time, but this section mainly focuses on the optimization that can take place at design time.


In the last decade, methodologies and technologies to eliminate or reduce hardware lock-in have emerged. Notably, OpenCL has the ambition to facilitate compiling annotated C code to various hardware targets ranging from multi-CPU, CPU+GPUs, FPGA among others [16]. OpenCL specification are led by the Khronos Group, an industry consortium with the largest heterogeneous hardware creators in the World such as Apple, Intel, Qualcomm, Advanced Micro Devices (AMD), Nvidia, Xilinx, Altera, Samsung, Vivante, Imagination Technologies and ARM Holdings. Another actor, namely, the Heterogeneous System Architecture (HAS) Foundation [17] led by AMD with many other hardware and chip makers also contribute to opening hardware by providing open specification for building HAS compatible chips combining CPU with many GPUs based on a shared (or coherent) memory architecture. In parallel, technologies such as Simulink, MatLab and open source alternative like SciLab or Octave have been used to prototype rapidly numerical simulation dear to the eScience and Engineering community as well as signal processing required in all cyber physical systems [18] [19]. Various initiatives have focused on integrating these prototyping technologies to use OpenCL either in a hidden fashion or explicitly by providing new OpenCL commands to be invoked by the users in their models [20] [21] [22].

Independent of the domain, the increasing heterogeneity of infrastructure from set of programmable hardware devices to large heterogeneous clusters offered to users make the task of shaping a software to exploit the capabilities of infrastructure daunting. In particular, when the infrastructure owner must also keep the control of its infrastructure, while several software applications or jobs must be run concurrently. In such a case, the owner of the heterogeneous infrastructure must have the prerogative to freely assign a computing job to one type of hardware or another while on the other end, the owner of software applications or jobs will still expect reliable results and deadlines met while not jeopardising the security of data. Exploiting new hardware under increasingly complexity demands from eScience, Engineering, IoT and Big Data now requires scientists and engineers to rely on proven requirement engineering methods to explicitly model goals and problems being solved as well as the environment in which a solution will be developed. The Requirement Engineering field initiated in the 90’s has developed modelling techniques and tools used successfully for mission critical system development projects, for example, Goal oriented requirement analysis with methods and tools such as KAOS, i* or GRL, problem analysis method such as Problem Frames [23] [24] [25]. These diverse requirement engineering methods and tools used in development project targeting heterogeneous hardware will serve development team expressing explicitly the trade-offs between quality properties to achieve by the system to develop. These quality properties will then guide searching the solution space to find how to best distribute software elements across the different computing hardware to identify the optimal or nearly optimal solution. Researchers have also proposed approaches to relate requirements and software architecture models, in particular, to identify generic related patterns between requirements and software architecture models [26] [27]. However, requirement analysis method and approaches to relate requirement and software architecture did not consider the context of heterogeneous hardware. This adaptation is not trivial as it requires to capture the specification of the heterogeneous hardware environment. Furthermore, when programmable hardware such as FPGA is used then the specification of how to use this programmable hardware is so open that exploring how to best make use of it will require incremental approaches. For example, Simulink or similar modelling languages could help to specify the various hardware layers, their capabilities, constraints



of 173

and existing reusable hardware configuration available to solve various kinds of sub-problems such as fast Fourier transform or other filters. Models specifying the capability of various hardware nodes and of data exchange across node will provide crucial information when searching the solution space. Some frameworks, such as PREESM [28] [29], developed within COMPA project [30] have explicitly targeted the modeling of distributed applications over embedded heterogeneous hardware architectures. The tool allows to simulate and generate code for target hardware, leveraging a S-LAM model (System Level Architecture Model). S-LAM provides a high-level architecture description through modeling of processing nodes, communications nodes and data links. It allows easily identifying bottlenecks (latency, memory) and sub optimal resource loads. However, the tool is designed for a specific type of applications that would comply with the PiSDF (Parameterized and Interfaced Synchronous Datafow) dataflow MoC (Model of Computation). This is typically the case for dataflow oriented signal processing applications. Furthermore, while PREESM allows for automatic mapping and scheduling, it requires the user to provide an input simulation scenario that specifies the mapping constraints, the simulation timings and parameters and to choose the transformations that can be applied. For code generation, the tool rather assembles chunks of manually implemented code that match the “actors” functionalities as modeled in the MoC. There is clearly a need for more developed hardware modeling approaches that allow dealing with more system attributes than only latency and memory, that could manage different levels of abstraction and wider application scope. In addition to language for modelling requirements, design approaches to help system and software analysts move away from the traditional programming approach tailor to sequential single CPU hardware. Although a fully automated program analysis and rewriting to exploit heterogeneous hardware is desirable, the shape and paradigm used for shaping the design model and the source code of a program will heavily influence how easy or hard it may be to exploit certain types of heterogeneous hardware. Programming models envisaged by POLCA [31] will help developers with new annotations to specify potential dataflow parallelism opportunities in their code and subsequently, the POLCA tool chain will transform the code automatically to exploit its parallelism extensively. Other “more” explicit parallel programming approaches, such as FastFlow [32], require user to write a code that complies with predefined set of high-level patterns for parallelism that the framework is able to process and transform toward the supported target platforms. From input code perspective, the various parallel programming approaches would fit for specific developer profiles. However, they rely on tools that are able to handle significant part of the complexity of efficiently adapting code to target infrastructure. Hence, the involved tools should continioulsy address the critical trade-off of being both enough generic and still performant for a wide range of hardware. At the moment, new architecture and design approaches appropriate to the new heterogeneous hardware have not been extensively studied for new type of application families tied to the Internet of Things and big data where a multitude and massively distributed set of heterogeneous hardware technologies will need to cooperate efficiently to run complex IoT and Big Data applications of tomorrow.


The requirement modelling technique explored in TANGO will enable to express variability on quality properties trade-offs and identify requirement patterns on these trade-offs in relation to contexts of various application families. Furthermore, an incremental approach to architecture specification will be used for modelling hardware though developing software design models. This will facilitate the exploration of solution space in heterogeneous hardware environments, enabling development teams without knowledge or extensive expertise on how to program such environments. The modelling technique explored in TANGO will also study patterns to express requirement variability and architecture patterns to allow for self-adaptation and re-configuration at runtime to exploit the full potential of the underlying heterogeneous hardware potentially executing a variable number of applications at the same time.



of 173

3.4.3 Optimizing the scheduling of software on hardware

Optimizing the allocation of software onto large heterogeneous hardware is a well-known problem. It has been encountered in the context of embedded systems as well as in the context of large computer farms running software batches.

Various techniques can be deployed to optimize this mapping, depending on the available information, and resulting in various trade-offs involving notably the run time of the optimization process, and the quality of the delivered mapping.

The run time of the optimization process notably determines whether an optimization can be performed on-line (during the execution of the considered software) or off-line (before the execution of the considered software).

3.4.3.1 Requirements

We propose here below to review the classes of non-functional requirements that can be encountered on such optimization. These are focusing on the context of the mapping, and not so much on the actual mathematical problem:

Structure and granularity of the considered software

Information on the task resources and duration

Complexity of the optimization problem

Delay and lifetime of the optimization solution

3.4.3.1.1 Structure, and granularity of the considered software Software can be structured in various ways, notably depending on the algorithm they implement, and based on the granularity of the considered model. Three structure of such software are:

Task models where the software is divided into a set of tasks, and precedence constraints between

them. Typically, all communication of tasks is performed at startup (input) and on completion

(output) so that such communication constraints can be captured by precedence constraints.

Petri nets that similarly to task model capture tasks and precedence, but support the notion of cycles

in the task graph.

Communicating permanent tasks that perform communication with other tasks throughout their

execution. In such model, the notion of task is not the same as in the above models, and some

mapping between the two models can be found, with a change in granularity: communicating tasks

are expanded into a set of little tasks that are structured into e.g. some task model presented

above.

Task models are suitable to perform offline optimization, since they do not incorporate cycles, so that a single schedule specifies only the start date of all task, possibly as condition on the termination of other tasks.

Petri nets are slightly more difficult because they incorporate cycles, which are potentially unbounded in time and in number of tasks that can be executed in parallel, so that the configuration of task to execute at some point can differ a lot from the configuration encountered at another point of the execution, although they are the same tasks. The boundedness of petri net is a decidable property that can be assessed through static analysis.

Communicating tasks is a model that does not allow any time-related optimization because tasks are permanently executing throughout the run of the software. The optimization of such model is more related to the choice of the appropriate hardware, and the reduction of the network overhead by locating tasks that communicate together on hardware nodes that can communicate easily with one another.



of 173

3.4.3.1.2 Additional aspects of the software and additional objectives Software can exhibit additional requirements. Capturing them into the optimized model can improve the quality of the resulting real execution, but might also increase the complexity of the optimization. These other aspects include notably:

Additional decision variables such as hybrid hardware, or frequency scaling

Data transfer cost, at the beginning and end of the execution

Configuration cost of hardware nodes, beyond data transfer

Sharing of the considered hardware platform among several software application that execute

concurrently, and compete for some resource usage

Deadline on some task execution

Other objective functions such as energy consumption and cost of the mapping (depending on how

the cost is to be computed, actually)

Other constraints, such as security of some communication, preventing some data to transit between

some nodes, or security concerns of some computation node that cannot execute some tasks even

though they are able to do so

3.4.3.1.3 Precision of the information on the task resources and duration The typical information that is needed about tasks is their resource usage, notably their memory, their duration, and the amount of data they will input and output. Besides, some technical information that restrains the type of processing unit is supposed to be known in advance.

The duration of the task is a critical piece of information to perform any optimization of the scheduling. When this information is perfectly known, a schedule can be computed and executed as is. When this information is known with less precision, a static schedule might lead to suboptimal result when executed as is.

Imprecisions can be accounted for in two ways. A first option is to incorporate information about this imprecision in the computing of the schedule, so that a robust schedule can be computed, that might lead to better overall result upon execution, compared to a non-robust one. A second option is to consider online optimization of the schedule, where the schedule is computed gradually throughout its execution, so that it can be adjusted to the actual situation once it is known with more precision. Of course, the first option of robust schedule can be incorporated when performing on-line scheduling.

The higher is the precision on the task model available offline, the better quality will be reached upon execution of an offline-generated schedule, and so the more relevant it is to produce off-line a high quality schedule. The higher the imprecision, the less opportune it is to focus on producting offline a high quality schedule and the better it is to focus on efficient handling on the imprecision, notably through online scheduling. Besides this the runtime of the scheduling algorithms must also be considered when selecting the scheduling stratergy.

3.4.3.1.4 Complexity of the optimization problem Given an optimization problem, some instances are more difficult to optimize properly than others.

For instance, the Job-Shop scheduling problem is a well-studied optimization problem, with standard benchmarks. The MT10 instance is a rather small one, but that turned out to be especially difficult to optimize, probably because the duration of the tasks in this instance exhibit very different order of magnitudes.

The size of the optimization problem can also influence on the choice of the selected optimization algorithm. Typically, the choice starts with constraint programming for the smallest instances, then passes to local search (e.g. iFlatRelax) and large neighborhood search, and ends up with greedy approaches for the larger instances.



of 173

3.4.3.1.5 Delay and lifetime of the optimization solution All optimizer implement a trade-off between the run time of the solver and the quality of the delivered solution. Typically, if more time can be invested in the computation of a solution, the better the solution will be, possible by using another optimization algorithm. If delay is a critical apsect, one must therefore ensure that the selected algorithm provides an accepteabe answer within the accepteable delay. Also, some agorithms might descend faster than others at the beginning or might have difficulties in improving their solution if they are given more time. A timing analysis or a calibration might therefore be necessary before selecting one of them.

3.4.3.2 Approaches for solving scheduling problems

Approaches for solving scheduling problems can roughly be divided into three classes, namely: greedy approaches, meta-heuristic approaches, and exact approaches.

Backfilling is a greedy approach. It is an extension of a constructive strategy such as first-come-first-serve, where holes left by the constructive strategy can be filled by new jobs, as long as they fit within the size of the hole. Various tunings are possible, related to the constructive part, as well as related the strategy to fill the holes left by the initial construction. It can not only be used as a static scheduling, but also as an online scheduling strategy, to react to imprecision on the run time of task.

Iterative flattening and relaxation is a well-known meta-heuristic procedure for solving JobShop scheduling problems. JobShop scheduling problem is the fundamental scheduling problem supporting the notion of tasks, with known duration, and precedence constraints. The goal is to minimize the end time of the last executed task. Tasks use a known amount of resource, and release them on completion. The iterative flattening and relaxation algorithm proceeds by repeatedly performing a flattening, and a relaxation. The flattening introduces additional precedence constraints, in order to resolve resource constraints. Once all such conflicts are resolved, the schedule is so-called feasible. The relaxation proceeds by removing some of the added precedence constraints that are found one the critical path. The initial concept was proposed in [33], improved by performing several relaxations in a row in [34], and various improvements have been proposed around this procedure, notably by introducing some taboo on the precedence, to force the exploration of alternative schedules [35].

Flexible JobShop is a more comprehensive formulation of the JobShop problem where additional decision variables are introduced, and must be handled by the optimization algorithm as well. A notable extension is to add the possibility to select the most appropriate resource for each task among a set of given ones. In the context of TANGO, this would allow introducing the notion of hybrid hardware. Such optimization problems are still actively researched, but some solutions have been proposed, notably based on meta-heuristic approaches that were proved working for the simple JobShop problem [36].

Constraint programming is an exact approach to perform combinatorial optimization, including scheduling. Recent techniques such as edge finding [37] have proven useful in the context of scheduling. Constraint programming per se does however not scale to very large problem size, as it remains an exhaustive search paradigm that will –in the worst case- explore the full solution space. While constraint programming is an exhaustive search, it can also be used in a local search fashion, by performing exhaustive search on fractions of a given solution, the rest being frozen. This process can be repeated, and give rises to the concept of large neighborhood search.

Optimization problems close to the one we might encounter in the TANGO project are subject to research, as notably denoted by the ROADEF challenge 2012 [38].

3.4.3.3 Dealing with on-line optimization

On-line optimization is when decisions must be taken in real-time, notably to mitigate the lack of information about run-time, approximates made in the optimization models or other factors such as the other executing application competing for the same resources.



of 173

These optimizations generally proceed with greedy approaches that exhibit very short response time, such as variants of the backfilling approach [39] [40].

Some approach have been developed to enable the use of time-consuming optimization algorithm for scheduling processes at run time, by performing multiple optimizations off-line, and selecting the appropriate optimized schedule at run time. In the context of embedded multi-core heterogeneous systems, [41] and proceeds with an off-line scheduling phase that delivers a set of Pareto-optimal schedules for each application. The final schedule is then selected on-line for each running application, and tasks are then executed. It is not clear at first reading if such approach can be transposed to other contexts such as computer farms.

For the sake of completeness, specific online algorithms are available for taking optimized decisions under incomplete information, such as Monte-Carlo tree search [42], or regret algorithm [43]. However, the response time of these algorithms must be checked for compatibility with the context of TANGO.



of 173

3.5 Programming Models and Run-Time Management techniques for Heterogeneous Parallel Architectures


Complexity of computing architectures has been growing in recent years. The fact of the appearance of multi and many-core architectures has forced programmers to take parallelism into account in their applications. Several language extensions, such as Unified Parallel C [44]or Cilk [45], and programming models or APIs, such as OpenMP [46], have been proposed to overcome this parallelism complexity. In the case of distributed memory architectures, developers have the Message Passing Interface Standard (MPI) [47], or Partitioned Global Address Space (PGAS) [48].

Another grade of complexity is because of the appearance of heterogeneity in the computing nodes. This heterogeneity can be in the form of different architectures and programming APIs (like in the case of GPUs and FPGAs) or by the fact that processors of the same architecture but of different capacities are found on the same chip. Examples of the latter are the big.LITTLE architectures where more powerful processors, and also more energy hungry, are paired with less powerful ones in the same chip. The appearance of the GPUs combined with general-purpose processors has had a significant impact of how applications are developed. While GPUs contribute to high performance capability, they need to be programmed with different APIs. NVIDIA introduced the CUDA API [49] to facilitate the programming of the GPU devices. While relatively easy to use, CUDA imposes to the programmer the responsibility of handling several operations such as allocation of memory in the device, transfer of data between the host memory and the device memory, scheduling of the computations in the device, synchronization, etc. which represent an additional effort and make codes less interoperable and less focused in the algorithm that the application wants to solve.

With FPGAs it can be even more difficult, since the actual computation of the software needs to be compiled into the hardware that will be configured in the device. Despite some vendors provide tools and libraries to facilitate the compilation and configuration of the device, such as the Xilinx SDSoC Development Environment [50], the programmer still has to manually manage the integration and configuration process which is extremely complex to non-specialized programmers. The extra complexity for the programmer introduced by new architectures depends on the type heterogeneity, for instance, when the heterogeneity does not affect the Instruction Set Architecure (ISA) such as big.LITTLE processors, the programming complexity is relatively low compared with the GPU and FPGA examples described before.

Efforts to reduce the complexity of programming these heterogeneous nodes, as well as, providing portability between architectures, have been topics of research during the last years. OpenCL [51] was born with the ambition of providing a common programming interface for heterogeneous devices (including not only GPUs, but also DSPs and FPGAs). With syntax very based on C, it has had a significant impact because the same code could be used in several accelerators. However, similar to CUDA, it requires the programmer to write specific code for the device handling, which reduces programmability. OpenACC [52] is another example of programming standard for parallel computing designed to simplify parallel programming of heterogeneous CPU/GPU systems. Based on directives, the programmer can annotate the code to indicate those parts that should be run in the heterogeneous device. The OpenMP standard tackles the programmability issues in a similar way as OpenACC with regard the heterogeneous devices and also considers many other aspects of parallelism which makes it a stronger option. OmpSs [53], a task based programming model developed by BSC, has been pushing ideas in the OpenMP standard a long time, proposing features that are now in the OpenMP standard such as the tasking model, task dependences and specific accelerators directives. OmpSs promotes both programmability and portability of codes by hiding the details of the architecture to the programmer (i.e., aspects such as allocation of memory in the device or data transfers are performed automatically by the runtime) at the same time that parallelism of the architectures is inherently exploited by analysing the data dependencies between tasks. Finally, the EU POLCA project also proposed a programming model [54] to manage heterogeneity which is following a pragma-based API which has similarities with the OpenMP or openACC ones.



of 173

Other proposals are based on programming frameworks which proposed templates/skeleton libraries to access the underlying computing hardware. StarPU [55] [56] is one example of such a framework which provides skeletons to manage the issues of sending data to the different types of processing units, on a multicore architecture. SkePU [57] is a C++ template library with multiple implementations for each hardware environment offer useful abstractions and common parallel features such as: map, farm, scan and reduce. Similar template libraries such as Fastflow [58], and Nvidia Thrust [59] can also be found in the literature. These libraries can be at various levels of abstraction. For instance, FastFlow provides a set of ready-to-use parametric algorithmic skeletons modelling the most common parallelism exploitation pattern. Frameworks such as PEPPHER [60], Kessler and Lowe [61], Merge Framework [62] and the EXOCHI complier [63] aims to address the issues of selecting appropriate implementation variants and providing programming abstractions to ease the usage of the underlying hardware. This can include providing fat binaries in the case of the EXOCHI complier or provide the ability to specify execution context restrictions such as the Merge Framework.

Heterogeneous environments offer the opportunity to offload work from the main CPU to others processing hardware. This can be easily achieved through the use of dedicated libraries offering some generic primitives. One of these options is the hStreams proposed by Intel which is included in the Manycore Platform Software Stack [64]. hStreams is a library that provides a streaming abstraction for mapping user-defined tasks to heterogeneous platforms. It provides the plumbing to support remote memory management, data movement, remote invocation and synchronization. Another alternative is to rely on alternative approaches to merely providing libraries includes language extensions that hide the low abstraction levels of OpenCL. Such language extensions include Offload C++ [65] [66] that allows for explicit compilation and offloading to GPUs and other accelerators such as Cell processors.

Regarding energy efficiency, in the literature we can find different techniques to reduce the consumption of running application. It includes the usage of low-power processor architectures or Dynamic Voltage Frequency Scaling (DVFS), re-design of algorithms using energy efficient patterns in compilers or apply runtime scheduling techniques for allocating task-based applications on the available resources.

The state of the art in low power architectures has already treated in section 3.3, however a hardware feature which is commonly used at runtime to reduce energy consumption is the Dynamic Voltage Frequency Scaling (DVFS). By using DVFS, processors can run at different voltage, impacting on the frequency and energy consumption. In [67], [68] and [69], authors present different techniques to reduce power consumption by decreasing voltage in resources which are not executing the application critical path, reducing the energy consumption without affecting too much in the application performance.

DVFS can be used by the cluster scheduler without knowing the workload of an application (the frequency is dynamically adapted to the load on each processor); however, the energy savings of such black-box approaches are limited (a review [70] reports energy savings on NAS Parallel Benchmark of up to 20-25% with roughly 3-5% performance degradation). Other methods complement DVFS. A simple, system-level technique is to switch off unused processors—here the key problem is the energy needed to switch the processor back on [71].

Rountree et al. [72] developed a performance prediction model outperforming previous models. Etinski et al. [73] studied how to improve the trade-off energy versus completion time on applications. Freeh et al. [74] provided a huge number of experiments for measuring the Energy over Time.

A new tool to improve energy efficiency of applications during runtime is currently under development by Intel and called GEOpm [75] [76]. It is an open source, scalable, extensible runtime and framework for power management in HPC systems developed for Intel processors. The tool has very interesting features such as machine learning techniques to adapt energy consumption based on application phases during runtime. It certainly worths our evaluations to see if and how it could be adopted within TANGO.



of 173

Compiler optimizations are usually applied to obtain a better execution performance, however with the appearance of battery operated embedded devices, the compiler research have also focused on applying optimizations to reduce energy consumption. In [77], [78] and [79], different loop optimization and register re-labelling techniques, used previously for gaining performance, are analysed from the point of view of energy consumption in memory system and cores. A compiler could also change the instruction-scheduling algorithm in Very Long Instruction Word (VLIW) architectures to be energy-aware, as done in [80], or reduce data cache energy through compiler techniques, as presented in [81]. In mobile environments, research has been conducted to discuss the results of an energy performance evaluation between the Sun and Android Dalvik JVMs, presenting interesting findings on the implications of dynamic (JIT) compilation [82].

Finally, another topic where energy consumption can be optimized is in the programming model runtime. Parallel programs can be broken into separate tasks, as part of a task graph structure. The different task-based programming model runtimes, such as OmpSs, also implement different techniques to manage and efficiently execute this task graphs to the underlying processing units. The scheduling performed by these runtimes has big effect on the performance and energy consumed by the application execution as studied in [83] and [84].


Nowadays, production computing nodes are in most cases heterogeneous. This means that in the same node general purpose processors are placed together with GPUs and or FPGAs. While the previous approaches consider some aspects of programmability and portability between different heterogeneous processing units, the goal of programming for heterogeneous platforms is still in early stages.

In TANGO, we will use a combination of StartSs models (OmpSs and PyCOMPSs/COMPSs [85] a sibling of OmpSs programming model that targets distributed platforms), to provide users a high-productive programming model to develop application for clusters of heterogeneous parallel nodes. We will extend the features of current programming models to better integrate heterogeneity aspects without reducing programmability and portability.

Regarding power efficiency in most cases it is not considered explicitly beyond the fact that “if you execute faster, you will consume less”. In the project ASCETIC [86], COMPSs is extended to be aware of power efficiency but only for cloud computing platforms [87]. In TANGO, we will port this experience to the heterogeneous parallel architectures, where heterogeneity in the compute units introduces an extra variable in the optimization of energy consumption. The programming models will be augmented by enabling application self-adaptation in cooperation with infrastructure self-adaptation to further optimise overall performance on time and energy without impacting other quality properties beyond targeted level.



of 173

3.6 Modelling tools for prototyping with software emulation/simulation of heterogeneous parallel architectures


Heterogeneous parallel architectures provide large potential energy-efficiency gains, in addition to opportunities to speed up predominantly sequential applications. The key to taking effective advantage of such heterogeneous architectures lies in the ability to model and simulate the underlying hardware for the purpose of fast application prototyping [88]. Several projects have tackled the technical and scientific challenges involved in effective modelling of heterogeneous parallel architectures for the purpose of increasing the speed at which an application can be prototyped. The ALMA project [11], which aims to enable the efficient programming of embedded reconfigurable multi-core machines through the use and application of a tool chain, which provides a tool at its end that, enables binaries to be executed on a simulated multi-core machine [89]. Insight on the performance of the binary is fed back from the simulator to other tools to extract further parallelism. However this project did not tackle the optimization of energy consumption. The PEPPHER project [3], devised as a unified framework for programming and optimizing applications for architecturally diverse heterogeneous many-core processor, tries to ensure performance portability. This is achieved with the aid of the PeppherSim component from the projects references architecture. This simulation component enables the exploration of application performance and scheduling before an application is deployed to a real system. Similarly to ALMA, PEPPHER did not tackle the optimization of application energy consumption. However, the project did consider support for energy consumption metrics on a per-heterogeneous parallel architecture basis but left the implementation as future work that has yet to materialise (as the project ended in 2012). Within the literature, several key papers discuss the problems of modelling and simulating heterogeneous parallel architectures. These are broken up into two specific problem domains: simulators that support array style SIMD cores for throughput orientated workloads and those that do not and are generally latency orientated [90]. There are several simulators that model traditional individual out of order cores (OOO) and simultaneous multithreaded (SMT) cores. Two well-known simulators that model OOO and SMT cores are SimpleScalar [91] and SMTSIM [92]. With the move to more modern architectures that employ heterogeneous chip multiprocessors (CMPs), several other simulators have been released. These include PTLsim [93], Sesc [94], Simics [95], Gems [96] and SimFlex [97]. In the other problem domain, there are a number of GPU simulators that support the modelling of SIMD cores, some good examples of which are: GPGPUsim [98], Qsilver [99] and Attila [100]. SimGrid [101] aims at simulating a parallel application on a distributed grid environment. It is an active and flexible open source instrument that has been extended to cover various types of platforms and applications. SimGrids authors claim a large application field to any application distributed at large scale (cloud, P2P, MPI, etc). In order to tackle the problem of emulating an heterogeneous environment Canon et al have designed the Wrekavoc [102] system. The main objective is to have a configurable environment that allows for reproducible experiments on large set of configurations using real applications. Wrekavoc degrades the performance of nodes and network links independently in order to build a new heterogeneous cluster. Then, any application can be run on this new cluster without modifications. However, none of the above supports energy consumption metrics. A different kind of emulation is used in [102] where the authors emulate a petaflop machine of 5040 nodes by using a small number of physical machines through the multiple-slurmd emulation technique provided by SLURM. The multiple-slurmd technique bypasses the typical functioning model of SLURM and allows the execution of multiple compute daemons slurmd upon the same physical machine (same IP address) but on different communication ports. This technique supports the heterogeneity of resources as shown in [102].



of 173

Georgiou et al. extended the same technique with energy consumption accounting per job emulations [103].


Two previous projects have made use of simulation but neither has prioritised the reduction of application energy consumption on heterogeneous parallel architectures. Additionally from the above literature review, it can be seen that there is scope for researching the development of a unified heterogeneous hardware simulator that can accommodate both more traditional parallel architectures, such as x86 and new SIMD accelerators such as GPUs, as presented in [104] but with the aim of not only optimizing application performance but also energy consumption. Extensions in SLURM will allow a finer grain power consumption profiling based on the monitoring of different heterogeneous resources per job.



of 173

3.7 Monitoring of heterogeneous architectures

In this section we detail different ways to access to performance indicators and energy consumption in different types of heterogeneous architectures in hardware, for CPUs, GPUs, or FPGAs.


3.7.1.1 Hardware Counters

Hardware counters or Hardware performance counters are special registers present in CPUs/GPUs/FPGAs that are programmable accessible and count hardware related activities (such as cache misses, usage of float point units, etc.). The main usage of this type of counters it is for low level performance analysis of microprocessors. The first time this kind of counters has been found is in the original Intel Pentium processor, although the counters were reversed engineered since their presence was no documented [105].

Libraries to access hardware counters

The Performance Application Programming Interface (PAPI) [106] [107] is an open source project that aims to specify a standard application programming interface (API) for accessing hardware performance counters. In its nearly 16 years of existence, it has become the most used API to access hardware counters in the majority of hardware microprocessors platforms. The events that can be monitored cover a wide range of architectural features, from float or integer points operations to cache misses. Its major advantage is that it offers a standard API to access hardware performance counters independently of platform and operating system.

With the popularization of heterogeneous architectures, such as multicore CPUs, GPUs, etc., PAPI evolved to Component PAPI or PAPI-C [108]. This allows PAPI to access to hardware counters in different domains, different from the traditional unicore CPU architecture. Multicore CPUs are supported by PAPI-C, but also other not traditional devices that incorporate hardware counters, such as GPUs, network or I/O devices. This also has the benefit of allowing users to monitor synchronization and data exchange between computing elements.

PAPI nowadays it is supported in nearly all architectures, from Intel, AMD, or ARM multicore processors, to more exotic architectures such as Intel Xeon Phi [109] or IBM Blue Gene/Q [110], passing by NVIDIA GPUs [111] [112].

3.7.1.2 FPGA Monitoring

Some Xilinx FPGAs provide a System Monitor port, that similar to the hardware counters detailed before for GPUs or CPUs, that can be accessed to see the status of a FPGA [113]. This port gives access to the metrics collected from the different sensors in the FPGA that go from input voltage, temperatures, or reconfiguration status.

Altera offers the possibility for some of their FPGAs to do performance analysis employing the ARM DS-5 software [114]. This is mainly done for the FPGAs that incorporate internally an ARM processor.

3.7.1.3 Energy measurements

In the following section we comment several ways to measure energy in processors or acelerators.

Energy measurements via external devices

Concerning monitoring, the best approach to accurately track power and energy consumption data in HPC platforms would be to use dedicated devices such as power-meters for the collection of the power of the whole node or devices like PowerInsight [115] and PowerPack [116] for power consumption of node’s



of 173

individual components. PowerPack [116] consists of hardware sensors for power data collection and software components to use those data for controlling power profiling and code synchronization.

It employs external Watt’s Up Pro power meters than send energy consumption of CPUs, PCI, Memory, etc. to an external controller. That information needs to be collected later on by the runtime application to be able to use it for taking live decisions.

IBM provides its PowerExecutive Toolkit to access the proprietary hardware measurement devices include in their blade servers [117]

In a different way to what we are going to comment for other family of processors in the next section, the ARM family, at least as far as we could find, does not incorporate energy measurement devices internally, it is necessary to measure it externally. There is the possibility of using external monitors in the same board as the ARM processor is running and reading them [118]. ARM provides also an ARM Energy Probe [119] to be used together with their ARM DS-5 Development Studio. [120].

Energy consumption of CPU and each different component can be approximated through models [73], [121]. It is also possible to measure the GPU power consumption [122]. External power meters can monitor whole nodes (and also other equipment like switches or routers): for instance, [123] describes a deployment of power meters on three clusters; [124] proposes a software framework that integrates power meters in datacenters. Nevertheless, introducing power-meters on large HPC clusters would be too expensive in terms of money and additional energy consumption therefore the alternative to exploit built-in interfaces is the most viable approach.

Indeed, modern hardware provides various interfaces to monitoring energy consumption. Standards include the Intelligent Platform Management Interface (IPMI, [125]) that uses a specialized controller (Baseboard Management Controller, BMC) to monitor various hardware parameters of a system, including power.

Energy measurements internal to the processor

To measure energy from inside the different heterogeneous architectures we need to understand what information is available from them. In this subsection we detail the information that can be extracted from the different common and not so common processors:

Intel RAPL [126] – With the introduction of Intel SandyBridge processors, Intel included the “Running Average Power Limit” RAPL interface. The main idea of the interface is to keep the energy consumption of a processor inside a limit specify by the user. The processors are able to estimate the energy usage via some internal models [127] and make this information public in a counter. RAPL provides different energy reads, for the whole processor package and also for all the cores (PP0). In some processors there is also the measurement of the energy consumed by the internal GPU (PP1), but this could also be the energy consumed by the DRAM package.

AMD Power Management [128] – Since the introduction of the 15th processor family, AMD provides a counter to access the actual power consumption in watts of the whole package.

NVIDIA Management Library [129] – For some GPUs, it is possible via de NVIDIA Management Library (NVML) to query the power usage of the whole GPU card (it includes also the power report of the

memory). NVML reports the power usage in milliwatts with a resolution of 5 Watts.

Intel Xeon Phi [130] – The Intel Xeon Phi onboard System Management Controlled (SMC) exposes via command line or in the Sysfs of the operating system the instantaneous power usage of the total coprocessor package.



of 173

PAPI Energy measurements

With the introduction of PAPI 5 [131], one of the main focus of the library was to support energy readings from different computational devices. It supports Intel RAPL [126] interface (it is necessary to configure the MSR driver at Linux Kernel level to expose the information of those registers to userspace level for PAPI to access them) [132] [133]. PAPI also supports accessing via the NVML [129] API to access to NVIDIA GPU power consumption. Finally, for Xeon Phi CoProcessors, PAPI is able to access to its instantaneous power reports.

Power and Energy consumption per job

Monitoring the energy of a platform allows to obtain general knowledge upon the consumption of the system overall. However, in an effort to go further in our understanding of energy consumption per application we need to attribute the measured energy upon the executed jobs. This will allow us to better analyze the internals of the application in terms of energy and enable optimizations.

Various Resource and Job Management Systems such as LoadLeveler and LSF provide power and energy monitoring and Hennecke et al. present the retrieval of power consumption data from IBM Blue Gene/P supercomputer [134] through a particular LoadLeveler tool called LLview. In addition, LLview has the capability to provide power consumption data per job basis in high precision with about 4 samples per second. Nevertheless, this tool is closed source, specifically developed for BlueGene/P platforms and it is not very well defined how the 4 samples per second are retrieved.

Georgiou et al. have extended the Resource and Job Management System (SLURM) to provide functionalities that enable power monitoring per node along with energy accounting and power profiling per job [135].

The work presented in [135] showed a sampling with 1Hz with descent precision and low overhead. Hackenberg et al. went event further in [136] which allowed to provide improvements for the energy calculation per job, based upon an internal BMC sampling of 4Hz with optimized accuracy and even lower overhead. This infrastructure is based upon temporal resolution optimizations through internal BMC polling and querying via IPMI.

Furthermore, the HDEEM framework [136] proposes high-resolution measurement techniques based on FPGA that will radically improve quality of system monitoring and software profiling.


The objective of TANGO is to provide optimization of software running in heterogeneous architectures at runtime. To do so it is necessary to see how to precisely measure the status and performance from different devices and also its energy consumption. The previous section collects information about how to monitor those devices right now. Depending on the platforms selected, it is necessary to understand how that information could be measured by the runtime environment and stored in a monitoring architecture for later on analysis.

TANGO will continue to build in the emerging tools that monitor energy consumption in heterogeneous devices and evolve the actual SotA by taking that information and use it in a runtime environment, not just the typically postmorten analysis of the performance of an application.



of 173

3.8 Workload management techniques for heterogeneous architectures


Resource Management and Job Scheduling in traditional HPC systems is being performed by specialized software called RJMS. This software holds an important position in the HPC stack since it stands between the user workloads (jobs) and the hardware platform (resources). It is responsible for delivering computing power to applications efficiently.

More than 2 decades of research and developments in the field has resulted into various open-source and proprietary versions of RJMS that exist today offering basic and advanced functionalities to deal with HPC specialized platforms and workloads.

Nowadays, various software exist either as evolutions of some older software (e.g. PBSPro or LSF) or with new designs (e.g. OAR and SLURM). Commercial systems like LSF [133] [137], LoadLeveler [138], PBSPro [139] and Moab [140] generally support a large number of architecture platforms and operating systems, provide highly developed graphic interface for visualization, monitoring and transparency of usage along with a good support for interfacing standards like parallel libraries, Grids and Clouds. On the other hand, their open-source alternatives like Condor [141], OAR [142], SLURM [143] provide more innovation and flexibility when compared to the commercial solutions. These open-source projects and especially SLURM are used as tools for research in resource management and scheduling by various teams. Taking advantage of large open-source communities’, new algorithms and features are implemented, experimented and put in production usually faster than the proprietary RJMS. Furthermore since those tools are more recent than the commercial solutions they have been designed from scratch with flexibility and scalability in mind.

Since 2010 new generation schedulers have started to appear, Mesos [145], Yarn [146], Ome [84] and Fuxi [147] can execute both compute and data intensive workloads based on new types of internal architectures trying to deal with scalability, efficiency and fault-tolerance issues. In this group we can also add Flux [148] which is currently under active development and destined for extreme scale HPCsystems.

As far as our knowledge, none of the above RJMS provides the support of heterogeneous resources allocations, tightly integrated with MPI.


The TANGO workload and resource management part will be enhanced through the new developments upon SLURM which will enforce this software to become more flexible to allow the allocation of heterogeneous resources for jobs through a tight integration with the Message Passing Interface (MPI). Furthermore it will enable a self-adaptive automatic workload execution upon the best-fitted heterogeneous resources along with a finer grain power profiling and energy accounting per job.



of 173

3.9 Other: Distributed systems, security, networking, and Data Management

3.9.1 Security

3.9.1.1 State of the Art

The use of heterogeneous parallel architectures and software platforms offers significant promises in terms of increasing performance, lowering cost, reducing energy consumption and boosting security.

The security market is evolving alongside new requirements arising from the Internet of Things, cloud computing and sophisticated targeted attacks. For example, the IoT adds new security dimensions to consider. For example, an insecure connection could give a hacker access not just to the confidential information transmitted by the device, but to everything else on a user’s network. The risk is not related to data, but also associated with the device access as well. Recent studies of IoT devices seem to agree that “security” is not a word that always gets associated with this category of devices, leaving consumers potentially exposed [149].

In the literature, “Data Centric” approaches [150] [151] [152] are largely adopted by both academia and industry to date and focus on delivering end-to-end IoT data security at device, network, platform and data levels with the hope that privacy and trust can be cultivated as a by-product. Another approach recognises that security is a complex notion with context-specific meaning, culture-specific requirements and stakeholder-specific perceptions [153].

In the European research space, RERUM (REliable, Resilient and secUre IoT for sMart city applications) [154] aims to develop, evaluate, and trial an architectural framework for dependable, reliable, and secure networks of heterogeneous smart objects supporting innovative Smart City applications. The framework is based on the concept of “security and privacy by design”, addressing the most critical factors for the success of Smart City applications. The project makes use of the OpenLDAP software which is an open source implementation of the Lightweight Directory Access Protocol used for the purpose of authentication and authorization. SMARTIE (Secure and smarter cities data management) [155] aim is to create a distributed framework to share large volumes of heterogeneous information for use in smart-city applications, enabling end-to-end security and trust in information delivery for decision-making purposes following data owner’s privacy requirements. A secure, trusted, but easy to use IoT system for a Smart City benefits various stakeholders of a smart city. Furthermore, the services offered are more reliable if quality and trust of the underlying information is ensured. The SMARTIE project uses a number of baseline technologies such as Public Key Cryptography (PKC) and industry standard encryption techniques to achieve its aim via a number open source projects such as OpenSSL.

Finally, TANGO will investigate the security recommendations and solutions proposed by the EU project SHARCS [156] which aims at designing, building and demonstrating secure-by-design system architectures that achieve end-to-end security for their users. SHARCS achieves this by systematically analyzing and extending, as necessary, every hardware and software layer in a computing system. This is of relevance of TANGO not only for its layered architecture but also for its support of heterogeneous parallel architectures as well.

Therefore, security mechanisms for the SHARCS framework will be evaluated against security requirements of each TANGO application that are to be supported in hardware. Thus, the usage of SHARCS developed technologies will be investigated in the context of TANGO applications and services requiring end-to-end security.

3.9.1.2 Relevance to TANGO and Progress beyond SotA

TANGO, through leveraging existing baseline technologies, will build dependability and public trust when using embedded application through maintaining the security of the architecture as a whole. This will be achieved through the secure gateway used within the project, which provides capabilities to secure embedded application and their use of remote computational resources. This is of extreme importance in the context of the two Use Cases that TANGO will address: IoT and HPC.



of 173

However, research into securing embedded IoT devices within TANGO will be limited in scope. Instead, the following baseline technologies will be reused within the proposed architecture:

RERUM’s OpenLDAP software for the purpose of authentication and authorization will be used to secure TANGO’s architecture and improve its dependability;

SMARTIE’s technologies such as Public Key Cryptography (PKC) and industry standard encryption techniques for use in TANGO for the purpose of securing inter-device communication and end-to-end security.



of 173

Part 4. Requirements and Architecture Specification

4.1 Introduction

The project Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (TANGO) aims to simplify the way developers approach the development of next-generation applications based in heterogeneous hardware architectures, configurations and software systems including heterogeneous clusters, chips and programmable logic devices.

This part presents the specification of the TANGO architecture - Year 1. This includes the architectural roles, scope and interfaces of the architectural components, as well as the communication patterns. The commonalities between the envisioned use cases have been a topic of significant attention following the interaction between the business goals analysis, the technical and business requirements elicitation and the architecture definition. These requirements will assist in the development of high quality software and the identification of the adequate design approaches to guarantee these factors.

An initial view of TANGO quality model and architecture is based on the reference architecture introduced in the DoW and past research projects reviewed in the state of the art section. It is also the result of a first iteration on business and technical requirements. This three-layer architecture considers the classic IDE, middleware and infrastructure approach, and supports various components such as the Programming Model (IDE), the Application Life-cycle Deployment Engine (Middleware) and the Heterogeneous Parallel Device Cluster (Infrastructure). The design of the various architectural components is described in detail, some of which will require specific extensions in order to be able to deal with low power and energy efficiency management. In addition to this, the architecture also requires specific components to be developed from scratch such as the Requirements and Design Modelling tool and the Device Emulator. The rationale and functionalities of all those components will also be explained in this document.

4.1.1 Requirements and Architecture Specification Structure

This architecture design document is structured as follows:

Section 4.2 introduces TANGO vision to characterise the factors, which affect energy efficiency in software development, application deployment and operations. As such, the section also describes the lifecycle of an application, and outlines the project various objectives and contributions.

Section 4.3 describes TANGO requirements elicitation approach. This section outlines both the requirements engineering process used, and the specification of business and technical requirements for developing appropriate methods and tools matching TANGO vision.

Section 4.4 presents an overview of TANGO architecture and details the components of which it is comprised, their roles, and interfaces. It also illustrates the industry pilots (DELTATEC and Bull) at an architectural level within the context of TANGO.

Section 4.5 discusses the interactions of the architectural components in the context of power consumption/energy efficiency and the ramifications this has on application design, deployment and operation.

Section 4.6 discusses the future steps needed to implement TANGO architecture.



of 173

4.2 Vision

Heterogeneous parallel architectures have received considerable attention, as an efficient approach to run applications and deliver services, by combining different processor types in one system to improve absolute performance, minimise power consumption and/or lower cost. New platforms incorporating multi-core CPUs, many-core GPUs, and a range of additional devices into a single solution have been introduced. These platforms are showing up in a wide range of environments spanning supercomputers to personal smartphones. As the range of applications continues to grow, e.g. Cyber-Physical Systems (CPS), the Internet of Things (IoT), connected smart objects, High Performance Computing (HPC), mobile computing, wearable computing etc. there is an urgent need to design more flexible software abstractions and improved system architectures to fully exploit the benefits of these heterogeneous platforms.

Heterogeneous parallel architectures have the potential to be applied to any sizable workload. Adopting heterogeneous systems to run HPC as well as non-HPC workloads has the potential to deliver higher performance on extreme-scale applications, which is particularly useful when homogeneous servers are too slow. As the HPC community is heading toward the era of exascale machines, these are expected to exhibit an unprecedented level of complexity and size. The biggest challenges to future application performance lie with not only efficient node-level execution but power consumption as well, which is a key focal point of TANGO.

Families of applications such as complex engineering simulations are already benefiting from customised low power computing heterogeneous architectures. In the upcoming era of IoT and Big Data, new families of applications will show significant interest in exploiting the capabilities offered by customised heterogeneous hardware such as FPGA, ASIP, MPSoC, heterogeneous CPU+GPU chips and heterogeneous multi-processor clusters all of which with various memory hierarchies, size and access performance properties.

One of the major challenges to reaping these benefits, in data centric and emerging domains, is the complexity of using them – or more precisely of designing and maintaining software that can deliver this benefit. Developers need to fully understand the nuances of different hardware configurations and software systems (both rapidly evolving), as well as consider additional difficulties in performance, security mixed-criticality and power consumption resulting from these heterogeneous systems.

TANGO will complement low-power multi/many-core computing systems developments by addressing the power consumption and efficiency of the software which runs on these heterogeneous infrastructures. The major consumption of energy by software is the power consumed in its operation. The primary aim of TANGO is thus to relate software design and power consumption awareness. As energy use is of relevance across all software design and implementation steps, TANGO focuses on the design of shared software components which are likely to be used and reused many times in many different applications. Therefore, it makes it imperative that the software to be developed is not only as low power consumption aware as it possibly can be, but takes into account trade-offs with other key requirements in the environment where it runs such as performance, time-criticality, dependability, data movement, security and cost-effectiveness as well.

While these developments represent significant contributions to the area of energy efficiency in heterogeneous parallel architectures, TANGO approach states that the energy requirements of the software applications which run on hardware unit must be incorporated into the overall development and deployment process. TANGO will therefore address the total characterisation of software with respect to the impact of software structure on power consumption, which is not incorporated into current solutions. Determining the relationship between software structure and its power usage will allow the definition of a



of 173

set of software power metrics similar in concept to those for hardware. By associating those metrics with software components and libraries it will be possible to not only populate a software development environment with information to predict and illustrate the power requirements of applications enabling the programmer to see the consequences of their work, but also automatically optimise the code by allowing alternative selections of software components to be made, using power consumption as an additional selection criterion.

TANGO will measure how software systems actually use heterogeneous parallel devices, with the goal of optimizing consumption of these resources. In this way, the awareness of the amount of energy needed by software will help in learning how to target software optimization where it provides the greatest energy returns. To do so, the three architectural layers (IDE, middleware and infrastructure) will be supported thanks to a MAPE (Monitor, Analyse, Plan and Execute) loop. Each layer monitors relevant metrics such as performance and power consumption/energy efficiency status information locally and shares this with the other layers, assesses its current energy status and forecasts future energy consumption as needed. Actions can then be decided and executed according to this assessment.

Therefore, the objectives of TANGO are to:

1. Propose and implement a self-adaptive reference architecture. 2. Extend existing software development models and methodologies for heterogeneous parallel

architectures. 3. Develop an energy aware hardware agnostic programming environment. 4. Develop and evaluate a self-adaptive model with identified low power parameters and QoS

metrics. 5. Develop hardware power consumption and software energy models. 6. Found a research Alliance to which complementary research efforts into programming approaches

will nucleate, collaborating and integrating results.

TANGO will make significant contributions to software engineering, programming models and adaptive architectures for heterogeneous parallel architectures. For each objective, the project has defined a clear set of results: Objective 1 Results: A toolbox based implementation of the reference architecture: TANGO will address power management in heterogeneous parallel architectures across the entire software stack. The reference architecture will support self-adaptation regarding energy efficiency while at the same time be aware of the impact on other quality characteristics of the overall infrastructure such as space and time performance. Objective 2 Results: Reference software development models and methodologies for best practice: TANGO will propose methods for energy efficiency for applications running on heterogeneous parallel architectures by combining power-awareness related to these architectures with the principles of requirements engineering and design modelling. Objective 3 Results: A collection of reusable IDE plugins, programming models and runtimes: the programming framework will bring a set of functionalities that allows the developer to program his application in a seamless way with regard to the underlying heterogeneous parallel infrastructure. Objective 4 Results: An adaptive quality model for holistic system performance: TANGO will develop mechanisms to translate high-level energy efficiency, performance (including time criticality), data locality and cost goals into to a holistic model associated with application components and propose methods and tools to manage the life cycle of applications from requirements to run-time through construction, deployment, operation and their adaptive evolution over time. Objective 5 Results: Hardware and software energy models: TANGO will develop novel energy models that will allow estimation of power consumed by an application over a range of heterogeneous parallel architectures.



of 173

Objective 6 Results: Inter-project collaboration: through actively seeking cooperation and the formation of an alliance, we expect to enhance co-ordination and collaboration between projects. Thus enhancing the sustainability and impact of TANGO through common APIs, standards and the integration of the developed tools.



of 173

4.3 Requirements

This deliverable has a strong focus on requirements gathering, and aims to ensure the correct requirements drive the project forward, leading to better scientific outcomes. In this section we discuss the approach taken for the elicitation process followed by the results of these endeavours.

4.3.1 Tango Requirement Elicitation Approach

4.3.1.1 Description of the Overall Requirements Approach

The TANGO requirement elicitation approach consists of identifying both the business-related requirements as well as reflection upon more technical requirements.

To gather business requirements, the Tango consortium has already proposed 2 Industry pilots respectively with DELTATEC and Bull. However, to guarantee a broader scope of applicability of the Tango results, a larger set of stakeholders that are potentially interested in the results of Tango are interviewed. This led to the preparation of an initial interview with two types of interview questions, one oriented towards business goals and the other reflecting on the technical aspects that a software architecture would have to consider. Annex C contains copies of questions used during these interviews.

To select a broad variety of stakeholders potentially interested by Tango results, an initial identification of a list of stakeholder roles involved in heterogeneous parallel architectures was derived. This list of stakeholders is present in the Section ¡Error! No se encuentra el origen de la referencia.). The selection of stakeholders for the interviews therefore attempted to cover the broadest range of the roles identified.

The analysis of interview results from the questionnaire identified high level business goals. Among the 14 interviews based on this general questionnaire, two were conducted with Tango Industry partners DELTATEC and Bull. The thorough analysis of all 14 interviews that were undertaken resulted in the assessment that was presented in Section 4.3.2.1.

4.3.1.1.1 Collecting TANGO Business goals and requirements The business goals were collected by organising a questionnaire to ascertain the interest of the market. It also looked to validate itself by enquiring about the types of businesses that were being surveyed. The main aspects of the questions asked focused upon:

How much power and energy saving was an issue? o Including the reasons why businesses would want power and energy saving to be

addressed

How much energy and power saving could be traded off against other quality aspects?

How much heterogeneity was going to be an impactor of businesses?

The release strategies for code?

4.3.1.1.2 Collecting TANGO Technical goals and requirements As part of the requirement elicitation process in TANGO, a number of technical questions were asked to potential stakeholders. Table 2 provides a summary of technically questions that were asked as part of the stakeholder survey for the purpose of defining technical goals. Some questions have been omitted that were used to weight the relevance of the interviewee’s feedback.

Table 2: Overview of Technically Relevant Survey Questions

Question ID Question Purpose / Technical Relevance

1 Do you design or plan to design heterogeneous computing

systems?

For the purpose of prioritise support for

specific types of device.



of 173

2 Do you use secure internet connections in your applications? Used to define the degree to which security

should be considered within the architecture.

3 Do you develop hard real-time applications? Helped define the technical goal to support

real-time IoT applications.

4 What are the relevant operating systems for your product

development?

Used to limit the scope of potential operating

systems to support.

5 Do you develop specific ASICs for embedded systems? Proved the need for a technical goal that would

see support for ASIC based devices.

6 Do you develop GPU code? Helped define the technical need for supporting

a range of GPU devices and potential

abstraction layers.

7 What are the programming languages used for the

development of relevant products in your company?

Provided insight into what languages should be

supporting.

8 Do you already use some technologies (middleware or

libraries or compiler or architecture) for parallel computing

or for management of distributed software?

Provided insight into the technical objective to

support de facto standard baseline

technologies within the TANGO architecture.

9 Do your applications including some image processing or

signal processing elements?

Motivated the technical goal of having out-of-

the-box processing support.

10 What energy related measures are you interested to monitor

within your computing system?

Used to define the relevance of energy metrics

in the context of the TANGO architecture when

benchmarking and optimising application

deployment.

The remaining questions were used by the consortium to guide the technical objectives and envisaged development of components within the TANGO architecture. The results of the questions were circulated to component owners within the architecture and used as technical goals. Following table presents provides a traceability matrix mapping the previously discussed questions to relevant component requirements:

Table 3: Questionnaire Traceability Matrix

Question ID Relevant Component Relevant Component Requirement(s) Section Cross-Reference

1 Programming Model,

Device Supervisor, Device

Emulator

R-PM-2, R-DE-1, R-DE-2, R-DE-3, R-DE-4 Section 4.4.2.1, Section 4.4.4.1,

Section 4.4.4.3

2 Programming Model R-PM-5 Section 4.4.2.1

3 Runtime Abstraction Layer R-RT-2 Section 4.4.2.4

4 Device Supervisor,

Runtime Abstraction Layer

R-RT-2, R-DS-1, R-DS-2 Section 4.4.4.1, Section 4.4.2.4

5 Device Supervisor, Device

Emulator

R-DS-1, R-DE-1, R-DE-2, R-DE-3, R-DE-4 Section 4.4.4.1, Section 4.4.4.3



of 173

6 Programming Model,

Runtime Abstraction

Layer, Device Supervisor

R-PM-2, R-RT-2, R-DS-1 Section 4.4.2.1, Section 4.4.2.4,

Section 4.4.4.1

7 Programming Model R-PM-1, R-PM-2, R-PM-3, R-PM-4 Section 4.4.2.1.

8 Device Supervisor R-DS-1, R-DS-1 Section 4.4.4.1

9 Programming Model R-PM-1 Section 4.4.2.1

10 Self-Adaptation Manager,

Energy Modeller,

Monitoring Infrastructure,

Device Emulator,

R-SAM-1, R-SAM-2, R-EM-1, R-EM-2, R-EM-3,

MI-1, MI-2, MI-3, MI-5, R-DE-5

Section 4.4.3.2, Section 4.4.3.3,

Section 4.4.3.4, Section 4.4.4.3.

For further details on the technical requirements, we advise the reader to use the cross-references to sections within this deliverable presented in the previous table, to view the requirement traceability matrix of each component within the context of its associated implementation.

4.3.2 Business Requirements

4.3.2.1 Summary of Business Interviews Conducted

An important aspect of the interviewing process is to ensure that a broad audience is queried as part of the study. We altogether interviewed individuals from 14 different companies with one interview per company, which included DELTATEC and Bull, whom are the use case partners. This was done in order to get as broad a range of input as possible in regards to the requirements that drive the project. The interviewees include: AWTC Europe, Bull, DELTATEC, HLRS, IntoPIX, IP Trade, IMEC, TSTSYSTEMS, Vodafone, ARICO, Intel, ARM and Xilinx.

Ten out of the fourteen interviewee’s completed a complimentary characterization questionnaire, this resulted in an assessment of the representativness and broadness of the interviewees.These interviewees covered a range of different markets (see: Figure 12) as well as different classifications of business (Figure 13).

Figure 12: The target markets of interviewees



of 173

Figure 13: Classification of the type of business of the interviewees

The 10 of the interviewee’s completed a complimentary characterization questionnaire. This resulted in them indicating that they had experience of both High Performance Computing (HPC) and Internet of Things (IoT) or both. Two stated neither and focused their business on embedded systems, but had yet to consider IoT.

Figure 14: Experience of interviewees



of 173

Figure 15: Favouritism towards Scenario 1, 2 and 3 regarding energy efficiency: 1) The Doom's day scenario, 2) The Optimist’s scenario 3) The centrist scenario

Figure 15 shows how much each of the participants favoured 3 scenarios regarding how much future energy supply will be a problem. 1) Pessimistically states energy supply will not meet demand in the near future. This was selected by only 2 of the interviewees. The middle approach (3.1 & 3.2) was split into a slightly pessimistic and an optimistic version and overall was selected the most common at 57%. The final case where energy supply would not be a problem in the future was selected by 1 interviewee. There was therefore no general consensus but the middle option of having to be careful with energy supply remained predominant.

A similar query (See: Figure 16) regarding the mind set of interviewees towards device heterogeneity, proved to hold a more positive outlook with most selection the optimistic point of view or the centrist ground, with only 1 having a negative outlook on the challenges of heterogeneity.

Figure 16: Interviewee's outlook on device heterogeneity



of 173

The interest of the interviewees in energy saving, was queried as one of the key goals of the project’s call was for low power computing. In all 10 cases which gained a response they were interested in energy saving. In one case a clarification was given that energy saving was only important in the embedded systems and not the larger compute resource that they utilise. The reasoning was widespread, with the top aspects been: cost, corporate social responsibility and heat. The results of this question are shown in Figure 17. The less common responses are still of particular interest and relate specifically to cases where lower power is critical to the overall aims of the businesses’ products, e.g. aids a products lifespan or meets specific needs such as having been powered using “Extended Power Over Ethernet (UPOE)“ which is roughly 50/60 W.

The interviewee’s covered a broad spectrum of interested parties in the Tango project as previously shown in Figure 12 and Figure 13. To help assess their awareness of power and energy saving efforts in the EU they were asked if they had “heard about the EU's 20% policy objective?” [157] The result of this was that 41.7% had heard of the objective while 58.3% had no knowledge of the objective. This shows promise as a demonstrator of a greater interest in energy reduction, as opposed to merely indicating energy saving was useful.

Figure 17: Reasons for energy saving

A key factor in Figure 17 was cost. This is shown to be a very strong driver within businesses and dominates over other aspects, when asked to rank power savings against other quality of service (QoS) factors. To examine the priority of energy saving the following question was asked: “If given an alternative between similar hardware platforms, would you be attracted by the one displaying lower overall energy consumption?” The answers focused upon cost as the main driving factor as shown in the table below:

Answer Tally

Cost is key 1

Same cost only 4

Not relevant to short or mid-term business goals

1

The dominant trend from this is that energy saving must bring a cost benefit or at the very worst be cost neutral in order for the TANGO project to obtain widespread acceptance.



of 173

Figure 18: Results of several Likert scales for determining the importance of Security, energy/power and performance, to potential end users.

In Figure 18 the result of determining the importance of security, energy/power saving and performance are evaluated. Security is a concern that is always rated highly by respondents. Performance is quite polar in regards to the response, it is either important or not at all. Energy and power saving gets responses in the middle of the scale of either 2 or 3 and in no case does it get ratings that are very high or very low.

The interviewees showed an interest in the Tango project’s software, when they were asked if they were “interested in experimenting with new open-source toolbox for optimization of energy efficiency within heterogeneous architectures?” In all cases they responded positively and said yes they would be willing to test it. In some cases this was qualified based upon their own commercial interests, as shown in the table below:

Form of Agreement Count

Yes 4

Yes – final version 2

Yes - if stable 1

yes - Integrate into own tools 1

No answer given 6

There was an overall was a willingness to test the outputs of TANGO and use its final product. It is therefore important not to squander this when considering the licensing and support that would be made available. The middleware would be accepted, without certification and none said that they would not accept it without certification. In one case this was qualified in that the source code would have to be made available. There was a general consensus that having the source code available was good and very useful. A BSD style licence was favoured by all participants, which did not restrict the usage of code in a fashion such as GPL. GPL licencing produced a 50:50 split in regards to acceptance (out of 8 respondents). This can be further seen to have an impact in the comments that was made. One described GPL as contaminating code, from a commercial perspective so was not suitable. In regards to the respondents who said yes, one said:



of 173

“Anything but GPL v3 for embedded, but otherwise yes”, while another said “yes it’s ok but BSD is better”. A more permissive open source license given the interview was therefore seen as preferential.

In terms of release support 3 said “a free open source distribution with non-contractual support through a community” would be acceptable while 2 more were more inclined towards paid support accepting “a fee-based commercial distribution with contractual support”.

4.3.2.2 Business Goals Synthesis - Pilot Cases from TANGO Industrial Partners

In this section the use case partners indicatate the initial status of their use cases, followed by describing their business interests in these pilot cases. This is then concluded by indicating how they plan to measure power consumption and perform the static benchmarking for energy consumption, as part of the first year goals of the project.

4.3.2.3 DELTATEC

4.3.2.3.1 Initial Status DELTATEC present two use-case to the Tango project. The first is an embedded use-case and the second is a remote processing use-case.

DELTATEC embedded use-case is derived from an application that has been already developed for a customer.

This application has been selected because it combines multiple aspects that are typically present in DELTATEC applications: image processing, parallel processing, distributed clients.

The target platform is based on industrial technologies that DELTATEC uses or will reuse in various projects. Particularly platforms are needed that complies with strong environmental constraints, like temperature range, dust protection or mechanical strengthening. DELTATEC also needs its platforms to be easily deployable. Moreover long lasting availability and reproducibility is often a major requirement. This means that standard PC are generally not suitable.

The remote processing use-case is based on a core technology of the DELTACAST product line, and uses the code of a DELTACAST product. Improving this technology, and opening the range of hardware on which this technology runs will help the DELTACAST subsidiary of DELTATEC to enlarge its commercial offering. Particularly, this application will help proposing the technologies inside DELTATECs high-end systems into lower-end systems, for which the processing tasks will be executed “remotely” or in the cloud.

4.3.2.3.2 DELTATEC’s Business Interest in TANGO Regarding its embedded activities, DELTATEC Business interest in TANGO is first a “time-to-market” consideration. It is expected that a framework like TANGO will provide an environment that helps developing efficiently on heterogeneous multi-core platforms. Thanks to a unified programming model, DELTATEC can expect to develop efficiently the hardware and the software in parallel. It is also expect to reduce the development cycle through the integration of FPGA source code directly inside the application code.

Regarding its broadcast-oriented activities, first interest would be deploying a unified environment for parallel programming. DELTATEC target mainly SMP systems with GPU computing capabilities. The objective is to reduce “time-to-market” constraints and to simplify code maintenance. The second interest is being able to manage efficiently remote computing requests that can be issued asynchronously, while optimising the usage of a shared computing resource. DELTATEC expect TANGO will be able to establish a suitable solution that can be naturally implemented though its task management functions.



of 173

4.3.2.3.3 Approach for energy benchmarking DELTATEC is very interested in all what concerns energy efficiency for its embedded market. In the embedded world, energy efficiency can bring a lot of benefits like mechanical simplification, component cost reduction, improvement of battery autonomy and battery life, simplification of deployment (for instance through the use of “phantom style” power supplies) or reliability improvement.

In the broadcast market, the interest of DELTATEC in energy efficiency is related to the technology. Energy efficiency is very important for some of the DELTACAST products through the reduction of form-factors and through reliability improvements. These two elements are key features in the DELTACAST broadcast market.

4.3.2.4 Bull

4.3.2.4.1 Initial Status A fundamental hurdle to experiment with a new development and deployment framework (such as TANGO) in an HPC context, is the size (in number of code lines) and complexity of modern scientific applications destined for HPC along with the scale of the typical HPC computational system. Applications can be millions of lines of code and are written to use complex algorithms and data structures. Furthermore, large scale infrastructures are not easily available for reproducible experiments. It is in this context that the notion of a mini-application can be explored – a miniapp is a condensed partial implementation of the HPC application of interest (e.g. weather forecast) that highlights one or multiple performance aspects that can affect the parent application’s codebase. These miniapps are written to be easily refactored or modified and still remain representative and useful in their scientific problem domain. In addition they can be executed upon a small-scale platform and still provide a good approximation of the actual execution in large-scale.

Miniapps are designed specifically to capture some key performance issue from a full blown application and to enable its execution in a more controlled and simplified setting at a smaller scale.

Hence, the idea of this use case is to make a selection of miniapps that would best characterize the typical workloads of BULL’s HPC clients and use a small scale platform for the development and deployment phases to evaluate and experiment with the TANGO toolbox.

4.3.2.4.2 Bull’s Business Interest in TANGO Based on the difficulties of experimenting with an HPC framework in real-life conditions the BULL use case will be consisted of different mini-apps workloads submitted upon a testbed platform composed by heterogeneous resources. Each mini-app will be selected to approximate the characteristics of a real application executed upon BULL’s customer’s production HPC platforms. The selection of mini-apps is made in a way to cover as many as business sectors as possible, based on BULL HPC platforms clients. The mini-app use case will enable us to approximate real-scale executions of HPC applications using a small scale testbed platform under realistic conditions.

The goal is to evaluate the energy and performance of those workloads when developed and executed using TANGO toolbox and compare them with runs that are developed and executed without the TANGO toolbox to show the benefit of this new framework.

Hence in this use case, we propose to study and select some of the miniapps of Mantevo project or those proposed by LLNL. For example, a workload of miniapps composed of MiniFE, which is memory sensitive and phdMesh, which is compute bound, reflects the usage of heterogeneous executions as done by BULL clients such as KNMI and CEA. The execution of their simultaneous combination will be more efficiently executed through a support of heterogeneous resources and would be optimally executed through the self-adaptation provided by the TANGO toolbox. The selection of optimal execution parameters along with the runtime adaptation will be possible if the heterogeneity and energy concerns are taken into account during development phase. This is another important aspect taken under consideration by TANGO. However the



of 173

simplicity of considering heterogeneity and power during the development phase will be a critical point for the application engineers in their decision of TANGO adoption or not.

In a similar context, a miniapp such as LULESH or CloverLeaf provide Computational Fluid Dynamics (CFDs) simulation capabilities for a wide variety of scientific and engineering problems that requires the modelling hydrodynamics. They both have been ported to a range of programming models to take advantage of different types of heterogeneous parallel hardware architectures executions. The existence of different versions of programming models of an application may simplify its deployment upon different types of heterogeneous platforms (CUDA for GPU, OpenCL for both GPU and FPGA). In our research for mini-apps we will consider those using programming models adapted for all types of heterogeneous resources treated by TANGO.

Another scenario of mini-apps usage consists of submitting multiple workloads of various types of mini-apps and while evaluating the performance/power trade-offs of the applications evaluate also scheduling characteristics such as system utilization, fragmentation and stretch comparing with runs that are executed without the TANGO toolbox to show the benefit of those new developments. The system scheduling metrics are important evaluation criteria for HPC platform owners and administrators because they show an overall performance of the system. Having a near 100% utilization with low stretch for jobs is ideal, whereas different values can waste resources, result in starvation and can make users unhappy which is not desired.

Finally, the Use Case will show how in a HPC context we can discover the power consumption of a job allocating heterogeneous resources, which will in turn explore the flexibility and finer-grained profiling capabilities of the TANGO toolbox.

Finally in the context of HPC platforms the TANGO toolbox need to place special care to the scalability of the different components in order to allow TANGO to be adopted by different stakeholders that may have very large scale deployment usage scenarios.

4.3.2.4.3 Approach for energy benchmarking Energy consumption is a very important aspect in HPC platforms and probably the most important one in our way to exascale supercomputers. BULL is very interested in improving energy efficiency of the system in all levels, hardware and software.

The testbed platform will provide means of retrieving the power consumption of different components and TANGO should enable these means and consider the power calculations for development, execution and runtime adaptation. Particular energy benchmarks will make sure that the energy consumption is kept as low as possible during the deployment phase. The precision of energy benchmarking is an important issue that has to be guaranteed through the toolbox and specifically experimented and demonstrated through the use case.



of 173

4.4 Architecture

This section introduces the Tango architecture initially with an Overview followed by an in detail examination of each of the components in the Tango architecture. The components are discussed within their layers in turn with an overall description of the workflow of the layer given at the start of each sub-section.

4.4.1 Overview

The TANGO preliminary high level architecture is introduced on a per component basis, as shown in Figure 19. From a high level of abstraction, the architecture is separated into remote processing and management capabilities in the upper layers (above the IoT Network Fabric), which in turn is separated into distinct components that support the standard application deployment model (construct, deploy, run, monitor, adapt) and local processing capabilities in the lowest layer (below the IoT Network Fabric), which illustrates support for secure embedded management of IoT devices and associated I/O.

SDK Layer: This layer is comprised of the IDE component and is a collection of plug-ins that interact to facilitate the modelling, design and construction of applications, in addition to a Runtime Abstraction Layer component that manages application execution. These plug-ins aid in evaluating power consumption of an application during its construction. The plug-ins providea frontend within the IDE as a means for developers to interact with the functionality provided within this component. Lastly, this component enables architecture agnostic deployment of the constructed application, while also maintaining low power consumption awareness. The components in this layer are:

Requirements & Design Modelling Plug-in: aims at guiding the development and configuration of applications to achieve the targeted Quality of Service (QoS), Quality of Protection (QoP), cost of operation and power consumption behaviour. It also aims at exploring how questioning patterns and preference elicitation can guide the refinement process of common patterns of goals and requirements for applications running on heterogeneous parallel architectures. In particular, it is anticipated that different deployment alternatives of an application will lead to different levels of quality, power consumption behaviour and operational cost. The Requirements & Design Modelling tools must therefore help to better understand deployment alternatives in particular situations.

Programming Model Plug-in: supports developers when coding their applications. Although complex applications are written in a sequential fashion without APIs, they are annotated in such a way that they can call the Runtime Abstraction Layer sub-component to be executed in parallel on heterogeneous parallel architectures. At runtime, applications described using the Programming Model, are aware of the power consumption of components implementation. The task-based programming paradigm supported by the programming model is well suited to a wide range of applications such as IoT, CPS and Big Data. In this approach, tasks are annotated by the developer, indicating directionality of the task parameters, and at runtime a task dependence graph is built which inherently describes the parallelism of the application. Heterogeneity is easily handled with this paradigm, since tasks that better fit a given device will be executed there, and locality aspects can also be easily taken into account by the runtime. A hierarchy of task-based programming models will be used in a project, combining coarser grain and finer grain tasks, which will enable on one hand to better capture the high level structure of the application (coarse grain) and the other hand, details of the architecture (finer grain). Different instances of tasks would be available, in such a way that a runtime the best one (in terms of energy or time) can be chosen and executed in the optimum device.



of 173

Smart Device (n)

IDE

Heterogeneous Parallel Device Clusters (1 - n)

Device Supervisor

Se

cu

re G

ate

wa

y

Meta

Scheduler

Device Emulator

Backend Driver

Self-Adaptation Manager

Application Life-cycle

Deployment Engine

Monitor Infrastructure

Code OptimizerR&D ModellingProgramming Model

Runtime Abstraction Layer

Energy Modeller Decision Support

Engine

RRD

Storage

Agents

Hotspot

Identifier

Design

Patterns

Algorithmic Skeletons Code Annotation Library

Workflow Manager

Low Power

Modeller

Power

Profiles

Knowledge

Base

CPU / APU GPGPUFPGA

Cluster

Scheduler

Au

the

ntic

atio

n

Au

tho

riza

tion

Optimisation Engine

Se

cu

re G

ate

wa

y

Embedded Processing

(Microcontroller)

Connectivity

(M2M, BAN, PAN, WAN, ...)

Sensors / Actuators

(Gyroscope, GPS, Temp, ...)

Network Fabric

Environment

System on Chip

IntelCurie

Michigan Micro Mote

SD

KM

idd

lew

are

Fa

bri

c

Figure 19 Proposed reference architecture

Code Optimizer Plug-in: plays an essential role in the reduction of energy consumed by an application. This is achieved through the adaptation of the software development process and by providing software developers the ability to directly understand the energy foot print of the code they write. The proposed novelty of this component is in its generic code based static analysis and energy profiling capabilities (Java, C, C++, etc. above that available in the discipline of mobile computing) that enables the energy assessment of code out-of-band of an application’s normal operation within a developer’s IDE.

The Middleware Layer: Below the IDE, a set of components are responsible for application deployment and handle the placement of an application considering energy models on target heterogeneous parallel architectures. This layer aggregates the tools that are able to assess and predict performance and energy



of 173

consumption of an application. Application level monitoring is also accommodated, in addition to support of self-adaptation for the purpose of making decisions using application level objectives given the current state of the application in question. The components in this layer are:

Application Life Cycle Deployment Engine: this component manages the lifecycle of an application deployed by the IDE. Once a deployment request is received, this component must choose the infrastructure that is most suitable according to various criteria, which include for example: 1) energy constraints/goals that indicate the minimum energy efficiency that is required/desired for the deployment and operation of an application; 2) application performance constraints that indicate the minimum requirements in terms of performance for the application (time-criticality, data location, cost etc.) This will be made possible through the enhanced heterogeneous resources description as implemented within the resource and job management system used, SLURM. The different application needs and criteria will be selected through the interface provided by SLURM. The enhanced SLURM will perform automatic workload execution upon the heterogeneous platform, in addition to managing data (stage-in, stage-out), by applying efficient scheduling techniques between jobs (fair sharing, backfilling, pre-emption, etc.) and by selecting the best-suited resources for each job (based on resources characteristics, network topology, internal node topology, power management, etc.). Moreover, this component’s role is also to optimize the life cycle of an application to ensure its constraints are fulfilled considering: 1) the status of the heterogeneous parallel devices in terms of power consumption and workload; 2) the description of the cluster in terms of platform type, hardware specification and its power consumption profile, and 3) profile of application in terms of how it stresses each of the devices (CPU, memory, network…). Using SLURM’s support for heterogeneous resources, the accounting and profiling of each heterogeneous resource will take place for all jobs.

Monitor Infrastructure: this component is able to monitor the heterogeneous parallel devices (CPU, memory, network…) that are being consumed by a given application by providing historical statistics for device metrics. The monitoring of an application must be performed in terms of power/energy consumed (e.g. Watts that an application requires during a given period of its execution), and performance (e.g. CPU that an application is consuming during a given period of its execution).

Self-Adaptation Manager: This component provides key functionality to manage the entire adaptation strategy applied to applications and Heterogeneous Parallel Devices (HPDs). This entails the dynamic optimisation of: energy efficiency, time-criticality, data movement and cost-effectiveness through continuous feedback to other components within the architecture and a set of architecture specific actuators that enable environmental change. Examples of such actuators could be: redeployment to another HPD, restructuring a workflow task graph or dynamic recompilation. Furthermore, the component provides functionality to guide the deployment of an application to a specific HPD through predictive energy modelling capabilities and polices, defined within a decision support engine, which specify cost constraints via Business Level Objectives (BLOs).

The Fabric Layer: Is divided into two parts, namely above and below the IoT Network Fabric line with the latter illustrating the interaction of the architecture with IoT devices. This layer addresses the heterogeneous parallel devices and their management. The application admission, allocation and management of HPDs are performed through the orchestration of a number of components. Power consumption is monitored, estimated and optimized using translated application level metrics. These metrics are gathered via a monitoring infrastructure and a number of software probes. At runtime HPDs will be continually monitored to give continuous feedback to the Self-Adaptation Manager. This will ensure the TANGO architecture adapts to changes in the current environment and in the demand for energy. Optimizations take into account several approaches, e.g. redeployment to another HPD, dynamic power



of 173

management policies considering heterogeneous execution platforms and application energy models. The components in this Layer are:

Device Supervisor: This component provides scheduling capabilities across devices during application deployment and operation. This covers the scheduling of workloads of both clusters (Macro level, including distributed network and data management) and HPDs (Micro level, including memory hierarchy management). The component essentially realises abstract workload graphs, provided to it by the Application Life-cycle Deployment Engine component, by mapping tasks to appropriate HPDs. Meta-scheduling heuristics manage multiple clusters efficiently, while cluster level heuristics optimise the use of HPD resources and resource sets. Optimisation criteria (such as power consumption) and environment state are provided as input by the Self-Adaptation Manager and Monitoring Infrastructure components respectively.

Device Emulator: This component provides out-of-band application deployment and operation to emulated HPD resources for the purpose of training application power profiles. Emulated HPD resources execute application code while KPIs are monitored. The output of this process calibrates metrics within a power model that is provided to the Self-Adaptation Manager as a power profile, the normalised performance results of as running an application on a specific type or combination of HPD. Emulation of a range of HPDs is realised through a generic backend driver that interfaces to hardware emulators such as QEMU, OpenCL Emulator (ocl-emu) or vendor specific ASIC (FPGA) emulators. The device emulator could also be re-purposed to provide development time debugging capabilities.

Furthermore, a Secure Gateway supports pervasive authentication and authorization, which at the core of the proposed architecture enables both mobility and dynamic security. This protects components and thus applications from unauthorised access, which in turn improves the dependability of the architecture as a whole. The component provides embedded smart devices from the IoT paradigm, secure access to remote processing resources through the network fabric as well as enabling secure management of these devices through the upper layers of the architecture. These smart devices, comprised of a combination of discrete embedded components (providing connectivity, embedded processing, sensors, etc.) or a System on Chip (SoC), sense or actuate on an environment and filter acquired data using limited processing capabilities. After this local processing, data is sent securely over the network fabric for further remote processing on more capable heterogeneous parallel devices, supported by the upper layers of the architecture.

4.4.2 Layer 1 – SDK

In the SDK layer, a collection of plug-ins and components interact to facilitate the modelling, design and construction of applications.

4.4.2.1 Programming Model

This component provides a programming model and development tools to facilitate the implementation of parallel applications for distributed heterogeneous parallel systems.

4.4.2.1.1 Novelty beyond the State of the Art The novelty of this component beyond the State of the Art is focused in facilitating the implementation of complex distributed parallel applications hiding the programming complexity for heterogeneous platforms and enabling an easy deployment and an efficient execution.

4.4.2.1.2 Requirements This component includes features and functionality that fulfil the following technical requirements:

ID Type Short Name Description Part Pri. Task



of 173

R-PM-1 FUN Multi-level

task support

Enable the decomposition of coarse-grain

tasks in a workflow of fine-grain tasks

BSC MAN 3.2

R-PM-2 FUN HPA support Support the definition of tasks for the

different execution environment available in

HPA (SMP, GPUs, FPGAs)

BSC MAN 3.2

R-PM-3 FUN Application

KPI

constraints

Enable the definition of application based KPI

as constraints for guiding the application

execution in the heterogeneous platforms

BSC DES 3.2

R-PM-4 USR Application

development

widgets

Provide graphical widgets and wizards to

facilitate the implementation of applications

and code generation

BSC OPT 4.2

R-PM-5 USR Remote

Application

deployment

Provide a set of tools (scripts, etc.) to

facilitate the remote compilation and

deployment of applications in the

Heterogeneous platforms

BSC OPT 4.2

Table 4: Programming Model Referenced Requirements

4.4.2.1.3 Internal Architecture The Programming Model component situated in the IDE of the TANGO Architecture, facilitates the implementation of parallel distributed application for Heterogeneous platforms. The Programming model proposed for TANGO consist on a combination of different task-based programming models of the Star Superscalar family (COMPSs [158], OmpSs [159]) applied in different levels (platform and node levels). In the platform level, the developer implements the main application workflow defining the coarse-grain task which will be executed in the different remote computing nodes. In the node level, each of the coarse-grain tasks can be decomposed as a workflow of fine grain tasks which will be executed in the different devices of a node (processors, GPUs, FPGAs). The tasks definition with the programming model consists on the annotation of the code to indicate which methods are implementing the tasks and the directionality of the different method parameters, and the application workflows are implemented in a sequential fashion using well-known high level languages such as C/C++, Java or Python. At runtime, the annotated code is executed by means of the runtime system which is in charge of detecting data dependencies between tasks and the inherent parallelism as well as spawning the application execution across the available resources. More details about the runtime are described at Section 4.4.2.4.

4.4.2.1.4 Baseline Technologies The following baseline technologies are used within this component:

Name Description Version

COMPSs Provides coarse-grain task-based programming model for distributed platforms

1.4+

OmpSs Provides a fine-grain task-based programming model for intra-node application execution

15.06+

Eclipse Eclipse is a Java-based open source platform that allows a software developer to create a customized development environment (IDE) from plug-in components built by Eclipse members. Eclipse is managed and directed by the Eclipse.org Consortium.

4.4.0+ (Luna Release)

Table 5: Programming Model Baseline Technology



of 173

4.4.2.1.5 Component Diagram The TANGO Programming Model is implemented as an Eclipse plug-in in order to facilitate the code annotation, building and deployment of applications developed with the Programming Model in the same integrated environment as the other development tools (editors, compilers, etc.). This plug-in is composed, as depicted in Figure 20, by a set of wizards and actions to enable the users to perform the definition of the application tasks in an easy way as well as automating the processes of building and deployment the implemented application.

Build & Deployment Actions

<<sub-component>>

PM Plug-in<<component>>

Task Definition Wizard

<<sub-component>>

Remote Call

Extended COMPSsCode Annotation

<<library>>

Extended OmpSs Code Annotation

<<library>>

Application Deployment Engine

<<component>>

Figure 20: Programming Model Component Diagram

4.4.2.1.6 Sequence Diagrams The following figures show the different components and sub-components interact to each other to achieve the required functionalities for the Programming Model. On the one hand, Figure 28 shows how application developers interact with the Task Definition Wizards of Programming Model plug-in to define coarse and fine grain tasks and assign target devices and constraints to them according to the code annotations defined in the COMPSs and OmpSs libraries. On the other hand, Figure 22 shows the sequence diagram describing the process of building and deploying an application developed with the TANGO Programming Model.



of 173

PM Plugin

Developer

Extended COMPSs Code Annotations

Task Definition Wizard

runCGTaskDefinition(TaskDescription)

Extended OMPSsCode Annotations

runFGTaskDefinition(TaskDescription)

taskCodeAnnotation()

alt

[fine-grain task]

[coarse-grain task]

runTargetDeviceDefinition(task, device)

targetDeviceCodeAnnotation()

runConstraintDefinition(task, constraint)

Figure 21: Programming Model Application Development Sequence Diagram



of 173

PM Plugin

Developer

Extended COMPSs Code Annotations

Build and DeploymentActions

buildCoarseGrainPart


Link libraries

buildFineGrainPart

deployApplication(projectID)


cross-compile

cross-compile

Link libraries

Figure 22: Programming Model Building and Deployment sequence diagram

4.4.2.1.7 Deployment Diagram

:Developer’s PC<<physical machine>>

{OS=linux}

:Eclipse<<IDE>>

Code Optimizer Plug-in<<Plug-in>>

Requirements & Design Modelling<<Plug-in>>

Programming Model<<Plug-in>>

Extended COMPSs<<library>>

Extended OmpSs<<library>>

Figure 23: Programming Model Deployment Diagram

Like the other Plug-ins of the TANGO Architecture the Programming Model plug-in resides inside the Eclipse IDE. In addition to the plug-in the Developers machine should have the COMPSs and OmpSs libraries for enabling the code annotation and application compilation.

4.4.2.1.8 External Interface It should be noted that this component is not invoked by another components of the TANGO Architecture and thus does not include an external interface. It is however used as part of the tool chain within the IDE workflow. The following table describe the actions which a user can perform with the Programming Model.



of 173

Method Input Output Description

defineCorseGrainTask Class file (String),

method name

(String), method

parameters (Array of

(name-type-

direction tuples)

Task definition written

down in the coarse-grain

interface file.

Method to define a coarse-grain

task with the Programming Model.

defineFineGrainTask Class file (String),

method name

(String), method

parameters (Array of

(name-type-

direction tuples)

Task definition directive

written down in the class

file.

Method to define a fine-grain task

with the Programming Model.

defineTargetDevice Class file (String),

method signature

(String), target device

(string)

Target device directive

written down in the class

file.

Method to define a specific target

device to a fine-grain task with the

Programming Model.

defineConstraint Class file (String),

method signature

(String), constrain

name, constraint

value.

Constraint annotation

added in the task

definition.

Method to define a constraint for a

task with the Programming Model.

deployApplication Project

identifier(String)

Success (Boolean) Build and deploy an application

developed with the Programming

Model in the computing

environment.

Table 6: Programming Model API

4.4.2.2 Requirements & Design Modelling Plug-in

The TANGO Requirements and Design tooling is for supporting requirements and design effort to later obtain targeted time and energy performance when operating a software application. It aims at guiding application development team in making appropriate design time decisions on:

What heterogeneous hardware devices to consider for later operating the various software components of an application

How to break down the application in various software components to best exploit the potential of heterogeneous hardware devices knowing that workloads and even available processing power may vary in anticipated (or not) ways at operation time. This variability in workloads and processing power availability at runtime means that the complete mapping of software components on to hardware devices to process them cannot be fully frozen at design-time.

Part of the guidance provided by R&D-T should therefore assists the software development team in making decision that will leave an appropriate level of flexibility when mapping software components on to hardware devices to maintain targeted time and energy performance.

On the other hand, factors such as security, data privacy, network or processing performance bottleneck or support and maintenance cost may force a development team to eliminate part of the flexibility by freezing the mapping of certain software components to execute on specific kind of hardware devices at design-time. This in such cases that R&D-T must quantify information on what hardware will provide the most



of 173

adequate time and energy performance for the part of the mapping solution frozen at design time. Clearly, this information must be available as early as possible in the project lifecycle i.e., based on early software and system architecture information and prior to software application being fully implemented.

Different situations will be study to guarantee that R&D-T is broadly applicable to various development projects. Notably, certain decisions may have already be taken at organisation or project level hence more or less details can be provided in the system and application models and even partial pieces of software code may also be available for instance from reuse. Likewise, the type of hardware to use for operating part of the application may also be frozen or at least a specific list of hardware vendor devices to consider maybe known at requirements and design time. Such decision will often have an impact on the development-time and runtime technologies —languages and associated development tooling and runtime framework— to use.

The type of situations to consider for shape R&D-T will come from the use cases provided by the Industry partners Bull and DELTATEC. However to guarantee a broad applicability R&D-T, first external people such as interviewees from WP2 or EAB members will be given the opportunity to provide specify design time knowledge and decisions for which they would seek assistance from R&D-T and second, the research team building R&D-T may also stimulate new interest from Bull and DELTATEC by suggest additional questions that R&D-T could also help address.

Although the final details on how to implement R&D-T is part of the research effort, it is already possible to partition the tooling in three type of tools:

R&D-Modelling – R&D-M: Tools for modelling software tasks, data flow processing and heterogeneous hardware elements used in an execution environment including programmable devices such as FPGA. This tool will provide knowledge on software details and hardware consideration that the development team will relying on when building their application. An agile approach will be of interest where knowledge is iteratively introduced in the various models making it possible to answer questions more accurately as well as to address new kind of questions as additional knowledge is introduced in the software architecture and design.

Design Time Characteriser - DTC: Tools for measuring and characterising the runtime behaviour of an application or parts of an application at design and construction time. Iteratively, the software architecture will identify algorithms and even actual code to be used for implementing certain software components. In the earlier phase where little architecture knowledge details are provided, DTC will only provide rough estimates. Later, refined design models and algorithmic information may be useful to increase accuracy of DTC provided results. For instance, Rapid prototyping may rely on device emulator or even hardware-in-the-loop execution.

Design Time Optimiser – DTO: Tools to guide decisions regarding what part of the solution to fix at design time and what parts to leave open for deployment time or runtime. DTC will provide several types of characterisation on overall time, on idle time, on power, on energy, on heat generation for various possible software to hardware mappings. In the end, a development team may want to explore different ways to specify the trade-off between time and energy and verify if their design-time decisions hold true for these various trade-off function. Combining and ranking all the results from DTC can quickly become an overwhelming manual effort in particular as additional details are added to models, it is therefore the role of DTO to automate this exploration and ranking. The goal of DTO is to provide an optimal answer (or in case of very large problem nearly optimal answer) to design time questions. Alternatively, in some cases, DTO will also be capable to provide a ranking of various potential answers according to their degree of optimality.

4.4.2.2.1 Novelty beyond the State of the Art Current design time tool chains are either very specific to a particular heterogeneous platform in many case even bound to a specific vendor (emulators for CUDA or simulators/prototyping systems for FPGA vendors such as Xilinx or Altera) or modelling tools are completely generic such as UML/SysML, Matlab or Simulink



of 173

hence requiring high modelling expertise from the development team on what should be modelled from the hardware and software.

Furthermore, current tools often provide an answer for a given fixed design but do not allow searching for a more optimal solution in a broad design space. When an environment can include heterogeneity at the level of chips, boards, blades and server clusters as well as in the I/O transfer technology and memory access speed, it becomes difficult to determine how to break down processing of tasks or of a data flow to best exploit a pre-defined broad set of targeted hardware architectures.

The three types of tools mentioned above will collaborate:

First, to provide means to model hardware capabilities (hierarchical representation of hardware devices and their specifications) and software processing needs (tasks or data-flow along with representative datasets).

Second, to obtain a quantitative characterisation for executing tasks of data flow processing on particular hardware targets. This characterisation may not need to be fully accurate but to provide a relative information that can be relied on to guide design time decision.

Third, to search for a few optimal design alternatives worth implementing to fixed part of a solution at design time while leaving opportunities to adapt the runtime execution of adequate software processing on various type of hardware targets whose availability may vary in time.

Together, these tools will provide the means for a development team to study the most adequate solution space in term of optimising design-time aspects of an architecture while leaving enough runtime decisions to best exploit the underlying hardware infrastructure at hand at a given moment in time. Dynamic scheduling is of particular interest when several programs execute on a shared heterogeneous hardware infrastructure.



R&D-M-1 FUN Hardware Characteristic Modelling

R&D-M shall provide means to express important hardware characteristics that impact time performance and energy consumption (for instance, aspects related to hardware processing speed, power consumption, I/O transfer speed, Memory Access speed)

CET MAN 3.1

R&D-M-2 FUN Hardware Processing Categorisation

R&D-M shall provide means to express information on categories of processing for which a hardware target is fit. For instance, "single program/instructions, multiple data" for all CPU+GPU platforms, Graphics rendering for GPU on client-side, "custom data-flow filtering" for all FPGA platforms, etc.

CET MAN 3.1



of 173

R&D-M-3 FUN R&D-M shall provide means to model software processing aspects of different kinds, for instance task -based, data-flow based, or others paradigm if needed.

CET MAN 3.1

R&D-M-4 FUN R&D-M shall provide means to express opportunities to exploit control and data dependencies and independency within a software processing unit.

CET MAN 3.1

R&D-M-5 FUN R&D-M shall provide means to associate representative data workloads to consider during characterisation

CET MAN 3.1

DTC-1 FUN DTC shall provide a systematic methodology relying on existing tools (emulator, simulator) to characterise the execution of software processing unit (including I/O transfer and program component deployment)

CET MAN 3.1

DTC-2 FUN DTC characterisation shall provide approximate quantitative values of the execution of a software processing unit (including I/O transfer task and task of program component deployment – and compilation if needed) on hardware with given characteristics. Characterisation metrics considered shall include power and energy as well as time performance.

CET MAN 3.1

DTO-1 FUN DTO shall provide a ranking of design alternatives regarding the optimality of the placement or scheduling of software processing unit on heterogeneous hardware node in terms of a utility function based on time performance and power/energy consumption

CET MAN 3.1

DTO-2 FUN DTO shall be able to choose among several mapping of a given SW onto a given HW in order to maximize the optimality, as stated by the utility function

CET MAN 3.1



of 173

DTO-3 FUN DTO shall be able to choose among several HW for mapping a given SW onto it in order to maximize the optimality, as stated by the utility function

CET MAN 3.1

DTO-4 FUN DTO shall be able to answer speculative questions related to the targeted efficiency of a future implementation of some part of the SW and its impact on the overall efficiency on a given HW

CET MAN 3.1

DTO-5 FUN DTO shall be able to address uncertainty in values used for hardware and software characterisation.

CET MAN 3.1

Table 7: Requirements and Design Tooling Referenced Requirements

4.4.2.2.3 Internal Architecture Requirements and Design Tooling is composed of tools to be used at design-time by the development team to explore design alternatives. Thanks to the developed tools, the development team shall have the ability to explore what parts of a software solution should be fixed and implemented at development time and what parts shall be implemented in a way to allow for deployment and runtime adaptation so that operational behaviour can maintain targeted quality criteria while minimising energy consumption.

Requirements and Design Tooling (R&D-T) is composed of the following tools:

Tools for modelling software tasks, data flow processing as well as heterogeneous hardware elements used in an execution environment including programmable devices such as FPGA (R&D-Modelling – R&D-M)

Tools for measuring and characterising the execution of software tasks with different representative datasets on various targeted heterogeneous hardware architectures (Design Time Characteriser - DTC)

Tools to guide decisions regarding what part of the solution to fix at design time and what parts to leave open for deployment time or runtime (Design Time Optimiser - DTO)

The set of tools mentioned above must allow for an iterative refinement of the design space solution explored. In the early iteration of a development, requirements and design models may remain high level. In such cases, the characterisation may not provide accurate absolute values however, it must be able to provide a valid relative information for comparing different high-level design solutions. In particular, it must determine if two solutions are roughly equivalent or if one is better than the other when executed on a given set of heterogeneous architecture devices. In the initial phases, R&D-T will be used to filter out poorer solutions.

In later phases of the development, software and hardware models will iteratively provide additional details for part of the solution identify of interested for better exploiting heterogeneity of hardware. Accordingly, design time characterisation methodology and tools will involve more accurate models for estimating possible value ranges, for instance, MathLab and Simulink models or even vendor specific device emulators may be used and associated to finer model elements. The main goal of the iterative design approach is to provide the necessary information to the development team so it can decide the worthiness to further improve a current software solution or not, in order to exploit a given set of heterogeneous hardware devices.



of 173



OscaR framework Oscar include a Constraint Programming engine that will be used to implement DTO requirements.

Constraint programming is an exhaustive symbolic search approach. It explores all possibilities without explicitly enumerating them. Rather, it can evaluate sets of potential solutions (here, the potential mapping) and decide if the set includes any relevant mapping or not, thus drastically reducing the set of possibilities, and the run time of the mapping optimization task

Optimisation on mapping resembles a bin packing problem with additional constraints. It can be tackled for instance using constraint programming approach mentioned above, in particular, since the size of the problem will be relatively small.

2.1 (or newer)

Papyrus UML and SysML editor working as Eclipse plugin. In Y1, R&D-M will be implemented through UML or SysML profiles.

Acceleo Model to Text framework that can work as stand-alone technology or as an Eclipse plugin. Acceleo will be used to communicate design model knowledge to DTC

MangoDB NoSQL Database use to store design knowledge information. It is used to store and share data between DTC and DTO

Table 8: Requirement &Design Tooling (R&D-T) Baseline Technology

4.4.2.2.5 Use Case Diagram The Use Case Diagram in this section presents actions performed by application developers at design-time.

Figure 24: Design-time Use cases



of 173

A comment highlights the relationship between each use case and the tooling components mentioned earlier, namely, RDM, DTC and DTO.

4.4.2.2.6 Static Component Diagram This section presents a static view of RDT-related components and their dependencies.

Figure 25: Component Dependency diagram

The component in white are the base line technologies used to build RDT tooling. All tools to be developed or augmented during TANGO are components filled in blue. Finally, the Green component represent the input to be provided by the application development team, in particular, design-time knowledge on software and hardware elements to consider, that is, for DTC and DTO to characterize and explore for optimal design alternatives respectively.

4.4.2.2.7 Component Interaction Diagram This section presents a view of the component interactions. Subsequently, a short description explains the order in which interactions will take place.

Figure 26: Component Interaction Diagram.

This diagram shows that RDT is a top level component that will orchestrate interaction and information flow across other tools.

RDT implemented as an Eclipse plugin. It will require input where information on the location of the model file, the Acceleo extraction script, links to DTC and DTO as well as a reference to the Mongo database that will store the design time knowledge acquired throughout the development lifecycle of an application prior to it being transited to operation in production.



of 173

At this time of the TANGO project, a list providing an explanation on the flow of interactions as well as what each component will be performing is more appropriate than a sequence diagram.

Let us remark that the modelling an application software and hardware is not represented in the interaction diagram above. It is assumed to have already taken place. In order to achieve this first step, the Application development team will have initial installed Eclipse, the Papyrus UML designing plugin, and the TANGO-Profile (part of the RDM tools). It will then be possible for the application development team to model known design knowledge at a given point in time. The TANGO-Profile will augment standard UML (or SysML) element with TANGO provided stereotypes. These stereotype will provide the necessary interface to request additional metadata to attach to the model elements. Although determining the exact metadata to use is part of the research, we may already state that on hardware elements, information related to clock speed or data throughput speed, ability to scale energy consumption will be provided among others. For software elements, information on language and technologies used or to be used for their implementation or earlier in the design phase, maybe only higher-level information on the anticipated parallelization potential will be provided. The resulting model with the added metadata information will be save in a file either on a development team member’s local machine or in a shared file system with other team members. In the interaction flow below it is assumed that the property in myApplicationModel in Figure 26 point to that file location.

Subsequently, the RDT plugin will make component interact as follows.

First, the information pointing to myTANGODesignKnowledgeDB (the location of the MongoDB) will be retrieved by the RDT plugin as it is needed as input for the other tools that will eventually read additional input information from it and in all cases, write the results back to it.

Second, the reference to the application model file (myApplicationModel) is used in combination with the Acceleo script, myModelDataExtractor. In particular, the Acceleo script has been built retrieve the UML/SysML data and the associated TANGO metadata provided in the Application model. Furthermore the Acceleo script myModelDataExtractor will write its results to the myTANGODesignKnowledgeDB in a meta-model agreed with the other component DTC and DTO.

Third, the application development team may then decide to invoke DTO to further characterise certain properties of the model or if the desired characterisation data is available, the team may then invoke DTO to obtain answer highlighted in the Use case Diagarm in Figure 24. Concretely, RDT plugin will respectively use the reference to myDTC or myDTO to launch the appropriate action. Also, the team will be able to invoke a selected characterisation or optimisation among the several alternative offered. This selection will be implemented as part of the user interface of the RDT plugin.

4.4.2.2.8 Sequence Diagram This level of details is unneeded at this time.

4.4.2.2.9 Deployment Diagram Eclipse and the required RDT plugins and components will most likely be installed on each of the development team’s workstation. Other setup are possible but as explained in Section 4.4.2.2.10 below. RDT results are meant to for the development team to take decision at design time and will not need to be communicated to other TANGO components used by the runtime framework.

4.4.2.2.10 External Interface At this point, the RDT is expected to live in a close loop with the application development team during design time. Its goal is to help the team make design time decision. Consequently, at this point, it is not anticipated that RDT information will be communicated to other runtime components of the TANGO framework.



of 173

4.4.2.3 Code Optimizer Plug-in

The Code Optimiser Plugin (COP) plays an essential role in the reduction of energy consumed by an application. This is achieved through the adaptation of the software development process and by providing software developers the ability to directly understand the energy foot print of the code they write.

4.4.2.3.1 Novelty beyond the State of the Art The proposed novelty beyond the SotA of this component is in its generic java profiling capabilities (above that available in the discipline of mobile computing) that enables the energy assessment of code out-of-band of an application’s normal operation within a developer’s IDE.



R-COP-

1

FUN Static Code

Analysis

Functionality to enable the detection of

energy consumption hot spots.

Satisfaction criteria: Given any unit test

the COP component will provide visual

block by block estimations of energy

consumption validating this output via

offline energy model calibration.

ULE MAN T3.1,

T3.2

R-COP-

2

FUN Runtime

energy

profiling

Functionality providing the ability to run

an application outside of normal

operation and ascertain its power

consumption through the translation of

profiling performance metrics via an

energy model.

Satisfaction criteria: Given a Java class,

the COP component will provide power

consumption metrics over the duration of

execution.

ULE MAN T3.1,

T3.2

Table 9: COP Referenced Requirements

4.4.2.3.3 Internal Architecture The COP component situated in the IDE of the TANGO Architecture, provides functionality to indicate to a developer where the most energy consumption is being consumed within application during development and construction time. This functionality is necessary so enable an agile and iterative or self-adaptive approach to green software engineering. This functionality is achieved through the profiling of code to ascertain resource consumption and translate via an energy model this consumption to energy and power consumption metrics. It should be noted that this component does not interact with other components of the TANGO Architecture and thus does not include a sequence diagram or external interface. It is however used as part of the tool chain within the IDE workflow.



JVM Monitor [160] JVM Monitor is a Java profiler integrated with Eclipse to monitor CPU, threads and memory usage of Java applications.

3.8



of 173

JouleUnit [161] The JouleUnit workbench is Eclipse-based additional tooling for the JouleUnit framework, especially for the visualization of JouleUnit profiling and testing results.

0.2

Eclipse [162] Eclipse is a Java-based open source platform that allows a software developer to create a customized development environment (IDE) from plug-in components built by Eclipse members. Eclipse is managed and directed by the Eclipse.org Consortium.

4.4.0+ (Luna Release)

Java SE Development Kit [163]

The JDK is a development environment for building applications, applets, and components using the Java programming language. The JDK includes tools useful for developing and testing programs written in the Java programming language and running on the Java platform.

7+

Table 10: COP Baseline Technology

4.4.2.3.5 Component Diagram

:Eclipe<<IDE>>

Energy Profiler<<sub-component>>

Hot Spot Identifier<<sub-component>>

Code Optimiser Plug-in<<component>>

Java API

… other Plug-ins

JouleUnit<<component>>

JVMMonitor<<component>>

Java API

Figure 27: COP Component Diagram

The COP component is comprised of a single Eclipse plugin that contains two distinct sub-components: the Energy Profiler and the Hot Spot Identifier. The Energy Profiler, reusing and adapting functionality of the JVM Monitor, provides the ability to run an application out-of-band and ascertain its resource consumption (i.e. CPU time) and correlate this to its estimated energy consumption. The Hot Spot Identifier, through reusing and adapting the capabilities of JouleUnit, will enable offline calibration of performance metrics to energy consumption providing the ability to associate a block of code with predicted energy consumption.



of 173

4.4.2.3.6 Sequence Diagram

Alt.

COP : PaaS Component

Analyise Application

return : results

SaaS Developer

Joule Unit :Base Line Tech.

JVM Monitor :Base Line Tech.

profileApplication(binary)

return : hotspots

return : energyEstimate

profileWorkflow(jUnit)

Energy Modeller :Sub Component

attributeEnergy(cpuTime, cpuProfile)

return : energy

attributeEnergy(cpuTime, cpuProfile)

return : energy

Figure 28: Sequence Diagram of internal functionality of the Code Optimizer Plug-in

Figure 28 describes the internal interactions of the COP Plug-in where two alternative uses of the component are illustrated. The first provides the ability to profile an application, to ascertain per line of code energy consumption metrics enabling a ranked list of energy hotspots to be returned as a result set to the Developer for evaluation. The second describes the execution of JUnit based workflows where total energy consumption of specific elements of the application are calculated using an energy model that again attributes CPU Time to energy consumption.


:Developer’s PC<<physical machine>>

{OS=Any}

:Eclipse<<IDE>>

Code Optimizer Plug-in<<Plug-in>>

Requirements & Design Modelling<<Plug-in>>

Programming Model<<Plug-in>>



of 173

Figure 29: COP Deployment Diagram

Like the other Plug-ins of the TANGO Architecture the COP plug-in resides inside the Eclipse IDE that is agnostic of the underlying operating system.

4.4.2.4 Runtime Abstraction Layer

This component provides a transparent and efficient execution of parallel applications in distributed heterogeneous platforms.

4.4.2.4.1 Novelty beyond the State of the Art The novelty of this component beyond the State of the Art is focused in hiding the complexity of executing applications in heterogeneous platforms (data transfers, execution management and data synchronization, in detecting the inherent application parallelism by performing data dependency analysis and scheduling the task execution in the available resource to achieve and efficient application execution according to different key performance indicators.



R-RT-1 FUN Multi-level task execution support

Enable the execution of application with coarse-grain tasks which contains a workflow of fine-grain tasks

BSC MAN 3.2

R-RT-2 FUN HPA execution support

Support the execution of tasks for the different execution environment available in HPA (SMP, GPUs, FPGAs)

BSC MAN 3.2

R-RT-3 FUN KPI guided execution

Enable the scheduling of tasks to available resources based on the monitoring of application based KPI

BSC MAN Y2-Y3

R-RT-4 FUN Self-Adaptation support

Enable the application reconfiguration to adapt to application changes

BSC MAN Y2-Y3

Table 11: Runtime Referenced Requirements

4.4.2.4.3 Internal Architecture The Runtime Abstraction Layer is the component which enable the abstraction of the the underlying execution platform to the application and the parallelization complexity. Based on the Programming Model directives the runtime environment is capable to detect the inherent parallelism on the application and transparently and efficiently manage the application execution in distributed heterogeneous computing platforms, including the data management between the distributed computing nodes and the between de different computing devices within a node.

This component is deployed with the application in the computing nodes and is in charge of performing the main interaction with lower layer components of the TANGO architecture.



COMPSs runtime Provides the runtime environment for the efficient execution of coarse-grain tasks in distributed

1.4+



of 173

platforms

OmpSs runtime Provides the runtime environment for the efficient execution of fine-grain tasks in the heterogeneous node devices

14.06+

Table 12: Runtime Baseline Technology

4.4.2.4.5 Component Diagram The following figure depicts the component diagram of the Runtime Abstraction Layer. It is mainly composed by two runtime levels: The Platform-level runtime and the Node-level runtime. The Platform-level runtime is in charge of detecting the inherent parallelism between the coarse-grain tasks and the efficient execution on the distributed computing platform, while the Node-level runtime is doing the same for the fine-grain tasks and their execution in the different heterogeneous devices available in a computing node. Due to the Platform-level runtime is focused on distributed platforms, it is composed by two sub-components: The master and worker parts. On the one hand, the master part is a library which is executed together with the main workflow of the application in the master node. It performs the data dependency analysis between the coarse-grain task invocations and coordinates the overall execution between the different worker nodes, and in case of unexpected events or performance degradations appear during execution it interacts with lower level TANGO components to react and adapt the execution. On the other hand, the worker part is a daemon which is started in the worker nodes when the main application starts, and its main task is managing the execution of the coarse-grain tasks. The execution of this coarse-grain tasks is managed by the Node-level runtime.

Platform-level Worker Runtime

<<sub-component>>

Platform-level Runtime

<<component>>

Platform-level Master Runtime

<<sub-component>>

Remote Call

Application Lifecycle Deployment Engine

<<component>>

Remote Call

Remote Call

Node-level Runtime

<<component>>

Runtime API Call

Main workflow<<coarse-grain tasks>>

Coarse-grain Task<<Fine-grain workflow>>

Fine-grain tasks

Runtime API Call

<<runs>>

<<runs>>

Self-adaptation Manager

<<component>>

Remote Call

Figure 30: Runtime Component Diagram

4.4.2.4.6 Sequence Diagrams The following figures show the interactions between the User application and the different runtime components and those runtime components with other TANGO components in order to support the required functionalities. More in detail, Figure 28 depicts the interactions between components occurred during the application execution and Figure 3 depicts the interactions between the runtime components and other TANGO components to adapt the resources configuration the application execution.



of 173

Platform-levelMaster Runtime

startRuntime(configuration)

Return data

User Application:Coarse-grain task -

fine-grain workflow

Platform-levelWorker Runtime

runTask(TaskDesc)

Node-level Runtime

run

startRuntime

notifyEnd(task)

User Application:Main Coarse-Grain

workflow

User Application:Fine-grain task

Application LifecycleDeployment Engine

getResources(AppId)

executeTask(taskDesc)

addTaskInDAG(taskDesc)

loop

for dependency-free coarse-grain tasks

getData(dataReference)

waitUntilDataAvailable()

loop

for all coarse-grain tasks in main workflow

loop

for fine-grain tasks in coarse-grain task

executeTask(taskDesc)

addTaskInDAG(taskDesc)

getData(dataReference)

waitUntilAvailable()

Return data

run

requestData(dataReference)

Return data

stopRuntime()

loop

for dependency-free fine-grain tasks

StartWorker()loop

for each assignedresource

stopWorker()loop

for each assignedresource

Figure 31: Runtime Operation Sequence Diagram during Application Execution

Platform-levelMaster Runtime

Platform-levelWorker Runtime

Node-level Runtime

Self-adaptationManager

resourceReconfigurations

loop

while application is running

applyChange(reconfiguration)applyChange(resourceReconfiguration)

loop

for each resourcechange

checkKPIs

Figure 32: Runtime Operation Sequence Diagram during self-adaptation



of 173

4.4.2.4.7 Deployment Diagram The runtime libraries are deployed together with the user application in the compute nodes as shown in following figure. For simplicity, we have depicted the master and worker parts of the application in separated compute nodes. However, the master’s compute node can also host the worker’s components in order to maximize the resource usage. In this case, the master worker node will execute the main application and part of application tasks.

:Worker<<Compute Node>>

:Master<<Compute Node>>

Platform-level Runtime<<Master-library>>

User application <<Coarse-grain workflow>> User application

<<Coarse-grain tasks/fine-grain workflow>>

Platform-level Runtime<<Worker-libraries>>

Node-level Runtime <<library>>

User application<<fine-grain tasks>>

:Worker<<Compute Node>>

User application<<Coarse-grain tasks/fine-grain workflow>>

Platform-level Runtime<<Worker-libraries>>

Node-level Runtime <<library>>

User application<<fine-grain tasks>>

Figure 33: Runtime Deployment Diagram

4.4.2.4.8 External Interface As depicted in previous diagrams, the Runtime components is not directly called by another TANGO component. However, in the building phase, the PM model code annotation introduced in the user application are converted to a set of calling the following API to initialize and stop the runtime and managing tasks executions and data synchronizations.


startRuntime Configuration file void Method to start the runtime

environment

executeTask Task description void Method to submit the

asynchronous execution of a task

and in the runtime

getData Data reference data Waits until the referenced data is

generated and returns it to be used

in the main application code.

stopRuntime - void Method to stop the runtime

environment

Table 13: Runtime API



of 173

4.4.3 Layer 2 – Middleware

The application deployment layer consists of a set of components to handle the placement of an application considering energy models on target heterogeneous parallel architectures. This layer aggregates the tools that are able to assess and predict performance and energy consumption of an application. Application level monitoring is also accommodated, in addition to support of self-adaptation for the purpose of making decisions using application level objectives given the current state of the application in question.

4.4.3.1 Application Lifecycle Deployment Engine

This component is responsible for the workload scheduling and the management of the application life-cycle while it is executed. First of all the deployment request passes through a secure gateway for authentication and authorization purposes. Once the permission is granted the deployment request is received within the main part of the ALDE. The workload scheduling component of ALDE will try to make an optimal decision on how, when and where the application will be started based on inputs of the Device Supervisor. While the application is executed the application life-cycle manager component of ALDE can provide adaptations and reconfigurations based on inputs of the self-adaptation manager.

4.4.3.1.1 Novelty beyond the State of the Art The novelty of the Application Life-cycle Deployment Engine is derived from the fact that it is responsible for the scheduling of workloads along with the management of the application life-cycle and control of execution. It takes into account static workload requirements as given by the user on the application submission and it can also deal with dynamic adaptation of the application based on the inputs of the self-adaptation manager.



R-ALDE-1 FUN ALDE heterogeneity

Support of heterogeneous allocations BULL MAN 4.4

R-ALDE-2 FUN ALDE scheduling self-adaptation

Enable self-adaptation upon heterogeneous allocations

BULL MAN 4.5

Table 14: Application Life-cycle Deployment Engine Referenced Requirements

4.4.3.1.3 Internal Architecture This component of Application Life-cycle Deployment Engine in conjunction with the Device Supervisor provide an important part of the TANGO toolbox. They are responsible for the actual application deployment upon the platforms. They need information from different components in order to enable the submission of an application fulfilling its needs. The Application Life-cycle Deployment Engine is the software that will initially get the requirements for the application execution. Based on the information returned from the Device Supervisor for the HPDs allocation state it can provide workload scheduling in order to prioritize and execute workloads in efficient manners. The secure gateway is responsible to allow only the users having the right permissions to submit jobs, follow their execution and collect results. Based on input given by the self-adaptation manager and the monitoring infrastructure adaptations and reconfigurations may be triggered and the DS will pass these info to the HDPs.




of 173


Slurm This is used for the workload scheduling, resource selection and task placement of applications upon the heterogeneous parallel devices.

16.05

Munge This is used for authentication through creation and validation of credentials on the secure gateway. It is directly used by Slurm.

0.5.12

Table 15: Application Life-cycle Deployment Engine Baseline Technology


Figure 34: Application Life-cycle Deployment Engine Component Diagram

The primary purpose of the Application Life-cycle Deployment Engine is to perform workload scheduling and application life-cycle management. It performs this through its interactions with the following major components:

Secure Gateway: This component provides the authentication and authorization of user and examines whether he/she has the right privileges for the submission.

Self-Adaptation Manager: This provides actuators that may be used to adapt an application deployed upon a platform in order to respect QOS goals. The self-adaptation communicates directly with the application life-cycle management of the ALDE.

Device Supervisor: This component will return the resource management details to the ALDE and will enable the workload scheduler to take the most efficient decisions. Furthermore when the application life-cycle manager issues a reconfiguration action it is responsible to effectuate the action and continue execution.



of 173


Figure 35: Application Life-cycle Deployment Engine Sequence Diagram

The Application Life-cycle Deployment Engine is the primary means of deciding when a workload will be executed and through a close application life-cycle management it may modify dynamically its execution to improve some factors of it..

It is directly related with the Device Supervisor. When the application is submitted information are needed for the state of the clusters and resources (HPDs).

Once the application is deployed there can be a decision for adaptation and reconfiguration based on results of monitoring and internal calculations on the self-adaptation manager. If reconfigurations are needed the adaptation is demanded by the self-adaptation manager and the particular actions to be taken are directed by the Application Life-cycle Deployment Engine to device supervisor.



of 173


Figure 36: Application Life-cycle Deployment Engine Deployment Diagram

The Application Life-cycle Deployment Engine will be deployed on the same physical host with the Device Supervisor. It will communicate with one or multiple user login nodes through a secure gateway to guarantee authentication and authorization.

4.4.3.1.8 External Interface Method Input Output Description

Application Submission Application

Deployment ID

Outcome of Request This issues a workload scheduling task

Application Reconfiguration

Application

Type of Adaptation

None This allows the ALDE to provide application adaptation along with hardware reconfigurations to adapt the application execution

Table 16: Application Life-cycle Deployment Engine Manager API

4.4.3.2 Self-Adaptation Manager

The Self-Adaption Manager (SAM) plays an essential role in maintaining energy, performance and goals of an application at runtime. It supports self-adaption in the middleware. This is achieved through the careful consideration of violations in service quality and the actuators that can be utilised to perform self-adaptation.

4.4.3.2.1 Novelty beyond the State of the Art The novelty of the self-adaption manager is derived from it being at the heart of the self-adaptation process, in which it focuses upon both macro and micro aspects of an applications deployment upon heterogeneous parallel architectures.



R-SAM-1 FUN QoS Monitor Create a QoS monitor sub-component ULE MAN 3.1

R-SAM-2 FUN QoS Rules Create a set of QoS rules for applications ULE MAN 3.2

Table 17: Self-Adaptation Manager Referenced Requirements



of 173

4.4.3.2.3 Internal Architecture The self-adaptation manager is the key component in the middleware for at runtime adaptation. The self-adaptation manager works as the main component for deciding on when to perform adaption in the TANGO architecture. It works with other components to invoke adaptation (actuators) in order to perform self-adaptation of applications at runtime.

The self-adaptation manger achieves adaptive response through enforcing QoS goals which consider aspects such as: energy efficiency and performance. This enforcement is to be performed in a holistic fashion considering the business rules and objectives of a middleware provider.



ActiveMQ This is used for the interactions the Quality of Service monitoring component and actuators such as the Application life-cycle deployment engine.

5.11.1

Table 18: Self-Adaptation Manager Baseline Technology


Self Adaptation Manager

QoS Monitor

Monitoring Infrastructure

Application Lifecycle Deployment Engine

Energy Modeller

Programming Model

Figure 37: Self Adaptation Manager Component Diagram

The primary purpose of the self-adaptation manager is to perform adaptation in the middleware at runtime. It performs this through its interactions with the following major components:

Application Lifecycle Deployment Engine: This provides the actuators used by the self-adaptation manager so that it may influence the applications that are running on the physical infrastructure.

Programming Model: This similarly provides actuators that may be used to adapt an application running within the model.

QoS Monitor: The QoS Monitor will provide monitoring behaviour for QoS control. It will inform the self-adaptation manager that the QoS goals are being violated or are potentially going to be violated, thus allowing the self-adaptation manager to decide what action if any to take.



of 173

Monitoring Infrastructure: The monitoring infrastructure is the primary source for information about the performance of the deployed application. The QoS monitor utilises this in order to indicate when QoS goals are not been met. This enables it to notify the self-adaptation manager of any QoS goal violations and enable it to make any required adaptation decisions.

Energy Modeller: The energy modeller is an additional source of information regarding the power consumption of a given application. This information can be used to understand the scale of any adaptation that is required to be undertaken to meet an application’s QoS goals.



QoS Monitor

Monitoring InfrastructureApplication LifecycleDeployment Engine

notifyQoSFailure(Application)

getEnvironmentData(ApplicationDescription)

Run Adaptation Rules()

InvokeAdaptation

DiscoverAppropriateClassesOfAdaptation(ApplicationDescription)

getStaticDeploymentData():Application

Figure 38: Self Adaptation Manager Sequence Diagram

The Self-adaption manager is the primary means of deciding when runtime adaptation shall be performed in the middleware.

The QoS monitor continually monitors the applications’ overall performance with respect to the QoS goals. These goals cover aspects such as: performance, power and energy consumption. If one of these goals is breached the QoS monitor will notify the SAM that a breach of the goal has occurred. The adaptation manager from the information provided and the rule set applied to the application will then decide on what action to take. It will then invoke the application manager to perform the required adaptation.

The SAM decides how to adapt in several stages; in the first stage it obtains the applications information regarding the current deployment. This leads to the second stage it can discover which adaptation rules can be fired.

Given this information and current environmental information the adaptation rules are triggered and a rule is fired. From this the adaptor that invokes the adaptation requires more the environmental information to



of 173

decide on the scale of and exact nature of the adaption to make e.g. How is the application to be adapted? How much energy would be consumed upon adaptation and is that too much? These questions require information from additional data sources, such as the energy modeller and the monitoring infrastructure. The monitoring infrastructure is shown on the sequence diagram as a representative example of this occurring.


QoS Monitor (physical host)[OS=Linux]


Figure 39: Self Adaptation Manager Deployment Diagram

The self-adaption manager will work as a callable component of the QoS Monitor. The requirements for the self-adaption manager will be deployed on a physical host and invoked by the QoS monitor. It will then communicate as required with other components to ensure the adaptation takes place.


notifyQoSFailure Application

Deployment ID

Outcome of Request This launches the adaptive

behaviour of the Self-adaption

manager.

notificationOfExternalAda

ptationEvent

Application

Type of Adaptation

None This allows the SAM to be made

aware of changes made by other

actors so to prevent unnecessary

adaptation.

Table 19: Self Adaptation Manager API

4.4.3.3 Energy Modeller

The Energy Modeller (EM) provides power and energy consumption information for compute devices in the current, future and historical contexts. This is done with the intent of providing key information that guides the selection of the most appropriate compute setup from the possible list of heterogeneous configurations, with the intent of minimising energy consumption.

4.4.3.3.1 Novelty beyond the State of the Art The energy modeller is the key component for proving energy and power performance data of applications. It assists provides a log of real measurement values, along with the facility to provide future estimates that can be utilised to drive adaptation within the architecture and more generally to select the best configuration in order to minimise and maintain lower power consumption of an application.





of 173

R-EM-1 FUN Support

Calibration

for

Heterogeneo

us

Architectures

To enable calibration of multiple

heterogeneous setups, that may differ based

upon the type of workload instantiated upon

them (application level) and given the

heterogeneous parallel architectures

(hardware level). This will include the

identification of the parameters required to

monitor different types of physical

infrastructure and relate them to the

executing workload.

ULE MAN 3.1

R-EM-2 FUN Support

emulation of

Watt Meters

for

Heterogeneo

us

Architectures

This enables the large scale deployment (i.e.

HPC) of the Tango architecture by ensuring

that in cases where direct measurement is

not possible that an estimated value for

power consumption remains available.

ULE MAN 3.1

R-EM-3 FUN Support

Prediction for

Heterogeneo

us

Architectures

To support basic prediction for the

submission of jobs that may have different

execution paths dependent upon the

underlying architecture available.

ULE MAN 3.2

Table 20: Energy Modeller Referenced Requirements

4.4.3.3.3 Internal Architecture The energy modeller is a key component in the energy reduction process. It provides the mathematical models that estimate the power consumption and energy usage of a given deployment decision. Thus it is able to advise and drive the selection of hardware for service deployment and advise the process of self-adaptation in the Tango architecture. Energy modelling is used at deployment time for assessing the best possible assignment of resources to an application and also at runtime in order to advise a continuing energy mitigation strategy.

In addition it offers the facility to assess historic energy consumption which forms the heart of any advisory service for end users who which to understand the energy consumption of their application. The advice to end user’s goes further by informing them of the current power consumption of their software and hardware setup, thus they can gauge the current impact of running their applications.

In order to achieve this advisory service, the energy modeller utilises the infrastructure monitor as a data source, which provides key performance metrics for the cloud environment. The energy modeller then utilises a mathematical model to determine from these metrics the host resource’s current and likely future energy consumption, while also recording its past energy performance.

An estimation of the power consumption of a physical resource derives from two aspects. The first is the correct profiling of the resources characteristics, encompassing aspects such as its idle energy consumption and energy consumption under various load conditions. The second aspect is the profiling of the workload to be performed. This workload derives from the application that is to be characterised based upon the hardware it runs upon. To better advance the understanding of how much energy a application is expected to consume in the future.




of 173


MySQL To be used to store historical data. 5.6+

Maria DB JDBC Drivers To be used to connect to the database. 1.32+

Tomcat Provides a container for the remote calling of the calibration component of the energy modeller.

6.0+

Apache Commons Math A mathematical function library used for the models.

3

Table 21: Energy Modeller Baseline Technology


Energy Modeller

Energy Predictor

App Energy Allocation Policy

Programming Model

Infrastructure Monitor

Historical Record

Workload ProfilerHost Profile Host Calibrator

Watt Meter Emulator


Application Life-cycle Deployment

Engine

Figure 40: Energy Modeller Component Diagram

The major components of the energy modeller are shown have been identified in 4.4.3.3. The infrastructure monitor is seen as an information source for the energy modeller at the middleware level.

The energy modeller is broken down into 6 distinct sub components, namely the energy predictor, the application energy allocation policy, the host calibrator, the host profile, the workload profiler and the historic record. The energy predictor is the main mathematical model, which is supported by the various other services. The other sub-components roles are described below:

Application Energy Allocation Policy: This allocation policy determines how much of the host’s power consumption and energy usage is allocated to each application.

Host Calibrator: This performs the calibration of physical host, which provides the power consumption profile of a given host.



of 173

Host Profile: This is a profile that determines the host machine’s power consumption for a given spot workload.

Workload Profiler: A mechanism to profile application workloads to assist in future power consumption levels, by utilising a future projected workload profile to determine energy consumption.

Historic Record: A historic record that will assist the profiling process.

The Programming model, application life cycle deployment engine and self-adaptation manager are listed in the component diagram as they are the principle sources of invocation of the Energy modeller. The Programming model, application life cycle deployment engine does this in order for them to make their application deployments in the most energy efficient fashion as possible, while the self-adaptation manager performs invocations as a means to maintaining energy efficiency. The main approach for calibration is via invoking the calibration load locally and monitoring that performance as locally as possible to avoid scope for error.

The final component on the diagram is a Watt Meter emulator; which aims at providing scalability of the monitoring of power consumption. This is achieved through emulating Watt meters rather than requiring a Watt meter to be attached to every physical host which is impractical.


Energy Modeller

Application

Life-cycle

Deployment

Engine

Energy Predictor Infrastructure MonitorHistorical RecordHost Profile App Energy AllocationPolicy

estimateEnergyConsumption(ApptoResourceMapping, APP, APP_Workload_Profiles):Estimate

Get Energy Profile(ApptoResourceMapping, APP, APP_Workload_Profiles):Estimate

getHostProfile:Profile

getHistoricRecord(App):Historic Record

Get Current Environment Information():EnvironmentDesc

Run Model(): Host Estimate

getEnergyAssignmentToApp(Host, App, Workload Description):Estimate

Figure 41: Energy Modeller Sequence Diagram

Figure 41 demonstrates the energy modeller working in conjunction with the Application life-cycle deployment engine. The Application lifecycle deployment engine requests the energy associated with an applications deployment by specifying the mapping between an application and the host along with the expected workload (including usage of accelerators) to be induced on the physical host by the application. The energy predictor’s role is to assess the specified deployment strategy and to derive its likely energy usage. This therefore allows different plans to be compared allowing the best option to be selected.

The Application life-cycle deployment engine is therefore expected to provide the mappings between applications and the physical resources, as well as a workload profile that is as complete as possible.



of 173

Given the information provided by the Application life-cycle deployment engine, the energy modeller is able to look up the host’s power consumption profile, as well as the historical resource usage profile of a given application. From this information the energy modeller is able to invoke its mathematical models for the physical host’s characterisation and application’s workload characteristics. This will provide the power consumption and energy usage information for the physical host. The host’s energy usage is then fractioned among the applications that are running on the physical host by the application energy allocation policy to provide an estimate of the applications’ energy consumption.

The energy profiler is thus able to support the decision making process during both the initial placement of applications and during operation.

4.4.3.3.7 Deployment Diagram The Energy modeller is to be deployed into a physical machine with access to the infrastructure monitor. It is able to remotely invoke calibration tools if required through a tomcat based service. A Watt meter emulator is expected to be deployed on the host that is the infrastructure monitor is installed on. The Standalone calibrator is to be run on physical hosts that need profiling. The deployment is shown in Figure 42.

Energy Modeller Physical Host

Physical Host

Application Life-cycle Deployment Engine

MySQL Background Database

<<physical machine>>{OS=Linux}

:Tomcat<<Container>>

Calibration Service

<<DFS>>

SSD Storage<<device>>

EnergyModeller<<service>>

Standalone Calibrator Watt Meter Emulator

Figure 42: Energy Modeller Deployment Diagram

4.4.3.3.8 External Interface The table below lists the main API of the energy modeller:


getEnergyRecordForApp Collection<AppDeploye

d> apps,

TimePeriod timePeriod

HashSet<HistoricUsageRe

cord>

Provides a historical record for

power consumption data for a set

of applications.

getEnergyRecordForApp AppDeployed app,


HistoricUsageRecord Provides a historical record for

power consumption data for a

Application.

getCurrentEnergyForApp Collection<AppsDeploy

ed> apps

HashSet<CurrentUsageRe

cord>

Provides current power

consumption data for a set of



of 173

Applications.

getCurrentEnergyForApp AppDeployed app CurrentUsageRecord Provides current power

consumption data for an

application.

getPredictedEnergyForAp

p

App AppDeployment,

Collection<App>

AppsOnHost, Host host

EnergyUsagePrediction Provides predicted power

consumption data for a app, based

upon projected induced load of a

set of Applications.

getEnergyRecordForHost Collection<Host> hosts,


HashSet<HistoricUsageRe

cord>

Provides a historical record for

power consumption data for a set

of physical hosts.

getEnergyRecordForHost Host host, TimePeriod

timePeriod

HistoricUsageRecord Provides a historical record for

power consumption data for a

physical host.

getCurrentEnergyForHost Collection<Host> hosts HashSet<CurrentUsageRe

cord>

Provides current power

consumption data for a set of

physical hosts.

getCurrentEnergyForHost Host host CurrentUsageRecord Provides current power

consumption data for a physical

host.

getHostPredictedEnergy Host host,

Collection<Apps>

applications

EnergyUsagePrediction Provides predicted power

consumption data for a physical

host, based upon projected

induced load of a set of

applications.

calibrateModelForHost A set of hosts none Calibrates a set of hosts

calibrateModelForHost A host none Calibrates a host

getAppTotalCurrentPo

werConsumption

none double Provides the provider level metric

totalling all application power

consumption.

getHostsTotalCurrent

PowerConsumption

none Double Provides the provider level metric

totalling all host power

consumption.

getHostPowerUnalloc

atedToApps

none double Indicates how much power has not

been allocated to Apps (i.e. idle

hosts without applications).

getAppsToHostPower

Ratio

none double Returns the ratio between App

power and host power

consumption.

setApplicationProfileD

ata()

An App none Sets profile data to better describe

a Application. i.e. workload



of 173

variations possible

Table 22: Energy Modeller API

4.4.3.4 Monitoring Infrastructure

The responsibility of this component is twofold. For one side it must provide running applications with metrics of the status of different devices, such as CPU, GPUs, FPGAs, etc… this information should be both energy consumption and performance status. Also, it should provide historical statistics for device metrics for latter analysis of upper components of the TANGO architecture.

4.4.3.4.1 Novelty beyond the State of the Art Innovation in any kind of monitoring solution is quite difficult nowadays. Infinite solutions are available right now that cover needs from small footprint devices to entire clusters. Saying that, it is highly possible that the solution presented here, especially from the part of the different probes related to IPMI, RAPL, FPGA monitoring, PAPI, and so, with application performance monitoring, could help to improve the state of the art with respect a similar solutions but more focused on HPC environments [164].



MI-1 FUN Energy Measurements

The different agent probes need to be able to recover energy consumption information from the different processing devices connected to a node: CPU, FPGAs, GPUs, etc.

Atos MAN 3.3

MI-2 FUN Historic values

The monitoring solution needs to be able to store, if necessary, an historic information values of all the measurements

Atos Man 3.3

MI-3 INT API The monitoring solution should provide an API to access to the monitoring data

Atos Man 3.3

MI-4 INT GUI The monitoring solution should provide an GUI to looks at the stored monitoring data

Atos Des 3.3

MI-5 FUN Multinode The monitoring solution should be able to monitor several nodes at the same time

Atos Man 3.3

Table 23: Monitoring Infrastructure Referenced Requirements

4.4.3.4.3 Internal Architecture The Monitor Infrastructure component is sub-divided into three different types of subcomponents as it can be seen in the figure of subsection 4.4.3.4.5. From the bottom up:

Sub-agent modules such as IPMI, PAPI, proc probes, etc… that are going to connect to different devices or OS monitoring systems to collect performance and energy metrics.

Agent that can operate in both stand-alone or connected to a central server. The main mission of the agent it is to provide an unique API for the application to connect to the different probes and to temporarily store performance and energy metrics of the node to be sent, if necessary, to a central server for study.

Central Monitoring server, this component does not need to exists, specially in embedded systems that are not connected to a central infrastructure. This component can recollect metrics from the different nodes and aggregate them into one unique place that provides API and GUI for later study.



of 173

4.4.3.4.4 Baseline Technologies The following baseline technologies are could be used within this component (we mention could because this is the first year of the project a careful study should be done of he mentioned different tools for later on adapt them to the specific requirements for the rest of the components):


Freeipmi [165] Provides an API to access to IPMI information in a node.

1.5.1 or higher

PAPI [166] Provides access to read processor counters 5.4.3 or higher

VampirTrace [167] VampirTrace is an open source library that allows detailed logging of program execution for parallel applications using message passing (MPI) and threads (OpenMP), Pthreads).

5.14.4 or higher

Table 24: Monitoring Infrastructure Baseline Technology

4.4.3.4.5 Component Diagram The figure 43 shows the different components that make the Monitor Infrastructure. This is a modular approach. From the bottom up:

At the bottom we have the different sub-agents responsible of collect metrics from the different devices. The figure just show some common examples, but more could be added (for example, to monitor an FPGA, a GPU, etc.), the ones depicted in the figure:

o IPMI – If the node has an IPMI interface that measures energy, this module will be the responsible of collecting how much energy is consuming the node.

o Proc – It will read different performance metrics from the proc filesystem in a Linux environment.

o RAPL – If an Intel processor is present, this subagent it will connect to it to get different information about energy consumption of the cores of the processor.

o PAPI – If the devices in the node are compatible with PAPI libs, this module will be responsible of reading the counters in this device.

Agent – It will the MI component installed in the node, this component can be run in a standalone mode. The agent will connect to the different subsystem and expose the metrics via a common API. Also, if necessary, the agent can temporarily store the information in a file to be sent later on to a Central Monitoring component.

Central Monitoring – If TANGO tools are deployed in a cluster environment. A Central Monitoring component could be deployed. This component will recollect all the metrics from the different nodes and aggregate them, providing a common API to access to them and a GUI.



of 173

IPMI Proc RAPL PAPI ...

Temporal File Storage API

Agent

Storage API GUI

Central Monitoring

Figure 43: Monitor Infrastructure Component Diagram

4.4.3.4.6 Sequence Diagrams The following sequence diagram describes how an Application collects information from the monitor agent in a node:

Application Agent:APIgetMetricDevice

getLastMetricValue

getLastMetricValue

returnMetricValue

returnMetricValue

returnMetrics

Probe 1 Probe 2

Figure 44: Collecting information from a device Sequence Diagram

The following sequence diagram describes how the central monitoring system collects all the monitoring information from a node:



of 173

Central Monitoring Agent

sendData

returnMetricValue

Probe 1 Probe 2Filesystem

Storage

Loop getLastMetricValue

getLastMetricValue

Loop

sendData

getAllData

deleteDataack

Figure 45: Collecting information from a node Sequence Diagram

4.4.3.4.7 Deployment Diagram The following figure shows the deployment diagram of the Monitor Infrastructure. No base running technologies are depicted because they were not decided yet at the time of writing.

<<probe>> <<probe>> <<probe>>

<<Monitoring Agent>>

<<File Metric Storage>> <<API>>

<<Central Monitoring>>

<<Monitoring storage>>

RDD

<<API>> <<GUI>>

Figure 46: Monitoring Infrastructure Deployment Diagram

4.4.3.4.8 External Interface The Monitoring Infrastructure should present the following API functionality:


getMetricDevice ID of device and node +

metric id

Double value with the

metric + units

This method will return the latests

values for an specific metric for and

specific device in a node.



of 173

getHistoricMetricDevice ID of device and node +

metric id + period of

time

A collection of values +

units

This method will return the historic

values for an specific period of time

for an specific metric in an specific

device in an specific node.

Table 25: Monitoring Infrastructure API

4.4.4 Layer 3 – Fabric Layer

The Fabric Layer is divided into two parts, namely above and below the network fabric line. This layer addresses the heterogeneous parallel devices and their management. The application admission, allocation and management of HPDs are performed through the orchestration of a number of components. Power consumption is monitored, estimated and optimized using translated application level metrics. These metrics are gathered via a monitoring infrastructure and a number of software probes. At runtime HPDs will be continually monitored to give continuous feedback to the Self-Adaptation Manager. This will ensure the TANGO architecture adapts to changes in the current environment and in the demand for energy. Optimizations take into account several approaches, e.g. redeployment to another HPD, dynamic power management policies considering heterogeneous execution platforms and application energy models.

4.4.4.1 Device Supervisor

This component provides the workload scheduling capabilities, resource selection and task placement of the different applications upon the heterogeneous parallel devices of the used platforms. This covers both inter-cluster (across different networks) and intra-cluster (dealing with individual components of the heterogeneous parallel devices). Based on optimization criteria that can be given by both the Self-Adaptation Manager and the Monitoring Infrastructure and in conjunction with the current state of the scheduling queue and the HDPs it can make reconfigurations in order to decrease the energy consumption of the application or adapt the usage of resources to respect certain QOS goals such as performance or powercapping.

4.4.4.1.1 Novelty beyond the State of the Art The novelty of the device supervisor is derived from the fact that it is responsible for the scheduling of workloads upon all type of resources on both macro level such as multi-clusters and micro level such as compute nodes, HPDs and memory.



R-DS-1 FUN DS heterogeneity Support of heterogeneous allocations BULL MAN 4.4

R-DS-2 FUN DS scheduling

self-adaptation

Enable self-adaptation upon

heterogeneous allocations

BULL MAN 4.5

Table 26: Device Supervisor Referenced Requirements

4.4.4.1.3 Internal Architecture This component of Device Supervisor in conjunction with the Application LifeCycle Deployment Engine provide an important part of the TANGO toolbox. They are responsible for the actual application deployment upon the platforms. They need information from different components in order to enable the submission of an application fulfilling its needs. The Device Supervisor is the direct communicator with the actual heterogeneous parallel devices that execute the application. They need to get information from them concerning the application workflow and



of 173

the different results. Certain details may be exchanged between the DS and the HDPs throughout the application execution to guarantee the application is executed correctly. Based on input given by the self-adaptation manager and the monitoring infrastructure adaptations and reconfigurations may be triggered and the DS is responsible to pass to the HDPs.



Slurm This is used for the workload scheduling, resource selection and task placement of applications upon the heterogeneous parallel devices.

16.05

Table 27: Device Supervisor Baseline Technology


Figure 47: Device Supervisor Component Diagram

The primary purpose of the device supervisor is to perform workload scheduling, resource selection and task placement upon different levels of resources. It performs this through its interactions with the following major components:

Application Lifecycle Deployment Engine: This provides the particular applications’ needs in terms of resources and execution characteristics before and during the application deployment.

Self-Adaptation Manager: This provides actuators that may be used to adapt an application deployed upon a platform in order to respect QOS goals.

Monitoring Infrastructure: The monitoring infrastructure is the primary source for information for the resources consumption and performance of the deployed application. The stored information of the application executions can play an important role in the resource selection of future



of 173

application deployments. Furthermore it can be used dynamically in conjunction with the Self Adaptation Manager and the HDPs to reconfigure the hardware and adapt the application.

Heterogeneous Parallel Devices: The HDPs are the actual hardware components responsible for the application execution. Based on their characteristics and state they can execute particular parts of applications. They should provide these details to the Device Supervisor so that it can decide upon the right combinations of HDPs to use for each application. The DS will give them the details for the application tasks to execute and they need to control the execution and report back the workflow and results.


Figure 48: Device Supervisor Sequence Diagram

The Device Supervisor is the primary means of deciding when and where an application will be executed.

It is directly related with the application lifecycle deployment engine. When the application is submitted information are needed for the state of the clusters and resources (HPDs). Further details may be collected from the monitor infrastructure concerning the previous deployments of the applications. The treatment of these details in conjunction of the consideration of the other jobs in the queue will provide decisions on the actual scheduling and resource selection for the application deployment.

Once the application is deployed there can be a decision for adaptation and reconfiguration based on results of monitoring and internal calculations on the self-adaptation manager. If reconfigurations are needed the adaptation is demanded by the self-adaptation manager and the particular actions to be taken are directed by the device supervisor to the heterogeneous parallel devices.



of 173


Figure 49: Device Supervisor Deployment Diagram

The device supervisor will work as a callable component of the Application Lifecycle deployment engine. The requirements for the Device Supervisor will be deployed on a physical host and invoked by the Application Lifecycle deployment engine which will be also on the same host. It will then communicate as required with other components to ensure the workload scheduling and dynamic adaptations take place.


Application Submission Application

Deployment ID

Outcome of Request This launches the application upon

the HDPs

Application

Reconfiguration

Application

Type of Adaptation

None This allows the DS to provide

hardware reconfigurations to adapt

the application execution

Table 28: Device Supervisor Manager API

4.4.4.2 Heterogeneous Parallel Device Clusters

4.4.4.2.1 HPC BULL will provide a HPC platform of a small scale as a tesbed so that all TANGO partners can use it for experimentation, validation and performance evaluation purposes.

The platform is named nova2 and it consists of 48 heterogeneous compute nodes interconnected with 2 different networks: Ethernet for administration related tasks and Infiniband which is dedicated for the applications.

The compute nodes are based upon various types of BULL machines: blades (bullx, B510, B515, etc) and data-center servers (bullion) containing different versions of Intel x86 processors (such as E5-2470 at 2.3GHz or E5-2670 at 2.6GHz). Some of the nodes contain GPGPUs and other nodes contain Intel Xeon-Phi accelerators. There are plans to include nodes with Intel KNL processors as long as nodes with ARM.

The power consumption of the different nodes is given through the BMC of each node and it can pass to the different TANGO components through either in-band or out-of-band IPMI.

Nova2 platform makes use of SLURM for resource management and job scheduling, Lustre parallel file system for the sake of applications and NFS distributed file system for the user directories among nodes.



of 173

The access to nova2 platform is done through “ssh” connection upon the login node of the platform. Each user will get his/her own account created on the platform after following the procedure provided by BULL The usage of nodes has to pass through a SLURM allocation. By default users will have the possibility to use the particular software environment set by the administrators on the platform which may be limited depending on the type of experimentation.

Hence, the solution is to make in advance reservations, for the experiments, and through a coordination with an administrator a custom software environment will be prepared and will be deployed on the reserved nodes when the allocation starts. However we argue that this method may not be practical for TANGO components that need root access to be installed and configured upon the node. Hence other methods need to be defined to simplify the process.

Based on the specificities of TANGO we will try to adapt the usage of nova2 platform to the needs of the project. Since all partners will be interested in experimenting with different components and versions of TANGO toolbox, ideally we should enable a user-level environment deployment, through virtual machines or containers, dynamically on node allocation. This will enable the user to test any version of TANGO components’ she prefers even if this needs root access to be installed. Furthermore it will allow her to make changes and replay through a simplified procedure. Currently this is not supported but, based on ongoing studies in BULL, we hope that it will soon be the case so that we can take advantage of the flexibility and simplicity of this approach.

4.4.4.2.2 IoT DELTATEC present two use-cases to the project, the first is an embedded use-case and the second is distributed. Each has its own hardware requirements which are discussed next. For each use-case the aspects of the platforms’: hardware, operating system, access and security are discussed.

4.4.4.2.2.1.1 Embedded use-case

Platform Hardware

The computing platform provides a typical embedded architecture based on a Zynq (FPGA + CPU) and a Quad-cores ARM Cortex-A9 (i.MX6) with a GPU.



of 173

Figure 50: Overview of the Embeded Use-Case platform

Efficient communication channels are implemented between both Zynq and Multi-cores CPU:

- A Gigabit Ethernet network connection - A PCIe 2 (1 lane) providing 5 GBitps (full duplex)

It is not expected that this target platform is available at the end of year 1.

During year 1, development will take place on development boards that are procured from the computing device manufacturers or from their partners.

Therefore, the intermediate platform will be composed of two boards: one with the i.MX6 processor and one with the Zync SoC.

These development boards will be linked by an Ethernet network and sensors will be emulated using pictures transferred over the network.

This intermediate platform should enable development in a software environment that is very close to the final software environment, using the same devices and the same operating systems. Main differences will be

The PCIe link between processor will not be available. Communication between processors must be performed through the Ethernet link.

The input picture feed takes place through the Ethernet network instead of the Zigbee link. This point should have no impact on TANGO.

The power monitoring capabilities may be limited.

Platform Operating System

The two processors will run under a Linux 4.0 kernel, configured through a Yocto distribution.

Please note that the Yocto distribution has been selected to get the same operating system of the two processing devices. Other distributions are suitable for each of them. So this choice may be reconsidered. The final selection of the Linux kernel and distribution is to be agreed by the partners, taking the development constraints into account.

Access

Gateway

PU0

Zynq

Dual-Core

ARMFPGA

Shared SDRAM SDRAM

1GBps

Ethernet

Switch

1-Lane

PCIe

2.0

PU1

i.MX6

Quad Plus

Ethernet

PHY

PU0

SDCard

PU0

SDCard

SDRAM

Zigbee

Root

Power

monitoring

Power

Supply



of 173

As the platforms run on a Linux distribution, the partners may need these platforms mainly for final integration tasks. Moreover, we consider that working remotely on embedded platform is not productive.

Consequently we foresee to have more than one intermediate platforms, and to be able to lend them to the partners during some periods.

Security

We do not foresee to address security issues, which are not relevant in this use-case

4.4.4.2.2.1.2 Remote Processing Use-Case

Platform Hardware

The platform would be composed of:

End-user laptops (or desktop computers), executing the main application software. These laptops will be standard-grade x86 machines, they will be based on an INTEL Core I7 processor and will include a NVIDIA GPU with CUDA capabilities.

A dual multi-core INTEL XEON server, equipped with two NVIDIA GPU boards with CUDA capabilities (GeForce GTX-970 or better), providing a computational service.

These machines will be linked by an IP-based network, which can be an internet connection.

Platform Operating System and TANGO Requirements

The laptops will run under the Windows 10 Operating System.

The server INTEL XEON workstation will run either under the Windows 10 professional Operating System or an Ubuntu LTS distribution. Choice is left to partners.

While the full TANGO functionality is expected on the INTEL XEON workstation, the TANGO functionality on the laptops may be reduced.

On the laptops, we expect to be able to build the application inside the TANGO framework using its programming model, but we expect no job scheduling or energy be relevant because other software that cannot be supported by TANGO will be running at the same time.

The fact that the TANGO framework or a part of it is ported on Windows is to be agreed by the partners. If not, TANGO will be restricted to the server side. Please note that Windows support will increase significantly the TANGO project impact regarding DELTATEC markets.

Access

The laptops are common computers. We do not foresee to organise access to such computers.

On server side, the platform may be required by some partners during short periods during the framework integration phases. We foresee to be able to lend one machine during some short periods.

If useful, we can consider creating a remote access to such a machine using a remote control software like Team Viewer. We must still check what is practically feasible and see if such a connection is suitable for the development tasks. Nevertheless, partners must be aware that such a connection will anyway not be secured.

Security

We do not foresee security as presenting major issues within this use-case.

4.4.4.3 Device Emulator

This component provides out-of-band application deployment and operation to emulated HPD resources for the purpose of training application power profiles. Emulated HPD resources execute application code



of 173

while KPIs are monitored. The output of this process calibrates metrics within a power model that is created as a power profile within the Self-Adaptation Manager. The results from the device emulator are normalised performance metrics discovered when running (either via simulation or emulation) an application on a specific type or combination of HPD. Emulation of a range of HPDs is realised through a generic backend driver that interfaces to hardware emulators such as QEMU, OpenCL Emulator (ocl-emu) or vendor specific ASIC (FPGA) emulators. The device emulator can also be re-purposed to provide development time debugging capabilities.

4.4.4.3.1 Novelty beyond the State of the Art The scientifically novelty beyond the state of the art within this component, lays in its core functionality to provide performance metrics that form device profiles for executed applications on a range of different HPDs. Thus, this enables the use of scientific reasoning when selecting an appropriate device to deploy alongside the resource of a given application.



R-DE-1 FUN Execution Environment Support

Support execution environments of the Runtime Abstraction Layer within Programming Model Runtime for various HPD (SMP, GPGPU, FPGA).

BULL MAN 3.2

R-DE-2 FUN Simulation and Modelling

Simulation and modelling functionality with the aim of reducing the scope of the number of HPD alternatives to be considered for deployment.

BULL MAN 3.3

R-DE-3 FUN CPU Device Emulation

Device emulation for decision making on which HPD is ideal for application deployment, limited to QEMU in Y1: X86, ARM & POWER.

BULL MAN 3.3

R-DE-4 FUN Enhanced Device Emulation

Device emulation for decision making on which HPD is ideal for application deployment: X86, ARM, POWER, GPGPUs & FPGAs.

BULL MAN 4.3

R-DE-5 FUN Metric Data Store

A data store for normalised (against real devices, account of emulation overhead) performance metrics (cost, energy, util. etc.) on a per HPD device basis for a given application. In a format useful for self-adaptation and energy modelling.

BULL MAN 3.3

Table 29: Device Emulator Referenced Requirements

4.4.4.3.3 Internal Architecture The Device Emulator provides an environment in which the runtime abstraction layer of a programming model can evaluate a range of HPDs for a given application. This might involve the simulation and modelling of a HPD for the application in question with the aim of limiting the number of suitable devices during the process of resource selection. Additionally, after the selection of a range of suitable HPDs, normalised device emulation results can be used to gain further insight into which devices provide the most benefit, be it: performance, energy, cost or some balanced trade-off. This is achieved through the gathering of metrics during the execution of an application on an emulated or simulated device, which can then be used later by the Programming Model or other components such as the Application Life-Cycle Engine and Self-Adaptation Manager.



of 173

4.4.4.3.4 Baseline Technologies The following baseline technologies are used within this component in year 1:


MySQL [168] MySQL – Database Server 5.7.12+

QEMU [169] QEMU - Device Emulator 2.5.1+

Libvirt [170] Libvirt: The virtualization API 1.3.3+

Table 30: Device Emulator Baseline Technology

4.4.4.3.5 Component Diagram The following component diagram illustrates the interaction of the Device Emulator with other components within the TANGO architecture.

:DE<<service>>

Qemu<<executable>>

LIbvirtd<<service>>

REST API

System Call

:Programming Model

<<plug-in>>

Java API

:Life-CycleEngine

<<service>>

Java API

:Self-AdaptationManager

<<service>>

Java API

:R & DModelling

<<plug-in>>

Java API

Figure 51: Device Emulator Component Diagram

4.4.4.3.6 Sequence Diagram The following sequence diagram shows the API level interactions of the following components:

1) Requirements and Design Modelling Plug-in 2) Programming Model 3) Life-Cycle Engine 4) Self-Adaptation Manager



of 173

Device Emulator : MW

R & D Modelling : IDE

Programming Model : IDE

Loop model(application, device)

runSimulationAndStore()

results[metrics]

Life-Cycle Engine : MW

Self-Adaptation Manager : MW

Self-Adaptation Manager : MW

execute(application, device)

runEmulationAndStore()

results[metrics]

getHistoricData()

findResults(application, device)

results[metrics]

getHistoricData()

findResults(application, device)

results[metrics]

Figure 52: Device Emulator Sequence Diagram

Three core API method calls are shown that enable the modelling of an application on a specific device via simulation, provide an execution environment through software device emulation and finally a method to fetch historic data from previous simulation and emulation runs.

4.4.4.3.7 Deployment Diagram As per the other components within the TANGO architecture the device emulator and its baseline technologies will be deployed on a host machines running TANGO middleware:

:DataBaseServer<<physical machine>>

{OS=Linux}

:TangoMiddleware<<physical machine>>

{OS=Linux}

:Tomcat<<Container>>

Device Emulator<<Service>>

Qemu<<executable>>

Libvirtd<<service>>

Life-Cycle Engine<<service>>

Self-AdaptationManager

<<service>>

<<TCP>>

Storage<<device>>

Figure 53: Device Emulator Deployment Diagram

4.4.4.3.8 External Interface The following table summarises the methods as illustrated in the component’s sequence diagram:



of 173


model(Object, Object) :

ArrayList

application : an object

defining the

characteristics of the

application to be

simulated

device : an object

defining the


device

results : an array defining

the results of the

simulation. This contains

a number of relevant

metrics and the suitability

of a range of devices for

the application in

question.

Method to model an application on

a specific type of HPD. Uses

simulation to estimate the

performance and cost of the

application / device pair

execute(Object, Object) :

ArrayList


containing the location

and details of the

application to be

executed in an

emulated environment

device : an object

defining the emulated

device


the results of the

emulation. This contains

a number of relevant

metrics

Method to deploy and execute an

application on a specific type of

HPD. The application is executed in

an emulated device with results

normalised to that of an equivalent

real HPD

findResults(Object,

Object) : Arraylist


defining the


application to be

simulated

device : an object

defining the


device


the past results of either

a simulation or an

emulation run. This

contains a number of

relevant performance

energy and costs metrics.

Method to find results of previous

invocations of the Device Emulator.

Table 31: Device Emulator API



of 173

4.5 The Tango Architecture Workflow

This workflow of the tango architecture is discussed in further detail in this section and in particular the interactions of these components in the context of power consumption and the ramifications this has on application design, deployment and adaptation. This is split into service deployment and service operation.

4.5.1 Service Deployment

Figure 54: Architecture support for training application power profiles and deployment

In supporting the standard application deployment model (construct, deploy, run, monitor, adapt) the Tango architecture (shown in Figure 54) at design and runtime supports application profiling at design time and the deployment of applications on heterogeneous parallel devices. This provides the means of training to assess an application for a specific parallel architecture. In doing so the Energy Modeller is able to provide better power consumption predictions once the application is deployed, resulting in the saving of energy.

4.5.2 Service Operation

In addition to support for application design and deployment, the proposed architecture provides capabilities to perform continuous autonomic self-adaptation during runtime as shown in Figure 55. This leverages fine-grained monitored metrics of heterogeneous parallel devices and application software to create an adaptation plan supporting the performance and cost goals of an application. It is achieved through advances in modelling and prototyping that enable power, cost and performance awareness during operation through emulation and simulation under various “what-if” scenarios.



of 173

Figure 55: Architecture support for self-adaptation at runtime

This section discusses the future steps needed to implement TANGO architecture Year 1. It comprises of the critical path through the components for identifying component dependencies and a work plan outlining provisional deadlines for the delivery of the implemented external interfaces for each component.



of 173

4.6 Critical Path

Identifying the critical path of execution through the architecture identifies component dependencies and thus the order in which components must be delivered. This should aid in the mitigation of risk concerning component blocking the development of others due to dependencies on incomplete features or functionality. The following subsections describe the dependencies of components as outlined in Figure 56.

4.6.1 Construction

The entry point into the system is the IDE within the top layer. This is done through the two plugins namely the Requirements and Design plugin and the Programming Model plugin.

The programming model plugin makes use of the programming model runtime library and the Application Descriptor tool. This builds up a profile of the application that is to be deployed.

4.6.2 Deployment

During the deployment stage, the Application Life-cycle Deployment Engine initiates a plan on how to deploy the application based upon 1) energy constraints/goals that indicate the minimum energy efficiency that is required/desired for the deployment and operation of an application, and 2) application performance constraints that indicate the minimum requirements in terms of performance for the application (time-criticality, data location, cost etc.)

The different application needs and criteria will be selected through the interface provided by the scheduler/workload manager, e.g. SLURM. The latter will perform automatic workload execution upon the heterogeneous platform, in addition to managing data (stage-in, stage-out), by applying efficient scheduling techniques between jobs (fair sharing, backfilling, pre-emption, etc.) and by selecting the best-suited resources for each job (based on resources characteristics, network topology, internal node topology, power management, etc.). Moreover, this component’s role is also to optimize the life cycle of an application to ensure its constraints are fulfilled considering: 1) the status of the heterogeneous parallel devices in terms of power consumption and workload; 2) the description of the cluster in terms of platform type, hardware specification and its power consumption profile, and 3) profile of application in terms of how it stresses each of the devices (CPU, memory, network…). Using SLURM’s support for heterogeneous resources, the accounting and profiling of each heterogeneous resource will take place for all jobs.



of 173

R & D Modelling

Plugin

Programming Model Code

Optimizer

Monitoring Infrastructure

Runtime Abstraction

Layer

Smart Devices

Application Life-Cycle Deployment

Engine

Heterogeneous Parallel Device

Cluster

Device Supervisor

Device Emulator


Energy Modeller

Figure 56: TANGO Architecture Critical Path

4.6.3 Operation

During the operation phase, the Application Life-cycle Deployment Engine and Monitor Infrastructure continually gather performance data regarding the deployed applications.

The Self-Adaptation Manager provides key functionality to manage the entire adaptation strategy applied to applications and Heterogeneous Parallel Devices (HPDs). This entails the dynamic optimisation of: energy efficiency, time-criticality, data movement and cost-effectiveness through continuous feedback to other components within the architecture and a set of architecture specific actuators that enable environmental change. Examples of such actuators could be: redeployment to another HPD, restructuring a workflow task graph or dynamic recompilation. Furthermore, the component provides functionality to guide the deployment of an application to a specific HPD through predictive energy modelling capabilities and polices, defined within a decision support engine, which specify cost constraints via Business Level Objectives (BLOs).

4.6.4 Interface Work Plan

Given the previous discussion on the component critical path, the following subsection presents a high-level work plan for completing the implementation of external interfaces, “wiring” of baseline technology and any “dummy” code needed to enable the deployment of an application following the cloud lifecycle. This should facilitate in the prevention of blocking issues during component development, mitigating any risk there in of failure to deliver the architecture.



of 173

ID Task Name Start Finish DurationApr 2016 May 2016 Jun 2016 Jul 2016

1/5 8/5 5/6

1 219d30/12/201601/03/2016Work Package 3

2 7d10/05/201602/05/2016Code Optimizer

3 7d10/05/201602/05/2016R & D Modelling Plug-in

4 7d10/05/201602/05/2016Programming Model Plug-in

5 7d19/05/201611/05/2016Runtime Abstraction Layer

6 7d30/05/201620/05/2016Application Life-Cycle Deployment Engine

7 7d17/06/201609/06/2016Self-Adaptation Manager

8 7d08/06/201631/05/2016Energy Modeller

9 7d17/06/201609/06/2016Heterogeneous Parallel Device Cluster

10 7d08/06/201631/05/2016Device Supervisor

11 7d28/06/201620/06/2016Device Emulator

12 7d17/06/201609/06/2016Smart Devices

13 7d07/07/201629/06/2016Monitoring Infrastructure

14 120d22/12/201608/07/2016Continious Intergration

3/7

Figure 57: TANGO Architecture Interface Work Plan



of 173

Part 5. Conclusions

This deliverable D.2.1 has presented the work carried out by TANGO consortium as part of WP2 – Requirements, Architecture and V&V Approach during the period M1-M4 of the project. In this first year, TANGO concentrates on delivering energy awareness in all components for Heterogeneous Parallel Architectures in a cross-layer programming flow. Monitoring and benchmarks information are measured at hardware level and propagated through the various layers of TANGO Toolbox. This iteration will produce and validate and demonstrate TANGO Toolbox Alpha version. The first part of the deliverable - Market Analysis - can be considered as a starting point for the exploitation of the project as it provides a general overview of the current market situation. Taking into account that ICT market, and specially the filed of Heterogeneous Parallel Architectures / low power computing is continuously evolving the results of this investigation will be constantly updated. This will be done within WP7 and the correspondent Exploitation documents.

The second part of the deliverable has presented a thorough state-of-the-research on various topics such as architecture support for low power computing and programming models/run-time management techniques for Heterogeneous Parallel Architectures. This also included the review of past/current related research projects such as POLCA, EUROSERVER, P-SOCRATES, FiPS, ADEPT, Mont-BLANC and EXCESS. The third part of the deliverable has focused on the specification of TANGO architecture. This includes the architectural roles, scope and interfaces of the components, as well as communication patterns. The commonalities between the envisioned use cases have been a topic of significant attention, as has been the interaction between the business goals analysis, the technical requirements elicitation and the architecture definition. An initial view of TANGO quality model and architecture has been presented. This architecture complies with a standard IDE, middleware and infrastructure layers and supports components such as the Programming Model, the Application Lifecycle Deployment Engine VM manager and the Heterogeneous Parallel Device Cluster. The design of the architectural components was described in detail, some of which will require specific extensions in order to be able to deal with energy efficiency/low power management. In addition to this, the architecture also requires specific components to be developed from scratch such as the Energy modeller and the Device Emulator. The rationale and functionalities of all those components were also be explained in this document.

The take home message in year 1 of TANGO with regards to the architecture is that energy efficiency will be addressed at all layers of the software stack and during the complete lifecycle of the application. This will showcased via two architecture deployment illustrations: DELTATEC and Bull use cases.

In the second year, TANGO will augment the alpha-version software modelling tools, code optimisation and programming model by adding the capabilities required in order to enable run-time self-adaptation. This iteration will produce and validate and demonstrate TANGO Toolbox Beta version.



of 173

References

[1] ADEPT Technologies, «ADEPT - Address Energy in Parallel Technologies,» 2015. [En línea]. Available: http://www.adept-project.eu. [Último acceso: 02 04 2015].

[2] C. Silvano, W. Fornaciari, S. C. Reghizzi, G. Agosta, G. Palermo, V. Zaccaria, P. Bellasi, F. Castro, S. Cobertta, E. Speziale, D. Melpignano, J. M. Zins, D. Siorpaes, H. Hubert, B. Stabernack, J. Brandenburg, M. Palkovic, P. Raghavan, C. Ykman-Couvreur, A. Bartzas, D. Soudris, T. Kempf, G. Ascheid, H. Meyr, J. Ansari, P. Mahonen y B. Vanthournout, «Parallel Paradigms and Run-time Management Techniques for Many-core Architectures: The 2PARMA Approach,» de Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop, New York, 2012.

[3] PEPPHER Project, «PEPPHER Project - Programmability & Portability,» [En línea]. Available: http://www.peppher.eu. [Último acceso: 14 03 2014].

[4] S. Benkner, S. Pllana, J. Traff, P. Tsigas, U. Dolinsky, C. Augonnet, B. Bachmayer, C. Kessler, D. Moloney y V. Osipov, «PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems,» IEEE Micro, pp. 28-41, 21 07 2011.

[5] EXCESS Consortium, «EXCESS - Execution Models for Energy-Efficient Computing Systems,» 2015. [En línea]. Available: http://excess-project.eu. [Último acceso: 02 04 2015].

[6] D. Bortolotti, C. Pinto, A. Marongiu, M. Ruggiero y L. Benini, «VirtualSoC: A Full-System Simulation Environment for Massively Parallel Heterogeneous System-on-Chip,» de IEEE International Symposium on Parallel and Dsitributed Processing, Workshops and PhD Forum (IPDPSW, Cambridge, MA (USA), 2013.

[7] FiPS Consortium, «FiPS Project - Developing Hardware and Design Methodologies for Heterogeneous Low Power Field Programmable Servers,» [En línea]. [Último acceso: 02 04 2015].

[8] L. M. Pinho, E. Quinones, M. Bertogna, A. Marongiu, J. Pereira Carlos, C. Scordino y M. Ramponi, «P-SOCRATES: A Parallel Software Framework for Time-Critical Many-Core Systems,» de 17th Euromicro Conference on Digital Systems Design (DSD), 2014.

[9] HARPA Consortium, «Harnessing Performance Variability,» 2015. [En línea]. [Último acceso: 02 04 2015].

[10] The Barbeque Open Source Project, «The Barbeque Open Source Project - An highly modular and extensible run-time resource manager,» 2014. [En línea]. [Último acceso: 02 04 2015].

[11] ALMA Consortium, «The ALMA project,» 2014. [En línea]. [Último acceso: 18 03 2014].

[12] Emmtrix Technologies, «Emmtrix Solutions for Leveraging Embedded Multicore Computing,» [En línea]. Available: http://www.emmtrix.com. [Último acceso: 16 02 2016].

[13] I. El-Helw, , R. Hofman y . H. E. Bal, «Glasswing: Accelerating Mapreduce on Multi-core and Many-core Clusters,» de 23rd International Symposium on High-performance Parallel and Distributed Computing,, 2014.

[14] K. Kanoun, M. Ruggiero, D. Atienza y V. D. Schaar, «Low Power and Scalable Many-Core Architecture for Big-Data Stream Computing,» 2014.



of 173

[15] EuroServer Consortium, «EUROSERVER - Green Computing Node for European micro-servers,» 16 05 2016. [En línea]. Available: http://www.euroserver-project.eu/.

[16] Khronos OpenCL Working Group, «The OpenCL Specification, version 2.0,» 2014.

[17] HSA Foundation, «Heterogeneous System Architecture,» 2015. [En línea]. Available: http://www.hsafoundation.com/.. [Último acceso: 22 02 2016].

[18] Altera, «OpenCL to FPGA Tutorial at 2015 International Symposium on Code Generation and Optimization,» 2015. [En línea]. Available: ttp://cgo.org/cgo2015/event/altera-compiling-opencl-to-a-streaming-dataflow-architecture-on-fpgas/ . [Último acceso: 22 03 2016].

[19] S. van Beek y S. Sharma, «Best Practices for FPGA Prototyping of MATLAB and Simulink Algorithms,» 11 08 2011. [En línea]. Available: http://www.eejournal.com/archives/articles/20110825-mathworks/. [Último acceso: 22 03 2016].

[20] MathWorks, «Prototyping Algorithms and Testing CUDA Kernels in MATLAB,» [En línea]. Available: http://es.mathworks.com/company/newsletters/articles/prototyping-algorithms-and-testing-cuda-kernels-in-matlab.html?requestedDomain=nl.mathworks.com#. [Último acceso: 22 03 2016].

[21] L. B. Bosi, M. Mariotti y A. Santocchia, «GPU Linear algebra extensions for GNU/Octave,» de ACAT, Journal of Physics: Conference Series 368, 2012.

[22] A. D. a. S. 2014. [En línea]. Available: http://www.scilab.org/community/news/20141110 . [Último acceso: 22 03 2016].

[23] O. T. f. MatLab. [En línea]. Available: https://code.google.com/p/opencl-toolbox/ . [Último acceso: 22 03 2016].

[24] A. Dardenne, A. van Lamsweerde y S. Fickas, «Goal-directed Requirements Acquisition,» Science of Computer Programming, vol. 20, pp. 3-50, 1993.

[25] Yu y E. S.K., «Towards Modelling and Reasoning Support for Early-Phase Requirement Engineering,» de IEEE Int. Symp. Requirements Eng, 1997.

[26] D. Amyot, «Introduction to the User Requirements Notation: learning by example,» Computer Networks, vol. 42, nº 3, pp. 285-301, 2003.

[27] L. Rapanotti, J. G. Hall, M. Jackson y B. Nuseibeh, «Architecture-driven Problem Decomposition,» IEEE Computer Society, pp. 80-89,, 2004.

[28] M. Pelcat, K. Desnos, J. Heulot, C. Guy, J. Nezan y S. Aridhi, «Dataflow-Based Rapid Prototyping for Multicore DSP Systems - Technical Report PREESM/2014-05TR01,» 2014.

[29] M. Pelcat, K. Desnos, J. Heulot, C. Guy, J.-F. Nezan y S. Aridhi, « PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming,» de 6th European Embedded Design in Education and Research Conference (EDERC), Milano, 2014.

[30] COMPA Project. [En línea]. Available: http://www.compa-project.org. [Último acceso: 22 03 2016].

[31] POLCA Consortium, «Programming Large Scale Heterogeneous Architectures,» 2015. [En línea].



of 173

Available: http://cluster013.ovh.net/~polcapro/. [Último acceso: 22 03 2016].

[32] U. o. P. I. Massimo Torquati - Parallel computing group y U. o. T. I. Marco Aldinucci – Parallel computing group, «FastFlow parallel programming framework website,» 2015. [En línea]. Available: http://calvados.di.unipi.it/fastflow.

[33] A. Cesta, A. Oddi y S. F. and Smith, «Iterative flattening: A scalable method for solving multi-capacity scheduling problems,» p. 742–747, 2000.

[34] L. M. a. P. V. Hentenryck, «Iterative Relaxations for Iterative Flattening in Cumulative Scheduling,» de Proceedings of the 14th International Conference on Automated Planning & Scheduling, 2004.

[35] A. C. A. P. N. &. S. S. F. Oddi, «Iterative flattening search for resource constrained scheduling,» Journal of Intelligent Manufacturing, 21(1), pp. 17-30, 2010.

[36] M. M. S.-O. R. Jansen K., «Approximation Algorithms for Flexible Job Shop Problems,» de Proceedings of Latin American Theoretical Informatics (LATIN'2000), 1999.

[37] L. &. V. H. P. Mercier, «Edge finding for cumulative scheduling,» INFORMS Journal on Computing, 20(1), pp. 143-153, 2008.

[38] «ROADEF/EURO Challenge 2012: Machine Reassignment,» 2012. [En línea]. Available: http://challenge.roadef.org/2012/en/.

[39] R. C. G. P. M. P. D. R. L. T. A. D. Baraglia, «Backfilling Strategies for Scheduling Streams of Jobs On Computational Farms,» de Proceedings of the CoreGRID Workshop on Programming Models Grid and P2P System Architecture Grid Systems, Tools and Environments, Heraklion, Crete, Greece, 2007.

[40] S. K. R. S. V. &. S. P. Srinivasan, « Characterization of backfilling strategies for parallel job scheduling,» de Proceedings Workshops International Conference on Parallel Processing, IEEE, 2002.

[41] C. A. P. M. G. P. G. S. C. &. Z. V. Ykman-Couvreur, «Linking run-time resource management of embedded multi-core platforms with automated design-time exploration,» Computers & Digital Techniques, IET, 5(2), pp. 123-135, 2011.

[42] E. P. D. W. S. L. P. I. C. P. R. S. T. D. P. S. S. S. C. Cameron Browne, «A Survey of Monte Carlo Tree Search Methods,» IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no 1, 2012.

[43] P. V. H. a. R. Bent, Online Stochastic Combinatorial Optimization, The MIT Press, 2006.

[44] UPC Consortium, "Unified Parallel C Language Specification, v1.3," 2013. [Online]. Available: https://upc-lang.org/assets/Uploads/spec/upc-lang-spec-1.3.pdf . [Accessed 23 2 2016].

[45] «Cilk+ Open Source Project,» Intel, [En línea]. Available: https://www.cilkplus.org/. [Último acceso: 23 2 2016].

[46] OpenMP ARB, «OpenMP Application Program Interface, v. 4.5,» November 2015. [En línea]. Available: http://www.openmp.org/mp-documents/openmp-4.5.pdf . [Último acceso: 23 2 2016].

[47] MPI Forum, «Message Passing Interface Standard, v3.1,» July 2015. [En línea]. Available:



of 173

http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. [Último acceso: 23 2 2016].

[48] «Partition Global Address Space website,» [En línea]. Available: http://www.pgas.org. [Último acceso: 23 2 2016].

[49] «CUDA Toolkit,» NVIDIA, [En línea]. Available: https://developer.nvidia.com/cuda-toolki. [Último acceso: 23 2 2016].

[50] «Xilinx SDSoC Development Environment,,» [En línea]. Available: http://www.xilinx.com/products/design-tools/software-zone/sdsoc.html. [Último acceso: 23 2 2016].

[51] Khronos OpenCL Working Group, « The OpenCL Specification, version 2.0,» 2014.

[52] «OpenACC website,» [En línea]. Available: http://openacc-standard.org. [Último acceso: 23 2 2016].

[53] Eduard Ayguade, Rosa M. Badia, Pieter Bellens, Daniel Cabrera, Alejandro Duran, Marc Gonzalez, Francisco Igual, Daniel Jimenez-Gonzalez, Jesus Labarta, Luis Martinell, Xavier Martorell, Rafael Mayo, Jose M. Perez, Judit Planas and Enrique S. Quintana, «Extending OpenMP to Survive the Heterogeneous Multi-core Era,» Journal of Parallel Programming, vol. 38, pp. 440-459, 2010.

[54] Schubert, Lutz, Jan Kuper, and José Gracia, «POLCA–A Programming Model for Large Scale, Strongly Heterogeneous Infrastructures.,» Parallel Computing: Accelerating Computational Science and Engineering (CSE), vol. 25, p. 43, 2014.

[55] C. Augonnet, S. Thibault, R. Namyst and P.-A. Wacrenier, «StarPU: a unified platform for task scheduling on heterogeneous multicore architectures,» Concurrency and Computation: Practice and Experience, vol. 23, nº 2, pp. 187-198, 2011.

[56] C. Augonnet, J. Clet-Ortega, S. Thibault and R. Namyst, «Data-Aware Task Scheduling on Multi-accelerator Based Platforms,» de IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), 2010.

[57] Enmyren, J. and Kessler, C. W., «SkePU: A Multi-backend Skeleton Programming Library for multi-GPU Systems,» de The Fourth International Workshop on High-level Parallel Programming and Applications, New York, 2010.

[58] M. Aldinucci, M. Danelutto, P. Kilpatrick, and M. Torquati, «FastFlow: high-level and efficient streaming on multi-core,» de Programming Multi-core and Many-core Computing Systems, Wiley, 2014.

[59] Nvidia Corporation, «Thrust:C++ Template Library for CUDA,» 2015.

[60] S. Benkner, S. Pllana, J. L. Traff, P. Tsigas, U. Dolinsky, C. Augonnet, B. Bachmayer, C. Kessler, D. Moloney and V. Osipov, «PEPPHER: Efficient and Productive Usage of Hybrid,» Computing Systems, vol. 31, pp. 28-41, 2011.

[61] Kessler, C. and Lowe, W., «Optimized composition of performance-aware parallel components,» Concurrency and Computation: Practice and Experience, vol. 24, nº 5, pp. 481-498, 2012.

[62] M. D. Linderman, J. D. Collins, H. Wang and T. H. Meng, «Merge: A Programming Model for Heterogeneous Multi-core Systems,» SIGPLAN Not., vol. 43, nº 3, pp. 287-296, 2008.



of 173

[63] P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, M. Girkar, N. Yang, G.-y. Lueh and H. Wang, «EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system,» de ACM SIGPLAN Conferece on Programming Language Design and Implementation, 2007.

[64] «Manycore Platform Software Stack,» Intel, [En línea]. Available: [21] Intel Manycore Platform Software Stack, https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss#about. [Último acceso: 23 2 2016].

[65] P. Cooper, U. Dolinsky, A. Donaldson, A. Richards, C. Riley and G. Russell, «Offload – Automating Code Migration to Heterogeneous Multicore Systems,» High Performance Embedded Architectures and Compilers SE - 25, vol. 5952, pp. 337-352, 2010.

[66] A. F. Donaldson, U. Dolinsky, A. Richards and G. Russell, «Offloading of C++ for the Cell BE Processor: A Case Study Using Offload,» de International Conference on Complex, Intelligent and Software Intensive Systems, 2010.

[67] A. Miyoshi, C. Lefurgy, E. Van Hensbergen, R. Rajamony and R. Rajkumar, «Critical power slope: understanding the runtime effects of frequency scaling,» de 16th International Conference on Supercomputing, 2002.

[68] L.Wang, G. Von Laszewski, J. Dayal and F.Wang, «Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with dvfs,» de 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010.

[69] C.M. Wu, R.-S. Chang and H.Y. Chan, «A green energy-efficient scheduling algorithm using the dvfs technique for cloud datacenters,» Future Generation Computer Systems, vol. 37, pp. 141-147, 2014.

[70] G. L. Valentini, W. Lassonde, S. U. Khan, N. Min-Allah, S. A. Madani, J. Li, L. Zhang, L. Wang, N. Ghani, J. Kolodziej, H. Li, A. Y. Zomaya, C.-Z. Xu, P. Balaji, A. Vishnu, F. Pinel, J. E. Pecero, D. Kliazovich y P. Bouvry, «An overview of energy efficiency techniques in cluster computing systems,» Cluster Computing, vol. 16, nº 1, pp. 3-15, March 2013.

[71] S. Albers, « Energy-efficient algorithms,» Communications of the ACM, 2010.

[72] B. Rountree, D. Lowenthal, M. Schulz y B. De Supinski, «Practical performance prediction under dynamic voltage frequency scaling,» de International Green Computing Conference and Workshops (IGCC), Orlando, 2011.

[73] M. Etinski, J. Corbalan, J. Labarta y M. Valero, «Understanding the future of energy-performance trade-off via DVFS in HPC environments,» Journal of Parallel and Distributed Computing, 2012.

[74] V. W. Freeh, D. K. Lowenthal, F. Pan, N. Kappiah, R. Springer, B. L. Rountree y M. E. Femal, «Analyzing the energy-time trade-off in high-performance computing applications,» IEEE Transactions on Parallel and Distributed Systems, vol. 18, nº 6, pp. 835 - 848, 2007.

[75] Intel Inc., «Intel GEOpm slides,» 12 2015. [En línea]. Available: https://eehpcwg.llnl.gov/documents/webinars/systems/120915_eastep-geo.pdf, . [Último acceso: 2016 03 22].

[76] Intel Inc., «Intel GEOpm,» [En línea]. Available: https://github.com/geopm/geopm. [Último acceso: 22 03 2016].



of 173

[77] V. Tiwari, S. Malik and A. Wolfe, «Compilation techniques for low energy: An overview,» de IEEE Symposium in Low Power Electronics, 1994.

[78] M. Kandemir, N. Vijaykrishnan, M. J. Irwin, and W. Ye, «Influence of compiler optimizations on system power,» de The 37th Annual Design Automation Conference, 2000.

[79] H. Mehta, R. M. Owens, M. J. Irwin, R. Chen, and D. Ghosh, «Techniques for low energy software,» de International Symposium on Low power electronics and design, 1997.

[80] Guo, M., «Energy-aware compiler scheduling for VLIW embedded software,» de International Conference Workshops on Parallel Processing, 2005.

[81] W. Zhang, M. Karakoy, M. Kandemir, and G. Chen, «A compiler approach for reducing data cache energy,» de The 17th annual international conference on Supercomputing, 2003.

[82] Kundu, Paul Kolin and Tapas Kumar, «Android on mobile devices: an energy perspective,» de 10th International Conference on Computer and Information Technology, 2010.

[83] A. Beloglazov, J. Abawajy, R. Buyya, «Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing,» Future generation computer systems, vol. 28, pp. 755-768, 2012.

[84] I. Pietri, M. Malawski, G. Juve, E. Deelman, J. Nabrzyski and R. Sakellariou, «Energy-constrained provisioning for scientific workflow ensembles,» de Third International Conference on Cloud and Green Computing (CGC),, 2013.

[85] F. Lordan, E. Tejedor, J. Ejarque, R. Rafanell, J. Álvarez, F. Marozzo, D. Lezzi, R. Sirvent, D. Talia and R. M. Badia, «ServiceSs: An Interoperable Programming Framework for the Cloud,» Journal on Grid Computing, vol. 12, nº 1, pp. 67-91, 2014.

[86] «Adapting Service lifeCycle towards EfficienT Clouds Project website,» [En línea]. Available: http://www.ascetic-project.eu. [Último acceso: 23 2 2016].

[87] F. Lordan, J. Ejarque, R. Sirvent and R. M. Badia, «Energy-Aware Programming Model for Distributed Infrastructures,» de 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2016.

[88] D. Bortolotti, C. Pinto, A. Marongiu, M. Ruggiero y L. Benini, «VirtualSoC: A Full-System Simulation Environment for Massively Parallel Heterogeneous System-on-Chip,» de IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum (IPDSW), Cambridge, MA, USA, 2013.

[89] T. Bruckschloegi, O. Oey, M. Ruckauer, T. Stripf y J. Becker, «A Hierarchical Architecture Description for Flexible Multicore System Simulation,» de IEEE International Symposium on Parallel and Distributed Processing with Applications, Los Alamitos, 2014.

[90] K. Jiayuan Meng y Skadron, «A reconfigurable simulator for large-scale heterogeneous multicore architectures,» de IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), Austin, 2011.

[91] D. C. Burger, T. M. Austin y S. Bennett, «Evaluating future microprocessors: the SimpleScalar tool



of 173

set,» 1996.

[92] D. M. Tullsen, «Simulation and modelling of a simultaneous multithreading processor,» CMG, vol. 22, 1996.

[93] M. Yourst, «PTLsim: A cycle accurate full system x86-64 microarchitectural simulator,» de ISPASS, 2007.

[94] P. Montesinos Ortego y P. Sack, «SESC: SuperESCalar Simulator,» 2004. [En línea]. [Último acceso: 1 03 2016].

[95] P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt y B. Werner, «Simics: A full system simulation platform,» Computer, vol. 35, nº 2, 2002.

[96] M. M. K. Martin, D. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill y D. A. Wood, «Multifacet's general execution-driven multiprocessor simulator,» CAN, vol. 33, nº 4, 2005.

[97] N. Hardavellas, S. Somogyi, T. F. Wenisch, R. E. Wunderlich, S. Chen, J. Kim, B. Falsafi, J. C. Hoe y A. G. Nowatzyk, «SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture,» SIGMETRICS PER, vol. 31, nº 4, 2004.

[98] A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong y T. M. Aamodt, «GPGPU-Sim: A performance simulator for massively multithreaded processor,» de ISPASS, 2009.

[99] J. W. Sheaffer, D. Luebke y K. Skadron, «A flexible simulation framework for graphics architecture,» Graphics Hardware, 2004.

[100] Attila Project, 2015. [En línea]. Available: http://attila.ac.upc.edu/wiki/index.php/Main_Page. [Último acceso: 1 03 2015].

[101] H. Casanova, A. Giersch, A. Legrand, M. Quinson y F. Suter, «Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms.,» Journal of Parallel and Distributed Computing, 10 2014.

[102] L.-C. Canon y E. Jeannot, «Wrekavoc: a tool for emulating heterogeneity,» IPDPS, 2006.

[103] Y. Georgiou y M. Hautreux, «Evaluating scalability and efficiency of the Resource and Job Management System on large HPC Clusters,» de Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, 2012.

[104] K. Jiayuan Meng and Skadron, «A reconfigurable simulator for large-scale heterogeneous multicore architectures,» de IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), Austin, 2011.

[105] T. Mathisen, «Pentium Secrets,» Byte, July 1994.

[106] PAPI, «Performance Application Programming Interface,» 2016. [En línea]. Available: http://icl.cs.utk.edu/papi/index.html. [Último acceso: 29 02 2016].

[107] S. Browne, J. Dongarra, N. Garner, G. Ho y P. Mucci, «A Portable Programming Interface for Perfomance Evaluation on Modern Processors,» International Journal of High Performance



of 173

Computing Applications, vol. 14, nº 3, pp. 189-204, 8 2000.

[108] D. Terpstra, H. Jagode, H. You y J. Dongarra, «Collecting Performance Data with PAPI-C,» de Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing, Dresden, 2009.

[109] H. McCraw, J. Ralph, A. Danalis y J. Dongarra, «Power monitoring with PAPI for extreme scale architectures and dataflow-based programming models,» de IEEE International Conference On Cluster Computing (CLUSTER), Madrid, 2014.

[110] H. McCraw, D. Terpstra, J. Dongarra, K. Davis y R. Musselman, «Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q,» de 28th International Supercomputing Conference, Leipzig, 2013.

[111] Nvidia, «PAPI CUDA Component,» 2016. [En línea]. Available: https://developer.nvidia.com/papi-cuda-component. [Último acceso: 1 03 2016].

[112] A. D. Malony, S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole y C. Lamb, « Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,» de International Conference on Parallel Processing (ICPP), Taipei, 2011.

[113] Xilinx, «Virtex-5 FPGA System Monitor - User Guide,» 2011. [En línea]. Available: http://www.xilinx.com/support/documentation/user_guides/ug192.pdf. [Último acceso: 01 03 2016].

[114] Altera Corporation, «FPGA-Adaptive Software Debug and Performance Analysis,» 2013. [En línea]. Available: https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01198-fpga-software-debug-soc.pdf. [Último acceso: 01 03 2016].

[115] L. James, D. David y P. Phi, «Powerinsight - a commodity power measurement capability.,» de The Third International Workshop on Power Measurement and Profiling, 2013.

[116] R. Ge, F. Zizhou, S. Shuaiwen, H.-C. Chang y D. Li, «PowerPack: Energy Profiling and Analysis of High Performance Systems and Applications,» IEEE Transactions on Parallel and Distributed Systems, vol. 21, nº 5, pp. 658-671, 5 2010.

[117] IBM corporation, «IBM PowerExecutive Toolkit,» 2007. [En línea]. Available: https://www-01.ibm.com/marketing/iwm/tnd/demo.jsp?id=IBM+PowerExecutive+Power+Capping+Mar07. [Último acceso: 01 03 2016].

[118] D. Abdurachmanov, P. Elmer, G. Eulisse, R. Knight, T. Niemi, J. K. Nurminen, F. Nyback, G. Pestana, Z. Ou y K. Khan, «Techniques and tools for measuring energy efficiency of scientific software applications,» de Conference Series of Journal of Physics, 2015.

[119] ARM Ltd., «ARM Energy Probe,» 2016. [En línea]. Available: http://ds.arm.com/ds-5/optimize/arm-energy-probe/. [Último acceso: 01 03 2016].

[120] ARM Ltd., «ARM DS-5 Development Studio,» 2016. [En línea]. Available: http://ds.arm.com/ds-5/. [Último acceso: 01 03 2016].

[121] G. L. Tsafack Chetsa, G. Da Costa, L. Lefevre, J.-M. Pierson, O. Ariel y B. Robert, «Energy aware approach for hpc systems,» de High-Performance Computing on Complex Environments, 2014.



of 173

[122] J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. Aamodt y V. J. Reddi, «Gpuwattch: Enabling energy optimizations in gpgpus,» de Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013.

[123] M. Assuncao, J.-P. Gelas, L. L. Lefèvre y A.-C. Orgerie, «The green grid’5000: Instrumenting and using a grid with energy sensors,» de Remote Instrumentation for eScience and Related Aspects, 2012.

[124] F. Rossigneux, J.-P. Gelas, L. Lefevre y M. D. de Assuncao, «“A generic and extensible framework for monitoring energy consumption of openstack clouds,” arXiv preprint arXiv:1408.6328, 2014.,» de IEEE Fourth International Conference on Big Data and Cloud Computing (BdCloud), Sydney, 2014.

[125] Intel Inc., «IPMI Specification, V2.0, Rev. 1.1,» 22 03 2016. [En línea]. Available: http://www.intel.com/content/www/us/en/servers/ipmi/ipmi-second-gen-interface-spec-v2-rev1-1.html.

[126] Intel Corportation, «Intel® 64 and IA-32 Architectures Software Developer’s Manual,» 2015.

[127] E. Rotem, A. Naveh, D. Rajwan, A. Anathakrishman y E. Weissmann, «Power-management architecture of the Interl microarchitecture code-named Sandy Bridge,» IEEE Micro, vol. 32, nº 2, pp. 20-27, 2012.

[128] AMD, «AMD Family 15th Processor BIOS and Kernel Developer Guide,» 2015.

[129] NVIDIA Corporation, NVML API Reference Manual, 2012.

[130] Intel Corportation, «Intel Xeon Phi Coprocessor System Software Developers Guide,» 2012.

[131] V. M. Weaver, D. Terpstra, H. McCraw, M. Johnson, K. Kasichayanula, J. Ralph, J. Nelson, P. Mucci, T. Mohan y S. Moore, « PAPI 5: Measuring power, energy, and the cloud,» de IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, 2013.

[132] V. M. Weaver, M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra y S. Moore, «Measuring Energy and Power with PAPI,» de 41st International Conference on Parallel Processing Workshops (ICPPW), Pittsburgh, 2012.

[133] J. Dongorra, H. Ltaief, P. Luszczek y V. M. Weaver, «Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures,» de Second International Conference on Cloud and Green Computing (CGC), Xiangtan, 2012.

[134] M. Hennecke, W. Frings, W. Homberg, A. Zitz, M. Knobloch y H. Böttiger, «Measuring power consumption on ibm blue gene/p,» Computer Science - Research and Development, vol. 27, nº 4, p. 329–336, 2012.

[135] Y. Georgiou, T. Cadeau, D. Glesser, D. Auble, M. Jette y M. Hautreux, «Energy Accounting and Control with SLURM Resource and Job Management System,» de Distributed Computing and Networking - Lecture Notes in Computer Science, 2014.

[136] D. Hackenberg, T. Ilsche, J. Schuchart, R. Schone, W. E. Nagel, M. Simon y Y. Georgiou, «HDEEM: High Definition Energy Efficiency Monitoring,» de Energy Efficient SuperComputing SC, 2014.

[137] Co-design, «Co-design at Lawrence Livermore National Lab,» 2015. [En línea]. Available:



of 173

https://codesign.llnl.gov/. [Último acceso: 1 3 2015].

[138] A. Yoo, M. Jette y M. Grondona, «SLURM: Simple Linux Utility for Resource Management,» de 9th International Workshop, Job Scheduling Strategies for Parallel Processing (JSSPP), 2003.

[139] S. Zhou, X. Zheng, J. Wang y P. Delisle, «Utopia: A load sharing facility for large, heterogeneous distributed computer systems.,» 1993. [En línea].

[140] IBM, «IBM loadleveler,» 2001. [En línea]. Available: http://www.redbooks.ibm.com/redbooks/pdfs/sg246038.pdf . [Último acceso: 22 03 2016].

[141] R. Henderson, «Job scheduling under the portable batch system,» de Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing (IPPS), London, 1995.

[142] Adaptative Computing, «Moab workload manager,,» [En línea]. Available: http://www.adaptivecomputing.com/resources/docs/mwm/7-0/help.htm. [Último acceso: 22 03 2016].

[143] D. Thain y T. L. M. Tannenbaum, «Distributed computing in practice: the condor experience,» Concurrency - Practice and Experience, vol. 17, nº 2-4, p. 323–356, 2005.

[144] N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin, G. Mouni ́, P. Neyron y O. Richard, «A batch scheduler with high level components,» de 5th Int. Symposium on Cluster Computing and the Grid, 2005.

[145] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker y I. Stoica, «Mesos: A platform for fine-grained resource sharing in the data center,» de Proceedings of the 8th USENIX conference on Networked systems design and implementation.

[146] V. Vavilapalli, A. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth y e. al., «Apache hadoop yarn: Yet another resource negotiator,» de Proceedings of the 4th annual Symposium on Cloud Computing, 2013.

[147] Z. Zhang, C. Li, Y. Tao, R. Yang, H. Tang y J. Xu, «Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale,» de Proceedings of the VLDB Endowment, 2014.

[148] M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek y J. Wilkes, «Omega: flexible, scalable schedulers for large compute clusters,» de Proc. EuroSys., 2013.

[149] M. Ballano Barcena y C. Wueest, «Insecurity in the Internet of Things. Symantec Report Version 1.0,» [En línea]. Available: https://www.symantec.com/conten/dam/symantec/docs/white-papers/insecurity-in-the-internet-of-things.pdf. [Último acceso: 2016 03 08].

[150] B. Lowans y E. Perkins, «Focus, Big Data Needs a Data-Centric Security,» Gartner, 2014.

[151] M. Abomhara y G. M. Køien, « Security and privacy in the Internet of Things: Current status and open issues,» de International Conference on Privacy and Security in Mobile Systems (PRISMS), Aalborg, 2014.

[152] H. Yue, L. Guo, R. Li, H. Asaeda y Y. Fang, «DataClouds: Enabling Community-Based Data-Centric Services Over the Internet of Things,» IEEE Internet of Things Journal , pp. 472 - 482, 10 2014.



of 173

[153] J. Rose, C. Barton, R. Souza y J. Platt, «The Trust Advantage: How to Win with Big Data,» Boston Consulting Group, 2015.

[154] RERUM Project, «RERUM: REliable, resilient and secUre IoT for sMart city applications,» 2016. [En línea]. Available: https://ict-rerum.eu. [Último acceso: 01 04 2015].

[155] SMARTIE Project, «SMARTIE - Secure and smarter cities data management,» 2013. [En línea]. Available: http://www.smartie-project.eu. [Último acceso: 2015 04 01].

[156] Sharcs Consortium, «Secure Hardware-Software Architectures for Robust Computing Systems.,» 2016. [En línea]. Available: http://www.sharcs-project.eu/.

[157] European Commission, «2020 Energy Strategy - European Commission,» 06 05 2016. [En línea]. Available: https://ec.europa.eu/energy/en/topics/energy-strategy/2020-energy-strategy.

[158] COMP Superscalar. [En línea]. Available: https://www.bsc.es/computer-sciences/grid-computing/comp-superscalar. [Último acceso: 25 4 2016].

[159] OMPs, «The OmpSs Programming Model,» [En línea]. Available: https://pm.bsc.es/ompss. [Último acceso: 25 4 2016].

[160] «JVM Monitor - Java profiler intergrated with Eclipse,» [En línea]. Available: http://www.jvmmonitor.org/. [Último acceso: -4 12 2014].

[161] «JouleUnit - A generic framework for profiling ICT applications,» [En línea]. Available: https://code.google.com/p/jouleunit/. [Último acceso: 12 04 2014].

[162] «Eclipse Open Source IDE,» [En línea]. Available: http://www.eclipse.org. [Último acceso: 12 04 2014].

[163] Oracle, «Java SE Development Kit 7,» [En línea]. Available: http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html. [Último acceso: 04 12 2014].

[164] Y. Georgiou, T. Cadeau, D. Glesser, D. Auble, M. Jette y M. Hautreux, «Energy Accounting and Control with SLURM Resource and Job Management System,» de 15th International Conference, ICDCN, Coimbatore, 2014.

[165] FreeIPMI Core Team, «GNU FreeIPMI,» 2014. [En línea]. Available: http://www.gnu.org/software/freeipmi/. [Último acceso: 19 04 2016].

[166] ICL Team, «Performance Application Programming Interface,» [En línea]. Available: http://icl.cs.utk.edu/papi/. [Último acceso: 19 04 2016].

[167] M. Jurenz, «VampirTrace,» 2014. [En línea]. Available: https://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampirtrace. [Último acceso: 19 04 2016].

[168] «Mysql - An Open Source Database,» [En línea]. Available: www.mysql.com. [Último acceso: 01 20 2014].



of 173

[169] «QEMU – A generic and open source machine emulator and virtualizer,» [En línea]. Available: http://wiki.qemu.org. [Último acceso: 29 01 2014].

[170] «Libvirt – The virtualization API. A toolkit to interact with the virtualization capabilities of recent versions of Linux.,» [En línea]. Available: http://libvirt.org/. [Último acceso: 01 12 2015].

[171] A. Project, «Title 1,» Computer Magazine, vol. 3, nº 1, p. 2, 2013.

[172] A. Project, «Title 2,» de CloudCom, Madrid, 2012.

[173] ASCETiC Consortium, D2.2.3: ASCETiC Requirement Specification - Year 3, 2015.



of 173

Annex A. Results from Market and Value Internal Workshop

We now present the results and conclusions extracted from the work done in the internal market and value workshop.

A.1 Markets

- Vendors of HPA Systems (CPU/GPU/FPGAs/DSP/Embedded Chip Market): paving the ground for

developers… enabling to be able to develop/create/migrate for their hardware platforms (and

mixed) … with frameworks and toolkits for their systems… making it dramatically easier to

program heterogeneous computing devices… to program for their HW… to sell their HW

o Frameworks/environments/Architectures/Programing Models/languages/ for all kinds of

HW

o Simulation of Heterogeneous infrastructures and hardware

o Creating hardware

- HPC Vendors: creating tools for HPC

- Software Development: tools for design, development, operation, simulation

o Develop applications for HPAs ... parallel,

o High Computing demands,

o HPC devs

o New devs into HPC, parallel, optimized SW

o Hybrid systems. Cloud + more…

o Simulation:

Energy

Heterogeneous infrastructures and hardware

o For Optimized SW (energy, other dimensions…)

o Embedded Systems & CPS & IoT Platforms that Need computing capabilities, and/or low-

power characteristics Internet of Things for Industrial, Smart Homes, Smart Cities

- Software Vendors/Integrators, Workload/Resource/Scheduler Management & System

Administration:

o Efficient Manage of Heterogeneous computing clusters with HPAs

o HPC Workload Management (Pure HPC & Hybrid HPC/BigData/Cloud infrastructures)

o SW for Data Centers, HPC and Computational Farms:

Resource & Job Management (RJMS); Scheduler & Workload Allocation Managers ()

Data Center Infrastructure Management (DCIM);

Cloud Platform Management Solutions (CPM);

Data Center Infrastructure Efficiency (DCIE) Vendors (Commercial & OSS)

o Optimization suites



of 173

- Data Centers and Computational Farms: Data Center Administrators/Operators/Users as

demanding users for management & workload allocation tools, optimization tools and energy

reduction/management tools

- Domains & Contexts of Use:

o Computational intensive end-user applications:

Big Data (&Near-realtime Big Data Analytics),

eScience: Scientific/Research,

HPC Apps,

Image compression for space applications,

Neural networks

A.2 Market Top Influencers

Hardware:

- HW Vendors:

o CPUS: Intel , AMD (who plans for CPU+GPU with shared memory on chip for 2017), ARM

(low-power chip maker), Cisco, Samsung

o FPGAs: Xilinx, Altera … Intel (who just bought Altera – FPGA maker)

o GPUs: Nvidia

o … others …

Software:

- Khronos Group (consortium behind OpenCL)

- SLURM Workload Manager

- Operating system market

Domains and Context of Use

- IoT:

o Jasper

o Google Brillo

o Xively



of 173

Annex B. Trends/Requirements identification from Internal Market and Value Workshop From the market analysis the team has involved in the definition of the questions that will be asked to stakeholders in the engagement process that is now underway. The intention is to validate trends and requirements that have been found in the early stage market analysis.

Next we present these trends and requirements grouped in three groups: energy consumption/optimization; development frameworks; and technical aspects.

B.1 Energy consumption/optimization:

- Compliance with regulations and policies (EU 20% policy)

- What’s the incentive for taking QoS (Performance; Security, Availability, other) reduction vs Energy

Optimization trade-offs

o Corporate policies:

Social Corporate Responsibility;

Cost Reduction (Power – Energy Price; Data Center Heating, other)

Marketing value:

Green Badge/Award (provider, company level, or user consumption)

Different Green Services Offerings

SLAs

o Technical

Global optimizations (energy cost)

Optimizing in specific areas (Server side Computing, Server side data Storing, Server

Side Data Networking, Telco Networking, Client Devices)

Customers QoS fulfillment

Architecture Performance vs cost () … minimum optimization margin to adopt

technology… ROI

Developers/Applications/Frameworks readiness that consider energy profiles

Savings may improve your products regarding following aspects?

Autonomy

Battery cost or life

Reliability

Size

Deployment simplification

Ergonomics

o IT related cost:

IT-related hardware cost (servers, storage, cooling),

hardware depreciation cost,

maintenance cost,



of 173

B.2 Design\Development\Operation for (Optimized) Heterogeneity:

- Current Solutions:

o what types of hardware products are considered to be Heterogeneous computing systems

?

o what types of software products are considered to leverage Heterogeneous computing

systems ?

- Concerns/Capabilities required for Heterogeneous computing systems ?

o Security

o Energy/power saving

o Performance

- What current solutions offer for HCS:

o Current dev environment

o Current programming model

o Tools for HCS:

Design new

Adapt current

Optimization Dimensions (energy, etc..)

o Tools for benchmarking energy

- Solutions must have:

o Have Certification for the Platform & Tools

o Open Source vs Commercial

B.3 Technical Requirements

- Design or Plan to design heterogeneous computing systems:

o CPU + GPU,

o CPU + FPGA,

o CPU + DSP,

o DSP + FPGA,

o Operating system heterogeneity

o Other (identify)

o for multi-core CPUs or SMP (symmetric multiprocessor) architectures

o develop internet connected devices

o network or internet distributed devices that share some kind of processing

o embedded systems (that require intensive computation tasks)

embedded processors used for embedded developments?

X86

Arm

PowerPC

Microcontrollers

DSPs

Others (please identify)

operating systems for your product development?

Embedded Linux

Workstation Linux



of 173

Android

Embedded Windows

Workstation Windows

MacOS

Real-time kernels (please identify)

None (development without OS)

Others (please identify)

- Type of Applications:

o hard real-time applications

o image processing or signal processing

o neural networks technologies in distributed or potentially distributed applications

o artificial intelligence

- Middleware specific operating system?

o A Linux kernel?

o A Linux distribution?

o Windows?

- Dev IDE:

o Eclipse?

o Visual Studio?

o QT IDE?

o None?

Technical capabilities are required for middleware?

o OS support

o Processor family support

o FPGA compilation support

o GPU code generation support

o Network distribution support

o Support for Internet protocols

o Support for web-services

o Multi-CPU support

o Available programming languages



of 173

Annex C. Characterizing an Interviewee

To best capture valuable information provided by an interviewee, it is crucial to understand who an

interviewee is and where her or his expertise and experiences are. The questionnaire is divided in

three parts.

The first two parts attempts to understand the philosophical and psychological viewpoint

from the interviewee with regards to:

◦ energy concerns in general

◦ application heterogeneity (in our context…)

The last part identifies the stakeholder’s roles for which an interviewee has an expertise or a

significant experience.

C.1 Interviewee and Organization Data

Name:

Organisation:

Business Type (i.e.: what is the business of your organization?):

Must interview data be anonymized? YES NO

What are your target market(s) for your product(s) or project(s)?

o Consumers

o Industrials

o Medical

o Telecommunications

o Aeronautics

o Space

o Security & Defence

o Energy

o TV broadcast

o Entertainment (except TV broadcast)

o Graphics & publishing

o Finance

o Other (please identify)

Are you familiar with the development of multi-core applications (embedded, desktop, grid, etc.)?

Yes/No

Do you design, use or make heterogeneous computing systems?

Yes/No/Plan to in the future



of 173

C.2 Understanding philosophical and psychological viewpoint

C.2.1 Green Energy

The goal of this part of the interview is to understand whether an interviewee is rather prone to the

Green argument or whether they are now tired of too much Green Washing in general and in the

particular case of ICT and Cloud computing.

Question: Can you identify the general scenario that best fits the mindset of the interviewee? Feel

free to add additional relevant info provided by the interviewee.

Scenario 1 - The Doom's day scenario - Due to increasing energy demand, all type of energy

(brown, green) will become scarce in the next XX years (XX to be decided by the

interviewee). Thus while it is important to develop technologies for generating green energy,

it is even more crucial to drastically reduce the energy consumption of all aspects of society.

Otherwise, we will have no reliable and sustainable energy production available in XX

years. In all cases, we can anticipate an explosion in energy price.

Scenario 2 - The Optimist’s scenario - Although they will be an increasing demand in

energy, no shortage will happen in the next 50 to 100 years and by then new technologies

will be able to supply all the energy needed (by decreasing energy consumption of various

devices and by increasing energy supply brown or green). Although we might see shifting

energy cost, it will be due to speculation rather than a real problem on supply vs demand.

Hence in the long term we can expect energy pricing to increase in relation to overall

inflation level.

Scenario 3 - The centrist scenario - The increasing demand in energy and the inability

Humans to generate green energy producing technologies quickly enough will generate

pollution problems however, there will not be a overall supply problem. It is therefore

equally important to develop technologies for producing green energy as it is to develop

technologies that requires less power to run. (the change in cost factor can then take 2

direction for Scenario 3)

Scenario 3.1 - ... with a pessimistic touch - To control the increased pollution level,

energy cost will likely increase so that supply from Green and brown energy

suppliers will become competitive.

Scenario 3.2 - ... with an optimistic touch - The increased pollution level will not

play a role in energy pricing. Hence in the long term we can expect energy price

increase related to overall inflation.



of 173

C.2.2 Heterogeneity

The goal of this part of the interview is to understand whether an interviewee is rather prone to the

Hardware/architecture heterogeneity argument or not in general and in the particular case of ICT.

Question: Can you identify the general scenario that best fits the mindset of the interviewee? Feel

free to add additional relevant info provided by the interviewee.

Scenario 1 - The Doom's day scenario – In the next XX years (XX to be decided by the

interviewee), there will be so many different types of hardware architecture that it will be

increasingly difficult and cost a substantial amount to maintain and develop on top of these

varying hardware platforms.

Scenario 2 - The Optimist’s scenario – There is an increasing heterogeneity in hardware

platforms, which is offering new opportunities. It will lead to the widespread acceleration of

many different applications.

Scenario 3 – The centrist scenario – The increasing heterogeneity in hardware platforms,

offers opportunities in some areas for the use of hardware accelerators. This will result in the

acceleration of specific applications, but overall the impact will be limited to these areas.



of 173

C.3 Expertise and experiences of an interviewee

This part of the interview attempts to identify the expertise and significant experiences that an

interviewee has and then deduce what stakeholders roles of interest to Tango she or he can answer

to.

The approach used would be to see which of the following categories the interviewee feels they

belong to:

Role Personal Experience Organisation Level

Software designer

Software developer

System engineer

Hardware vendor

(designer, manufacturer, seller)

Software Middleware or Tools

Vendor (Workload Management, Data

Center Asset Management, Data

Center Infrastructure Management)

System Integrator

Platform user

Platform Operator/Owner

Asset Management (Energy

Management and Environment

Sustainability Management)

Others that you see…



of 173

It would be useful to record notes of contextual information showing the interviewee’s expertise in

the roles identified i.e. they may have done one such role in the past, or X is done in their

organisation but they are not directly involved.

Notes:

Do you have experience with Internet of things? Yes/no

If so what?: …………….

Do you have any experience of High performance computing (HPC)?

If so what?: …………….



of 173

Annex D. Business/Technical Requirements Questionnaire

In the Questionnaire below, each question is target to a particular stakeholder role.

If not done yet, first proceed by answering the questionnaire « Characterizing an Interviewee ».

Subsequently, only provide answers to question for the identified stakeholders roles.

The roles of stakeholder covered in this questionnaire are:

Application/Software designer

Application/Software developer

System engineer

Hardware vendor (engineer, designer, manufacturer, seller)

Software Middleware or Tools Vendor (Workload Management, Data Center Asset

Management, Data Center Infrastructure Management)

System Integrator

Platform user

Platform Operator/Owner

Asset Management (Energy Management and Environment Sustainability Management)

The following set of questions are aim to understand the business perspective of the person being

interviewed. Based on the role/roles determined in “Characterizing an Interviewee” the interviewer

has to select which questions are applicable. The roles associated with a question are indicated

before each question.

D.1 Questions to Consortium R&D Partners, to Consortium Industry Partners and beyond:

D.1.1 Business Requirements:

All stakeholders - Have you ever heard about the EU's 20% policy objective?

Platform user - If given an alternative between similar hardware platforms, would you be

attracted by the one displaying lower overall energy consumption?

Even if the one with the lower energy consumption has the same cost than the other

offers?

Even if the one with the lower energy consumption is more expensive than the other

offers? How much more would you accept to pay for the offer with lower energy

consumption

If you answer 'NO' to the 2 questions above, how much cheaper would the offer with

lower energy consumption need to be for you to opt for it?



of 173

Platform user, Platform Operator, Hardware vendor - To what degree is energy consumption

a problem for you?

if yes- what is the motivation for energy concerns: [] cost , [] corporate social

responsibility, [] heat, [] other, which ones?

Platform user, Platform Operator, Hardware vendor – Is heterogeneity an important aspect

for your current or future computing platform. What type of heterogeneous hardware are you

(or expect to be) interested in: manycore CPU, GpGPU, FPGA, SoC, other? For what type

of usage/workloads : HPC, BigData, Cloud, IoT , combinations?

Platform user, Platform Operator, Hardware vendor – What type of middleware/runtime

environment do you currently use? Are you willing to change this middleware/runtime

environment that you are used of if a new one would allow your system to better handle

heterogeneity and become more energy efficient? How flexible do you think you are on this

aspect? Ranking (low) 1-5 (high)

Platform user, Platform Operator, Hardware vendor –Would you be interested in

experimenting with new open-source toolbox for optimization of energy efficiency within

heterogeneous architectures and share your use cases and feedback? If yes would you care

for intermediate versions or directly the final one?

Platform user - For what quality of service would you accept to lower your requirements if

this reduced energy consumption? In particular:

To what degree are you willing to decrease performance, security, availability or

other quality of service? And for this decrease how much % of energy savings would

you expect minimally? How would you prioritize between these objectives?

In general, if you are NOT willing to decrease quality of service for the overall cloud

service, what if we think about subset of functionality, for example, for this given

subset of functionality of the could service, I would then accept to reduce my

requirements for performance, availability, security, or other quality of service.

Platform user - What kind of other incentive would you expect from a "Hardware platform

Operator" to convince you to select the service with a lower energy consumption?

The two questions above propose incentives on price and on quality of service

against energy performance. What other considerations would you find relevant?

Green Energy vs Brown Energy?

Other types of rewards? or penalty?

Other way for you to transitively value your image (as a customer who

display an energy responsible behaviour)?

Platform Operator - Which percentage represents energy costs on your organization? (In

particular, this questions only covers IT related cost such as IT-related hardware cost

(servers, storage, cooling), hardware depreciation cost, maintenance cost, manual operation

cost vs the energy cost for operating the IT-related hardware.)



of 173

Platform Operator - What level of dependence to external software -such as the one to be

delivered by TANGO- are you willing to accept?

Platform Operator - Which percentage of energy saving will you consider as a minimum for

applying the Tango solution in your business? (NOTE: solution may go beyond a product

and actually be a set of tools and techniques to learn and apply by your development team as

well as the staff on operation side)

Platform Operator - Do you generate your own Energy (is it green or brown)?

Question above is only relevant if the Platform Operator runs the applications on

their own infrastructure

Platform Operator - Are you already using any tool or techniques for energy optimization?

Which ones?

Platform Operator - What type of energy consumption concerns you? Server side

Computing, Server side data Storing, Server Side Data Networking, Telco Networking (eg

for Internet connectivity to client), Client Devices

Platform Operator - What pricing model have you seen that is directly or indirectly taking

energy into account?

Is this pricing model reflective of a theoretical or an effective energy consumption?

It could also well be that energy consumption is just a side effect and is not directly

targeted. e.g. a pay-per-access to access Storage may be viewed as energy related

approach where it may have been introduced due to mean time failure of hard disk.

Platform Operator - Do you use a complex cost model for the energy you pay for, such as

lower cost on some unpredictable or predictable periods of time (e.g.: cheaper at night)?

Platform Operator - Do you believe that various usage profiles would allow deploying

applications differently for different customer eventually leading to energy savings?

Platform Operator - What other approach beside usage profiles do you believe could help

save energy of your hardware platform?

Platform Operator - Would you find it beneficial to propose variable business offers to your

customer where energy consumption concerns could be included next to other quality of

service aspects? (In particular, current offers propose different cost for improved

availability, improved reliability, improved performance. Do you believe that adding energy

consumption aspects to your offers would be appreciated (and use) by existing customer or

help to attracted new customers?)

Platform Operator - What kind of incentive or disincentive, would you expect from relevant

authorities to encourage you to run a platform with a lower energy consumption or lower

brown energy consumption?

That they award an eco-label to a platform so that remains under a given Watt per



of 173

CPU? Other metrics

That they give tax Rebate/Exemption

What else?

would inaction from authorities encourage you not to move with regards to energy

aspects?

Platform Operator - Do you consider that displaying an energy-aware behaviour can

positively impact on your customer experience or constitute as motivation for businesses to

become new customers?

Platform Operator - What kind of incentive or disincentive, would you be interested to offer

to your customer to promote an energy responsible behaviour?

Grant/Award eco-label to your customers (who show an energy responsible

behaviour)?

Platform Operator - (Furthermore, this question should be addressed to a stakeholder at

Management level) How interested would you be for your Software designer, Software

developer, System engineer, Hardware designer and testers to increase their knowledge on

Green Development practices?

Would you pay for their training? (Explain/Argue you answer)

Platform Operator, Hardware Vendor, System Integrator – What software products do you

use/sell that leverages heterogeneous computing systems?

Platform Operator, Hardware Vendor, System Integrator – What hardware products do you

use/sell that could be considered heterogeneous computing systems?

Platform Operator, Platform User – How much of a concern is security within your

heterogeneous computer systems? Ranking (low) 1-5 (high).

Platform Operator, Platform User – How much of a concern is energy/power saving within

your heterogeneous computer systems? Ranking (low) 1-5 (high).

Platform Operator, Platform User – How much of a concern is performance within your

heterogeneous computer systems? Ranking (good) 1-5 (bad).

System Engineer, Software Designer and Hardware Designer - Do you believe it is worth

adding energy consideration to SLA? Why or Why not?

Software Designer, Software Developer and Software Engineer – What development

environment and programming model do you currently/mostly use for your applications?

How flexible do you think you are in changing them for a new one that will help you

leverage heterogeneity and optimize your code in terms of energy efficiency? Ranking (low)

1-5 (high)

Software Designer, Software Developer and Software Engineer – Would you be interested

in tools to help you design / adapt your applications in order to execute optimally upon



of 173

heterogeneous architectures? Ranking (low) 1-5 (high) How much would you be willing to

invest on that ? Ranking (low) 1-5 (high)

Software Designer, Software Developer and Software Engineer – Would you be interested

in tools to help you design / adapt your applications in order to become energy efficient?

Ranking (low) 1-5 (high) How much would you be willing to invest on that ? Ranking (low)

1-5 (high)

Software Designer, Software Developer and Software Engineer – If you answered positively

on at least one of the 2 previous questions would you be willing to evaluate intermediate

versions of the TANGO toolbox, share your use cases and give us your feedback.

Software Developer - Would you appreciate to learn how to develop applications that

require less energy to run? or would you give priority to other type of training? If so why,

e.g. other training have more added value to your CV? What would it take for you to attend

such a training?

Platform Operator - In order to determine if TANGO tool chain and framework generate an

energy savings, we must be able to benchmark the energy need of your hardware platform

before and after using TANGO results, if you were an TANGO use case, what strategies do

you believe would be appropriate in your case to obtain the "before TANGO" benchmark

results and then the "after TANGO" benchmark results?

Platform Operator, Platform User, Software Middleware or Tools Vendor – Does the

platforms that you use require any specific certification to allow its integration in your

development flow? If yes, which ones?

Platform Operator, Platform User – Would you be ready to pay for such middleware? What

is the acceptable price range (consider a royalty model – indicate the considered number of

pieces)?

Platform Operator, Software Middleware or Tools Vendor – Would you be interested to use

middleware even if it is not certified?

Platform Operator, System Integrator – Do you prefer :

a free open source distribution with non-contractual support through a

community or

a fee-based commercial distribution with contractual support?

Software Developer, Software Middleware or Tools Vendor – Is having access to the

source code of the middleware and being allowed to modify it important or useful for you?

Software Developer, Software Middleware or Tools Vendor – Would it be acceptable for a

middleware to be published under a licence like the GPL which implies that you must

publish the source files of the applications?

Software Developer, Software Middleware or Tools Vendor – Would it be acceptable for a



of 173

middleware to be published under a licence like open-source licence like BSD which

implies only reproducing licence information with your products?



of 173

D.2 Technical Requirements:

Software Designer, Software Developer and Software Engineer – Do you design or plan to

design heterogeneous computing systems:

o CPU + GPU,

o CPU + FPGA,

o CPU + DSP,

o DSP + FPGA,

o Operating system heterogeneity

o Other (identify)

Software Designer, Software Developer and Software Engineer – Do you develop for multi-

core CPUs or SMP (symmetric multiprocessor) architectures?

Software Designer, Software Developer and Software Engineer – Do you develop internet

connected devices?

Software Designer, Software Developer and Software Engineer – Do you develop network

or internet distributed devices that share some kind of processing?

Software Designer, Software Developer and Software Engineer – Do you have some need

for remotely accessible processing capabilities?

Software Designer, Software Developer and Software Engineer – Do you use secure internet

connections in your applications?

o If yes, which secure protocols do you use?

o If yes, what are your security constraints?

Software Designer, Software Developer and Software Engineer – Do you develop hard real-

time applications?

Software Designer, Software Developer and Software Engineer – What kind of embedded

processors are you using or foreseeing to use for your embedded developments?

o X86

o Arm

o PowerPC

o Microcontrollers

o DSPs

o Others (please identify)

Software Designer, Software Developer and Software Engineer – What are the relevant

operating systems for your product development?

o Embedded Linux

o Workstation Linux

o Android

o Embedded Windows

o Workstation Windows

o MacOS

o Real-time kernels (please identify)

o None (development without OS)

o Others (please identify)



of 173

Software Designer, Software Developer and Software Engineer – Do you develop on

FPGAs?

o Xilinx

o Altera


Software Designer, Software Developer and Software Engineer – Do you develop embedded

systems that require intensive computation tasks?

Software Designer, Software Developer and Software Engineer – Do you use or foresee to

use FPGA as computational accelerators?

Software Designer, Software Developer and Software Engineer – Do you use C or C++

compilation for FPGAs?

Software Designer, Software Developer and Software Engineer – Do you develop specific

ASICs for embedded systems?

Software Designer, Software Developer and Software Engineer – Do you use SOCs

components?

o If yes, what kind of SOC do you use?

Xilinx Zync

Multi-core DSPs

NVidia TEGRA

Other (please identify)

Software Designer, Software Developer and Software Engineer – Do you develop GPU

code?

o Using CUDA

o Using OpenCL

o Using C or C++ (please identify the framework that is used)

o Using another language (identify)

Software Designer, Software Developer and Software Engineer – What are the

programming languages used for the development of relevant products in your company

o C

o C++

o C#

o Java

o VB

o VB.net

o ADA

o Fortran

o Python

o Assembler




of 173

Software Designer, Software Developer and Software Engineer – Do you already use some

technologies (middleware or libraries or compiler or architecture) for parallel computing or

for management of distributed software?

o MPI

o CORBA-compliant

o OpenMP

o OpenCL

o OpenACC

o SLURM

o Compss

o Ompss

o Web services

o Parallel compiler (please identify)

o Proprietary middleware


Software Designer, Software Developer and Software Engineer – Are your applications

including some image processing or signal processing?

Software Designer, Software Developer and Software Engineer – Do you use external

libraries for this? Which ones

o OpenCV

o Intel Performance (IPP) libraries

o PIL


Software Designer, Software Developer and Software Engineer – Do you use or intent to

use neural networks technologies in distributed or potentially distributed applications?

o If yes, what framework or library do you use?

o Caffe

o Torch

o FANN

o OpenCV

o Commercial libraries or code generator (please identify)

o Proprietary libraries or code


Software Designer, Software Developer and Software Engineer – Do you use other

“artificial intelligence” oriented technologies in your development? If yes, identify the

technologies used.

Software Designer, Software Developer and Software Engineer – What is the computing

power range of your applications that use parallelism or distributed or heterogeneous

computing (Min – Max in MFlops or Mips) ?

o < 1 Gips

o 1..1000 Gips

o Over 1000 Gips

Software Designer, Software Developer and Software Engineer – What kind of distribution

would be the most relevant to you?



of 173

o SMP CPUs

o Distributed computing nodes over a network or the internet

o CPU + GPU

o CPU + FPGA

o CPU + DSP

o DSP + FPGA

Software Designer, Software Developer and Software Engineer – Do you feel energy saving

is an important design criterion for your products?

Software Designer, Software Developer and Software Engineer – Do you feel energy

savings may improve your products regarding following aspects?

o Autonomy

o Battery cost or life

o Reliability

o Size

o Deployment simplification

o Ergonomics

Software Designer, Software Developer and Software Engineer – Do you feel that software

is designed may have a significant impact on the energy consumption of your product?

Software Designer, Software Developer and Software Engineer – Do you feel that using a

distribution middleware like TANGO may bring such energy savings?

Software Designer, Software Developer and Software Engineer – Do you feel that using

FPGA accelerators may help saving energy?

Software Designer, Software Developer and Software Engineer – Do you expect a

middleware for the development of heterogeneous or distributed platforms include energy

management functions?

Software Designer, Software Developer and Software Engineer – Is the presence of these

energy management function a criterion for the selection of this middleware?

Software Designer, Software Developer and Software Engineer – Is software scalability

important for your product development cycles?

Software Designer, Software Developer and Software Engineer – What are the most

important technical criterions for selection of such a middleware? Please select aspects that

would have a significant relevance in an evaluation.

o OS support

o Processor family support

o FPGA compilation support

o GPU code generation support

o Network distribution support

o Support for Internet protocols

o Support for web-services

o Multi-CPU support

o Available programming languages



of 173

o Code optimisation efficiency

o Presence of energy management functions

o Access to OS native functions

o Access to peripherals and/or drivers

o Data distribution management tools

o Development system IDE integration

o Middleware code size

o Security management capabilities

o Debugging tools

o Simulation tools

o Integration capabilities with mathematical libraries

o Integration with image processing or signal processing libraries

o Integration with neural network libraries

o Aspects related to graphical user interface integration

o Deployment tools

Software Designer, Software Developer and Software Engineer – Would you use a

middleware that impose the presence of a specific operating system? What do you think if

the operating system is

o A Linux kernel?

o A Linux distribution?

o Windows?

Software Designer, Software Developer and Software Engineer – Would you use a

middleware that is bound to a single development environment (IDE)? What do you think if

this environment is

o Eclipse?

o Visual Studio?

o QT IDE?

o None?

Software Designer, Software Developer and Software Engineer – Would you accept that such a

middleware requires compilation or other configurations outside the development environment

(IDE)?

What energy related measures are you interested to monitor within your computing system

maximum instantaneous power consumption (Watts)

total energy consumption within a certain period (KWh)

energy efficiency (Flops per Watt) other

Other

Which energy related metrics are you interested in: TCO, PUE, TUE, iTUE, CUE, ERE , other ?

transparent heterogeneous hardware architecture deployment...

Documents