the axiom project (agile, extensible, fast i/o...

8
The AXIOM project (Agile, eXtensible, fast I/O Module) Dimitris Theodoropoulos, Dionisis Pnevmatikatos Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH) GR-70013 Heraklion - Crete, Greece Email: {pnevmati,dtheodor}@ics.forth.gr Carlos Alvarez, Eduard Ayguade, Javier Bueno, Antonio Filgueras, Daniel Jimenez-Gonzalez, Xavier Martorell, Nacho Navarro Computer Sciences Dept., Barcelona Supercomputing Center, Barcelona, Spain and Computer Architecture Dept. Universitat Politecnica de Catalunya Barcelona, Spain Email: [email protected] Carlos Segura, Carles Fernandez, David Oro, Javier Rodriguez Saeta Herta Security, S.L. Email: [email protected] Paolo Gai Evidence SRL, Pisa, Italy Email: [email protected] Antonio Rizzo Roberto Giorgi University of Siena Siena, Italy Email:[email protected] Abstract—The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new software/hardware archi- tectures for the future Cyber-Physical Systems (CPSs). These systems are expected to react in real-time, provide enough computational power for the assigned tasks, consume the least possible energy for such task (energy efficiency), scale up through modularity, allow for an easy programmability across performance scaling, and exploit at best existing standards at minimal costs. Current solutions for providing enough computational power are mainly based on multi- or many-core architectures. For example, some current research projects (such as ADEPT or P- SOCRATES) are already investigating how to join efforts from the High-Performance Computing (HPC) and the Embedded Computing domains, which are both focused on high power efficiency, while GPUs and new Dataflow platforms such as Maxeler, or in general FPGAs, are claimed as the most energy efficient. We present the project’s initial approach, ideas and key concepts, and describe the AXIOM preliminary architecture. Our starting point uses power efficient multi-core nodes, such as ARM cores and FPGA accelerators on the same die, as in the Xilinx Zynq. We will work to provide an integrated environment that supports programmability of the parallel, interconnected nodes that form a CPS system, and evaluate our ideas using demanding test application scenarios. Keywords-Cyber-Physical Systems, Parallel Programming Models, Smart Video Surveillance, Smart Living/Home I. I NTRODUCTION AND PROJECT OBJECTIVES We are entering the Cyber-Physical age, in which both objects and people will become nodes of the same digital network for exchanging information. Therefore, the gen- eral expectation is that ”things” or systems will become somewhat smart as people, allowing a rapid and close interaction not only system-system, but also human-system, system-human. Moreover, through smart systems, the human behavior daily routines are hopefully and simplified. The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new hardware/software architectures for CPSs in which the above expectations are possibly realized. The project, started on February 2015, will span over 3 years. The coordination of the project is carried out by the University of Siena (UNISI). UNISI also takes the evaluation part of the project. Foundation for Research and Technology - Hellas (FORTH) develops the interconnection between boards. Barcelona Supercomputing Center (BSC) is responsible of the OmpSs [1] programming model and software toolchain. Partner EVIDENCE takes the lead on the development of the runtime systems. Partner SECO designs and builds the prototype board. Partner HERTA Security pro- vides the video-surveillance use case, and partner VIMAR provides the smart-building use case. The specific objectives of the AXIOM project are: Realizing a small board that is flexible, energy efficient and modularly scalable. We will use an ARM and FPGA-based chip with custom high-speed intercon- nects to build the AXIOM prototype board. Easy programmability of multi-core, multi-board, FPGA node, with the OmpSs programming model, and improved thread management and real-time support from the operating system. The software will be Open- Source. Easy interfacing with the Cyber-Physical world, based on the Arduino shields [2], pluggable onto the board. Contribute to standards, in the context of the Stan- dardization Group for Embedded Systems (SGET) and OpenMP. 978-1-4673-7311-1/15/$31.00 ©2015 IEEE 1

Upload: ngodiep

Post on 16-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

The AXIOM project (Agile, eXtensible, fast I/O Module)

Dimitris Theodoropoulos,Dionisis Pnevmatikatos

Institute of Computer Science,Foundation for Research andTechnology - Hellas (FORTH)

GR-70013 Heraklion - Crete, GreeceEmail: {pnevmati,dtheodor}@ics.forth.gr

Carlos Alvarez, Eduard Ayguade, Javier Bueno, Antonio Filgueras,Daniel Jimenez-Gonzalez, Xavier Martorell, Nacho Navarro

Computer Sciences Dept.,Barcelona Supercomputing Center, Barcelona, Spain

and Computer Architecture Dept.Universitat Politecnica de Catalunya

Barcelona, SpainEmail: [email protected]

Carlos Segura, Carles Fernandez,David Oro, Javier Rodriguez Saeta

Herta Security, S.L.Email: [email protected]

Paolo GaiEvidence SRL,

Pisa, ItalyEmail: [email protected]

Antonio RizzoRoberto Giorgi

University of SienaSiena, Italy

Email:[email protected]

Abstract—The AXIOM project (Agile, eXtensible, fast I/OModule) aims at researching new software/hardware archi-tectures for the future Cyber-Physical Systems (CPSs). Thesesystems are expected to react in real-time, provide enoughcomputational power for the assigned tasks, consume the leastpossible energy for such task (energy efficiency), scale upthrough modularity, allow for an easy programmability acrossperformance scaling, and exploit at best existing standards atminimal costs.

Current solutions for providing enough computational powerare mainly based on multi- or many-core architectures. Forexample, some current research projects (such as ADEPT or P-SOCRATES) are already investigating how to join efforts fromthe High-Performance Computing (HPC) and the EmbeddedComputing domains, which are both focused on high powerefficiency, while GPUs and new Dataflow platforms such asMaxeler, or in general FPGAs, are claimed as the most energyefficient.

We present the project’s initial approach, ideas and keyconcepts, and describe the AXIOM preliminary architecture.Our starting point uses power efficient multi-core nodes, suchas ARM cores and FPGA accelerators on the same die, asin the Xilinx Zynq. We will work to provide an integratedenvironment that supports programmability of the parallel,interconnected nodes that form a CPS system, and evaluateour ideas using demanding test application scenarios.

Keywords-Cyber-Physical Systems, Parallel ProgrammingModels, Smart Video Surveillance, Smart Living/Home

I. INTRODUCTION AND PROJECT OBJECTIVES

We are entering the Cyber-Physical age, in which bothobjects and people will become nodes of the same digitalnetwork for exchanging information. Therefore, the gen-eral expectation is that ”things” or systems will becomesomewhat smart as people, allowing a rapid and closeinteraction not only system-system, but also human-system,

system-human. Moreover, through smart systems, the humanbehavior daily routines are hopefully and simplified.

The AXIOM project (Agile, eXtensible, fast I/O Module)aims at researching new hardware/software architectures forCPSs in which the above expectations are possibly realized.The project, started on February 2015, will span over 3years. The coordination of the project is carried out bythe University of Siena (UNISI). UNISI also takes theevaluation part of the project. Foundation for Research andTechnology - Hellas (FORTH) develops the interconnectionbetween boards. Barcelona Supercomputing Center (BSC)is responsible of the OmpSs [1] programming model andsoftware toolchain. Partner EVIDENCE takes the lead on thedevelopment of the runtime systems. Partner SECO designsand builds the prototype board. Partner HERTA Security pro-vides the video-surveillance use case, and partner VIMARprovides the smart-building use case.

The specific objectives of the AXIOM project are:

• Realizing a small board that is flexible, energy efficientand modularly scalable. We will use an ARM andFPGA-based chip with custom high-speed intercon-nects to build the AXIOM prototype board.

• Easy programmability of multi-core, multi-board,FPGA node, with the OmpSs programming model,and improved thread management and real-time supportfrom the operating system. The software will be Open-Source.

• Easy interfacing with the Cyber-Physical world, basedon the Arduino shields [2], pluggable onto the board.

• Contribute to standards, in the context of the Stan-dardization Group for Embedded Systems (SGET) andOpenMP.

978-1-4673-7311-1/15/$31.00 ©2015 IEEE 1

These are the expected impacts obtained from the AXIOMproject:

• Platform interfacing with the physical world. TheAXIOM project intends to create a platform that en-ables the module designed to interact with the physicalworld through compatibility with Arduino Shields [3].

• Production platform. The AXIOM design is aimed tobecome a hardware and software platform for largescale production.

• Development of autonomous technology. This technol-ogy may break the Embedded Systems energy effi-ciency and programmability barriers. The same set oftechnologies are expected to represent the base forfuture European industrial exploitation in the HPC andEmbedded Computing markets.

• Provide the basis for a new European-level researchat the forefront of the development of extreme perfor-mance embedded system software and tools.

II. PROJECT OVERVIEW

In the near future, we expect that CPSs will at least reactin real-time, provide enough computational power for theassigned tasks, consume the least possible energy for suchtask (energy efficiency), scale up through modularity, allowfor an easy programmability across performance scaling,and exploit at best existing standards at minimal costs.The whole set of these expectations impose scientific andtechnological challenges that need to be properly addressed.

Towards this goal, the AXIOM project (Agile, eX-tensible, fast I/O Module) aims at researching newsoftware/hardware architectures for CPSs to meet theabove expectations. Research will not be limited to onlyone kind of technology, but it will start from power efficientmulti-cores, such as ARM cores and FPGA acceleratorson the same die as in the Xilinx Zynq. Moreover, theAXIOM project will build on the consolidated expertiseof its consortium within the HPC domain to bring, alsoin the CPSs domain, the simplicity and scalability of theOpenMP framework, by extending the StarSs paradigm,called OmpSs.

Regarding the easy programmability challenge, we ob-serve that current toolchains for CPSs are too specific tothe selected domain, involving customized or expensiveSoftware Development Kits (SDKs). On the contrary, recentexamples such as the RaspberryPI or the UDOO (the latterbeing funded by two partners of AXIOM, shown in Figure 1)represent successful examples of eliminating the tediouslearning steps of customized SDKs and their associatedhigh costs, relying on totally open hardware designs andopen-source software. This has the potential to open thedevelopment to a broader community not confined to expertsbut also open to other end-users like the wider educationalsector and the ”Makers” community.

ARM i.MX6 Freescale CPUquad/dual core

ARM SAM3X8E Atmel

LVDS

touchscreenethernet

WIFI

HDMI

Arduino Due compatibledigital & analog I/O

RAM DDR3 1GB Audio & Mic

jack conectors

Mini USB & Mini USB OTG USB type A (x2)

Micro SD (boot device)

SATA

(Only Quad-Core version)

Camera connector

ARDUINO™ PINOUT

Low Voltage 1GB DDR3

Audio & Mic

jack connectorsSATA

(only Quad-Core version)

UDOO is Arduino-Compatible and features the standard

Arduino™ R3 layout (1.0 pinout). Thanks to this, UDOO is

fully compatible with Arduino™ shields*.

*Please note that like the Arduino Due, UDOO runs at 3.3V and the

maximum voltage that the I/O pins can handle is 3.3V.

Android/Linux +

Arduino-compatible embedded

w w w. u d o o . o r g

DESCRIPTION

TECHNICAL SPECIFICATION

FreescaleTM i.MX 6 ARM Cortex-A9 CPU Dual*/Quad core 1GHz

Atmel SAM3X8E ARM Cortex-M3 CPU (same as Arduino Due)

Processor

Memory

Graphics

Integrated graphics: each processor provides 3 separated accelerators

for 2D, OpenGL® ES2.0 3D and OpenVGTM (only Quad-Core Version)

Video out

HDMI interface (up to 1080p)

18/24 bit LVDS interface (up to 1920x1200) + Touch (I2C signals)

(*) Dual Core CPU is Dual Lite version, with only one Image Processing Unit (IPU) and

without the SATA interface

Video in

Camera connection

Mass Memory

SATA (only Quad-Core version)

SD card slot onboard

Network Interfaces

Gigabit Ethernet RJ45 (10/100/1000 MBit)

Optional WiFi Module

Audio

Headphone and Microphone stereo jacks

USB

1 x USB OTG (micro-A connector)

1 x USB 2.0 internal pin header (requires adapter cable)

2 x USB 2.0 type A ports

1 x USB to Serial interface (micro-B connector)

Dimension

4.33 inch x 3.35 inch (11 cm x 8.5 cm)

UDOOTM is an open hardware, low-cost computer equipped with an ARM® i.MX6 FreescaleTM processor for Android and Linux, alongside

an ArduinoTM Due ARM SAM3X. Both CPUs are integrated on the same board.

Ideal for prototyping applications requiring multimedia capabilities and/or high levels of parallel computing, maintaining

the advantages offered by low-power consuming ARM Processors.

Digital I/O Pins

76 fully available GPIOs

Analog Input Pins

12

Analog Output Pins

2 (DAC)

Shared Pins

The 76 digital communications pins are shared

between the two processors. They can be switched

individually as input or output via software muxing.

OPERATING SYSTEMS

Android 4.4.2 KitKat

UDOObuntu Ubuntu 12.04 LTS armHF

XBMC, Debian Wheezy armHF, Yocto,

OpenMediaVault, Volumio, Archlinux and much more

www.udoo.org

ARM i.MX6 Freescale CPU

quad/dual core

ethernet

WIFILVDS

touchscreen

HDMI

ARM SAM3X8E Atmel Arduino Due compatible

digital & analog I/O

RAM DDR3 1GB Micro SD (boot device)

Camera connector

USB type A (x2)Mini USB &Mini USB OTG

68_1

5

Figure 1. The UDOO development platform is an open hardware, low-costcomputer equipped with an ARM i.MX6 Freescale processor for Androidand Linux, alongside an Arduino Due ARM SAM3X. Both CPUs areintegrated on the same board.

For what concerns parallelism, several solutions have beenproposed during the last decades. However, a unanimousconsensus on the best solution has not been reached. TheAXIOM project will build on the consortium consolidatedexpertise in the HPC domain to bring, also in the CPSsdomain, the simplicity and scalability of the OpenMP frame-work. In particular, we will focus on its extension to theStarSs paradigm, called OmpSs.

Previous experimentation brought the UDOO as the firstworld-wide board integrating a powerful quad-core ARM(able to run both Linux and Android smoothly) and theArduino 2 interface, for an easy plug-and-play of a hugenumber of sensors and actuators. The AXIOM project willpush further the idea of a general, modular, reusable singlemodule capable of providing further specialization throughthe reconfigurability features of the FPGAs.

AXIOM-based CPSs will include classical connectivity(e.g., Internet), but will also bring modularity to the nextlevel by allowing to build more compute intensive systemsthrough a low-cost scalable high-speed interconnect. Thelatter will allow to build (or upgrade at a later moment)flexible and low-cost systems with simplicity by re-usingthe same basic (small) module without the need of costlyconnectors and cables.

Moreover, AXIOM addresses pan-European needs andproblems, which requires collaboration at multinational levelso as to find and develop adequate solutions and test them.More specifically, AXIOM aims at promoting the usageof innovative software and hardware technologies, with aspecific focus on heterogeneous hardware accelerators onFPGA plus custom interconnects for high speed board-to-board communication at low-cost and their integration, as amean to increase the competitiveness of the European indus-trial and academic worlds in the context of the worldwideEmbedded System research.

As we described in Section I, the AXIOM-based CPSswill be evaluated with two applications from the video-surveillance and smart-building domains, presented in Sec-tion V.

2

Figure 2. Energy efficiency of FPGAs (courtesy Altera corp.)

III. THE AXIOM HARDWARE PLATFORM

A. Genesis of the platform

As illustrated in Figure 2, FPGAs are considered the mostenergy efficient approach for specific tasks [4]. Within theAXIOM project, the choice is toward a chip like XilinxZynq (Zynq is a chip-family, the chip can include a dualARM Cortex-A9@1GHz, [email protected] to [email protected], low-power programmable logic from 28k to444k logic cells + 240 to 3020 KB BRAM + 80 to 202018x25 DSP slices, PCI express, DDR3 memory controller, 2USB, 2 Gbe, 2 CAN, 2SDIO, 2 UART, 2 SPI, 2 I2C, 4x32bGPIO, security features, 2 ADC@12bit 1Msps). The ratio-nale for the FPGA is that it can become the central hearthof the board making possible to integrate all the features,providing customized and reconfigurable acceleration of thespecific scenario where the board is deployed, providing thenecessary substrate for board-to-board communication.

The platform is ideally augmented with additional ARMcores @ 32 bit; we choose to demonstrate and prototypethe 32-bit and we will consider – at project start – ifit will be possible to use a 64-bit ARM (currently onlyApplied-Micro Xgene is providing some sample and alsoannounced by Xilinx in the next generation Zynq). Weconsidered the possibility of using an x86 core, but we thinkthat it could be more strategically for Europe to build onthe ARM processor, which will also bring us a completelyhomogeneous platform and will enable further optimizationsin the software stack. For the board-to-board connectionour choice is integrating a standard-connector (e.g. SATA,or PCIe) for low-cost reasons, possibly multiple-channel(similar to the FORMIC board [5] developed by the FORTHproject partner).

B. The AXIOM Memory System

The AXIOM project will design and develop a reliableand efficient mechanism inspired by the Distributed Shared

[AXIOM] - Part B - page [9] of [75]

Figure 3. Distributed Shared Memory (DSM) on the AXIOM modularplatform

Memory (DSM) paradigm, as illustrated in Figure 3, in-tegrated in the Linux OS. The mechanism will run onthe reference platform, consisting of a set of ARM boardsconnected through a high-speed bus interconnection. It willallow leveraging on the consolidated expertise of partnerBSC in the HPC domain, bringing to CPSs - at any scale -the simplicity and scalability of the OpmpSs framework ontop of low-cost commercial off-the-shelf (COTS) hardware.

The DSM mechanism will be software and hardwaresupported. OS-level mechanisms will guarantee best per-formance and lowest latency. The DSM mechanism willbe released as Open-Source software, and it is expected tobring benefits to both the ICT and the embedded industries.Moreover, we will investigate the possibility of integratingfeatures in the OS to allow a balancing of the work acrossthe basic modules through the high-speed interconnection.Current solutions for work balancing across distributed sys-tems, in fact, are expensive, too specific, or too difficult toprogram (with paradigms such as MPI). All needed Linuxkernel drivers for accessing the interconnection media willprovide scalability and low latency, by implementing lock-free data structures.

Another relevant aspect will be the necessity of properlymanaging actions in real-time. The project partner EvidenceSRL will provide its consolidated expertise to integrate real-time management in the infrastructure, having recently suc-ceeded in integrating its real-time scheduler1 into the officialLinux kernel. In particular, Evidence SRL will realize a real-time Linux distribution for the reference platform, whichwill be given to the rest of the consortium and possiblyreleased as Open-Source software.

C. Thread Management

The advent of multi-core processors in the purview ofcomputing solutions has brought up with more strenght

1http://en.wikipedia.org/wiki/SCHED DEADLINE

3

the problem of thread management on modern computingsystems. Moroever, the introduction of multi-core processorshas led to the need of managing the assignment of com-putational resources also from a space point of view. Thismeans that the scheduler should establish, not only the orderof execution and the time quantum per task, but also where(i.e., on which core) execute the task. This aspect becomesas more critical as the architecture of the platform grows incomplexity, configuring distributed computational resourcesand memory hierarchies. However, since energy-efficiencyhas emerged as a foreground requirement, thread manage-ment goals cannot be limited to the sole maximization ofperformance.

Concerning reconfigurable hardware platforms based onFPGA several works have proposed solutions to addressthe problem of dynamic allocation of tasks to general-purpose multi-core processors [6], or reconfigurable logic(hardware kernels) [7]. However, such approaches have beensuccessfully explored only on single and multi-core super-scalar architectures, so far. In recent systems, the largeadoption of many-cores accelerators has worsen the prob-lem, introducing synchronization issues also among differenttypes of computing elements. Research activity has beenalways active in this specific context, providing solutionsthat attempt to efficiently solve the synchronization problem.A more general scheduling unit must be used to efficientlyexploit workload distribution, not only among CPU cores,but also on the specific accelerators.

The challenging point in the project will be the study,design and implementation of mechanisms and policies ofthread management, which would lead to an optimal ex-ploitation of the hardware resources at run-time, consideringthe distribution of the platform on multiple boards. The chal-lenge presents itself at two levels: (1) when considering thesystem level, all the available resources and the healthinessof the whole system must be considered by the managingcomponent; (2) at low-level the fine-grain threads comingfrom the adoption of the dataflow execution model must bedistributed across the computing elements (CPUs, FPGAs).

This means to understand at run-time what is the bestresource assignment (scheduling/mapping on CPU or recon-figurable HW) to a task (or thread), according to multiplegoals (e.g., performance/QoS, power consumption mini-mization, thermal hotspots). The policies should operateeffectively both in a single application and a mixed workloadscenario. The scheduler will be further extended to enable itdistributing fine-grain threads across the different computingelements. To not limit the performance of the system andto avoid the introduction of bottlenecks, the Low-LevelThread Scheduler (LLTS) will be accelerated in hardware,by mapping its structure in the FPGA cards composingthe evaluation platform. To this extent, we are extendingprevious research prototypes [8] [9] [10] [11] and leveragingdataflow based execution models [12] [13].

Standard high-speed and low-latency interconnections(e.g., PCIe 3.0) may provide enough bandwidth to notbecome the bottleneck of the evaluation platform. Innovativescheduling algorithms will be investigated and integrated inthe LLTS, taking into consideration different metrics (e.g.,performance, power consumption). The expected final resultis, first of all, a competitive benefit in terms of modularscalability of the platform

D. High-Speed Boards Interconnection

Current systems are typically isolated and the only wayto build larger systems is to re-design a new system fromscratch by adding more resources (e.g., more cores, morememory, etc.). The AXIOM project will bring scalabili-ty/extensibility (i.e., ”modularity”) to CPSs. To achieve thisresult, we will provide easy programmability, not limitedonly to the single module but also for the larger systemsconsisting of several modules tightly interconnected. Toenable the modularity AXIOM aims to convoy high speedcommunication into cheap way to interconnect.

Based on the consortium accumulated experience, AX-IOM will investigate advanced interconnect structures forefficient (in terms of speed and power/energy) communi-cation, and the proper acceleration support for the OmpSsprogramming model. The innovation is to optimize andcustomize the networking logic so as to achieve efficient andfast communication among boards, leading to more effectivesystems and faster applications.

Towards a fast and inexpensive board-to-board intercon-nection, AXIOM will leverage the Formic experience [5]that utilized relatively cheap SATA connector for inter-connecting multiple boards and enable MPI computations.In AXIOM, the Network Interface will have to efficientlyimplement a (distributed) Shared Memory model with sup-port from the programming model (OmpSs), the OperatingSystem, and the Runtime.

Project partner FORTH will develop the interconnect logicand interfaces. The interconnect will provide accelerationfunctions for common uses of the applications, the OperatingSystem and the Programming model, so as to achieve moreeffective program execution. Finally, the interconnect logicwill make possible the seamless interconnection of systemsspanning multiple boards.

E. The Prototype Board

Heterogeneous computing is a challenge that involvesdistributing computational tasks between different devices,even between different processor architectures. This is typ-ically a need that arises in applications where there aremany different tasks to execute in parallel, by using themarket-available architectures; each with its advantages anddisadvantages. A common way to overcome the limitationsof a specific processor is to realize boards where the maincore is interfaced to a programmable device (FPGA or

4

CPLD). This approach is able to satisfy project requirementswith different CPU types. FPGAs have been widely used inpartner SECO’s projects along the years for this purpose.This know-how has been recently used to develop a solutionlike UDOO, which merges the potential of a scalable multi-core ARM Cortex-A9 architecture (i.MX6 processor) withthe flexibility of a FPGA based solution.

AXIOM aims to enhance such a concept, realizing notonly a single heterogeneous computing module, but alsoa modular solution. Moreover, AXIOM aims to realize asingle board computer with the main characteristics listed inSection III.A.Those basic characteristics should be enoughto ensure:

1) Flexibility in I/O management, which can be totallymanaged by FPGA.

2) Flexibility in software, since the standard devices em-bedded in the module allows using pre-existing soft-ware, both commercial and open-source.

3) Modularity using the high speed interface for com-munication between modules on a computer cluster.This will allow to easily building a High-PerformanceEmbedded Computing solution, tailored on the specificcomputational needs.

4) Power efficiency, where single sections of the modulecan be disabled when they are not used.

5) Real Time computing.6) Ease of programming, by using standard development

environments.7) Cost Efficiency.Such a solution can be efficiently used in many different

applications, from Smart-buildings (where one or moremodules can be used to collect and manage, even from adistance using Internet, the data coming from many differentdevices used in common home living) to video-surveillance,where huge streams of video data require real-time.

IV. THE AXIOM PROGRAMMING MODEL

AXIOM will leverage OmpSs, a task dataflow program-ming model that includes heterogeneous execution supportas well as data and task dependency management [1] andhas significantly influenced the recently appeared OpenMP4.0 specification.

One of the goals of OmpSs is to incorporate the pro-grammability of SMP systems to heterogeneous systemscomposed of non-shared memory subsystems by keepingtrack of where in the whole memory system has stored themost-recent version of every data and moving it to where itwould be needed transparently to the programmer. To do so,OmpSs provides an initial team of threads as specified bythe user upon starting the application. These threads executetasks that are generated as the application executes. Tasksare defined as portions of code enclosed in the task directive,or as user-defined functions, also annotated as tasks. A taskis created when the code reaches the task construct, or a

Figure 4. General view of OmpSs@FPGA and OmpSs@Cluster executioncontext

call is made to a function annotated as a task. The taskconstruct allows to specify, among others, the clauses in,out and inout. The information provided is used to derivedependencies among tasks at runtime, and schedule/fire atask. Tasks are fired when their inputs are ready and theiroutputs can be generated.

OmpSs is based on two main components: i) The Mer-curium compiler gets C/C++ and FORTRAN code, anno-tated with the task directives presented above, and trans-forms the sequential code into parallel code with calls tothe Nanos++ runtime system; and ii) The Nanos++ runtimesystem gets the information generated by the compiler aboutthe parallel tasks to be run, manages the task dependencesand schedules them on the available resources, when thosetasks are ready. Nanos++ supports the execution of tasks inremote nodes, and heterogeneous accelerators.

The AXIOM project will investigate and implement theOmpSs programming model integrating different commu-nication technologies. Intra-node, OmpSs@FPGA will easythe FPGA acceleration. Inter-node, two different approacheswill be explored: OmpSs@cluster and DSM.

Figure 4 shows the overall view of OmpSs@FPGA andOmpSs@cluster execution context in a multi-board sys-tem. Each FPGA-based node will be addressed by theOmpSs@FPGA suport meanwhile the OmpSs@cluster willhelp to transparently program all the multi-node system.

Figure 3 shows the overall view of a DSM system whereOmpSs@FPGA would have the same intra-node influenceand OmpSs@cluster will appear like a single intra-nodeOmpSs running over a transparent DSM system.

5

A. OmpSs@Cluster over OmpSs@FPGA

OmpSs@cluster is the OmpSs flavor that provides supportfor a single address space over a cluster of SMP nodeswith accelerators. In this environment, the Nanos++ runtimesystem supports a master-worker execution scheme. One ofthe nodes of the cluster acts as the master node, where theapplication starts. In the rest of nodes where the applicationis executed, worker processes just wait for work to beprovided by the master. OmpSs@FPGA, on the other hand,provides support for compilation and execution from sourcecode written in C/C++ to ARM binary and FPGA bitstreamfor Zynq generating the communication infrastructure thatallows the master core to send work to the FPGA resourcesand obtain back the computation results that will startsubsequent tasks.

In both environments, the data copies generated eitherby the in, out, inout task clauses are executed over theconnections across nodes or intra-node with the FPGA, tobring the data to the appropriated resource where the tasksare to be executed.

Three components are key for the integration of thesetwo OmpSs flavors; the scheduler, the data cache and thedata directory. When a task is ready the scheduler willtake care of deciding where this task should be executed(a local thread, a local FPGA or a remote node) dependingon resource availability and efficiency. Once decided, thedata cache component manages the operations needed at themaster node to transfer data to and from worker memories.The data directory is used to maintain the coherence ofthe memory. It contains the information of where the lastproduced values of a memory reference are located sothe system can determine which transfer operations mustperform to execute a task in any resource of the system.

B. OmpSs on DSM-like systems

DSM is a well-known research topic, and it can beimplemented either at software or at hardware level (witha full range of hybrid approaches). Some attempts forcreating Software DSM implementations for Linux havebeen carried out during the last decades. Examples areTreadmarks (TMK), JIAJIA [14], Omni/SCASH [15], [16],Jump [17], [18], Parade [19], [20], NanosDSM [21]. Someof these projects only supported very specific hardware, andnone of them has been maintained during the last decade.We will work on the design and development of a proper,reliable and efficient mechanism to implement a DSM-likeparadigm integrated in the Linux OS. The mechanism willrun on the reference platform. It will allow to leverage thesimplicity and scalability of the OmpSs framework on topof the AXIOM platform. It will be released as Open-Sourcesoftware, and it is expected to bring benefits to both the ICTand the embedded industries.

Figure 5. Video-surveillance is a multidisciplinary field related to computervision, pattern recognition, signal processing, communication, embeddedcomputing and image sensors.

C. Operating System Support

The operating system that we choose is Linux, for agreater flexibility and its Open-Source nature. We are in-vestigating the possibility of integrating features in the OSto allow a balancing of the work across the basic modulesthrough the high-speed interconnection. Current solutionsfor work balancing across distributed systems are expensive,too specific, or too difficult to program (with paradigms suchas MPI).

Particular attention will be given to scalability and latencyissues, by implementing lock-free data structures. Anotherrelevant aspect will be the necessity of properly managingevents in real-time.

The OS scheduler will be extended to enable it distributingthreads across the different computing elements. The low-level thread scheduler (LLTS [11]) will be accelerated inhardware, by mapping its structure in the FPGA cardscomposing the evaluation platform. This will avoid bottle-necks from the scheduler, thus increasing the performanceof parallel applications.

V. APPLICATION DOMAINS

The AXIOM project will use two real life applicationdomains as test cases, namely Video-surveillance and Smart-home. They will operate as benchmarks for assessing thepotentialities and the limits of the proposed architecture. Thetwo application domains have been chosen for the differentkind of challenges to process capabilities they pose.

A. Video surveillance

Intelligent multi-camera video surveillance (an excerptis shown in Figure 5) is a multidisciplinary field relatedto computer vision, pattern recognition, signal processing,communication, embedded computing and image sensors.Video surveillance has a wide variety of applications both inpublic and private environments, such as homeland security,crime prevention, traffic control, accident prediction anddetection, and monitoring patients, elderly and children at

6

home. These applications require monitoring indoor and out-door scenes of airports, train stations, highways, parking lots,stores, shopping malls and offices. There is an increasinginterest in video surveillance due to the growing availabilityof cheap sensors and processors, and also a growing needfor safety and security from the public. Nowadays there aretens of thousands of cameras in a city collecting a hugeamount of data on a daily basis.

The main challenge is to efficiently extract useful informa-tion from a huge amount of videos collected by surveillancecameras by automatically detecting, tracking and recogniz-ing objects of interest, and understanding and analyzing theiractivities. The view of a single camera is finite and limitedby scene structures. In order to monitor a wide area, suchas tracking a vehicle traveling through the road networkof a city or analyzing the global activities happening in alarge train station, video streams from multiple cameras haveto be used. Many intelligent multi-camera video surveil-lance systems have been developed [22] [23] [24] since byemploying distributed camera networks video surveillancesystems substantially extend their capabilities and improvetheir robustness through data fusion and cooperative sensing.With multi-camera surveillance systems, activities in wideareas are analyzed; the accuracy and robustness of objecttracking are improved by fusing data from multiple cameraviews, and one camera handovers objects to another camerato realize tracking over long distances without break.

However, as the sizes and complexities of camera net-works fast increase, there are higher requirements onthe robustness, reliability, scalability, transferability, self-adaptability of intelligent multi-camera video surveillancesystems. While most conventional surveillance systems as-sume one directional information flow, recent studies showthat different modules actually can support each otherFor example, activity modeling can improve inter-cameratracking and multi-camera tracking provides information forcamera calibration and inference of the topology of cameraviews. Jointly solving some of these problems not onlyimproves the robust and accuracy but also reduces humanintervention.

B. Smart-home

A Smart Home is a home-like environment that possessesambient intelligence and automatic control, which allow it torespond to the behavior of residents and provide them withvarious facilities. Moreover, Smart-Home scenarios includenew emergent behavior and unpredictable challenges andopportunities (Smart Living): we aim at providing humanswith the adequate tools for managing them properly, in orderto get the most from this new enabling technologies.

The standard approach for building smart homes is tocomputerize them. A set of sensors gather different typesof data, regarding the residents and utility consumption ofthe home. Computers or devices with computing power (e.g.:

micro-controllers) analyze these data to identify actions ofresidents or events. They then respond to these actions andevents by controlling certain mechanisms that are built into the home. A simple example for such smart behavior isturning the lights on when a person enters a room or takingcare of electric consumption.

Smart homes have been researched for nearly a coupleof decades. The pioneering work in this area is the ”SmartRooms” implemented by the MIT Media Lab [25]. There-after, several researches have investigated this topic with awide range of prospective applications. Apart from a specialtype of smart homes which can help the occupants to reducethe energy consumption of the house by monitoring andcontrolling of the devices and rescheduling their operatingtime according to the energy demand and supply; at thecurrent state, there are many types of smart homes withthree major application categories [26].

The first category aims at providing services to theresidents by detecting and recognizing their actions or bydetecting their health conditions. Such smart homes actas information collectors to support the wellbeing of theresidents of the home. The second category of smart homesaims at storing and retrieving of multi-media captured withinthe smart home, in different levels from different kind ofsensors. One might argue that the issue of privacy of suchtype of information collection, but it will be a matter ofacceptance in to one’s lifestyle with time. The third categoryis surveillance, where the data captured in the environmentare processed to obtain information that can help to raisealarms, in order to protect the home and the residents fromburglaries, theft and natural disasters like flood etc.

A few researches attempted to combine these functionsinto smart homes. With recent advances in electronics andcomputing, sensing technologies and computing power re-quired to implement a smart home is now available insmall sizes, low prices and energy efficiency. However,providing the ambient intelligence that is required to makedecisions for smart behavior is still a challenging task. Thisis especially true when that aim is not just reinforcingand making more reliable the current patterns of humanbehaviors but the envisioning new form of behavior enabledby the potentialities of CPSs.

VI. CONCLUSIONS AND FUTURE WORK

In this paper, we presented the AXIOM initial approach,ideas and key concepts along with its preliminary architec-ture. Within our starting point we use power efficient multi-core nodes, such as ARM cores and FPGA acceleratorson the same die, as in the Xilinx Zynq. As future work,we will provide an integrated environment that supportsprogrammability of the parallel, interconnected nodes thatform a CPS system, and evaluate our ideas using demandingtest application scenarios from the Smart Video Surveillanceand Smart Living/Home domains.

7

Finally, we believe that AXIOM addresses the ultimateneed for integrating in the competent and motivated orga-nizations from all over Europe able to contribute with newresearch ideas and to lead competitive projects in a varietyof fields (e.g., Architecture Development, Interconnects,Reconfigurable computing, Programming Models, Video-surveillance, Home-Automation). Also, AXIOM supportsand reinforces the collaboration among European companiesand universities by supporting the establishment of new andstrengthening the operation of existing networks.

ACKNOWLEDGMENT

We thankfully acknowledge the support of the EuropeanUnion H2020 program through the AXIOM project (grantICT-01-2014 GA 645496), the Spanish Government, throughthe Severo Ochoa program (grant SEV-2011-00067) theSpanish Ministry of Science and Technology (TIN2012-34557) and the Generalitat de Catalunya (MPEXPAR, 2014-SGR-1051). We also thank the Xilinx University Programfor its hardware and software donations.

REFERENCES[1] E. Ayguade, R. M. Badia, D. Cabrera, A. Duran, M. Gonzalez,

F. Igual, D. Jimenez, J. Labarta, X. Martorell, R. Mayo, J. M.Perez, and E. S. Quintana-Orti, “A Proposal to Extend theOpenMP Tasking Model for Heterogeneous Architectures,” inIWOMP: Evolving OpenMP in an Age of Extreme Parallelism,vol. 5568. Dresden, Germany: Springer, June 2009, pp. 154–167.

[2] A. Goransson and D. C. Ruiz, “Professional Android OpenAccessory Programming with Arduino,” in John Willey &Sons, Jaunary 2013.

[3] S. Monk, “Programming Arduino Next Steps: Going Furtherwith Sketches,” in 1st ed. USA: McGraw-Hill Professional,October 2013.

[4] R. Bolla et al., “Energy efficiency in the future internet: a sur-vey of existing approaches and trends in energy-aware fixednetwork infrastructures,” in IEEE Communications Surveys &Tutorials, vol. 31, 2011, pp. 223–244.

[5] S. Lyberis, G. Kalokerinos, M. Lygerakis, V. Papaefs-tathiou, D. Tsaliagkos, M. Katevenis, D. Pnevmatikatos, D.Nikolopoulos, “Formic: Cost-efficient and Scalable Prototyp-ing of Manycore Architectures,” in IEEE Field-ProgrammableCustom Computing Machines, April 2012, pp. 61–64.

[6] W. Ahmed, M. Shafique, L. Bauer, J. Henkel, “Adap-tive resource management for simultaneous multitaskingin mixed-grained reconfigurable multi-core processors,” inIEEE/ACM/IFIP international conference on Hardware/soft-ware co-design and system synthesis (CODES+ISSS), 2011,pp. 365–374.

[7] A. Clemente, V. Rana, D. Sciuto, I. Beretta, D. Atienza,“A Hybrid Mapping-Scheduling Technique for DynamicallyReconfigurable Hardware,” in Field Programmable Logic andApplications (FPL), September 2011, pp. 177–180.

[8] R. Giorgi and P. Faraboschi, “An introduction to df-threadsand their execution model,” in IEEE Proceedings of MPP-2014, Paris, France, oct 2014, pp. 60–65.

[9] L. Verdoscia, R. Vaccaro, and R. Giorgi, “A matrix multipliercase study for an evaluation of a configurable dataflow-machine,” in ACM CF’15 - LP-EMS, May 2015, pp. 1–6.

[10] N. Ho, A. Mondelli, A. Scionti, M. Solinas, A. Portero, andR. Giorgi, “Enhancing an x86 64 multi-core architecture withdata-flow execution support,” in ACM Proc. of ComputingFrontiers, Ischia, Italy, May 2015, pp. 1–2.

[11] R. Giorgi and A. Scionti, “A scalable threadscheduling co-processor based on data-flow principles,”ELSEVIER Future Generation Computer Systems,no. 0, pp. 1–10, 2015. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0167739X1400274X

[12] L. Verdoscia, R. Vaccaro, and R. Giorgi, “A clockless com-puting system based on the static dataflow paradigm,” in Proc.IEEE Int.l Workshop on Data-Flow Execution Models forExtreme Scale Computing (DFM-2014), aug 2014, pp. 30–37.

[13] R. Giorgi et al., “TERAFLUX: Harnessing dataflowin next generation teradevices,” Microprocessorsand Microsystems, vol. 38, no. 8, Part B,pp. 976 – 990, 2014. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0141933114000490

[14] “Jiajia,” http://www-users.cs.umn.edu/ tianhe/paper/dist.htm.

[15] “Omni/scash,” http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/doc/omni-scash.html.

[16] M. Hess, G. Jost, M. Muller, R. Ruhle, “Experiences usingOpenMP based on Compiler Directed Software DSM on aPC Cluster,” in Workshop on OpenMP Applications and Tools(WOMPAT’02, 2002.

[17] “The jump software dsm system,”http://www.snrg.cs.hku.hk/srg/html/jump.htm.

[18] C. L. W. B. Cheung and K. Hwang, “Jump-dp: A soft-ware dsm system with low-latency communication sup-port,” in In the 2000 International Conference on Paral-lel and Distributed Processing Techniques and Applications(PDPTA’2000), Las Vegas, Nevada, USA, 2000.

[19] “Parade,” http://peace.snu.ac.kr/research/parade/.

[20] Y. Kee, J. Kim, S. Ha, “ParADE: An OpenMP ProgrammingEnvironment for SMP Cluster Systems,” in Supercomputing2003 (SC’03), 2003.

[21] J. J. Costa, T. Cortes, X. Martorell, E. Ayguade, andJ. Labarta, “Paper running openmp applications efficientlyon an everything-shared sdsm,” Journal of Parallel and Dis-tributed Computing (JPDC), vol. 6, no. 5, pp. 647—658,2006.

[22] R.T. Collins, A.J. Lipton, H. Fujiyoshi T. Kanade, “Algo-rithms for cooperative multisensor surveillance,” in IEEE,vol. 9, October 2001, pp. 1456–1477.

[23] H. Aghajan, A. Cavallaro, “Multi-Camera Networks Princi-ples and Applications,” in Elsevier, 2009.

[24] M. Valera, S.A. Velastin, “Intelligent distributed surveillancesystems: A review,” in IEE Vision, Image and Signal Process-ing, April 2005, pp. 192–204.

[25] Pentland, Alex P., “Smart rooms,” in Scientific American274.4, 1996, pp. 54–62.

[26] L. C. De Silva, C. Morikawa, I. M. Petra, “State of the art ofsmart homes,” in Scientific American 274.4, vol. 25, October2012, pp. 1313–1321.

8