enpower: an opencl approach for energy proportional … · tnpl t recon request for turning on the...

1
ENPOWER: An OpenCL Approach for Energy Proportional Computing With Heterogeneous and Reconfigurable Processors Mohammad Hosseinabady, Jose Nunez-Yanez Department of Electronic Engineering, University of Bristol, UK This research is sponsored by EPSRC and done in collaboration with Queen's University Belfast {m.hosseinabady, j.l.nunez-yanez, }@bristol.ac.uk · Energy proportional computing applied to hybrid FPGA-ARM chip · This research proposes the concept of energy proportional computing includes scalability of a power triplet formed by · Voltage, · Frequency and · Logic (e.g. capacitance) · Applications in this design paradigm are written in an extended OpenCL 1- Abstract 2- Motivations 3-Goals Application Specification Host program (C/C++) Kernels (OpenCL-C or HDL) C/C++ files bitstreams bitstreams bitstreams C/C++ Compiler HDL files Vivado_HLS & Vivado Binary files PS PL Select a bitstream Kernels extraction 1 2 3 4 OpenCL-C OpenCL compiler Binary files Binary files Binary files Multi-core Processors Run Run Map 5 6 7 8 9 12 10 11 4-Contributions 5-Framework 6-Power & Energy Modelling (prerequisite research) SOC Consumer Portable Power Consumption Trends Source: ITRS 2011-System Drivers Xilinx ZYNQ Host program OpenCL Runtime Accelerator PS PL ARM-OpenCL · Propose a new design paradigm in order to expose the concept of energy proportional computing to the application software · Considering the logic and power scalability in a variation-aware closed-loop configuration · Using a software centric approach based on OpenCL for describing applications · Using a processor-centric platform such as Xilinx All Programmable SoCs to implement applications 7-Power Gating (prerequisite research) Turning off PL (tfpl) PL turned off (pltf) Turning on PL (tnpl) PL reconfiguration (reconf) Request for turning off the PL t tfpl t pltf t tnpl t reconf Request for turning on the PL Restore state (rs) Store State (ss) t ss t rs 0 0.2 0.4 0.6 0.8 1 1.2 0 1000 2000 3000 4000 5000 6000 Voltage (V) Time (μs) 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.2 0.4 0.6 0.8 1 1.2 V CCINT (volt) Power on V CCINT Rail Power on V CCAUX Rail Power on (W) Power versus VCCINT voltage on ZC702 PL shut-down slop Accelerator configuration frequency voltage A simple motivation example is considered in this subsection. The example consists of a MicroBlaze soft processor core configured in the PL and the ARM processor available in the PS. The MicroBlaze runs a 32 x 32 floating point matrix multiplication. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Running Software Configuration Before Configuration VCCINT VCCAUX Power (W) VCCINT VCCAUX Other power lines Clock Logic Signals MMCMs Leakage Xilinx XPower Analyzer Real power measurement on ZC702 MicroBlaze power consumption System-level issues · Power modelling · Energy modelling · Power gating · Power scaling · Clock gating · Frequency scaling · Logic scaling Runtime System 12 PL Power for ME = a*V 2 +b*f*V 2 +c*V 3 High-Level Models should be: This section investigates the viability of physical power gating FPGA devices that incorporate a hardened processor in a different power domain. The run-time power gating approach is applied to Xilinx ZYNQ devices that incorporate a hardened Cortex A9 multi-processor. UCD9248 UCD9248 PS PL UCD9248 I2C PS-PL level shifter PS power rails PL power rails ZYNQ all Programmable SOC VIVADO & VIVADO-HLS System-Level Power Control Role of OpenCL runtime Choosing the best configuration and voltage/ frequency at run time. · scheduling · mapping algorithm. · Covers PL, PS and memories · Simple · Compositional · Descriptive Specially the PL High-Level Models should: · Describes different thread hardware and cores · Describes different accelerator modes · Idle · Data transfer · Active · Computational The results show that the minimum time that the FPGA fabric must remain in power-off state for the technique to be energy efficient is in the order of milliseconds and up to 96% power reduction occurs when the fabric voltage is lowered below critical level. Dynamic power caused by measurement modules with own frequency ME dynamic power Static power Motion Estimation (ME) MB PS PL ZYNQ (FPGA+ARM) (FPGA+ARM)

Upload: others

Post on 20-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • ENPOWER: An OpenCL Approach for Energy Proportional Computing With Heterogeneous and Reconfigurable Processors

    Mohammad Hosseinabady, Jose Nunez-Yanez

    Department of Electronic Engineering, University of Bristol, UKThis research is sponsored by EPSRC and done in collaboration with Queen's University Belfast

    {m.hosseinabady, j.l.nunez-yanez, }@bristol.ac.uk

    · Energy proportional computing applied to hybrid FPGA-ARM chip· This research proposes the concept of energy proportional computing

    includes scalability of a power triplet formed by · Voltage, · Frequency and · Logic (e.g. capacitance)

    · Applications in this design paradigm are written in an extended OpenCL

    1- Abstract 2- Motivations

    3-Goals

    ApplicationSpecification

    Host program(C/C++)

    Kernels(OpenCL-C or HDL)

    C/C++ files

    bitstreamsbitstreamsbitstreams

    C/C++ Compiler

    HDL files

    Vivado_HLS& Vivado

    Binary files

    PS PL

    Select a bitstream

    Kernels extraction

    1

    2

    3 4

    OpenCL-C

    OpenCL compiler

    Binary filesBinary

    filesBinary

    files

    Multi-core Processors

    Ru

    n

    Ru

    n

    Map

    5

    6 7

    8

    9

    12

    10

    11

    4-Contributions

    5-Framework

    6-Power & Energy Modelling

    (prerequisite research)

    SOC Consumer Portable Power Consumption Trends

    Sou

    rce:

    ITR

    S 2

    01

    1-S

    yste

    m D

    rive

    rs

    Xilinx ZYNQ

    Host program

    OpenCL Runtime Accelerator Accelerator Accelerator

    ARM-OpenCL

    PS PL

    ARM-OpenCL

    · Propose a new design paradigm in order to expose the concept of energy proportional computing to the application software

    · Considering the logic and power scalability in a variation-aware closed-loop configuration

    · Using a software centric approach based on OpenCL for describing applications · Using a processor-centric platform such as Xilinx All Programmable SoCs to implement

    applications

    7-Power Gating

    (prerequisite research)

    Turning off PL (tfpl) PL turned off (pltf) Turning on PL (tnpl) PL reconfiguration (reconf)

    Request for turning off the PL

    ttfpl tpltf ttnpl treconf

    Request for turning on the PL

    Restore state (rs)Store State (ss)

    tss trs

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    0 1000 2000 3000 4000 5000 6000

    Vo

    ltag

    e (

    V)

    Time (µs)

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    00.20.40.60.811.2

    VCCINT (volt)

    Power on VCCINT RailPower on VCCAUX Rail

    Po

    we

    r on

    (W)

    Power versus VCCINT voltage on ZC702

    PL shut-down slop

    Accelerator configuration

    freq

    uen

    cy

    volta

    ge

    A simple motivation example is considered in this subsection.The example consists of a MicroBlaze soft processor coreconfigured in the PL and the ARM processor available in thePS. The MicroBlaze runs a 32 x 32 floating point matrix multiplication.

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    Running Software

    Configuration

    Before Configuration

    VCCINT VCCAUX

    Po

    we

    r (W

    )

    VCCINT

    VCCAUX

    Other power linesClock Lo

    gic

    Sign

    als

    MMCMs

    Leakage

    Xilinx XPower Analyzer Real power measurement on ZC702

    MicroBlaze power consumption

    System-level issues· Power modelling · Energy modelling· Power gating· Power scaling· Clock gating· Frequency scaling· Logic scaling

    RuntimeSystem

    12

    PL Power for ME = a*V2+b*f*V2 +c*V3

    High-Level Models should be:

    This section investigates the viability of physical power gating FPGA devices that incorporate a hardened processor in a different power domain. The run-time power gating approach is applied to Xilinx ZYNQ devices that incorporate a hardened Cortex A9 multi-processor.

    UCD9248UCD9248

    PS

    PL

    UCD9248

    I2C

    PS-PL level shifter

    PS power rails

    PL power rails

    ZYNQ all Programmable SOC

    VIV

    AD

    O &

    V

    IVA

    DO

    -HLS

    System-Level Power Control

    Role of OpenCL runtime Choosing the best configuration and voltage/

    frequency at run time. · scheduling · mapping algorithm.

    · Covers PL, PS and memories· Simple· Compositional· Descriptive

    Specially the PL High-Level Models should:· Describes different thread hardware and cores· Describes different accelerator modes

    · Idle · Data transfer· Active· Computational

    The results show that the minimum time that the FPGA fabric must remain in power-off state for the technique to be energy efficient is in the order of milliseconds and up to 96% power reduction occurs when the fabric voltage is lowered below critical level.

    Dynamic power caused by measurement modules with

    own frequency

    ME dynamic power

    Static power

    Motion Estimation

    (ME)

    MB

    PS

    PL

    ZYNQ (FPGA+ARM)

    (FPGA+ARM)

    iwocl2014.vsdPage-1