kristoffer robin stokke, phd flir uas...wan, can, et al. "photovoltaic and solar power...
TRANSCRIPT
Power Modelling and Characterisation for Data Centers
Kristoffer Robin Stokke, PhD
FLIR UAS
Outline
• Introduction to datacenters in
smart grid
• Energy consumption and
modelling for Data Centers
• Power Saving Techniques
• Case Study: Power Modelling
for the Tegra K1 & X1
30.08.2018 3
Energy Consumption of Data Centers
• 1.1 % -> 1.5 % total (world) annual consumption (2010)
• Economic
– Cost of energy $ increases
• Environmental
– Greenhouse gas emissions
• Increasing demand for processing power & services
– TODO: Motivate with cisco forecast if possible
30.08.2018 4
[1] Info-Tech, “Top 10 energy-saving tips for a greener data center,” Info-Tech Research Group, London, ON, Canada, Apr. 2010.
[2] Dayarathna, Miyuru, Yonggang Wen, and Rui Fan. "Data center energy consumption modeling: A survey." IEEE Communications Surveys & Tutorials 18.1 (2016): 732-794.
[3] Koomey, Jonathan. "Growth in data center electricity use 2005 to 2010." A report by Analytical Press, completed at the request of The New York Times 9 (2011).
Example data center energy
consumption breakdown [1]
What is Power and Energy?
30.08.2018 5
Time [s]
GPU kernel
(program) launches
• Power is the rate of energy consumption
over time
•𝑤𝑜𝑟𝑘 (𝑒𝑛𝑒𝑟𝑔𝑦)
𝑡𝑖𝑚𝑒= Watts =
𝐽𝑜𝑢𝑙𝑒𝑠
𝑠𝑒𝑐𝑜𝑛𝑑
• Which means that energy 𝐸 [𝑊ℎ]..
• 𝐸 = 𝑡1𝑡2𝑃(𝑡) 𝑑𝑡 -> area under curve
• Intuition: With a battery of 10 𝑊ℎ (watt-
hours)
• You can draw 10 W for one hour
• Or 5 W for two hours..
𝑡0 𝑡1
Smart Energy Systems and Data Centers
30.08.2018 6
[1] Saad, Walid, et al. "Game-theoretic methods for the smart grid: An overview of microgrid systems, demand-side management, and smart grid communications." IEEE Signal Processing Magazine 29.5 (2012): 86-105.
[2] https://www.123rf.com/photo_52377624_stock-vector-renewable-energy-sources-vector-infographics-solar-wind-tidal-hydroelectric-geothermal-power-biofuel.html
Hydroelectric
Fossil / Nuclear
Solar
Wind
Geothermal
Data CenterEnergy
Storage
Fossil / Nuclear
Data Center
Data Center
Power distribution and
metering infrastructure
Solar Photovoltaic Panels
30.08.2018 7
PV PanelsData Center
Regulator /
Inverter
Limited Energy
Density Storage
Consumers
• Transient source of energy
• Limited energy storage, use now
• Predicting solar power usage
• Forecasting model
• Outputs irradiance I 𝑊
𝑚2
• PV power model
• Actual power output
• Challenge: Predicting availability
• Scheduling decisions
• Strategic planning for workloads
• Maintenance, off-line work
Solar power from a PV farm in Jutterland, Denmark
(2006), plotted over every day and time of day.
Wan, Can, et al. "Photovoltaic and solar power forecasting for smart grid energy management." CSEE Journal of Power and Energy Systems 1.4 (2015): 38-46.
Bacher, Peder, Henrik Madsen, and Henrik Aalborg Nielsen. "Online short-term solar power forecasting." Solar Energy 83.10 (2009): 1772-1783.
Solar Photovoltaic Panels (Cont.)
30.08.2018 8
• Forecasting dependencies, ex:
• Solar irradiance 𝐼𝑊
𝑚2
• PV cell temperatures 𝑡0• Cloud cover, humidity, wind
• PV dependencies, ex:
• Panel area 𝑆[𝑚2]• Regulator efficiency 𝛼, reflectivity..
• Prediction of solar irradiance
• Statistical, time series
• Neural networks
• Numerical Weather Prediction (NWP)
Forecasting Model
PV Power Model
𝑊
𝑚2Output irradiance ->
𝑃𝑅 = 𝛼𝑆𝐼[1 − 0.05 𝑡0 − 25 ]
Output PV power:
Wan, Can, et al. "Photovoltaic and solar power forecasting for smart grid energy management." CSEE Journal of Power and Energy Systems 1.4 (2015): 38-46.
Bacher, Peder, Henrik Madsen, and Henrik Aalborg Nielsen. "Online short-term solar power forecasting." Solar Energy 83.10 (2009): 1772-1783.
Renewable Energy in General• Availability changes over time, with little energy storage
– Solar, wind, hydropower (wave)
• Hydropower (dam), vehicle-to-grid
– More latent storage of energy
• Prediction of availability involves..
– A prediction of weather (wind, temperature, humidity, irradiance)
– ..which is fed to a model mapping environment to power
• Alternatively, just a statistical model (no actual physics)
• Challenging weather conditions (climate change?)
– Norway’s summer 2018 marked by exceptionally long dry periods and
warm weather.
– Meanwhile, it’s the wettest summer in Iceland.
– Opportunity for smart grid to use our resources smarter?
30.08.2018 9
May 2018 marked the warmest weather
ever measured in Oslo. Picture from the
meteorological institute at Blindern.
Source: Aftenposten.
Data Center Components
30.08.2018 10
• Mostly power from grid
• Diesel, solar, wind, hydrogen..
• Cooling
• CRAC (Computer
Room A/C)
• CRAH (Computer
Room Air Handler)
• Redundancy (UPS)
• Racks (IT equipment)
• Processors
• Storage
• Network
• Mainboards
• Air cooling through vents
• Lighting
Dayarathna, Miyuru, Yonggang Wen, and Rui Fan. "Data center energy consumption modeling: A survey." IEEE Communications Surveys & Tutorials 18.1 (2016): 732-794.
Simula’s Data Center
• Thermal picture + count of different hardware
• Touch/Explain heterogeneity
• Touch/Explain difficulties measuring all of it
30.08.2018 11
Data Centers: Overview
30.08.2018 12
Processors
CPU / GPU
Storage
HDD / SDD
Network /
Interconnect
Power
Conversion
Guest
OS
VM
Monitoring
Virtual Hardware
Disk / CPU / NICHost
OS
Service Applications
Internet
• Users access datacenter services
over the internet
• Requests handled by running
applications
• Applications usually run in virtual
environments
• Virtal CPU, GPU, disk, etc.
• Sand-box
SpotifyDropboxBrowsing
Virtu
al M
achin
eP
hysic
al M
achin
e
Challenges for Data Centers in Smart Energy Systems
30.08.2018 13
• Worldwide energy consumption, and the (climate) bill is large and increasing
• Large cooling and processing energy consumption
• Macroscopic view: smart energy systems can help these issues
• Buy and use cleaner / cheaper energy
• Partitioning / migrating tasks / virtual machines to other centers
• Delaying work until cleaner / cheaper energy is available
• Microscopic view: can we utilise the resources in a data center more efficiently?
• More energy-efficient processing and cooling
• Mandates understanding of energy consumption (models)
• Predicting energy availability (when / how much ± uncertainty)
• Understanding how the datacenter components consume energy
• Identify tradeoffs between performance and power usage
Managing Energy Consumption in Data Centers
30.08.2018 14
Feature extraction
Model Construction
Prediction / Validation
Model Usage
• Real / simulated system
• Measure component power
• Identify important consumers
• Cannot measure all
subcomponents (!)
• Need models for
research (!)
• Build models for
• Power, user
behaviour, weather
• Formal abstraction of
system
• Power model directs
optimisation approaches
• Scheduling, DVFS, task
placement etc • Validation, robustness, correctness
• Later, predicting
Heterogeneity
Measuring Power
30.08.2018 15
• PDU
• Individual socket readout
J. Smith, A. Khajeh-Hosseini, J. Ward, and I. Sommerville, “Cloudmonitor: Profiling power usage,” in Proc. IEEE 5th CLOUD Comput., Jun. 2012, pp. 947–948.
https://www.rackmountmart.com
http://rsta.royalsocietypublishing.org/content/372/2018/20130278
• IBM POWER7 server
• Automated System for Temperature and Energy Reporting
• Integrated Lights-Out
• Remote server power
monitoring
Power and Synchronisation
8/30/2018
16
[1] http://mlab.no/blog/2015/08/a-peek-in-the-lab-tegra-k1-power-and-voltage-measurements/
[2] Rice, Andrew, and Simon Hay. "Decomposing power measurements for mobile devices." Pervasive Computing
and Communications (PerCom), 2010 IEEE International Conference on. IEEE, 2010.
• Very few authors consider synchronisation
• The problem:
• «You» are the machine to be measured
• «Logging» is done externally
• There is latency between «your» events
and the actual measurements of the
effects of those events (timestamps)
• Causes
• Uneven time synchronisation between
«you» and the «logger»
• Internal latencies in the measuring device
• Electrical capacitance smooths out current
signature
«You»
«Logger»
«Measurement»
Rate-Based Power Models
• Power is correlated with utilisation levels (events per second) and
summed
– E.g. rate at which instructions are executed, or rate of cache misses
– Limited by availability of measurement
• Challenge: what are good hardware activity predictors?
– Excersise in learning how hardware works
8/30/2018 17
Xiao, Y. et. al., 2010. A System-Level Model for Runtime Power Estimation on Mobile Devices.
Dong, M. and Zhong, L., 2011. Self-Constructive High-Rate System Energy Modeling for Battery-Powered Mobile Systems.
S. Li et al., “The MCPAT framework for multicore and manycore architectures: Simultaneously modeling power, area, and timing,” ACM Trans. Archit. Code Optim., vol. 10, no. 1, pp. 5:1–5:29, Apr. 2013.[1] T. Li and L. K. John, “Run-time modeling and estimation of operating system power consumption,” in Proc. ACM SIGMETRICS Int. Conf. Meas. Model. Comput. Syst., 2003, pp. 160–171.
𝑃𝑡𝑜𝑡 = 𝛽0 +
𝑖=1
𝑁𝑝
𝛽𝑖𝜌𝑖
Events per second
Cost (𝑊
𝐸𝑣𝑒𝑛𝑡 𝑝𝑒𝑟 𝑠𝑒𝑐𝑜𝑛𝑑)
Constant base power
Example breakdown for OS functions. [1]
State-Based Power Models
8/30/2018
18
• Abstracting hardware into set of states 𝑺• Ex. CPU core: Off, Idle, Active
• Each state has a constant power draw 𝑷𝒔
• Often accompanied by transition costs
• Transitions also cost time
𝐸𝑐𝑜𝑚𝑝,𝑆 =
𝑠∈𝑆
𝑃𝑠𝑇𝑠 +
(𝑢,𝑣)∈𝑆
𝐶𝑢,𝑣𝑛𝑢,𝑣
Energy of
component S
Total time
spent in state S
CPU core
Off
ActiveIdle
Energy cost of
transition from state
u->v
Number of
transitions from u->v
𝑃𝑜𝑓𝑓 = 0𝑊
𝑃𝑎𝑐𝑡 = 600 𝑚𝑊𝑃𝑖𝑑𝑙𝑒 = 15 𝑚𝑊
𝐶𝑎𝑐𝑡→𝑖𝑑𝑙𝑒 = 2𝑛𝑊ℎ
Summary of Power Model Types
19
Regression is the «de facto» method to estimate model coefficients!
Central / Graphical Processing Units
8/30/2018 20
Generic CPU multicore architecture.
Typical predictors 𝜌𝑖 (access rates):
• Instructions
• Floating point, integer, branch
• Cache, L1+L2(+L3)
• Cache references
• Cache misses
• Texture (GPU)
• RAM (GPU-integrated)
• Dedicated counters
• Clock frequency
• Usually synchronous
• Some predictors indirectly
«measure» off-chip energy
consumption!
𝑃𝑡𝑜𝑡 = 𝛽0 +
𝑖=1
𝑁𝑝
𝛽𝑖𝜌𝑖
Rate-based model:
DRAM Power : Typical Rate-Based Models
8/30/2018 21
𝑃𝑑𝑑𝑟 = 𝑃𝑠𝑡𝑎𝑡𝑖𝑐 + 𝛼𝑟𝑒𝑎𝑑𝑢𝑟𝑒𝑎𝑑 + 𝛼𝑤𝑟𝑖𝑡𝑒𝑢𝑤𝑟𝑖𝑡𝑒
𝑃𝑑𝑑𝑟 = 𝐷𝑆𝑅𝜎 + 𝐸𝑟𝑤𝜌𝑟𝑤+D𝐸𝑎𝑝𝑓𝑎𝑝
Read / Write throughput
Static power
Read/write
energy per bit
Energy needed to
activate / precharge
a row
Other predictors
• Refresh cycles
• Idle & active power
• What about clock frequency?
J. Lin, H. Zheng, Z. Zhu, H. David, and Z. Zhang, “Thermal modeling and management of DRAM memory systems,” SIGARCH Comput. Archit. News, vol. 35, no. 2, pp. 312–322, Jun. 2007.
N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, and W. Ye, “Energy-driven integrated hardware-software optimizations using simplepower,” SIGARCH Comput. Archit. News, vol. 28, no. 2, pp. 95–106, May 2000.
Rotating Disks
8/30/2018 22
Power of rotating disk
Angular velocityRadiusPlatters
Read energy ∝ 𝐿3 (logical block number)
Rate-based model (active, seek, idle)
Time in idle I
(kind of state-based)
Requests N Seeks N
Rotating HDD states and transitions.
Solid State Drives
8/30/2018 23
SSD SLC (Single-Layered Cell) state diagram.
• Transistor-based
• SLC and MLC
• Same types as found in
embedded (EMMCs)
• Rate-based model predictors
• Transition costs
• Program <-> Read
• Program <-> Write
• Utilisation
• Programming sector
• Reading sector
• Erasing sectorSSD MLC (Multi-Layered Cell) state diagram.
Network Switches (Ethernet)
8/30/2018 24
Switch ∈ 𝑉
Switch ∈ 𝑉
Switch ∈ 𝑉
N
N
N
NN
Link ∈ 𝐸
State-based model (!) :
Cost of active link (u,v) [W] Cost of active switch (u) [W]
Link / Switch active?
𝑃𝑛𝑒𝑡
Rate-based model for per-bit-energy in switch network:
Bit-processing
energy in stage iBit-processing
energy in output
of stage i
Bit-processing
energy in
access network
Routers / Network Interfaces
8/30/2018 25
Router / switch architecture
𝑃𝑟𝑜𝑢𝑡𝑒𝑟 = 𝑃𝑏𝑎𝑠𝑒 + 𝐸𝑝𝑘𝑡𝑅𝑝𝑘𝑡 + 𝐸𝑠𝑓𝑅𝑏𝑦𝑡𝑒
𝑃𝑏𝑎𝑠𝑒 = 𝑃𝑐𝑡𝑟𝑙 + 𝑃𝑒𝑛𝑣 + 𝑃𝑑𝑎𝑡𝑎
Processing energy
per packetPer-byte store
and forward energy
Data PlaneEnvironmental Plane
Control
Plane
• Conventional (ethernet)
• 𝑃𝑒𝑡ℎ = 𝑇𝑖𝑑𝑙𝑒𝑃𝑖𝑑𝑙𝑒 + (𝑇𝑎𝑐𝑡𝑖𝑣𝑒𝑃𝑖𝑑𝑙𝑒)𝜌
• Router
Power Distribution Loss
8/30/2018 26
𝑃𝑢𝑝𝑠/𝑝𝑑𝑢 = 𝑃𝑖𝑑𝑙𝑒 + 𝜶𝑃𝑑𝑒𝑙𝑖𝑣𝑒𝑟𝑒𝑑
Power delivered by UPS or
PDU to consumers
Inefficiency constant
Power Delivery
According to this, the only way
to reduct loss in a UPS or PDU
is to deliver less power.
An Example of Cooling Failure
• Picture of processor temperature
8/30/2018 27
Cooling Systems - Inefficiency
30.08.2018 28
𝑃𝑎𝑐= 𝑃𝑡𝑜𝑡
𝐶𝑜𝑃(𝑇𝑠𝑢𝑝)
𝑇𝑠𝑢𝑝
𝑇𝑠𝑢𝑝
The higher supplied temperature, the
worse energy performance of the A/C
unit
(in example: HP Labs CRAC units)
Tang, Qinghui, Sandeep Kumar S. Gupta, and Georgios Varsamopoulos. "Energy-efficient thermal-aware task scheduling
for homogeneous high-performance computing data centers: A cyber-physical approach." IEEE Transactions on Parallel
and Distributed Systems 19.11 (2008): 1458-1472.
Computing powerChilled Room AC
(CRAC) power
𝐶𝑜𝑃 =𝑄
𝑊
Heat removed [W]
Work required [W]
Minimising Contribution to Heat Buildup
30.08.2018 29
• n chassis
• Each chassis contributes to heating
up inlet temperatures
• Goal
• Reschedule VMs on machines
• Minimise total contribution to inlet
temperature
• Model for each chassis’ contribution
to inlet temperature
Inlet tempearture
Chassis idle power dissipation b
Active (100 %) processor
power dissipation a
Task distribution
vector c
𝑡𝑠𝑢𝑝Unknown
𝒅𝒆𝒈𝒓𝒆𝒆𝒔
𝒘𝒂𝒕𝒕D
30.08.2018 30
• Hybrid cooling (CRAH)
• Chiller active
• Free cooling
• Challenge
• Dynamically select approach
• Few cooling transitions
• Dynamically position VMs
Cooling power breakdown (model)[1]
[1] S. Ghosh, S. Chandrasekaran, and B. Chapman, “Statistical modeling of power/energy of scientific kernels on a
multi-gpu system,” in Proc. IGCC, Jun. 2013, pp. 1–6.
Hybrid Cooling
Dynamic Shutdown of Servers
30.08.2018 31
• Core idea
• Turn off physical machines
• Eliminate static / dynamic power
• VMs migrated between PMs
• Threshold value controls decision
• Can impair QoS and violate SLA
• Question:
• Is it a good idea to impose a lot of
processing on a single PM?
• (contention, voltage scaling, locking /
other resource usage, disk etc)
CPU Util [%]
Upper threshold
Lower threshold
Too low utilisation.
About to loose all it’s VMs,
get powered off
Too high utilisation.
VMs will get migrated
To other PMs.Beloglazov, Anton, and Rajkumar Buyya. "Adaptive threshold-based approach for energy-efficient
consolidation of virtual machines in cloud data centers." MGC@ Middleware. 2010.
VM Provisioning at Perspective
30.08.2018 32
• Re-allocate VMs to..
• ..turn off physical machines.
• ..minimise traffic overhead,
and avoid hotspots.
• ..minimise contribution to heat
buildup.
• Predict future workload
• Validation methodology (usually
simulations)
• PlanetLab
• CloudSim
• GreenCloud
Kliazovich, Dzmitry, Pascal Bouvry, and Samee Ullah Khan. "DENS: data center energy-efficient network-aware scheduling." Cluster computing 16.1 (2013): 65-75.
Liu, Haikun, et al. "Performance and energy modeling for live migration of virtual machines." Proceedings of the 20th international symposium on High performance distributed computing. ACM, 2011.
Maximise traffic overhead
Maximise contribution to heat
Prevent nodes from shutting off
They actually don’t
work together at all
High-Precision Power Modelling for the Tegra K1 & X1 SoCs
30.08.2018 33
Jetson TX2 / TX1 blade server.
System-on-Module (SoMs)
mounted in a 1U server.
http://connecttech.com/product/jetson-tx2-tx1-array-server/
Tegra K1/X1: Hereogeneous Multicore 28 nm SoC
• Tegra family of mobile Systems-on-Chip
(SoC), < 12 W power usage
• (Tegra 2, 3, 4..)
• Tegra K1 & Tegra X1
• Programmable GPU (CUDA)
• Power management capabilities
8/30/2018 34
Tegra K1 Tegra X1
CPU
High Performance 4 x ARM Cortex-A15 4 x ARM Cortex-A57
Low Power 1 x ARM Cortex-A15 4 x ARM Cortex-A53
GPU 192-Core Kepler 256-Core Maxwell
Memory 2 GB (Jetson-TK1) 4 GB (Jetson-TX1)
CMOS Devices : Static and Dynamic Power
𝑃𝑟𝑎𝑖𝑙 = 𝑃𝑠𝑡𝑎𝑡 + 𝑃𝑑𝑦𝑛
𝑃𝑠𝑡𝑎𝑡 = 𝑉𝑟𝑎𝑖𝑙𝐼𝑙𝑒𝑎𝑘 𝑃𝑑𝑦𝑛 = 𝛼𝐶𝑉𝑟𝑎𝑖𝑙2 𝑓
• Power on a rail can be described using
the standard CMOS equations
• Rail voltage 𝑉𝑟𝑎𝑖𝑙• Increases with clock frequency
• Total power
• ..is the sum of power of all rails
Transistor leakage
Capacitive load per cycle
Cycles per second
Tegra K1 SoC power distribution.
(FYI: billions of CMOS transistors)
Clock Frequency and Rail Voltage
• Clock frequency, rail voltage and power
usage are deeply coupled• For certain clocks...
• ...increasing clock frequency increases
voltage, and vice versa
• From previous slide: power ∝ 𝑉2
Measured Average Power
Measured GPU Rail Voltage
𝑃𝑠𝑡𝑎𝑡 = 𝑉𝑟𝑎𝑖𝑙𝐼𝑙𝑒𝑎𝑘 𝑃𝑑𝑦𝑛 = 𝛼𝐶𝑉𝑟𝑎𝑖𝑙2 𝑓
Building High-Precision Power Models
• Main innovation
– Express switching activity in terms of measurable hardware activity
– Consider voltages on all rails
– Consider core- and rail-gating
• What constitutes good hardware activity predictors?
– 𝜌𝑅,𝑖 can be cache misses, cache writebacks, instructions, cycles..
– Should cover all hardware activity on a rail
𝑃𝑟𝑎𝑖𝑙 = 𝑉𝑟𝑎𝑖𝑙𝐼𝑙𝑒𝑎𝑘 +
𝑖=1
𝑁𝑅
𝐶𝑅,𝑖𝜌𝑅,𝑖𝑉𝑅2
Number of utilisation
predictors on rail R
Capacitive load
per event per second
Hardware utilisation predictor
(events per second)
38
Memory Controller
Clock cycles per second
LP
Core
32K $D$I
512MB L2
HP
Core
32K $D$I
HP
Core
32K $D$I
HP
Core
32K $D$I
HP
Core
32K $D$I
2048 MB L2
L2 – RAM cache traffic
Rail voltage
Rail voltage
Clock cycles
1 GB DDR3
External Memory Controller
128 KB L2
64 KB L1
Rail voltage
Clock cycles
Cache reads
Cache r/w
Instructions
• Integer
• Single-precision floating point
• Double-precision floating point
• Conversion
• Control
• Misc
Global instructionsCore gating
1 GB DDR3Active clock cycles
Rail voltage
Model Predictors – Overview* utilisation units in [events per second]
L1 – L2 cache traffic
Rail gating
Rail gating
8/30/2018 39
Model Training Methodology
CPU GPU
RA
M
L1-L
2
L2-R
AM
INT
FP
U
NE
ON
L2
L1
INT
F3
2
F6
4
CN
V
MIS
C
CPU
Idle CPU
L1 wb
L1 refill
Mmul-int
Mmul-f32
Mmul-neon
GPU
L2 read
L1 read
L1 write
RAM
Integer
Single-precision
Double-precision
Conversion
Misc
Components
under explicit stress
Benchmarks
For each hardware config
For each mem-frequency
For each [C/G]PU-frequency
• run ( benchmark)
• Sample voltages
• Sample predictors
• Sample power
• Advantages– Ensures diversity in model predictors
(hardware access rates)
• Very helpful for regression
– Triggers changes in voltages across the
platform
– Helps vary hardware utilisation and model
predictors
• Disadvantages
– Takes a long time to run
• X CPU configurations
• Y GPU configurations
Model Coefficient Comparison
40
Leakage Currents
CPU Dynamic Power
CPU Instruction Power GPU Dynamic Power
Leakage in CMOS: Subthreshold leakage (Tegra X1)
8/30/2018 41
• Subthreshold leakage
• Transistor off-state leakage
• The smaller gate width W..
• The more significant it is
• Current flows from source to drain..
• Temperature-dependent (𝑽𝟎)
source
drain
gate
Thermal voltage given by temp T
Transistor supply
voltageTransistor threshold voltage
number of transistors * gate width
Tegra X1 power versus average SoC
temperature and GPU voltages.
Decent indication of
temperature-dependent
leakage
Leakage in CMOS: Gate-Oxide Leakage (Tegra X1)
8/30/2018 42
• Gate-Oxide leakage
• Transistor off-state leakage
• The smaller gate width W..
• The more significant it is
• Quantum-tunneling
• Current flows through di-electric layer
• S-D channel -> gate
source
drain
gate
Transistor supply voltage Oxide thickness
Power usage over voltages and different
average SoC temperature ranges.
A Subthreshold Leakage Model for the X1
43
Thermal voltage coefficient 𝜶 𝟑.𝟎𝟏 ∗ 𝟏𝟎−𝟑
𝑁𝑡𝑟𝑎𝑛𝑠 ∗ 𝐾𝑤𝑖𝑑𝑡ℎ (Core rail) 11.20
𝑁𝑡𝑟𝑎𝑛𝑠 ∗ 𝐾𝑤𝑖𝑑𝑡ℎ (GPU rail) 22.31
𝑁𝑡𝑟𝑎𝑛𝑠 ∗ 𝐾𝑤𝑖𝑑𝑡ℎ (Per-CPU core) 5.44
Estimated and measured GPU (idle) power usage.
Example leakage over temperature and voltage ranges.
• Modelled using non-linear least
squares solver in python
• Validated using dedicated power
measurement sensors on board
Video Processing Filters
• Drone processing live video
stream
– Debarreling
– Frame rotation
– Motion vector search
– Compression (DCT)
– Quantisation
– Entropy encoding
• Goal: divide workloads between
cores
– To achieve high energy efficiency44
Rotation
filter
60 FPS
«Shaky video»
Frame
stream
Debarre
l filter
Performance Per Watt (PPW)
45
Workloads: SIFT, BLAS, face
recognition and tracking
Processor A
CPU
Processor B
GPU, DSP, ...
System-on-Chip
• Tegra 2
• Tegra 250
• Tegra 3
• Samsung S4
• Samsung Note II
• Nexus 7
• OMAP 3530
• Reported
• Increased performance
• Increased performance-per-
watt
• Performance Per Watt (PPW)
• E.g. frames per second per
watt
• Common methodology:
• Measure performance and
power
• Test duration: until done
(!)
Example: DCT
46
100 % CPU 100 % GPU
NVIDIA
GK20AARM
Cortex A15
DCT frame
30 %
8x8
macroblock
• Process 80 DCT frames
• 1920x1080
• Can offload macroblocks
• CPU GPU
• Fixed
• Number of CPU cores
• Frequencies
• Test runs until 80 frames
processed
• Power measured for
this duration
Highest PPW
(or is it?)
To GPU
Example: DCT
47
Detailed power
component breakdown
• 10 % offloading seems better (PPW) because:
• At 10 % offloading, benchmarks finish earlier less idle
(unavoidable) energy
• 100 % GPU offloading is now best
• Less instruction energy
• Always run energy benchmarks for the same duration
Baser
pow
er
Single Filter: DCT Offloading
48
• Offloading 10 % to CPU
• Reduced GPU clock
power
• Reduced GPU
voltage
• Increased CPU clock
and instruction power
• 5 % energy saving
• (only)
Low
er fre
quencie
s
Multiple Filter: DCT Offloading
49
• CPU busy
• Feature search
• Huffman encoding
• No saving possible
• CPU frequency: 1.8 GHz
• CPU voltage: 0.94 V (0.82 V in idle)
• All instructions & cycles cost more!
𝑃𝑐𝑝𝑢 ∝ 𝑉𝑐𝑝𝑢2
The Right Core for the Right Job
50
• Better to select a single core for a
job
• Offloading beneficial in very
constrained and specific cases
• Filters on the right process full-HD
video at 25 FPS
• Implementations have an affinity
with different cores
• Tightly coupled with performance
• Better performance; less
cycles
(25 FPS QoS requirement)