lecture 7: duty cycling - instituto de computaçãolucas/teaching/mo801/2017-2/slides/ea… ·...
TRANSCRIPT
Lecture 7: Duty cyclingMO801/MC972 – Energy-Aware ComputingLucas Wanner – IC/[email protected]/eac
Lucas Wanner – IC/Unicamp Energy-Aware Computing 2
Agenda
• Revision: variability and dark silicon• Duty cycling• Concept and basic formulation• Variable power consumption• Duty cycling OS
Lucas Wanner – IC/Unicamp Energy-Aware Computing 3
Revision: variability
• Definition: “systematic and random variations in process, supply voltage and temperature” [Borkar, 2003]• Manufacturing beyond 90nm becomes probabilistic instead of
deterministic• Transistors with different channel length and threshold voltage
• Expanded definition: Variations between identically specified components due to manufacturing (process, vendors), environment (voltage, temperature), and aging• Effects of variability• Performance characteristics, e.g. clock speed• Reliability, e.g. device lifetime, error characteristics, gradual degradation• Power: Active (switching) and Sleep (leakage) power varies between parts
with identical specifications
Lucas Wanner – IC/Unicamp Energy-Aware Computing 4
Revision: variability
• To ensure effective use by software, we need accurate characterization (of performance, power).
• Variability imposes a limit on how accurate the models can get to • Mean error ~20% + 12% due to variability for 34% overall error in Nehalem 45nm
CPUs• 15-20% variation across 22 DIMMs• 20-24% read, 40-67% write variation in Flash• Rooted in inherent non-observability of power states.
• New regime of hardware/software operation• Machines built from parts with variations in performance, power and reliability• Machines that incorporate sensing circuits• Machines w/ interfaces to change ongoing computation & structures • New machine models: QOS or Relaxed Reliability parts Source: McCullough, UCSD
Adapted from Gupta, Variability Expedition
Lucas Wanner – IC/Unicamp Energy-Aware Computing 5
Revision: Dennard scaling
S = 1.4S3 ≃ 2.8
Adapted from Taylor, UCSD
Dennard: “We can keep power consumption constant”
S
S2
S3
1
S2 = 2x More Transistors
S = 1.4x Faster Transistors
S = 1.4x Lower Capacitance
Scale Vdd by S=1.4x S2 = 2x
Leakage issues prevent voltage scaling!
Lucas Wanner – IC/Unicamp Energy-Aware Computing 6
Revision: post-Dennard scaling
solution to dark silicon;3 it is merely indus-try’s initial, transitional response to theshocking onset of the dark silicon age. In-creasingly over time, the semiconductor in-dustry is adapting to this new designregime, realizing that multicore chips willnot scale as transistors shrink and that thefraction of a chip that can be filled withcores running at full frequency is droppingexponentially with each process genera-tion.1,3 This reality forces designers to ensurethat, at any point in time, large fractions oftheir chips are effectively dark—either idlefor long periods of time or significantlyunderclocked. As exponentially larger frac-tions of a chip’s transistors become darker,silicon area becomes an exponentiallycheaper resource relative to power and energyconsumption. This shift calls for new archi-tectural techniques that ‘‘spend’’ area to‘‘buy’’ energy efficiency. This saved energycan then be applied to increase performance,or to have longer battery life or lower operat-ing temperatures.
The utilization wall that causes dark siliconTable 1 shows the derivation of the utiliza-
tion wall1 that causes dark silicon.2,3 Itemploys a scaling factor, S, which is theratio between the feature sizes of two processes(for example, S ¼ 32=22 ¼ 1:4x between 32and 22 nm). In both Dennard and post-Dennard scaling, the transistor count scalesby S 2, and the transistor switching frequencyscales by S. Thus, our net increase in comput-ing performance is S3, or 2.8x.
However, to maintain a constant powerenvelope, these gains must be offset by a cor-responding reduction in transistor switchingenergy. In both cases, scaling reduces transis-tor capacitance by S, improving energy effi-ciency by S. In Dennard scaling, we canscale the threshold voltage and thus the oper-ating voltage, which yields another S 2 energy-efficiency improvement. However, in today’spost-Dennard, leakage-limited regime, wecannot scale threshold voltage without expo-nentially increasing leakage, and as a result,we must hold operating voltage roughly con-stant. The end result is a shortfall of S2, or 2"per process generation. This shortfall multi-plies with each process generation, resultingin exponentially darker silicon over time.
This shortfall prevents multicore frombeing the solution to scaling.1,3 Althoughadvancing a single process generation wouldallow enough transistors to increase corecount by 2", and frequency could be 1.4"faster, the energy budget permits only a1.4" total improvement. Per Figure 1, acrosstwo process generations (S ¼ 2), designerscould increase core count by 2" leaving fre-quency constant, or they could increase fre-quency by 2" with leaving core countconstant, or they could choose some middleground between the two. The remaining 4"potential remains inaccessible.
More positively stated, the true new poten-tial of Moore’s law is a 1.4" energy-efficiencyimprovement per generation, which could beused to increase performance by 1.4". Addi-tionally, if we could somehow make use ofdark silicon, we could do even better.
Although the utilization wall is based on afirst-order model that simplifies many fac-tors, it has proved to be an effective toolfor designers to gain intuition about the fu-ture, and has proven remarkably accurate(see the sidebar ‘‘Is Dark Silicon Real? A Re-ality Check’’). Follow-up work6-8 has lookedat extending this early work1,3 on dark sili-con and multicore scaling with more sophis-ticated models that incorporate factors suchas application space and cache size.
Dark silicon misconceptionsLet’s clear up a few misconceptions before
proceeding. First, dark silicon does not meanblank, useless, or unused silicon; it’s just
Table 1. Dennard vs. post-Dennard (leakage-limited) scaling.1 Incontrast to Dennard scaling,5 which held until 2005, under the
post-Dennard regime, the total chip utilization for a fixed powerbudget drops by S2 with each process generation. The result is anexponential increase in dark silicon for a fixed-sized chip under a
fixed area budget.
Transistor property Dennard Post-Dennard
D Quantity S2 S2
D Frequency S S
D Capacitance 1/S 1/S
V 2DD 1=S2 1
) D Power ¼ D QFCV 2 1 S2
) D Utilization ¼ 1/Power 1 1=S2
.............................................................
SEPTEMBER/OCTOBER 2013 9
Lucas Wanner – IC/Unicamp Energy-Aware Computing 7
Revision: approaches to handling Dark Silicon
• Dim silicon• Heavily underclocked parts of the chips• Inherently dark areas, e.g. caches• Turbo-boost: increase clock for short bursts of time• Near-threshold voltage computing (NVT)
• Higher susceptibility to PVT, leakage• Temporal dimness: e.g. switching between cores in Big.Little designs
• Specialization: Accelerators, specialized cores• Parallel with human brain• Very dark, low duty cycle, low voltage operation
Lucas Wanner – IC/Unicamp Energy-Aware Computing 8
Duty cycling
sleepactive
cp
Δ = c/p
↑ Δ ⇒ ↑Quality ↑Energyc↑ p↓
Lucas Wanner – IC/Unicamp Energy-Aware Computing 9
Duty cycle rate
• How can you determine duty cycle as a function of PA, PS, E, L ?
Active Power (PA) Sleep Power (PS)
Lifetime (L)
Energy (E)
Lucas Wanner – IC/Unicamp Energy-Aware Computing 10
Determining the lifetime for a given duty cycle
• Average power used by an application• PA: Active Power• PS: Sleep Power• Δ: Duty Cycle Rate
• Energy storage and lifetime• E: Battery capacity in Watt-Hours• L: Lifetime in hours
𝑃"#$%"&$ = Δ𝑃) + (1 − Δ)𝑃/
𝐿 = 𝐸
𝑃"#$%"&$
Lucas Wanner – IC/Unicamp Energy-Aware Computing 11
Determining the duty cycle rate for a target lifetime
• Average power used by an application• PA: Active Power• PS: Sleep Power• Δ: Duty Cycle Rate
• Maximum average power available for an application• E: Battery capacity in Watt-Hours• L: Lifetime in hours
• How to find the allowable duty cycle rate?
𝑃3"4 = 𝐸𝐿
𝑃"#$%"&$ = Δ𝑃) + (1 − Δ)𝑃/
Lucas Wanner – IC/Unicamp Energy-Aware Computing 12
Determining the duty cycle rate for a target lifetime
• Duty cycle the device at the maximum allowable power consumption
𝑃"#$%"&$ = 𝑃3"4 Δ𝑃) + (1 − Δ)𝑃/ =𝑃3"4
Δ(𝑃)−𝑃/) + 𝑃/ =𝑃3"4 Δ =𝑃3"4−𝑃/𝑃) − 𝑃/
Δ =𝐸𝐿 −𝑃/𝑃) − 𝑃/
Lucas Wanner – IC/Unicamp Energy-Aware Computing 13
Feasible Duty Cycle
Datasheet:Active PowerSleep Power
Variability
How to determine duty cycle when PA, PS vary with instance and temperature?
<c,p> = f (PA, PS, E, L)
Lucas Wanner – IC/Unicamp Energy-Aware Computing 14
Implications of Variation for Duty Cycling
• Scenario: deploy a network of sensors. All nodes have identical batteries, and should have identical lifetimes• If active and sleep power are constant for all instances, duty cycle can be
obtained trivially from
• Recall power variation in ARM Cortex M3• More than 8x in Sleep mode at room temperature• Around 10% in Active mode
• Uniform duty cycle across the network will be suboptimal
Δ =𝐸𝐿 −𝑃/𝑃) − 𝑃/
Lucas Wanner – IC/Unicamp Energy-Aware Computing 15
Duty cycle based on datasheet spec
• Use PA, PS from datasheet
0255075
100125150
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Slee
p Po
wer
(μW
)
Processor Instance
MeasuredDatasheet
will not meet lifetime
will leave energy untapped
Lucas Wanner – IC/Unicamp Energy-Aware Computing 16
Duty Cycle based on Worst-Case Power
• Use worst case PA, PS across all instances and target temperature
104478
112146180
0 10 20 30 40 50 60
Pow
er (μ
W)
Temperature (°C)
all the nodes will leaveenergy untapped
PS
Lucas Wanner – IC/Unicamp Energy-Aware Computing 17
Implications of Variation for Duty Cycling
ActiveMode:48MHzSamplingTask:10s
Battery:2xAA(5.4A-h)RoomTemperature
Lucas Wanner – IC/Unicamp Energy-Aware Computing 18
Implications of Variation for Duty Cycling
ActiveMode:48MHzSamplingTask:10s
Battery:2xAA(5.4A-h)Lifetime:20000hours
Lucas Wanner – IC/Unicamp Energy-Aware Computing 19
Variability-Aware Duty Cycling
• Instance dependent Duty Cycle
•PA(i) and PS(i) are instance-dependent active and sleep power•Assumes constant temperature
• Picking an arbitrary point in the DC vs temperature curve is suboptimal• Can we do better if we know something
about temperature in advance?
25% variation in DCin a single instancedue to temperature
Δ =𝐸𝐿 −𝑃/ (𝑖)
𝑃)(𝑖) − 𝑃/(𝑖)
Lucas Wanner – IC/Unicamp Energy-Aware Computing 20
Coping with Temperature-Dependent Variation
• If we knew the future: deploying a sensor network in Death Valley, CA (2009)
~30F Diurnal Variation
~50F Seasonal Variation
Lucas Wanner – IC/Unicamp Energy-Aware Computing 21
Coping with Temperature-Dependent Variation
• If we knew the future: deploying a sensor network in Death Valley, CA (2009)•Annual temperature variation of ~80F•Picking an arbitrary point in the DC vs temperature curve is suboptimal
• Assume there is perfect knowledge about future temperature•Temperature as a function of time: T(t)•Power as a function of instance and temperature: PA(i, T) and PS(i, T)•Power as a function of instance and time: PA(i, T(t)) and PS(i, T(t))
• How could you define duty cycle for each instance?
Lucas Wanner – IC/Unicamp Energy-Aware Computing 22
Instance and Temperature-Dependent Duty Cycle
Δ 𝑖 =𝐸𝐿 − 𝑃/
6 𝑖
𝑃)6 𝑖 − 𝑃/
6 𝑖
𝑃)6 𝑖 =
∑ 𝑃) 𝑖, 𝑇(𝑡);<=>
𝐿𝑃/6 𝑖 =
∑ 𝑃/ 𝑖, 𝑇(𝑡);<=>
𝐿
Lucas Wanner – IC/Unicamp Energy-Aware Computing 23
Relaxing the temperature knowledge assumption
• Having temperature as a function of time T(t) is not realistic•We can barely predict temperature for the next few days•Temperature distribution is easier to predict, can be learned over time
FigurebyJohnL.Daly,datafromNASAGoddardInstituteforSpaceStudies
Lucas Wanner – IC/Unicamp Energy-Aware Computing 24
Relaxing the temperature knowledge assumption
• From temperature as a function of time T(t) to frequency of temperature f(T)
Lucas Wanner – IC/Unicamp Energy-Aware Computing 25
Relaxing the temperature knowledge assumption
• From temperature as a function of time T(t) to frequency of temperature f(T)•Power as a function of instance and temperature: PA(i, T) and PS(i, T)•Temperature as a frequency distribution f(T)•Discretized temperature bins, e.g. one bin for each degree
𝑃)6 𝑖 =
∑ 𝑃) 𝑖, 𝑇(𝑡);<=>
𝐿 𝑃)6 𝑖 = ? 𝑃) 𝑖, 𝑇 ×𝑓(𝑇)
BCDE
B=BCFG
Lucas Wanner – IC/Unicamp Energy-Aware Computing 26
Variable Duty Cycle
• Operating each instance at a constant duty cycle may not be optimal• “Energy Cost” of remaining active at any time is determined by the difference between active and sleep mode power• If PA(i, T) - PS(i, T) changes with temperature T, “cost of activity” will be different for different temperatures
• Duty cycles for each temperature that maximize total active time can be found with a linear program
Lucas Wanner – IC/Unicamp Energy-Aware Computing 27
Variable Duty Cycle
taking worst-case characteristics into consideration, leave sig-nificant energy potential untapped. In this paper, we discussvariability-aware duty cycling methods that handle instanceand temperature-dependent variability.
Variation is expected to increase with scaling of silicontechnologies, and to become more pronounced in active aswell as standby modes [1]. In face of this variability, finding aduty cycle schedule can be seen as an optimization problem ina five-dimensional space: instance characteristics, temperature,supply voltage (if unregulated), recent activity, and aging. Insubsequent discussions, however, we focus only on the twoaspects with highest impact on current generation hardware:instance and temperature-dependent variation.
Our contributions are the following: (i) proposal and analy-sis of variability-aware duty cycle adaptation methods, and(ii) a duty cycle adaptation framework and abstraction forTinyOS, a popular operating system for embedded sensors.Our proposal and analysis of duty cycle adaptation methodsshows that ignoring instance and temperature-dependent vari-ability in low power embedded sensing systems leads to eitheruntapped energy potential, or unmet lifetime requirements. Wealso show how the common practice of reactively adjustingwork according to recent energy usage observations maylead to sub-optimal quality of service across the lifetime ofthe system. Finally, our duty cycle abstraction for TinyOSallows applications to explicitly specify lifetime and minimumduty cycle requirements for individual tasks, and dynamicallyadjusts duty cycle rates so that overall quality of service ismaximized.
II. RELATED WORK
Variations in power consumption can be interpreted aschanges in resource (energy) usage (and hence availability).Adaptation of work to resource availability is a common themeis embedded and real-time systems. In imprecise computa-tion [5] each task is designed to produce usable, approximateresults whenever resource scarcity (e.g. due to transient fail-ures or overloads) prevents the task from producing its desiredprecise result. Imprecise computation has been explored inthe context of energy-aware systems, where tasks may beinterrupted according to energy availability and lifetime re-quirements [11].
Similarly, several systems have explored the concept ofalternative task implementations with different resource us-age patterns and quality of service characteristics. Wheneverresources are available, tasks with higher quality of serviceare preferred to those with lower resource usage characteris-tics. Levels is an energy-aware programming abstraction forTinyOS based on alternative tasks [3]. With this abstraction,programmers define task levels, which provide identical func-tionality with different quality of service and energy usagecharacteristics. The run-time system dynamically choses thehighest task levels that will meet the required lifetime.
The issue of distributing available energy resources to taskshas also been explored in the literature. ECOSystem [12]introduced the concept of “currentcy” that tasks use to allocate
energy resources. The system periodically distributes currentcyto tasks, which adjust their workload according to availability.Cinder [7] is an energy-aware system for mobile computingdevices that features a Capacitor abstraction associated withtasks. Each capacitor represent a task’s right to request energyfrom the system to perform its operations. While we do notdeal with the issue of energy distribution to tasks explicitly inthis work, we could support schemes like currentcies or taskCapacitors by having our variability-aware power consumptionmodel be the source of available energy to these systems.
III. DUTY CYCLE SCHEDULING
A. Uniform Duty Cycle
A duty cycle schedule indicates the activity rate of a systemat any point in its lifetime. An optimal duty cycle schedulemaximizes the active time of the system across its lifetime,given an energy constraint. If there is no variability in powerconsumption, the optimal duty cycle schedule can be uniformacross the lifetime of the system. Given an energy budget ofE Joules, a lifetime of L seconds, and invariable constants foractive and standby power consumption PA and PS Watts, themaximum allowable allowed duty cycle DC is given in (1).
PA ·DC+PS · (1�DC) =EL
DC =E �L ·PS
L ·PA �L ·PS(1)
When instance and temperature-dependent variation is takeninto consideration, the worst-case uniform duty cycle can befound by applying the worst case active and stand-by powerconsumption across all instances and operating temperaturerange as constants PA and PS in (1).
B. Optimum Duty Cycle
With prior characterization, active and sleep power canbe expressed as functions of temperature PA(T ) and PS(T ).If the temperature profile is known (or can be learned) forthe lifetime of the system, temperature can be expressed asa frequency distribution. For a known operating temperatureprofile and a given processor instance, the problem of findingan optimum duty cycle can be formulated as a linear program.Given the expected frequency distribution of (discretized) tem-peratures across the lifetime of the application, the optimumduty cycle for all temperatures T , DCT is given by (2),where fT is the relative frequency of temperature T acrossthe lifetime L.
MaximizeTmax
ÂT=Tmin
DCT fT (2)
s.t.Tmax
ÂT=Tmin
fT · (PA(T ) ·DCT +PS(T ) · (1�DCT ))⇤EL
DCmin ⇤ DCT ⇤ DCmax, Tmin ⇤ T ⇤ Tmax
Lucas Wanner – IC/Unicamp Energy-Aware Computing 28
Programming duty-cycled systems
Active Power (PA) Sleep Power (PS)
Lifetime (L)
Energy (E)
while(1) {do_something(duration);sleep(time);
}
Lucas Wanner – IC/Unicamp Energy-Aware Computing 29
Reactive Duty Cycle• Duty cycle may also be adapted dynamically, based on resource availability• Typical adaptive strategy: adjust workload to resource availability•More resources available: higher quality of service• Imprecise computation (EDF)
• Tasks are divided into mandatory and optional parts. If there is sufficient processor time, run all optional parts, else, discard a fraction
• Same principle can be applied using energy as a resource
Lucas Wanner – IC/Unicamp Energy-Aware Computing 30
Reactive Duty Cycle• Duty Cycle can also be determined in a reactive fashion• At every decision point, estimate remaining available energy (battery capacity)• Analyze energy delta from time t-1 to current time t• Project expected lifetime from energy delta and remaining capacity• If there is an energy surplus, increase duty cycle• If there is an energy deficit, decrease duty cycle
• One potential model:
• Assumptions• Remaining battery capacity can be easily and accurately estimated• May be true for “smart” batteries, but not in general
• Energy delta will remain constant for a given duty cycle• Not true with temperature-dependent variability
where fT is the relative frequency of temperature T across thelifetime L, assuming discretized temperature bins. DCmin andDCmax are the minimum and maximum duty cycles allowedfor the application. The maximum duty cycle constraint can beused to limit duty cycles when increasing duty cycle beyond agiven rate would bring no further increase to quality of service.
B. Variability-Aware Uniform Duty CycleAssuming a uniform duty cycle DCT = DC⇤ independent
of temperature, we can determine DC⇤ that satisfies theconstraints given in (2).
DC⇤ = Min
"E�L · ÂTmax
T=TminPS(T ) · fT
L · ÂTmaxT=Tmin
(PA(T )�PS(T )) · fT,DCmax
#(3)
Moreover, it can be shown that when PA(T )�PS(T ) is con-stant across all T , DC⇤ is the uniform duty cycle that optimizesthe linear program in (2). We observed this to be practicallytrue for the current generation microprocessors, like the AtmelSAM3U, because (i) their sleep power consumption PS(T ) ismuch less than active power consumption PA(T ), and (ii) thePA(T ) is effectively constant as active mode leakage poweris insignificant for their fabrication technology, and switchingpower variation across the temperatures is negligible.
C. Reactive Duty CycleAllowable duty cycle rates can also be found dynamically
through measurements or estimations of past power consump-tion, given total energy capacity at the start of lifetime. Energyconsumption can be directly measured with dedicated moni-tors [26], inferred from remaining battery capacity [21], orthrough variability-aware models that estimate energy expen-diture by measuring conditions that affect power consumption,e.g. temperature and activity rates.
In a reactive model, duty cycle can be dynamically deter-mined at time t as a ratio of duty cycle at time t�1, accordingto energy spent from time t � 1 to time t, and remainingenergy in the system. Remaining energy at time t is givenby Et = E�Ât�1
i=0 Pi, where E is the total energy capacity andPi is power estimated or measured at time i. An example of areactive duty cycle adaptation model is given in (4).
DCt =Et · DCt�1
(Et �Et�1) · (L� t)(4)
The reactive model in (4) assumes that the power consump-tion rate for the previous time period is indicative of the powerconsumption for the remainder of lifetime of the system. Whilemore complex models could incorporate longer histories, anyreactive model will depend on accurate measurement of pastenergy consumption or estimation of remaining battery energy.
IV. DUTY CYCLE SCHEDULING IN TINYOSIn this section we present our design and implementation
of a duty cycle scheduling framework and abstraction forTinyOS [22]. TinyOS differs from traditional operating systemin that it is event-based. Applications respond to events (e.g.interrupts from hardware, incoming radio messages) with eventhandlers. These handlers should typically complete within
Application
Task(Adaptable Period)
Task (Adaptable Iterations)
Task (Non-Adaptable)
System
TaskVariablePeriod
TaskVariableIterations
Traditional Task
min, max period
min, max iterations
Adaptable Task <Label>
Adaptable Task <Label>
Hardware Signature Inference
Adaptable Task <Label>
DC SchedulerDuty Cycle = f(P , P , ...)sleep active
lifetime
Fig. 2. System Architecture for Variability-Aware DC Scheduling in TinyOS
a few hundred processor cycles. To execute long runningcomputations, applications post tasks, which work as deferredfunction calls. Each task runs to completion on a schedulerloop. Whenever the system has no tasks to schedule, it putsthe processor in sleep mode, waiting for the next interruptwhich will trigger new event handlers, and potentially newtasks. The event handler / background tasks model of TinyOSnaturally lends itself to duty cycled systems: event handlersand tasks represent active periods, empty scheduler queueslead to inactive periods. Nevertheless, there’s no explicitsupport for discovering and adapting duty cycle in TinyOS.
We introduce a new Duty Cycle Scheduler to TinyOS.Figure 2 shows our system architecture. A hardware signatureinference module provides power vs. temperature curves foreach processor instance. While in our work we assume thatthese curves are pre-characterized, extensions to this modulecould feature online learning through dedicated power meters,and take other variability vectors such as aging into account.
The scheduler determines allowable duty cycle based on:(i) sleep and active power vs. temperature curves providedby the hardware signature inference module, (ii) temperatureprofile for the application, which can be pre-characterized orlearned dynamically, (iii) lifetime requirement specified by theapplication, (iv) battery capacity, and (v) one of the schedulingmethods presented in section III.
To maintain compatibility with existing TinyOS tasks, weintroduce Adaptable Tasks. These tasks respond to events fromthe Duty Cycle scheduler that inform them of their currentand allowable duty cycle. However, the system does notenforce adaptation of these tasks. These are assumed to adaptaccording to the duty cycle change event from the scheduler.
Adaptable tasks are designed for flexibility. A module usingadaptable tasks may, for example, use an alternative functionmechanism, e.g. [21]. For standard applications, we providetwo additional classes of tasks which implement two commonadaptation scenarios: tasks with variable iterations and taskswith variable period. For the first class, the programmerprovides a function that can be invoked repeatedly a boundednumber of times within each fixed period. For the secondclass of tasks, the application programmer provides a functionrepresenting task functionality that is invoked once within eachvariable but bounded period of time. Internally, each of thesetasks use an adaptable task and unique identifier. The systemadjusts the number of iterations or period of the task based on
Lucas Wanner – IC/Unicamp Energy-Aware Computing 31
Determining Δ for each instance
Lifetime
Qua
lity
Guard-banded
Unde
rest
imat
ed
Optimal
Infeasible
Sub-Optimal
Objective: maximize active time for each instance subject to
energy capacity and lifetime
Lucas Wanner – IC/Unicamp Energy-Aware Computing 32
Knobs for Δ control in VaRTOS
App
for knob do // computation timesleep (constant - knob) // period
knob: app variable shared with OS
↑ knob value ⇒ ↑ Δ, ↑ quality
Lucas Wanner – IC/Unicamp Energy-Aware Computing 33
Sample task: adjusting computation time/* Task 1's quality is improved by extending a for loop (like ADC samples, etc.) */static void vExampleTask1( void *pvParameters ){
portTickType xLastExecutionTime = xTaskGetTickCount();
for( ;; ){
/* Enforce task frequency */vTaskDelayUntil( &xLastExecutionTime, TASK1_DELAY );
volatile unsigned long i, j, dummyVal;for( i=0; i<task1_knob; i++){
dummyVal = 0;for( j=0; j<1000; j++){
dummyVal += (((dummyVal+5)%3)*3)/2;}
}dummyVal = 0;
}}
Lucas Wanner – IC/Unicamp Energy-Aware Computing 34
Sample task: adjusting Activation Frequency/* Task 2's quality is improved by increasing task frequency (like sending radio messages, etc.) */
static void vExampleTask2( void *pvParameters ){
portTickType xLastExecutionTime = xTaskGetTickCount();
for( ;; ){
/* Enforce task frequency */vTaskDelayUntil( &xLastExecutionTime, 500/(task2_knob*0.1) );
task_body();
}}
Lucas Wanner – IC/Unicamp Energy-Aware Computing 35
Knobs for Δ control in VaRTOS
↑ knob value ⇒↑ Δ, ↑ quality
Qua
lity
/ Util
ity →
Knob Value / Duty Cycle →
xTaskCreate(..., &task_knob, min, max, priority);
Lucas Wanner – IC/Unicamp Energy-Aware Computing 36
Δ control in VaRTOS
1) Requirements
Hardware: power, temperature
App: knobs, lifetime,temperature profile
2) Model Training
T → PA, PS
knob ↔ time
3) Optimization
Maximize Δ
Assign knob values
Roughhistogram 40 points: 2.5% error LP + Greedy Opt.
Lucas Wanner – IC/Unicamp Energy-Aware Computing 37
Greedy optimization of knob valuessle
epac
tive
Task 1
Task 2
Task 3
Global Δ Global Utility
Lucas Wanner – IC/Unicamp Energy-Aware Computing 38
Recap: Choices in determining Δ
Lifetime
Qua
lity
Guard-banded (worst case)
Unde
rest
imat
ed(D
atas
heet
)Optimal
(Variability-Aware)
Infeasible
Sub-Optimal
Lucas Wanner – IC/Unicamp Energy-Aware Computing 39
Results: lifetime reduction with datasheet spec Δ
Lifetime: 1 year, Battery: 5400 mAhTemperature: Stovepipe Wells, CA, 2009
0
20
40
60
80
100
120
140
160
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Life
time
redu
ctio
n (d
ays)
Processor Copy
Average: 55 days
Lucas Wanner – IC/Unicamp Energy-Aware Computing 40
Results: energy untapped by worst-case Δ
Lifetime: 1 year, Battery: 5400 mAhTemperature: Stovepipe Wells, CA, 2009
0
10
20
30
40
50
60
70
80
90
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Rem
aini
ng b
atte
ry (%
)
Processor Copy
Average: 63%
0
500
1000
1500
2000
2500
3000
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Impr
ovem
ent (
%)
Processor Copy
Lucas Wanner – IC/Unicamp Energy-Aware Computing 41
Results: improvement over worst-case Δ
Lifetime: 1 year, Battery: 5400 mAhTemperature: Stovepipe Wells, CA, 2009
Average: 22x
Average for multiple temperature profiles: 6x
Lucas Wanner – IC/Unicamp Energy-Aware Computing 42
VaRTOS vs. Oracle
80 84 88 92 96 100
Best
Nominal
Worst
Utility vs. Oracle (%)
Tem
pera
ture
Best Nominal Worst
Lucas Wanner – IC/Unicamp Energy-Aware Computing 43
Duty cycling IoT devices
• Duty cycle must be able to detect event of interest
event
DC fails to capture events
DC tailored to event duration
Lucas Wanner – IC/Unicamp Energy-Aware Computing 44
Multiple low-power modes, wakeup latencies
• Wakeup latency vs. power tradeoff• Devices typically use (close to) full power during transitions
• Typical laptop• Sleep mode (wake up on keyboard, LAN)• Hibernation mode (state dump/restore, power off)• Independent duty cycling of peripherals (disk, wireless, etc.)
• Typical embedded processor: NXP LPC13xx Cortex M series• Sleep mode: clock gated, state preserved, peripherals active• Deep sleep mode: clock gated, state preserved, analog peripherals off• Deep power down mode: power gated, limited sources of wakeup
Lucas Wanner – IC/Unicamp Energy-Aware Computing 45
Moving to and from low power states
• Processor: set up sleep mode, halt, and wait for instruction• Example: ARM Cortex• Setup sleep mode by writing to specific registers• Setup an interrupt source (e.g. timer, push button)
• Available interrupt sources depend on sleep mode• Wait for interrupt (WFI)
• General-purpose: Advanced Configuration and Power Interface (ACPI) • D-States, C-States (more about this later in the course)
• Historic curiosity: look up the ”HCF” instruction• Halt and catch fire
Lucas Wanner – IC/Unicamp Energy-Aware Computing 46
Summary
• Duty cycle• Fraction of time in which the system is active• Average power consumption is a function of power in active mode, power
in sleep (inactive) mode, and duty cycle• Duty cycle ↔ Lifetime• Trivially determined for known power consumption• Complicated by variations in power
• Uniform duty cycle is suboptimal• Can be determined or learned for individual instances, power profiles
• Complicated by transition latencies• Complicated by multiple active/sleep states