quickrecall: a low overhead hw/sw approach for enabling computations across power cycles in...

7/26/2019 QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently P

1/6

QUICKRECALL: A Low Overhead HW/SWApproach for Enabling Computations across Power

Cycles in Transiently Powered Computers

Hrishikesh Jayakumar, Arnab Raha, and Vijay RaghunathanSchool of Electrical and Computer Engineering, Purdue University

Email: {hjayakum, araha, vr}@purdue.edu

AbstractTransiently Powered Computers (TPCs) are a newclass of batteryless embedded systems that depend solely on energyharvested from external sources for performing computations.Enabling long-running computations on TPCs is a major challengedue to the highly intermittent nature of the power supply (oftenbursts of< 100ms), resulting in frequent system reboots. Priorwork seeks to address this issue by frequently checkpointingsystem state in flash memory, preserving it across power cycles.However, this involves a substantial overhead due to the high

erase/write times of flash memory. This paper proposes the useof FRAM, an emerging non-volatile memory technology thatcombines the benefits of SRAM and flash, to seamlessly enablelong-running computations in TPCs. We propose a lightweight,in-situ checkpointing technique for TPCs using FRAM thatdecreases the time taken for saving and restoring a checkpointto only 12.6s, which is over two orders of magnitude lowerthan the corresponding overhead using flash. We have imple-mented and evaluated our technique, QUICKRECALL, using theTI MSP430FR5739 FRAM-enabled microcontroller. Experimentalresults show that our highly-efficient checkpointing translates toa significant speedup (1.4x - 4.5x) in program execution time.

I. INTRODUCTION

Transiently powered computers (TPCs) [1] represent a newclass of ultra-low power embedded computing platforms that

are batteryless and rely solely on external power sources fortheir energy supply. Examples of such TPCs include computa-

tional RFID tags [2], batteryless sensors [3], etc. Successfullyperforming computations on TPCs is a major challenge due tothe unpredictable and highly intermittent nature of the power

supply. For example, a TPC may receive power in smallintermittent bursts (often less than 100ms), far lower than thetime required to execute most programs.

Existing techniques to address this challenge are based onthe idea of frequent checkpointingof system state. When power

loss is imminent, a snapshot (checkpoint) of system state(e.g., processor registers, contents of SRAM) is stored to flashmemory, which is non-volatile. During the next burst of power,

the system reboots, restores state from the stored checkpoint,and resumes program execution. Thus, long-running programs

execute gradually, in small increments, as and when powerbecomes available. However, checkpointing to flash involves asignificant time and energy overhead due to the high erase/writetimes of flash memory (tens of ms). As a result, a big portion

of the time when a TPC receives power (henceforth referredto as the ON time) is spent performing checkpointing, whichlimits the amount of time available for program execution.

More importantly, if the ON time is less than the time requiredfor storing and retrieving checkpoints, the TPC can neversuccessfully complete program execution.

Recent advances in semiconductor technology have resulted

in new forms of memory technologies such as Ferroelectric

RAM (FRAM), Magnetoresistive RAM (MRAM), etc., thatcombine the speed, flexibility, and endurance of SRAM with

the non-volatility of flash, all at a very low power consumption.This has led to the possibility of unified memory where thesame type of memory technology is used as RAM and for non-

volatile program and data storage. Low power microcontrollerswith integrated FRAM are already commercially available. For

example, the TI MSP430FR5739 has 16KB of FRAM that canbe used as unified memory [4]. This paper makes a case for(and demonstrates the benefits of) using such emerging non-volatile memories in TPCs. Specifically, this paper makes the

following contributions:

To the best of our knowledge, this is the first work

to investigate the use of emerging non-volatile memorytechnologies (specifically FRAM) in TPCs to seamlesslyenable long-running computations in the presence of fre-

quent power interruptions.

We propose a lightweight, in-situ checkpointing tech-

nique, called QUICKRECALL, for TPCs that use FRAM.QUICKRECALL can save and restore a checkpoint in just

12.6s, which is over two orders of magnitude lower

than the corresponding overhead using flash memory. We have implemented QUICKRECALL using a TI

MSP430FR5739 microcontroller and evaluated it using

three typical embedded application programs. Experimen-tal results show that the highly efficient checkpointing inQUICKRECALLresults in a significant reduction (as much

as 4.5x) in program execution time, compared to a state-of-the-art flash-based checkpointing technique.

QUICKRECALL enables TPCs to perform computations

when ON times are as small as 5ms, as compared toprevious flash-based checkpointing methods which requirea minimum of 15ms ON time.

The remainder of this paper is organized as follows. SectionII describes related work. Section III makes the case for

using FRAM as unified memory in TPCs. Section IV presents

the design requirements (and tradeoffs) for enabling efficientcheckpointing in TPCs that use FRAM. Section V describesour implementation of QUICKRECALL. Section VI presents ourexperimental results and Section VII concludes the paper.

I I . RELATEDW OR K

Checkpointing schemes have long been used for fault tol-

erance in large-scale distributed systems. Checkpointing, per-formed at previously determined trigger points in the program,stores a snapshot of system state in non-volatile memory. In

case of a fault, the system rolls back to the most recent

2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems

1063-9667/14 $31.00 2014 IEEE

DOI 10.1109/VLSID.2014.63

330


2/6

checkpoint and continues execution [5], [6]. Trigger points

are usually periodic in nature or programmer-inserted. Whilecheckpointing an application in this manner ensures a roll-

back point, it impedes normal program execution and causesadditional overhead. A checkpointing scheme for large-scalesystems using FRAM was previously explored in [7]. However,

the underlying design goals and targeted systems are very

different from those considered in this paper.Mementos [1] is a checkpointing solution aimed at TPCs.

It instruments user-written code at compile time with triggerpoints, which compare the supply voltage with a threshold and

trigger a checkpoint if the supply voltage is less than the thresh-old. Trigger points are inserted at the end of each iteration of aloop or after a function return statement. Mementos addresses

the extra overhead by using a timer to periodically enabletrigger points. Typical checkpointing methods, like Memen-

tos, use flash memory for storing checkpoints. Flash memoryerases/writes are cumbersome due to the large performance andenergy overhead they present. The overhead for checkpointing

the same program differs due to the variable stack depth at eachtrigger point. In contrast, by using an emerging non-volatilememory (NVM), QUICKRECALLsidesteps the data transfer la-

tency and maximizes the time available for computation in eachpower cycle. Finally, Idetic [8] targets ASIC implementationsof applications and embeds checkpoints during the behavioralsynthesis process. In contrast, QUICKRECALL enables and

maximizes the execution time of application in off-the-shelfmicrocontrollers utilizing an emerging NVM.

I I I . MOTIVATION

Recent years have witnessed the emergence of non-volatile

memories (NVMs) such as Ferroelectric RAM (FRAM) andMagnetoresistive RAM (MRAM). In addition to being non-volatile, these memories have distinct advantages over flash in

terms of power consumption, performance, endurance, etc.

A. Ferroelectric RAM: A candidate for unified memory

The significant overhead in performance and energy of flash,

due to its inherent device limitations, is the primary motivatorfor employing FRAM in embedded systems. Flash memory bit-cells can only be written from logic 1 to logic 0. Writing a logic

1 to a cell that was previously set to logic 0 requires the flashbitcell to be erased first. Depending upon the flash memorysize and architecture, the smallest memory unit for erasure can

vary. As an example, for the MSP430F2132 microcontrollerused in the BlueWISP RFID platform, the smallest erasableunit is a segment of size 512 bytes and erasing it takes 10msto 18ms. Moreover, an erase operation requires higher voltageand, therefore, is energy expensive [9]. An FRAM memorycell, is DRAM-like in structure and uses the polarization on a

ferroelectric capacitor to distinguish between the logic states

[10]. Thus, FRAM is random-access for reads and writes andrequires no erase operations. Even though FRAM involves a

destructive read, the write-back is hidden and instantaneous,thereby presenting almost no latency overhead to the system.

Consequently, while flash memories present asymmetric read-write latencies, FRAM access latencies are symmetric. Anotherlimitation of flash memory is the limited endurance that it has.

While the endurance limit for flash memory is around 105

erase/write cycles, FRAM devices have an endurance almost10 orders of magnitude greater than flash [7].

B. Unified Memory for TPCs

Converting any embedded program into an executable binary

involves the steps of compiling, assembling and linking. Theassembler creates object files for each source file of theprogram. The different object files contain program information

in contiguous memory locations called sections. The linker istasked with combining sections, across multiple object files,

into a single executable file as well as mapping these sectionsinto memory. Conventionally, the linker allocates the uninitial-ized sections onto the RAM for run-time initialization, whereasthe global/static variables that are initialized and the program

code reside on the ROM. For example, in the MSP430 mi-crocontroller, the bss, data, sysmem (heap), and stacksections reside in the RAM while all the other sections are

allocated to the ROM. While the previous subsection estab-lished the advantages of FRAM over flash, its random access

and write-in-place properties also allow FRAM to be utilized asRAM, thus enabling it to serve as a unified memory technology.

IV. DESIGNM ETHODOLOGY

Next, we discuss the requirements and tradeoffs associatedwith enabling computations across power cycles in TPCs.

A. Checkpointing and Wake-up Overhead

To enable computations across power cycles, the applicationneeds to store the program and processor states to non-volatile

storage before power is lost. In conventional checkpointingschemes, the checkpoint triggers are either periodic in nature orprogrammer-inserted at vantage locations in the program. While

checkpointing an application in this manner ensures a roll-back point, it impedes normal program execution and causesadditional overhead.

The first design choice that Q UICKRECALL makes is that,for transiently powered computers, only a drop in the supplyvoltage should trigger a checkpoint of the current system state.

Such a checkpointing scheme does not impede normal program

execution and only triggers a checkpoint if power loss isimminent. However, one should note that, in such a scheme,

it is imperative that checkpointing be successfully completedbefore power is lost. QUICKRECALL ensures this by choosing

an appropriate trigger voltage to interrupt the program andinitiate the checkpointing operation.

We define the system context to consist of program state,

processor state, and the state of configuration registers ofvarious peripheral subsystems. Each of the above-mentioned

state information has to be retained for a successful recalland resumption of computation across power cycles. Q UICK-RECALL introduces very little overhead to retain the state of

the TPC. The overhead introduced comprises of checkpointingoverhead and wake-up overhead. Checkpointing overhead isdefined as the time required to store the system state before

a power-loss. Wake-up overhead is defined as the time spentin restoring the system state on power-up. A discussion on theoverheads introduced and design choices for Q UICKRECALLfollows.

1) Retaining Program State: The program state consists of

the values of the global variables, stack, heap, bss, etc.,in use by the program. Conventionally, the linker maps thecode section to a non-volatile storage like flash, and the data,

bss, and stack sections to the volatile SRAM. Figure 1shows the proposed linker map of a microcontroller system that

331


3/6

Fig. 1. QUICKRECALL linker map

uses an NVM technology such as FRAM as unified memory.

The same non-volatile memory is partitioned by the linker toinclude all the sections. The non-volatile memory now actsas the conventional RAM as well as the ROM. As a result,

while the MCU powers off, the RAM data is saved in-situ.Similarly, while waking up, the program can pick up the datafrom exactly the same address locations. By using FRAM as the

RAM, QUICKRECALL is superior to previous checkpointingschemes as there is no time or energy overhead incurred toretain RAM data.

2) Retaining Processor State: Capturing the processor state

involves retaining the state of the microcontroller register filewhich includes the program counter (PC), stack pointer (SP),

status register (SR), and General Purpose Registers (GPRs).The number of GPRs in use depends on the program state.For the same program at different execution stages, variable

number ofGPRsmight be in use. A software approach to trackthe number of active GPRs would hamper the normal programexecution. Hence, QUICKRECALL saves the values of all the

registers onto FRAM during checkpointing. This step involvesdata transfer and introduces some checkpointing overhead.

3) Retaining Microcontroller and Peripheral Settings: Com-mon microcontroller applications use multiple peripherals to

gather data from sensors and to communicate with the externalworld. The microcontroller and peripheral settings that have

to be configured before execution include GPIO directions,GPIO functions, and clock properties. For transiently powered

computers, it is pertinent to restore the MCU and peripheralstate when waking up to resume correct program execution.

QUICKRECALLaddresses this problem by carefully structuringprograms used for transiently powered computers. Every timethe microcontroller boots up, the configuration registers are

re-initialized to their last known state. This step contributesto the wake-up overhead and the duration of the overhead isapplication and program dependent.

B. Software Flow

Writing applications aimed at resuming computations acrosspower cycles requires minor variations to the traditional embed-

ded programming style. Previous work has tried to address thisfor large-scale systems [7]. QUICKRECALLuses a similar flowalthough we design for scenarios where the system is severely

power-deprived. QUICKRECALL places two requirements onthe programmer in this regard. First, the programmer has touse the predefined QUICKRECALLglobal variables which store

the state. The memory addresses of the program symbolsreside in the ELF executable. Hence, the memory map fora particular program remains unchanged across reboots. The

extra variables required for data retention are allocated in the

bss as uninitialized global variables. The variables required

Fig. 2. QUICKRECALL Software Flow

for QUICKRECALL include a checkpoint flag, in addition tomemory required to store the GPRs, SR, SP, and PC. Second,the programmer has to specify the initialization routine in a

function which QUICKRECALL can use while recalling the

system state.

As shown in Figure 2, the QUICKRECALLsoftware flow has

two boot sequences upon powering up. Upon boot, Q UICKRE-

CALL verifies the checkpoint flag which is declared globally.

An unset flag indicates a normal boot sequence. The nor-mal boot sequence initiates a call to the main() function.The main() function begins by initializing the MCU and

peripherals, and then executes the program. While executingthe application program, the MCU is interrupted if the supplyvoltage goes below a preset trigger voltage. Explanation of

how we arrived at a Vtrig for an example platform is given inSection V-B. Upon entering the ISR, the program context getspushed onto the stack. QUICKRECALL proceeds with storing

the current SR, SP, and the GPRs in predefined variables. Notethat these registers now point to the ISR state. Q UICKRECALLthen proceeds to set the checkpoint flag and saves thePC. Thus,

the system is safe for a power loss and could recall this stateon the following boot. The ISR spends any remaining timein comparing the supply voltage to the trigger voltage. If the

supply voltage rises above the trigger voltage, a reverse context-switch takes place and the program continues till the supplyvoltage drops again. Alternatively, the microcontroller can lose

power and shut off with the entire system state saved for afuture recall.

332


4/6

On the next power up, a set checkpoint flag launches theQUICKRECALL boot sequence that recalls the system state. Itbegins by restoring the stack pointer following which, the MCU

and peripheral subsystems are re-initialized. The stack pointeris restored initially so that the re-initialization routine may usethe remainder of the stack without corrupting the checkpointed

portion of the stack. QUICKRECALLs boot sequence then stalls

execution till the supply voltage surpasses the trigger voltage.Note that, even though the peripherals have been initialized,if the MCU powers off before achieving the trigger voltage,the previous state remains intact as the ISR is not triggered.

Otherwise, all the registers are reinstated and the checkpointflag is cleared. QUICKRECALL resumes by re-entering the ISR(Figure 2). The ISR returns, the program context is popped

from the stack and the program continues execution obliviousto the power interruption.

QUICKRECALLsupports all normal programming paradigmsincluding dynamic memory allocation and nested interrupts.Dynamic memory allocation requires no additional perfor-

mance overhead as the heapis also retained in-situ in the non-volatile memory. The data in the heapis stored as a linked-liststructure in the FRAM. The memory allocation engine stores

the control variables used to keep track of free and allocatedheap segments in the bss. Since QUICKRECALL retains the

state of bss across power cycles, the heap and the memoryallocation engine work seamlessly across power cycles withoutpresenting any overhead. Enabling nested interrupts facilitates

the QUICKRECALL ISR to be triggered. Note that nestedinterrupts are not enabled for the QUICKRECALL interruptvector to perform checkpointing.

V. DESIGN I MPLEMENTATION

This section describes the implementation and experimentalsetup for QUICKRECALL.

A. Experimental setup

Fig. 3. Experimental Setup

Figure 3 shows our experimental setup. We use the Texas

Instruments, MSP-EXP430FR5739 Experimenters board [11]for implementing QUICKRECALL. The board is equipped withan MSP430FR5739 microcontroller that has 1KB of SRAMand 16KB of FRAM [4]. An Analog Devices comparator

(CMP401), is interfaced with the GPIO pins to provide adigital signal output after comparing a reference Vtrig to themicrocontrollers Vdd. To supply a variable Vdd, we used

a function generator and supplied a square wave at varyingfrequencies and duty cycles. The observed positive supply

voltage gradient for the function generator was 1000V/s.

We modified the linker to allocate the data, bss, stack,

and heap sections to the on-chip FRAM. Note that while thesystem reboots across power cycles, the global variables shouldnot be initialized again. Hence they are defined in the bsssection of the code. The initialization routine that configuresthe MCU and peripherals, like setting GPIO directions, clockfrequency,etc., are defined in a function (say foo()). foo()

is invoked in both the main() function and QUICKRECALLboot sequence. Lastly, we modified the boot sequence and theenvironment pre-initialization routines as shown in Figure 2 to

implement QUICKRECALL.

Fig. 4. Microcontroller State with Vdd

B. Determining Vtrig

The choice of a suitable Vtrig is crucial for Q UICKRECALLto avoid unwanted wait periods and incomplete checkpoints.

The MSP430FR5739 has a non-programmable internal SupplyVoltage Supervisor (SVS) that monitors the Vdd and regulatesthe voltage to the microcontroller core at a constant 1.5V.Figure 4 is a conceptual graph that shows the state of the

microcontroller with the change in Vdd. The comparator mon-itors the Vdd and its output proctors the program execution

window. In Figure 4, shaded region A denotes the regionwhere Vdd is less than the SVSon voltage. The internal SVSkeeps the microcontroller powered off in this region. B shows

the region where the microcontroller is powered on but theprogram execution is stalled. In this region, Vdd is below thepredefined Vtrig and the program waits as a supply voltage of

atleast Vtrig is necessary to guarantee data retention. RegionC denotes the window when the program executes. As soonas Vdd drops below Vtrig, t h e QUICKRECALL interrupt is

triggered and the microcontroller operation moves from regionC to D. In D, the program executes the ISR to save the

system state and any remaining time in this region is spend onmonitoring the supply voltage. The microcontroller is switched

off once Vdd drops below SVSoff.Vtrig has to be greater than both SVSoff and SVSon

since they dictate the microcontroller on-off states. For

the MSP430FR5739 microcontroller, the typical voltages forSVSoff and SVSon are 1.88V and 1.93V respectively. Theminimum voltage required for a safe FRAM operation is 2.0V[4]. The chosenVtrig has to guarantee correct FRAM operationfor the duration of checkpointing. The overhead of storinga checkpoint at a CPU frequency of 8MHz, measured using

an oscilloscope, is 8.18s. For our experimental setup, the

333


5/6

TABLE IPROGRAM E XECUTION T IME ( CPU F RE Q= 8MHZ)

P rog ram QUICKRECALL Overhead a Total Runtime

CRC 8.18s+ 1.854ms + 4.4s 551ms

RSA 8.18s+ 1.854ms + 4.4s 11.12s

SENSE 8.18s + 20ms + 4.4s 79msa Store Overhead + Initialization Overhead + Restore Overhead

rate of decay of Vdd observed once the power supply iscut-off, is e17.56t. Using capacitor discharge equations, wedetermine a Vtrig of 2.0003V for successfully implementingQUICKRECALL.

V I . EXPERIMENTAL R ESULTS

Next, we present our experimental results and compareQUICKRECALL with a state-of-the-art checkpointing solution.

A. Definitions

1) Computation Window: For our experimental setup, Com-

putation Window (CW) is defined as the time for which theMCU is in the ON state. This corresponds to regions B,C,and D in Figure 4. It is important to note that we use a square

wave as described in Section V-A.2) Slowdown: Slowdown is defined as the ratio of time taken

by the program to complete an execution across multiple powercycles to the time taken by the same code to complete executing

in a single run, without any loss in power. Mathematically, ifthe application takes n power cycles to complete its execution,and the duration of the ith power cycle is given by CWi,

Slowdown =

n

i=1

CWi

TotalRuntime (1)

Slowdown happens due to the overhead presented by check-pointing schemes to store and restore the system snapshot. Note

that in the above definition, the amount of time the MCUis in the OFF state does not contribute to the calculation ofslowdown.

3) Single Life Cycle: We define the execution of a program

in a single continuous run in the absence of power loss as asingle life cycle execution of the program.

B. Results

To evaluate QUICKRECALL, three test programs were used,namely CRC, RSA and SENSE. CRC calculates a 16-bit CRC

and a 32-bit CRC of a message using polynomials. RSAdoes a 64-bit encryption on 128 characters. The program thendecrypts the encrypted value and verifies correctness. SENSE

senses accelerometer data, processes it using a low pass filter,and then performs statistical computations such as finding theminimum, maximum, mean, and standard deviation of the

collected data. SENSE implements nested interrupts as wellas dynamic memory allocation on the heap 1.The overhead introduced by QUICKRECALLper power cycle

and the single life cycle execution time for each test programis given in Table I. QUICKRECALL overhead comprises of

checkpointing (storing) overhead and wake-up overhead. Wake-up overhead comprises of an initialization overhead and restor-ing overhead. Initialization overhead denotes the time spent

1Mementos, a checkpointing scheme for TPCs, does not support dynamicmemory allocation.

for waking up the embedded platform, stabilizing the voltageregulator and PLLs, and includes the overhead for recalling the

microcontroller and the peripheral state. The duration of theinitialization overhead is application and platform dependent.For example, SENSE has a longer wake-up overhead due to

the time required for the accelerometer to settle. Restoringoverhead is the time taken to restore the checkpointed data.

For QUICKRECALL, this refers to the time required to populatethe GPRs, SR, SP, and PC registers upon power up. Table-I shows that QUICKRECALL introduces constant overheadsfor storing and restoring operations for each power cycle.

Comparatively, for flash-based checkpointing, the data hasto be transferred to and from the SRAM and this overheaddepends on the stack depth, number of global variables, etc.

For example, storing 100 bytes of data in flash, adds a further8ms overhead to checkpointing. In contrast, QUICKRECALLemploys FRAM to implement in-situ checkpointing for the

stack, bss, etc., and thus adds zero overhead. Table-I showsthat the overhead related to data transfer is a constant 12.6sfor QUICKRECALL. This is an improvement of100x-1000xover conventional checkpointing schemes using flash2. Thus,QUICKRECALL maximizes the time utilized for meaningful

computation in each power cycle. The total runtime given inTable-I corresponds to the time taken by a program to completeone execution in a single life cycle.

Figure 5 compares the normalized runtime for each program.The baseline system (normalized value of 1) is the micro-

controller system, using unified FRAM memory, executing theprogram across a single computation window. Figure 5 showsthat the total execution time for Q UICKRECALL single life

cycle is the same even when SRAM is used as the data memory.

We implement a conventional checkpointing scheme (hence-forth referred to as Checkpoint), which uses trigger points forvoltage comparison. Since MSP430FR5739 does not have a

flash memory, we use computed values of flash erase andwrite latencies for the MSP430F2132 [9] employed in WISP,

which is used to evaluate Mementos [1]. We note that flashread/write timing characteristic is independent of the micro-controller architecture and depends only on the memory devicearchitecture. We assume zero overhead for reading the data

back from flash to SRAM. The flash architecture that weconsider contains 2 segments of 512 bytes each, which canbe used for checkpointing. The erase operation is performed

when the flash segment is exhausted. Using this data, we createapproximate versions of the loop-latch and function-returnmodes of Mementos for Checkpoint. As discussed in Section

II, checkpointing schemes introduce trigger points, which addoverhead to program execution. Choice between the loop-latchmode and function-return mode has a strong dependence on the

application program and its structure. For example, slowdownfor the same CRC program in function-return mode and loop-

latch mode were 1.1x and 18x respectively. Comparatively,QUICKRECALL does not add any overhead to normal programexecution irrespective of the program structure. QUICKRECALLavoids program re-execution by a simple choice of triggervoltage which guarantees that the ISR has enough power tosuccessfully complete checkpointing. The results in Figure

5 include a conventional checkpointing scheme with trigger

2When the flash is not being erased in a power cycle, the only overhead forconventional checkpointing schemes is the write operation.

334


6/6

CRC RSA SENSE0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Norm

alizedRuntime

SRAM Single Life cycleQuickRecall Single Life cycle

Checkpoint Single Life cycle

QuickRecall 50ms CWCheckpoint 50ms CW

Fig. 5. Execution Time Comparison

points inserted according to the program. Function-return mode

was used for CRC while loop-latch mode was implemented forboth RSA and SENSE.

The computation window was set to 50ms by feedinga square wave to the Vdd of the experimenter board from

a function generator. Figure 5 compares the slowdown of

QUICKRECALL with Checkpoint. Our results show that for alltest cases, the overhead of inserted trigger points in Checkpointsingle life cycle is more than a power cycled Q UICKRECALLimplementation with 50ms computation window.

QUICKRECALL has similar slowdowns for both CRC andRSA with 50ms computation window. This is due to thesame overhead per power cycle for both the programs asshown in Table-I. On the other hand, due to the variation

in stack depth for CRC and RSA at each checkpoint, theoverhead and thus the slowdown is significantly different forthe two programs for Checkpoint. SENSE has a larger slow-

down for both QUICKRECALL and Checkpoint. This is dueto the overhead presented during wake-up by the initializationroutine. The larger overhead consumes a significant portion of

the computation window. Therefore, more number of power-

cycles are required to complete program execution. For flash-based systems, more power cycles mean more checkpointing

operations and hence, more erase operations depending uponthe checkpoint data size.

Figure 6 compares the slowdown of QUICKRECALL with

Checkpointwhen executing RSA. The computation window is

varied by using the function generator. A duty cycle of 8% ismaintained throughout the experiment. Predictably, Q UICKRE-

CALL does not slowdown the program as much as Checkpointand is almost 1 for larger computation windows. Additionallyas Figure 6 shows, due to the large overhead incurred for

Checkpoint, it cannot guarantee correct operation without re-

executions for computation windows less than 15ms which isthe minimum time required for an erase and write operation.On the other hand, QUICKRECALL works for computation

windows as small as5mswithout re-executions. The extremelylow overhead of QUICKRECALLgives a 3x improvement in the

computation window size for which the program can execute.This is a major step in enabling TPCs to perform computationsin power-deficient conditions.

VI I . CONCLUSION

In this work, we have successfully implemented and demon-strated QUICKRECALL, a scheme which minimizes the check-

pointing overhead by 100x-1000x in each power cycle and

Computation Window (ms)

Slowdown(x)

0 10 20 30 40 50 60 70 80 90 100

0

1

2

4

8

12

14Region where conventionalCheckpointing does not workQuickRecall Slowdown

Checkpointing Slowdown

Fig. 6. RSA Slowdown with QuickRecall Single Lifecycle as Baseline

which completes a complex computation across power cycles

without re-execution at any stage. Our work enables transientlypowered computers to do computations even when they receive

power for periods as low as 5ms.

ACKNOWLEDGMENT

This work was supported in part by the National Sci-ence Foundation (NSF) under grants CNS-0953468 and CCF-

1018358. The opinions expressed here represent those of theauthors and not necessarily of NSF.

REFERENCES

[1] B. Ransford, J. Sorber, and K. Fu, Mementos: system supportfor long-running computation on rfid-scale devices, SIGPLAN Not.,vol. 46, no. 3, pp. 159170, Mar. 2011. [Online]. Available:http://doi.acm.org/10.1145/1961296.1950386

[2] B. Ransford, S. Clark, M. Salajegheh, and K. Fu, Getting thingsdone on computational rfids with energy-aware checkpointing andvoltage-aware scheduling, in Proceedings of the 2008 conference onPower aware computing and systems, ser. HotPower08. Berkeley,CA, USA: USENIX Association, 2008, pp. 55. [Online]. Available:http://dl.acm.org/citation.cfm?id=1855610.1855615

[3] Y. Yang, L. Wang, D. K. Noh, H. K. Le, and T. F. Abdelzaher,Solarstore: enhancing data reliability in solar-powered storage-centricsensor networks, in Proceedings of the 7th international conferenceon Mobile systems, applications, and services, ser. MobiSys 09.New York, NY, USA: ACM, 2009, pp. 333346. [Online]. Available:http://doi.acm.org/10.1145/1555816.1555850

[4] Msp430fr573x datasheet, Texas Instruments, April 2013. [Online].Available: http://www.ti.com/lit/ds/symlink/msp430fr5739.pdf

[5] J. S. Plank, M. Beck, G. Kingsley, and K. Li, Libckpt: transparentcheckpointing under unix, in Proceedings of the USENIX 1995Technical Conference Proceedings, ser. TCON95. Berkeley, CA,USA: USENIX Association, 1995, pp. 1818. [Online]. Available:http://dl.acm.org/citation.cfm?id=1267411.1267429

[6] J. S. Plank, An overview of checkpointing in uniprocessor and distribut-edsystems, focusing on implementation and performance, Knoxville, TN,USA, Tech. Rep., 1997.

[7] S. Baek, J. Choi, D. Lee, and S. H. Noh, Energy-efficient and high-performance software architecture for storage class memory,ACM Trans.

Embed. Comput. Syst., vol. 12, no. 3, pp. 81:181:22, Apr. 2013. [Online].Available: http://doi.acm.org/http://dx.doi.org/10.1145/2442116.2442131

[8] A. Mirhoseini, E. Songhori, and F. Koushanfar, Idetic: A high-level syn-thesis approach for enabling long computations on transiently-poweredasics, in Pervasive Computing and Communications (PerCom), 2013

IEEE International Conference on, 2013, pp. 216224.

[9] Msp430f21x2 datasheet slas578j, Texas Instruments, January 2012.[Online]. Available: http://www.ti.com/lit/ds/symlink/msp430f2132.pdf

[10] G. R. Fox, F. Chu, and T. Davenport, Current and future ferroelectricnonvolatile memory technology, Journal of Vacuum Science and Tech-nology B, vol. 19, no. 5, 2001.

[11] Msp-exp430fr5739 fram experimenter board user guide,Texas Instruments, January 20 13 . [Online]. Availab le:http://www.ti.com/lit/ug/slau343b/slau343b.pdf

335

quickrecall: a low overhead hw/sw approach for enabling computations across power cycles in...

Documents