quickrecall: a low overhead hw/sw approach for enabling computations across power cycles in...

Upload: joao-victor

Post on 13-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently P

    1/6

    QUICKRECALL: A Low Overhead HW/SWApproach for Enabling Computations across Power

    Cycles in Transiently Powered Computers

    Hrishikesh Jayakumar, Arnab Raha, and Vijay RaghunathanSchool of Electrical and Computer Engineering, Purdue University

    Email: {hjayakum, araha, vr}@purdue.edu

    AbstractTransiently Powered Computers (TPCs) are a newclass of batteryless embedded systems that depend solely on energyharvested from external sources for performing computations.Enabling long-running computations on TPCs is a major challengedue to the highly intermittent nature of the power supply (oftenbursts of< 100ms), resulting in frequent system reboots. Priorwork seeks to address this issue by frequently checkpointingsystem state in flash memory, preserving it across power cycles.However, this involves a substantial overhead due to the high

    erase/write times of flash memory. This paper proposes the useof FRAM, an emerging non-volatile memory technology thatcombines the benefits of SRAM and flash, to seamlessly enablelong-running computations in TPCs. We propose a lightweight,in-situ checkpointing technique for TPCs using FRAM thatdecreases the time taken for saving and restoring a checkpointto only 12.6s, which is over two orders of magnitude lowerthan the corresponding overhead using flash. We have imple-mented and evaluated our technique, QUICKRECALL, using theTI MSP430FR5739 FRAM-enabled microcontroller. Experimentalresults show that our highly-efficient checkpointing translates toa significant speedup (1.4x - 4.5x) in program execution time.

    I. INTRODUCTION

    Transiently powered computers (TPCs) [1] represent a newclass of ultra-low power embedded computing platforms that

    are batteryless and rely solely on external power sources fortheir energy supply. Examples of such TPCs include computa-

    tional RFID tags [2], batteryless sensors [3], etc. Successfullyperforming computations on TPCs is a major challenge due tothe unpredictable and highly intermittent nature of the power

    supply. For example, a TPC may receive power in smallintermittent bursts (often less than 100ms), far lower than thetime required to execute most programs.

    Existing techniques to address this challenge are based onthe idea of frequent checkpointingof system state. When power

    loss is imminent, a snapshot (checkpoint) of system state(e.g., processor registers, contents of SRAM) is stored to flashmemory, which is non-volatile. During the next burst of power,

    the system reboots, restores state from the stored checkpoint,and resumes program execution. Thus, long-running programs

    execute gradually, in small increments, as and when powerbecomes available. However, checkpointing to flash involves asignificant time and energy overhead due to the high erase/writetimes of flash memory (tens of ms). As a result, a big portion

    of the time when a TPC receives power (henceforth referredto as the ON time) is spent performing checkpointing, whichlimits the amount of time available for program execution.

    More importantly, if the ON time is less than the time requiredfor storing and retrieving checkpoints, the TPC can neversuccessfully complete program execution.

    Recent advances in semiconductor technology have resulted

    in new forms of memory technologies such as Ferroelectric

    RAM (FRAM), Magnetoresistive RAM (MRAM), etc., thatcombine the speed, flexibility, and endurance of SRAM with

    the non-volatility of flash, all at a very low power consumption.This has led to the possibility of unified memory where thesame type of memory technology is used as RAM and for non-

    volatile program and data storage. Low power microcontrollerswith integrated FRAM are already commercially available. For

    example, the TI MSP430FR5739 has 16KB of FRAM that canbe used as unified memory [4]. This paper makes a case for(and demonstrates the benefits of) using such emerging non-volatile memories in TPCs. Specifically, this paper makes the

    following contributions:

    To the best of our knowledge, this is the first work

    to investigate the use of emerging non-volatile memorytechnologies (specifically FRAM) in TPCs to seamlesslyenable long-running computations in the presence of fre-

    quent power interruptions.

    We propose a lightweight, in-situ checkpointing tech-

    nique, called QUICKRECALL, for TPCs that use FRAM.QUICKRECALL can save and restore a checkpoint in just

    12.6s, which is over two orders of magnitude lower

    than the corresponding overhead using flash memory. We have implemented QUICKRECALL using a TI

    MSP430FR5739 microcontroller and evaluated it using

    three typical embedded application programs. Experimen-tal results show that the highly efficient checkpointing inQUICKRECALLresults in a significant reduction (as much

    as 4.5x) in program execution time, compared to a state-of-the-art flash-based checkpointing technique.

    QUICKRECALL enables TPCs to perform computations

    when ON times are as small as 5ms, as compared toprevious flash-based checkpointing methods which requirea minimum of 15ms ON time.

    The remainder of this paper is organized as follows. SectionII describes related work. Section III makes the case for

    using FRAM as unified memory in TPCs. Section IV presents

    the design requirements (and tradeoffs) for enabling efficientcheckpointing in TPCs that use FRAM. Section V describesour implementation of QUICKRECALL. Section VI presents ourexperimental results and Section VII concludes the paper.

    I I . RELATEDW OR K

    Checkpointing schemes have long been used for fault tol-

    erance in large-scale distributed systems. Checkpointing, per-formed at previously determined trigger points in the program,stores a snapshot of system state in non-volatile memory. In

    case of a fault, the system rolls back to the most recent

    2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems

    1063-9667/14 $31.00 2014 IEEE

    DOI 10.1109/VLSID.2014.63

    330

  • 7/26/2019 QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently P

    2/6

    checkpoint and continues execution [5], [6]. Trigger points

    are usually periodic in nature or programmer-inserted. Whilecheckpointing an application in this manner ensures a roll-

    back point, it impedes normal program execution and causesadditional overhead. A checkpointing scheme for large-scalesystems using FRAM was previously explored in [7]. However,

    the underlying design goals and targeted systems are very

    different from those considered in this paper.Mementos [1] is a checkpointing solution aimed at TPCs.

    It instruments user-written code at compile time with triggerpoints, which compare the supply voltage with a threshold and

    trigger a checkpoint if the supply voltage is less than the thresh-old. Trigger points are inserted at the end of each iteration of aloop or after a function return statement. Mementos addresses

    the extra overhead by using a timer to periodically enabletrigger points. Typical checkpointing methods, like Memen-

    tos, use flash memory for storing checkpoints. Flash memoryerases/writes are cumbersome due to the large performance andenergy overhead they present. The overhead for checkpointing

    the same program differs due to the variable stack depth at eachtrigger point. In contrast, by using an emerging non-volatilememory (NVM), QUICKRECALLsidesteps the data transfer la-

    tency and maximizes the time available for computation in eachpower cycle. Finally, Idetic [8] targets ASIC implementationsof applications and embeds checkpoints during the behavioralsynthesis process. In contrast, QUICKRECALL enables and

    maximizes the execution time of application in off-the-shelfmicrocontrollers utilizing an emerging NVM.

    I I I . MOTIVATION

    Recent years have witnessed the emergence of non-volatile

    memories (NVMs) such as Ferroelectric RAM (FRAM) andMagnetoresistive RAM (MRAM). In addition to being non-volatile, these memories have distinct advantages over flash in

    terms of power consumption, performance, endurance, etc.

    A. Ferroelectric RAM: A candidate for unified memory

    The significant overhead in performance and energy of flash,

    due to its inherent device limitations, is the primary motivatorfor employing FRAM in embedded systems. Flash memory bit-cells can only be written from logic 1 to logic 0. Writing a logic

    1 to a cell that was previously set to logic 0 requires the flashbitcell to be erased first. Depending upon the flash memorysize and architecture, the smallest memory unit for erasure can

    vary. As an example, for the MSP430F2132 microcontrollerused in the BlueWISP RFID platform, the smallest erasableunit is a segment of size 512 bytes and erasing it takes 10msto 18ms. Moreover, an erase operation requires higher voltageand, therefore, is energy expensive [9]. An FRAM memorycell, is DRAM-like in structure and uses the polarization on a

    ferroelectric capacitor to distinguish between the logic states

    [10]. Thus, FRAM is random-access for reads and writes andrequires no erase operations. Even though FRAM involves a

    destructive read, the write-back is hidden and instantaneous,thereby presenting almost no latency overhead to the system.

    Consequently, while flash memories present asymmetric read-write latencies, FRAM access latencies are symmetric. Anotherlimitation of flash memory is the limited endurance that it has.

    While the endurance limit for flash memory is around 105

    erase/write cycles, FRAM devices have an endurance almost10 orders of magnitude greater than flash [7].

    B. Unified Memory for TPCs

    Converting any embedded program into an executable binary

    involves the steps of compiling, assembling and linking. Theassembler creates object files for each source file of theprogram. The different object files contain program information

    in contiguous memory locations called sections. The linker istasked with combining sections, across multiple object files,

    into a single executable file as well as mapping these sectionsinto memory. Conventionally, the linker allocates the uninitial-ized sections onto the RAM for run-time initialization, whereasthe global/static variables that are initialized and the program

    code reside on the ROM. For example, in the MSP430 mi-crocontroller, the bss, data, sysmem (heap), and stacksections reside in the RAM while all the other sections are

    allocated to the ROM. While the previous subsection estab-lished the advantages of FRAM over flash, its random access

    and write-in-place properties also allow FRAM to be utilized asRAM, thus enabling it to serve as a unified memory technology.

    IV. DESIGNM ETHODOLOGY

    Next, we discuss the requirements and tradeoffs associatedwith enabling computations across power cycles in TPCs.

    A. Checkpointing and Wake-up Overhead

    To enable computations across power cycles, the applicationneeds to store the program and processor states to non-volatile

    storage before power is lost. In conventional checkpointingschemes, the checkpoint triggers are either periodic in nature orprogrammer-inserted at vantage locations in the program. While

    checkpointing an application in this manner ensures a roll-back point, it impedes normal program execution and causesadditional overhead.

    The first design choice that Q UICKRECALL makes is that,for transiently powered computers, only a drop in the supplyvoltage should trigger a checkpoint of the current system state.

    Such a checkpointing scheme does not impede normal program

    execution and only triggers a checkpoint if power loss isimminent. However, one should note that, in such a scheme,

    it is imperative that checkpointing be successfully completedbefore power is lost. QUICKRECALL ensures this by choosing

    an appropriate trigger voltage to interrupt the program andinitiate the checkpointing operation.

    We define the system context to consist of program state,

    processor state, and the state of configuration registers ofvarious peripheral subsystems. Each of the above-mentioned

    state information has to be retained for a successful recalland resumption of computation across power cycles. Q UICK-RECALL introduces very little overhead to retain the state of

    the TPC. The overhead introduced comprises of checkpointingoverhead and wake-up overhead. Checkpointing overhead isdefined as the time required to store the system state before

    a power-loss. Wake-up overhead is defined as the time spentin restoring the system state on power-up. A discussion on theoverheads introduced and design choices for Q UICKRECALLfollows.

    1) Retaining Program State: The program state consists of

    the values of the global variables, stack, heap, bss, etc.,in use by the program. Conventionally, the linker maps thecode section to a non-volatile storage like flash, and the data,

    bss, and stack sections to the volatile SRAM. Figure 1shows the proposed linker map of a microcontroller system that

    331

  • 7/26/2019 QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently P

    3/6

    Fig. 1. QUICKRECALL linker map

    uses an NVM technology such as FRAM as unified memory.

    The same non-volatile memory is partitioned by the linker toinclude all the sections. The non-volatile memory now actsas the conventional RAM as well as the ROM. As a result,

    while the MCU powers off, the RAM data is saved in-situ.Similarly, while waking up, the program can pick up the datafrom exactly the same address locations. By using FRAM as the

    RAM, QUICKRECALL is superior to previous checkpointingschemes as there is no time or energy overhead incurred toretain RAM data.

    2) Retaining Processor State: Capturing the processor state

    involves retaining the state of the microcontroller register filewhich includes the program counter (PC), stack pointer (SP),

    status register (SR), and General Purpose Registers (GPRs).The number of GPRs in use depends on the program state.For the same program at different execution stages, variable

    number ofGPRsmight be in use. A software approach to trackthe number of active GPRs would hamper the normal programexecution. Hence, QUICKRECALL saves the values of all the

    registers onto FRAM during checkpointing. This step involvesdata transfer and introduces some checkpointing overhead.

    3) Retaining Microcontroller and Peripheral Settings: Com-mon microcontroller applications use multiple peripherals to

    gather data from sensors and to communicate with the externalworld. The microcontroller and peripheral settings that have

    to be configured before execution include GPIO directions,GPIO functions, and clock properties. For transiently powered

    computers, it is pertinent to restore the MCU and peripheralstate when waking up to resume correct program execution.

    QUICKRECALLaddresses this problem by carefully structuringprograms used for transiently powered computers. Every timethe microcontroller boots up, the configuration registers are

    re-initialized to their last known state. This step contributesto the wake-up overhead and the duration of the overhead isapplication and program dependent.

    B. Software Flow

    Writing applications aimed at resuming computations acrosspower cycles requires minor variations to the traditional embed-

    ded programming style. Previous work has tried to address thisfor large-scale systems [7]. QUICKRECALLuses a similar flowalthough we design for scenarios where the system is severely

    power-deprived. QUICKRECALL places two requirements onthe programmer in this regard. First, the programmer has touse the predefined QUICKRECALLglobal variables which store

    the state. The memory addresses of the program symbolsreside in the ELF executable. Hence, the memory map fora particular program remains unchanged across reboots. The

    extra variables required for data retention are allocated in the

    bss as uninitialized global variables. The variables required

    Fig. 2. QUICKRECALL Software Flow

    for QUICKRECALL include a checkpoint flag, in addition tomemory required to store the GPRs, SR, SP, and PC. Second,the programmer has to specify the initialization routine in a

    function which QUICKRECALL can use while recalling the

    system state.

    As shown in Figure 2, the QUICKRECALLsoftware flow has

    two boot sequences upon powering up. Upon boot, Q UICKRE-

    CALL verifies the checkpoint flag which is declared globally.

    An unset flag indicates a normal boot sequence. The nor-mal boot sequence initiates a call to the main() function.The main() function begins by initializing the MCU and

    peripherals, and then executes the program. While executingthe application program, the MCU is interrupted if the supplyvoltage goes below a preset trigger voltage. Explanation of

    how we arrived at a Vtrig for an example platform is given inSection V-B. Upon entering the ISR, the program context getspushed onto the stack. QUICKRECALL proceeds with storing

    the current SR, SP, and the GPRs in predefined variables. Notethat these registers now point to the ISR state. Q UICKRECALLthen proceeds to set the checkpoint flag and saves thePC. Thus,

    the system is safe for a power loss and could recall this stateon the following boot. The ISR spends any remaining timein comparing the supply voltage to the trigger voltage. If the

    supply voltage rises above the trigger voltage, a reverse context-switch takes place and the program continues till the supplyvoltage drops again. Alternatively, the microcontroller can lose

    power and shut off with the entire system state saved for afuture recall.

    332

  • 7/26/2019 QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently P

    4/6

    On the next power up, a set checkpoint flag launches theQUICKRECALL boot sequence that recalls the system state. Itbegins by restoring the stack pointer following which, the MCU

    and peripheral subsystems are re-initialized. The stack pointeris restored initially so that the re-initialization routine may usethe remainder of the stack without corrupting the checkpointed

    portion of the stack. QUICKRECALLs boot sequence then stalls

    execution till the supply voltage surpasses the trigger voltage.Note that, even though the peripherals have been initialized,if the MCU powers off before achieving the trigger voltage,the previous state remains intact as the ISR is not triggered.

    Otherwise, all the registers are reinstated and the checkpointflag is cleared. QUICKRECALL resumes by re-entering the ISR(Figure 2). The ISR returns, the program context is popped

    from the stack and the program continues execution obliviousto the power interruption.

    QUICKRECALLsupports all normal programming paradigmsincluding dynamic memory allocation and nested interrupts.Dynamic memory allocation requires no additional perfor-

    mance overhead as the heapis also retained in-situ in the non-volatile memory. The data in the heapis stored as a linked-liststructure in the FRAM. The memory allocation engine stores

    the control variables used to keep track of free and allocatedheap segments in the bss. Since QUICKRECALL retains the

    state of bss across power cycles, the heap and the memoryallocation engine work seamlessly across power cycles withoutpresenting any overhead. Enabling nested interrupts facilitates

    the QUICKRECALL ISR to be triggered. Note that nestedinterrupts are not enabled for the QUICKRECALL interruptvector to perform checkpointing.

    V. DESIGN I MPLEMENTATION

    This section describes the implementation and experimentalsetup for QUICKRECALL.

    A. Experimental setup

    Fig. 3. Experimental Setup

    Figure 3 shows our experimental setup. We use the Texas

    Instruments, MSP-EXP430FR5739 Experimenters board [11]for implementing QUICKRECALL. The board is equipped withan MSP430FR5739 microcontroller that has 1KB of SRAMand 16KB of FRAM [4]. An Analog Devices comparator

    (CMP401), is interfaced with the GPIO pins to provide adigital signal output after comparing a reference Vtrig to themicrocontrollers Vdd. To supply a variable Vdd, we used

    a function generator and supplied a square wave at varyingfrequencies and duty cycles. The observed positive supply

    voltage gradient for the function generator was 1000V/s.

    We modified the linker to allocate the data, bss, stack,

    and heap sections to the on-chip FRAM. Note that while thesystem reboots across power cycles, the global variables shouldnot be initialized again. Hence they are defined in the bsssection of the code. The initialization routine that configuresthe MCU and peripherals, like setting GPIO directions, clockfrequency,etc., are defined in a function (say foo()). foo()

    is invoked in both the main() function and QUICKRECALLboot sequence. Lastly, we modified the boot sequence and theenvironment pre-initialization routines as shown in Figure 2 to

    implement QUICKRECALL.

    Fig. 4. Microcontroller State with Vdd

    B. Determining Vtrig

    The choice of a suitable Vtrig is crucial for Q UICKRECALLto avoid unwanted wait periods and incomplete checkpoints.

    The MSP430FR5739 has a non-programmable internal SupplyVoltage Supervisor (SVS) that monitors the Vdd and regulatesthe voltage to the microcontroller core at a constant 1.5V.Figure 4 is a conceptual graph that shows the state of the

    microcontroller with the change in Vdd. The comparator mon-itors the Vdd and its output proctors the program execution

    window. In Figure 4, shaded region A denotes the regionwhere Vdd is less than the SVSon voltage. The internal SVSkeeps the microcontroller powered off in this region. B shows

    the region where the microcontroller is powered on but theprogram execution is stalled. In this region, Vdd is below thepredefined Vtrig and the program waits as a supply voltage of

    atleast Vtrig is necessary to guarantee data retention. RegionC denotes the window when the program executes. As soonas Vdd drops below Vtrig, t h e QUICKRECALL interrupt is

    triggered and the microcontroller operation moves from regionC to D. In D, the program executes the ISR to save the

    system state and any remaining time in this region is spend onmonitoring the supply voltage. The microcontroller is switched

    off once Vdd drops below SVSoff.Vtrig has to be greater than both SVSoff and SVSon

    since they dictate the microcontroller on-off states. For

    the MSP430FR5739 microcontroller, the typical voltages forSVSoff and SVSon are 1.88V and 1.93V respectively. Theminimum voltage required for a safe FRAM operation is 2.0V[4]. The chosenVtrig has to guarantee correct FRAM operationfor the duration of checkpointing. The overhead of storinga checkpoint at a CPU frequency of 8MHz, measured using

    an oscilloscope, is 8.18s. For our experimental setup, the

    333

  • 7/26/2019 QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently P

    5/6

    TABLE IPROGRAM E XECUTION T IME ( CPU F RE Q= 8MHZ)

    P rog ram QUICKRECALL Overhead a Total Runtime

    CRC 8.18s+ 1.854ms + 4.4s 551ms

    RSA 8.18s+ 1.854ms + 4.4s 11.12s

    SENSE 8.18s + 20ms + 4.4s 79msa Store Overhead + Initialization Overhead + Restore Overhead

    rate of decay of Vdd observed once the power supply iscut-off, is e17.56t. Using capacitor discharge equations, wedetermine a Vtrig of 2.0003V for successfully implementingQUICKRECALL.

    V I . EXPERIMENTAL R ESULTS

    Next, we present our experimental results and compareQUICKRECALL with a state-of-the-art checkpointing solution.

    A. Definitions

    1) Computation Window: For our experimental setup, Com-

    putation Window (CW) is defined as the time for which theMCU is in the ON state. This corresponds to regions B,C,and D in Figure 4. It is important to note that we use a square

    wave as described in Section V-A.2) Slowdown: Slowdown is defined as the ratio of time taken

    by the program to complete an execution across multiple powercycles to the time taken by the same code to complete executing

    in a single run, without any loss in power. Mathematically, ifthe application takes n power cycles to complete its execution,and the duration of the ith power cycle is given by CWi,

    Slowdown =

    n

    i=1

    CWi

    TotalRuntime (1)

    Slowdown happens due to the overhead presented by check-pointing schemes to store and restore the system snapshot. Note

    that in the above definition, the amount of time the MCUis in the OFF state does not contribute to the calculation ofslowdown.

    3) Single Life Cycle: We define the execution of a program

    in a single continuous run in the absence of power loss as asingle life cycle execution of the program.

    B. Results

    To evaluate QUICKRECALL, three test programs were used,namely CRC, RSA and SENSE. CRC calculates a 16-bit CRC

    and a 32-bit CRC of a message using polynomials. RSAdoes a 64-bit encryption on 128 characters. The program thendecrypts the encrypted value and verifies correctness. SENSE

    senses accelerometer data, processes it using a low pass filter,and then performs statistical computations such as finding theminimum, maximum, mean, and standard deviation of the

    collected data. SENSE implements nested interrupts as wellas dynamic memory allocation on the heap 1.The overhead introduced by QUICKRECALLper power cycle

    and the single life cycle execution time for each test programis given in Table I. QUICKRECALL overhead comprises of

    checkpointing (storing) overhead and wake-up overhead. Wake-up overhead comprises of an initialization overhead and restor-ing overhead. Initialization overhead denotes the time spent

    1Mementos, a checkpointing scheme for TPCs, does not support dynamicmemory allocation.

    for waking up the embedded platform, stabilizing the voltageregulator and PLLs, and includes the overhead for recalling the

    microcontroller and the peripheral state. The duration of theinitialization overhead is application and platform dependent.For example, SENSE has a longer wake-up overhead due to

    the time required for the accelerometer to settle. Restoringoverhead is the time taken to restore the checkpointed data.

    For QUICKRECALL, this refers to the time required to populatethe GPRs, SR, SP, and PC registers upon power up. Table-I shows that QUICKRECALL introduces constant overheadsfor storing and restoring operations for each power cycle.

    Comparatively, for flash-based checkpointing, the data hasto be transferred to and from the SRAM and this overheaddepends on the stack depth, number of global variables, etc.

    For example, storing 100 bytes of data in flash, adds a further8ms overhead to checkpointing. In contrast, QUICKRECALLemploys FRAM to implement in-situ checkpointing for the

    stack, bss, etc., and thus adds zero overhead. Table-I showsthat the overhead related to data transfer is a constant 12.6sfor QUICKRECALL. This is an improvement of100x-1000xover conventional checkpointing schemes using flash2. Thus,QUICKRECALL maximizes the time utilized for meaningful

    computation in each power cycle. The total runtime given inTable-I corresponds to the time taken by a program to completeone execution in a single life cycle.

    Figure 5 compares the normalized runtime for each program.The baseline system (normalized value of 1) is the micro-

    controller system, using unified FRAM memory, executing theprogram across a single computation window. Figure 5 showsthat the total execution time for Q UICKRECALL single life

    cycle is the same even when SRAM is used as the data memory.

    We implement a conventional checkpointing scheme (hence-forth referred to as Checkpoint), which uses trigger points forvoltage comparison. Since MSP430FR5739 does not have a

    flash memory, we use computed values of flash erase andwrite latencies for the MSP430F2132 [9] employed in WISP,

    which is used to evaluate Mementos [1]. We note that flashread/write timing characteristic is independent of the micro-controller architecture and depends only on the memory devicearchitecture. We assume zero overhead for reading the data

    back from flash to SRAM. The flash architecture that weconsider contains 2 segments of 512 bytes each, which canbe used for checkpointing. The erase operation is performed

    when the flash segment is exhausted. Using this data, we createapproximate versions of the loop-latch and function-returnmodes of Mementos for Checkpoint. As discussed in Section

    II, checkpointing schemes introduce trigger points, which addoverhead to program execution. Choice between the loop-latchmode and function-return mode has a strong dependence on the

    application program and its structure. For example, slowdownfor the same CRC program in function-return mode and loop-

    latch mode were 1.1x and 18x respectively. Comparatively,QUICKRECALL does not add any overhead to normal programexecution irrespective of the program structure. QUICKRECALLavoids program re-execution by a simple choice of triggervoltage which guarantees that the ISR has enough power tosuccessfully complete checkpointing. The results in Figure

    5 include a conventional checkpointing scheme with trigger

    2When the flash is not being erased in a power cycle, the only overhead forconventional checkpointing schemes is the write operation.

    334

  • 7/26/2019 QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently P

    6/6

    CRC RSA SENSE0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    Norm

    alizedRuntime

    SRAM Single Life cycleQuickRecall Single Life cycle

    Checkpoint Single Life cycle

    QuickRecall 50ms CWCheckpoint 50ms CW

    Fig. 5. Execution Time Comparison

    points inserted according to the program. Function-return mode

    was used for CRC while loop-latch mode was implemented forboth RSA and SENSE.

    The computation window was set to 50ms by feedinga square wave to the Vdd of the experimenter board from

    a function generator. Figure 5 compares the slowdown of

    QUICKRECALL with Checkpoint. Our results show that for alltest cases, the overhead of inserted trigger points in Checkpointsingle life cycle is more than a power cycled Q UICKRECALLimplementation with 50ms computation window.

    QUICKRECALL has similar slowdowns for both CRC andRSA with 50ms computation window. This is due to thesame overhead per power cycle for both the programs asshown in Table-I. On the other hand, due to the variation

    in stack depth for CRC and RSA at each checkpoint, theoverhead and thus the slowdown is significantly different forthe two programs for Checkpoint. SENSE has a larger slow-

    down for both QUICKRECALL and Checkpoint. This is dueto the overhead presented during wake-up by the initializationroutine. The larger overhead consumes a significant portion of

    the computation window. Therefore, more number of power-

    cycles are required to complete program execution. For flash-based systems, more power cycles mean more checkpointing

    operations and hence, more erase operations depending uponthe checkpoint data size.

    Figure 6 compares the slowdown of QUICKRECALL with

    Checkpointwhen executing RSA. The computation window is

    varied by using the function generator. A duty cycle of 8% ismaintained throughout the experiment. Predictably, Q UICKRE-

    CALL does not slowdown the program as much as Checkpointand is almost 1 for larger computation windows. Additionallyas Figure 6 shows, due to the large overhead incurred for

    Checkpoint, it cannot guarantee correct operation without re-

    executions for computation windows less than 15ms which isthe minimum time required for an erase and write operation.On the other hand, QUICKRECALL works for computation

    windows as small as5mswithout re-executions. The extremelylow overhead of QUICKRECALLgives a 3x improvement in the

    computation window size for which the program can execute.This is a major step in enabling TPCs to perform computationsin power-deficient conditions.

    VI I . CONCLUSION

    In this work, we have successfully implemented and demon-strated QUICKRECALL, a scheme which minimizes the check-

    pointing overhead by 100x-1000x in each power cycle and

    Computation Window (ms)

    Slowdown(x)

    0 10 20 30 40 50 60 70 80 90 100

    0

    1

    2

    4

    8

    12

    14Region where conventionalCheckpointing does not workQuickRecall Slowdown

    Checkpointing Slowdown

    Fig. 6. RSA Slowdown with QuickRecall Single Lifecycle as Baseline

    which completes a complex computation across power cycles

    without re-execution at any stage. Our work enables transientlypowered computers to do computations even when they receive

    power for periods as low as 5ms.

    ACKNOWLEDGMENT

    This work was supported in part by the National Sci-ence Foundation (NSF) under grants CNS-0953468 and CCF-

    1018358. The opinions expressed here represent those of theauthors and not necessarily of NSF.

    REFERENCES

    [1] B. Ransford, J. Sorber, and K. Fu, Mementos: system supportfor long-running computation on rfid-scale devices, SIGPLAN Not.,vol. 46, no. 3, pp. 159170, Mar. 2011. [Online]. Available:http://doi.acm.org/10.1145/1961296.1950386

    [2] B. Ransford, S. Clark, M. Salajegheh, and K. Fu, Getting thingsdone on computational rfids with energy-aware checkpointing andvoltage-aware scheduling, in Proceedings of the 2008 conference onPower aware computing and systems, ser. HotPower08. Berkeley,CA, USA: USENIX Association, 2008, pp. 55. [Online]. Available:http://dl.acm.org/citation.cfm?id=1855610.1855615

    [3] Y. Yang, L. Wang, D. K. Noh, H. K. Le, and T. F. Abdelzaher,Solarstore: enhancing data reliability in solar-powered storage-centricsensor networks, in Proceedings of the 7th international conferenceon Mobile systems, applications, and services, ser. MobiSys 09.New York, NY, USA: ACM, 2009, pp. 333346. [Online]. Available:http://doi.acm.org/10.1145/1555816.1555850

    [4] Msp430fr573x datasheet, Texas Instruments, April 2013. [Online].Available: http://www.ti.com/lit/ds/symlink/msp430fr5739.pdf

    [5] J. S. Plank, M. Beck, G. Kingsley, and K. Li, Libckpt: transparentcheckpointing under unix, in Proceedings of the USENIX 1995Technical Conference Proceedings, ser. TCON95. Berkeley, CA,USA: USENIX Association, 1995, pp. 1818. [Online]. Available:http://dl.acm.org/citation.cfm?id=1267411.1267429

    [6] J. S. Plank, An overview of checkpointing in uniprocessor and distribut-edsystems, focusing on implementation and performance, Knoxville, TN,USA, Tech. Rep., 1997.

    [7] S. Baek, J. Choi, D. Lee, and S. H. Noh, Energy-efficient and high-performance software architecture for storage class memory,ACM Trans.

    Embed. Comput. Syst., vol. 12, no. 3, pp. 81:181:22, Apr. 2013. [Online].Available: http://doi.acm.org/http://dx.doi.org/10.1145/2442116.2442131

    [8] A. Mirhoseini, E. Songhori, and F. Koushanfar, Idetic: A high-level syn-thesis approach for enabling long computations on transiently-poweredasics, in Pervasive Computing and Communications (PerCom), 2013

    IEEE International Conference on, 2013, pp. 216224.

    [9] Msp430f21x2 datasheet slas578j, Texas Instruments, January 2012.[Online]. Available: http://www.ti.com/lit/ds/symlink/msp430f2132.pdf

    [10] G. R. Fox, F. Chu, and T. Davenport, Current and future ferroelectricnonvolatile memory technology, Journal of Vacuum Science and Tech-nology B, vol. 19, no. 5, 2001.

    [11] Msp-exp430fr5739 fram experimenter board user guide,Texas Instruments, January 20 13 . [Online]. Availab le:http://www.ti.com/lit/ug/slau343b/slau343b.pdf

    335