cooperative cross-layer protection for resource

81
COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE CONSTRAINED MOBILE MULTIMEDIA SYSTEMS Kyoungwoo Lee (final defense) Prof. Nikil Dutt Prof. Nalini Venkatasubramanian Prof. Lichun Bao Nov. 26, 2008

Upload: others

Post on 04-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE CONSTRAINED MOBILE MULTIMEDIA SYSTEMS

Kyoungwoo Lee (final defense)

Prof. Nikil DuttProf. Nalini VenkatasubramanianProf. Lichun Bao

Nov. 26, 2008

Page 2: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Contents

Thesis MotivationThesis Proposal – Cooperative, Cross-layer Methods

PPC (Partially Protected Caches)EAVE (Error-Aware Video Encoding)CC-PROTECT (Cooperative, Cross-layer Protection)

Thesis Contribution and Future Direction

2

Page 3: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Mobile Multimedia Embedded Systems3

Web Browsing

Image Browsing

Satellite TVVideo Streaming

Animation

Video Conferencing

Map Routing

Mobile TV

3D GraphicsResource-limited mobile devices!Main problem is to achieve low power with high performance, high QoS, and high reliability

Page 4: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Reliability

Reliability is an emerging and critical concern in mobile devicesNew enhanced technology makes devices vulnerable to errors due to high complexity and high integration

Exponential increase of soft error rate as technology scales [Baumann, 05]Mobile applications are running close to humans

In pervasive computing, failures of healthcare mobile devices cause serious results

Redundancy techniques incur high overheads of power and performanceTMR (Triple Modular Redundancy) may exceed 200% overheads without optimization [Nieuwland, 06]

Challenging to optimize multiple properties (e.g., performance, power, QoS, and reliability) in mobile embedded systems

4

Page 5: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Soft error is becoming an every second concern!Soft Error Rate (SER) – FIT (Failures in Time) = number of errors in 109 hours

5

SER (FIT) MTTF Reason

1 Mbit @ 0.13 µm 1000 104 years

SER (FIT) MTTF Reason

1 Mbit @ 0.13 µm 1000 104 years64 MB @ 0.13 µm 64x8x1000 81 days High Integration

SER (FIT) MTTF Reason

1 Mbit @ 0.13 µm 1000 104 years64 MB @ 0.13 µm 64x8x1000 81 days High Integration

128 MB @ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration

SER (FIT) MTTF Reason

1 Mbit @ 0.13 µm 1000 104 years64 MB @ 0.13 µm 64x8x1000 81 days High Integration

128 MB @ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration

A system @ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50% of soft errors in a system

SER (FIT) MTTF Reason

1 Mbit @ 0.13 µm 1000 104 years64 MB @ 0.13 µm 64x8x1000 81 days High Integration

128 MB @ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration

A system @ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50% of soft errors in a system

A system with voltage scaling @ 65 nm

100x2x2x1000x64x8x1000

18 seconds Exponential relationship b/w SER & Supply Voltage

SER (FIT) MTTF Reason

1 Mbit @ 0.13 µm 1000 104 years64 MB @ 0.13 µm 64x8x1000 81 days High Integration

128 MB @ 65 nm 2x1000x64x8x1000 1 hour Technology scaling and Twice Integration

A system @ 65 nm 2x2x1000x64x8x1000 30 minutes Memory takes up 50% of soft errors in a system

A system with voltage scaling @ 65 nm

100x2x2x1000x64x8x1000

18 seconds Exponential relationship b/w SER & Supply Voltage

A system with voltage scaling @ flight (35,000 ft) @ 65 nm

800x100x2x2x1000x64x8x1000 FIT

0.02 seconds

High Intensity of Neutron Flux at flight (high altitude)

Page 6: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Errors and Failures in Mobile Embedded Systems

Faults or Errors can cause Failures6

Application

Middleware/ OS

Hardware

Network

Soft Error

PacketLoss

Bug

Exception

Page 7: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Errors and Error Control Schemes at Hardware

7

Failures Causes Metrics Traditional ApproachesSoft Errors, Hard Failures, System Crash

External Radiations, Thermal Effects, Power Loss, Poor Design, Aging

FIT, MTTF, MTBF

Spatial Redundancy (TMR, Duplex, RAID-1 etc.) and Data Redundancy (EDC, ECC, RAID-5, etc.)

•FIT: Failures in Time (109 hours)•MTTF: Mean Time To Failure•MTBF: Mean Time b/w Failures•TMR: Triple Modular Redundancy•EDC: Error Detection Codes•ECC: Error Correction Codes•RAID: Redundant Array of Inexpensive Drives

Hardware failures are increasing as technology scales(e.g.) SER increases by up to 1000 times [Mastipuram, 04]

Redundancy techniques are expensive(e.g.) ECC-based protection in caches can incur 95% performance penalty [Li, 05]

Application

MW/ OS

Hardware

Network

Page 8: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Errors and Error Control Schemes at Software

8

Failures Causes Metrics Traditional ApproachesWrong outputs, Infinite loops, Crash

Incomplete Specification, Poor software design, Bugs, Unhandled Exception

Number of Bugs/Klines, QoS, MTTF, MTBF

Spatial Redundancy (N-version Programming, etc.), Temporal Redundancy (Checkpoints and Backward Recovery, etc.)

•QoS: Quality of Service

Software errors become dominant as system’s complexity increases(e.g.) Several bugs per kilo lines

Hard to debug, and redundancy techniques are expensive(e.g.) Backward recovery with checkpoints is inappropriate for real-time applications

Application

MW/ OS

Hardware

Network

Page 9: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Errors and Error Control Schemes in Networks

9

Failures Causes Metrics Traditional ApproachesData Losses, Deadline Misses, Node (Link) Failure, System Down

Network Congestion, Noise/Interference, Malicious Attacks

Packet Loss Rate, Deadline Miss Rate, SNR, MTTF, MTBF, MTTR

Resource Reservation, Data Redundancy (CRC, etc.), Temporal Redundancy (Retransmission, etc.), Spatial Redundancy (Replicated Nodes, MIMO, etc.)

•SNR: Signal to Noise Ratio•MTTR: Mean Time To Recovery•CRC: Cyclic Redundancy Check•MIMO: Multiple-In Multiple-Out

Network is unreliable (especially, wireless networks)Joint approaches across OSI layers have been investigated for minimal costs [Vuran, 06][Schaar, 07]

Application

MW/ OS

Hardware

Network

Page 10: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Conventional Approaches

Most redundancy techniques incur overheads in terms of performance, power, area, etc.

Conventional TRM (Triple Modular Redundancy) can incur 200% overheads without optimization.Backward Recovery with Checkpoints cannot guarantee the completion time of a task.

Recently proposed techniques have focused on the cost reduction without losing reliability

However, they still incur overheads

10

Page 11: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Thesis Problem Statement

Study tradeoffs among system properties(e.g.) Redundancy incurs energy overheads while DVS increases SER significantly

Examine errors and error control schemes across system abstraction layers

(e.g.) network errors & error-resilient video encoding, soft errors & ECC or EDC, etc.

Maximize reliability with minimal costs of power and performance for mobile embedded systems

11

Page 12: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Cross-Layer MethodsCross-layer approaches:

aim at system-level optimizationIntegrate and coordinate techniques across system layers

Classification [Srivastava, 05]

Top-down, Bottom-up, or Both direction Top-down – PPC, PDVS [GRACE], etc.Bottom-up – EAVE, etc.Both direction – CC-PROTECT, etc.

Coupling or Merging layers Dynamo [Mohapatra], xTune [Kim], etc.

12

Top-down

Bott

om-u

p

CouplingM

erging

Page 13: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Cross-Layer Approaches – GRACE

GRACE project @ UIUC [W. Yuan Ph.D. thesis in ’04 and A. F. Harris III, Ph.D. thesis in ’06]

QoS/Power tradeoffsPrimarily OS adaptation for power management in multimedia mobile devicesNetwork adaptation for power management in multimedia communications

13

[GRACE, 05]

Application

Operating

System

Hardware

Page 14: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Cross-Layer Approaches – DYNAMO & FORGE

DYNAMO middleware for FORGE project @ UCI [S. Mohapatra Ph.D. thesis in ’05 and R. Cornea Ph.D. thesis in ’07]

QoS/Power tradeoffs for mobile embedded systemsMiddleware-driven coordination and proxy-based cooperation1. Content transcoding at the

application layer2. Network traffic shaping at the

network layer3. Backlight (LCD display) setting at

the hardware layer4. NIC shutdown, CPU DVS/DFS at

the hardware layer

14

Application

Middleware/ OS

Hardware

Proxy Server

(NW & MW)

12

3 4

Page 15: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Cross-Layer Approaches – xTune

xTune framework @ UCI and SRI [M. Kim Ph.D. thesis in ’08]QoS/Power/Timeliness adaptation for distributed real-time embedded systemsA Formal Methodology for cross-layer tuning and verifiable timeliness of Mobile Embedded Systems

15

Handheld Server

Proxy Server

Application

Middleware/ OS

Hardware

Page 16: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Thesis Proposed Contribution

Thesis proposes a cross-layer design methodology for mobile multimedia embedded systems with minimal costs

Reliability/QoS/Power/Performance system optimization for mobile multimedia systems

Cooperative, Cross-Layer ProtectionPPC, EAVE, & CCPROTECTLow-cost reliability

16

Page 17: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Overview of Thesis Proposals17

Hardware

UnprotectedCache

ProtectedCacheProtectedCache ECCECC

Error-prone Networks

Mobile Video Application

Error-prone Networks

Mobile Video Application

EAVE

Error-ResilientEncoder (e.g., PBPAIR)

Error-Controller(e.g., frame drop)Error-Controller

(e.g., frame drop)

OriginalVideo

Error-AwareVideo

Monitor & Translate SER

MW/OS

Packet Loss

Frame Drop

Error detection

Application

Multimedia Application

EDCEDC

Correction

QoSPPC (Partially Protected Caches)EAVE (Error-Aware Video Encoding)CC-PROTECT (Cooperative, Cross-layer Protection)

Page 18: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Contents

Thesis MotivationThesis Proposal – Cooperative, Cross-layer Methods

PPC (Partially Protected Caches)EAVECC-PROTECT

Thesis Contribution and Future Direction

18

Application

Hardware

Middleware/ OS Network

Page 19: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Conventional Protection for Caches

Conventional Protected CachesUnaware of fault tolerance at applicationsImplement a redundancy technique such as ECC to protect all data for every access

Overkill for multimedia applicationsECC (e.g., a Hamming Code) incurs high performance penalty by up to 95%, power overhead by up to 22%, and area cost by up to 25%

High Cost

CacheCache ECCECCU

naware of Application

19

Page 20: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

PPC (Partially Protected Caches)

ObservationNot all data are equally failure critical

Multimedia data vs. control variables

Propose PPC architectures to provide an unequal protection for mobile multimedia systems [Lee, CASES06][Lee, TVLSI08]

Unprotected cache and Protected cache at the same level of memory hierarchyProtected cache is typically smaller to keep power and delay the same as or less than those of Unprotected cache

UnprotectedCache

ProtectedCacheProtectedCache

Memory

PPC

20

Page 21: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

PPC for Multimedia Applications

Propose a selective data protection [Lee, CASES06]Unequal protection at hardware layer exploiting error-tolerance of multimedia data at application layerSimple data partitioning for multimedia applications

Multimedia data is failure non-criticalAll other data is failure critical

Fault Tolerance

Power/D

elay Reduction

21

UnprotectedCache Protected

CacheProtectedCache

Memory

PPC

Page 22: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

PPC for General Applications

DPExplore [Lee, PPCDIPES08]Explore partitioning space by exploiting awareness of vulnerability of each data page

Vulnerable timeIt is vulnerable for the time when eventually it is read by CPU or written back to Memory

Pages causing high vulnerable time are failure criticalVulnerable time closely estimates failure rate

Read

Write

Eviction

Incoming

data

t0 t1 t2 t3

22

UnprotectedCache Protected

CacheProtectedCache

Memory

PPC

Page 23: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Summary – PPCAll data are not equally failure criticalPropose a PPC architecture to provide unequal protection

Support an unequal protection at hardware layer by exploiting error-tolerance and vulnerability at applicationPresent cost-efficient reliability

Related Publications[Lee, CASES06] – PPC for multimedia embedded systems[Lee, PPCDIPES08] – PPC for general applications[Lee, TVLSI08] – PPC and design space exploration

Under submission[Lee, TODAES??] – PPC for general applications and instruction caches

23

Application Data & Code

Failure Non-Critical

Failure Critical

Unprotected Cache

Protected Cache

PPC

Page Partitioning Algorithms

Error-tolerance of MM dataVulnerability of Data & Code

FNC & FC are mapped into Unprotected & Protected Caches

Page 24: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Contents

Thesis MotivationThesis Proposal – Cooperative, Cross-layer Methods

PPCEAVE (Error-Aware Video Encoding)CCPROTECT

Thesis Contribution and Future Direction

24

Application

Middleware/ OS Network

Page 25: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Active Error Exploitation – Intentional Frame Drop

Error-prone Networks

Mobile Video Application

Enc

CPU

Tx

WNI

Dec

CPU

Rx

WNIFDT-1FDT-1 FDT-2FDT-2 FDT-3FDT-3

•FDT: Frame Drop Type•Enc: Encoding, Dec: Decoding•WNI: Wireless Network Interface

Intentional Frame Drop (one way to actively exploit errors) can result in energy reduction for each operationFDT-1 affects the following components with respect to power, performance, and QoS in mobile video applications

25

Packet Loss

Page 26: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Error-Aware Video Encoding

Propose EE-PBPAIR [Lee, DIPES08]

Intentionally drop frames at video encodingReduce the energy consumption for video encodingMaintain the video quality by exploiting error-resilience of PBPAIR

Error-prone Networks

Packet Loss

Intentional frame drop

Error-Aware Video Encoder (EAVE)

Error-ResilientEncoder

(e.g., PBPAIR)

Error-Controller(e.g., frame dropping)Error-Controller

(e.g., frame dropping)

OriginalVideo

Error-Resilient

Video

•EIR: Error Injection Rate

26

Error-AwareVideo

Page 27: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Summary – EAVE

Intentional Frame Drop is one way to exploit errors activelyPropose an error-aware video encoding (EE-PBPAIR)

Present a knob (EIR) to adjust the amount of errors considering the QoS feedbackMaintain the video quality using error-resilience of PBPAIR

Related Publication[Lee, DIPES08] – EE-PBPAIR

Considering Submission[Lee, TECS??] – Generalized idea for error-resilient video encodings

•EIR: Error Injection Rate•PLR: Packet Loss Rate

27

Error Resilient Video Encoder

Error Controller

Hardware

MiddlewareEnergy

Reduction

CPU, Memory, and WNIC

Application

Network or Decoding Side

Error Rate = PlR + EIR

EIR PLR& QoS

Error-Aware Video Data

Page 28: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Contents

Thesis MotivationThesis Proposal – Cooperative, Cross-layer Methods

PPCEAVECC-PROTECT (Cooperative Cross-layer Protection)

Thesis Contribution and Future Direction

28

Application

Hardware

Middleware/ OS Network

Page 29: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Errors and Error Control Schemes – No Coupling

Different errors and their protection techniques have not been considered jointly

No coupling and no cooperation

Cooperating control schemes in a cross-layer manner can open a new venue

29

Error-prone Networks

Mobile Video Application

Application

Middleware/ OS Network

Hardware Soft Error

PacketLoss

Page 30: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

PPC still incurs overheads due to ECC-protection30

UnprotectedCache

ProtectedCacheProtectedCache

Memory

PPC

Propose PPC architectures to provide an unequal protection for mobile multimedia systems [Lee, TVLSI08]

Unprotected cache and Protected cache a the same level of memory hierarchy

PPC still incurs overheads due to high expensive ECC-protection at the protected cache

29% energy reduction compared to the protected cache

10% energy overhead compared to the unprotected cache

Page 31: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

PBPAIR is energy-inefficient in error-free network

PBPAIR is error-resilient and energy-efficient in generalPBPAIR may not be energy efficient in case of error-free network

31

PBPAIR

PLR

PacketLoss

network

Intra_Threshold•PBPAIR: Probability-Based Power Aware Intra Refresh [Kim, 06]

Page 32: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Outline of CC-PROTECT32

frame K frame K+1

UnprotectedCache

ProtectedCacheProtectedCache PPCEDCEDC

Error-prone Networks

Mobile Video Application

Error-prone Networks

Mobile Video Application

Error-Aware Video Encoder (EAVE)

Error-ResilientEncoder (e.g., PBPAIR)

Error-Controller(e.g., frame drop)Error-Controller

(e.g., frame drop)

OriginalVideo

Error-AwareVideo

DFR (Drop &Forward Recovery)

BER (Backward Error Recovery)

Feedback

Monitor & Translate SER

Trigger Selective DFR

Support EAVE & PPC

Parameter

MW/OS

Packet Loss

Frame Drop

Error detection

QoS Loss

Page 33: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Energy SavingBASE = Error-prone video encoding + unprotected cache

HW-PROTECT = Error-prone video encoding + PPC with ECC

APP-PROTECT = Error-resilient video encoding + unprotected cache

MULTI-PROTECT = Error-resilient video encoding + PPC with ECC

CC-PROTECT1 = Error-prone video encoding + PPC with EDC

CC-PROTECT2 = Error-prone video encoding + PPC with EDC + DFR

CC-PROTECT = error-resilient video encoding + PPC with EDC + DFR

33

EDC impact17% Reduction compared to HW-PROTECT4% Reduction compared to BASE

EDC + DFR impact36% Reduction compared to HW-PROTECT26% Reduction compared to BASE

EDC + DFR + PBPAIR(CC-PROTECT) impact56% Reduction compared to HW-PROTECT49% Reduction compared to BASE

Page 34: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Summary – CC-PROTECTPropose CC-PROTECT approach, which cooperates existing schemes across layers to mitigate the impact of soft errors on the failure rate and video quality in mobile video encoding systems

PPC (Partially Protected Caches) with EDC (Error Detection Codes) at hardware layerDFR (Drop and Forward Recovery) at middlewarePBPAIR (Probability-Based Power Aware Intra Refresh) at application layer

Demonstrate the effectiveness of low-cost (about 50%) reliability (1,000x) at the minimal cost of QoS (less than 1%)Related Publication

[Lee, ACMMM08] – CC-PROTECTConsidering Submission

[Lee, ACMTOMCCAP??] – Tradeoff space exploration with CC-PROTECT

34

Application

Middleware/ OS

Hardware UnprotectedCache Protected

CacheProtectedCache

ECC

DFR -Error Correction

PBPAIR -Error Resilience

EDC

Page 35: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Contents

Thesis MotivationThesis Proposal – Cooperative, Cross-layer Methods

PPCEAVECC-PROTECT

Thesis Contribution and Future Direction

35

Application

Hardware

Middleware/ OS Network

Page 36: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Overall Thesis Contribution

Cross-layer methodology to design mobile multimedia embedded systems with minimal costs

36

Application

Middleware/ OS

Hardware

Network

Soft Error

PacketLoss

1. Effective Cross-layer approaches for reliability

2. Low-cost reliability3. Expanded trade-off

space 4. Extended applicability of

existing techniques

Page 37: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Effectiveness of Thesis Proposals (Energy Saving)

25% energy reduction, as compared to a conventional protected cache with ECC

30% energy reduction, as compared to a conventional video encoding

PPC EAVE

56% energy reduction, as compared to a conventional composition of protections

37

CCPROTECT

Page 38: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Publication38

[Lee, ACMMM08] K. Lee, A. Shirvastava, M. Kim, N. Dutt, and N. Venkatasubramanian, “Mitigating the impact of hardware defects on multimedia applications – A cross-layer approach”, In ACM International Conference on Multimedia, Oct. 2008.

[Lee, TVLSI08] K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian, “Partially protected caches to reduce failures due to soft errors in multimedia applications”, In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2008, to appear.

[Lee, DIPES08] K. Lee, M. Kim, N. Dutt, and N. Venkatasubramanian, “Error exploiting video encoder to extend energy/QoS tradeoffs for mobile embedded systems”, In 6th IFIP Working Conference on Distributed and Parallel Embedded Systems (DIPES), Sep. 2008.

[Lee, PPCDIPES08] K. Lee, A. Shrivastava, N. Dutt, and N. Venkatasubramanian, “Data partitioning techniques for partially protected caches to reduce soft error induced failures”, In 6th IFIP Working Conference on Distributed and Parallel Embedded Systems (DIPES), Sep. 2008.

[Lee, CASES06] K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian, “Mitigating soft error failures for multimedia applications by selective data protection”, In Int. Conference on Compilers, Architecture, & Synthesis for Embedded Systems (CASES), Oct. 2006.

[Lee, ICME05] K. Lee, N. Dutt, and N. Venkatasubramanian, “Experimental Study on Energy Consumption of Video Encryption for Mobile Handheld Devices", In IEEE International Conference on Multimedia and Expo (ICME 05), Poster Session, July 2005.

[Mohapatra, IPDPS05] S. Mohapatra, R. Cornea, H. Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S. Shukla, and N. Venkatasubramanian, “A cross-layer approach for power-performance optimization in distributed mobile systems”, In Next Generation Software Program in conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2005.

Application

Middleware/ OS

Hardware

Network

[Lee, TVLSI08][Lee, PPCDIPES08][Lee, CASES06]

[Lee, DIPES08]

[Lee, ACMMM08][Mohapatra, IPDPS05][Lee, ICME05]

Page 39: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Future Direction

Error Rate Translation/Integration

Different types of errorsDifferent components across system layers

Cross-layer methods for distributed embedded systems (Horizontal Expansion)

Network-aware methodsContext-aware approaches

39

Error-prone Networks

Mobile Video Application

Application

Middleware/ OS

Hardware

Network

Soft Error

PacketLoss

Bug

Exception

Page 40: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Thank you! Any Questions or Comments?

40

Page 41: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Backup Slides41

Page 42: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Why Cross-Layer Approach?Cross-layer interactions and conflicts arise between system properties

DVS increases SER exponentiallyOver protection or under protection

All ECC for multimedia data is an overkillCross-layer approaches can maximize the reliability with minimal power and performance overheads

Benefits of Cross-layer approachesGlobal system viewCoordination for intelligent selectionAdaptation

Cross-layer approaches have been promising to save the resources at the cost of QoS [Mohapatra, 05][Yuan, 04]

•DVS: Dynamic Voltage Scaling•SER: Soft Error Rate•ECC: Error Correction Codes•QoS: Quality of Service

42

Page 43: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Thesis Proposed Contribution: CC-PROTECT

Cooperative Cross-layer Protection (CC-PROTECT) by exploiting error-awareness and error control schemes across system abstraction layersContribution

Present cost-efficient reliability methods (cooperative cross-layer protection)Open expanded tradeoff spaces and operating pointsRediscover applicability of existing approaches for other purposes

43

Page 44: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Performance vs. Capacity44

Total energy available from a battery is a design issue and is fixed at a design time, along with its weight and sizeStark contrast between linear growth rate of battery capacity and exponential technology improvement rate of system components

[Udani] Sanjay Udani and Jonathan Smith, “Power management in mobile computing”

Page 45: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Generalized Fault Tolerance Techniques

1) Modular Redundancy2) N-Version Programming3) Error-Control Coding4) Checkpoints and Rollbacks5) Recovery Blocks

45

[Chetan, SPC04] S. Chetan, A. Ranganathan, and R. Campbell, “Towards Fault Tolerant Pervasive Computing”, in SPC ’04[Somani, IEEECom97] A. K. Somani and N. H. Vaidya, “Understanding Fault Tolerance and Reliability”, in IEEE Computer ’97 vol. 30 issue 4

Page 46: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

1) Modular Redundancy

Modular RedundancyMultiple identical replicas of hardware modulesVoter mechanism

Compare outputs and select the correct output

Tolerate most hardware faultsEffective but expensive

ConsumerData

Producer Bvoter

Producer Afault

46

Page 47: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

2) N-version Programming

N-version ProgrammingDifferent versions by different teams

Different versions may not contain the same bugs

Voter mechanismTolerate some software bugs

Producer A ConsumerData

voter

Program i Program j

Programmer K Programmer L

fault

47

Page 48: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

3) Error-Control Coding

Error-Control CodingReplication is effective but expensiveError-Detection Coding and Error-Correction Coding

(example) Parity Bit, Hamming Code, CRC

Much less redundancy than replication

Producer A Consumer

Data

ErrorControl

Datafault

48

Page 49: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

4) Checkpoints & Rollbacks

Checkpoints and RollbacksCheckpoint

A copy of an application’s stateSave it in storage immune to the failures

RollbackRestart the execution from a previously saved checkpoint

Recover from transient and permanent hardware and software failures

Producer A ConsumerData

Application

state (K-1) state K

faultCheckpoint

Rollback

State K

49

Page 50: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

5) Recovery Blocks

Recovery BlocksMultiple alternates to perform the same functionality

One Primary module and Secondary modules Different approaches

1) Select a module with output satisfying acceptance test

2) Recovery Blocks and RollbacksRestart the execution from a previously saved checkpoint with secondary module

Tolerate software failures

Producer A ConsumerData

state (K-1) state K

faultCheckpoint

Rollback

Block XBlock YBlock Z

Block X2

Application

50

Page 51: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Soft Errors (Transient Faults)

SER increases exponentially as technology scalesIntegration, voltage scaling, altitude, latitude

Caches are most hit due to:Larger portion in processors (more than 50%) No masking effects (e.g., logical masking)

Transistor

01

5 hours MTTF

1 month MTTF

Intel Itanium II Processor

•MTTF: Mean time To FailureBit Flip

51

[Baumann, 05]

Page 52: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Related Work

Process Technology SolutionsHardening [Baze, IEEE Trans. on Nuclear Science 00]SOI [O. Musseau, IEEE Trans. on Nuclear Science 96]Process complexity, yield loss, and substrate cost

Microarchitectural Solutions for Caches

Cache Scrubbing [Mukherjee, PRDC04]Low Power Cache [Li, ISLPED04]Area Efficient Protection [Kim, DATE06]Multiple Bit Correction [Neuberger, TODAES 03]Cache Size Selection [Cai, ASP-DAC06]In-Cache Replication [Zhang, DSN03]Replication Cache [Zhang, IEEE Computers 05]High overheads in terms of power, performance, and area

52

Our Solution-Protects caches from failures due to soft errors exploiting error-tolerance of applications-Protection can be in conjunction with any techniques

Our Solution-Protects caches from failures due to soft errors exploiting error-tolerance of applications-Protection can be in conjunction with any techniques

Page 53: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Unequal Data Protection

All pages are not equally failure critical

Multimedia data is failure non-criticalProgram variables are failure criticalFailures: system crash, infinite loop, segmentation faults, etc

QoS degradation is not a failure

Only 9 pages out of 83 are failure critical

53

Page 54: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Failure Critical and Failure Non-Critical Data54

Page 55: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Soft Errors on Increase55

Increase exponentially due to technology scaling0.18 µm

1,000 FIT per Mbit of SRAM

0.13 µm 10,000 to 100,000 FIT per Mbit of SRAM

Voltage ScalingVoltage scaling increases SER significantly

SER ∝ Nflux CSx expQcritical{-x

Qs}

where Qcritical = C Vx

Page 56: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Setup for Page Failure Rates56

Page 57: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Framework57

Page 58: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Results – Failure Rate

Failure rate of PPC is close to that of Safe (Safe is a protected cache configuration with an ECC protection, i.e., protecting all data, and Unsafe is an unprotected cache)

58

Page 59: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Results – Performance

Runtime of PPC is close to that of Unsafe

59

Page 60: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Results – Power

Energy consumption of PPC is close to that of Unsafe

60

Page 61: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Setup for DPExplore61

Page 62: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

DPExplore Results62

Page 63: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Video Encoding63

Page 64: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Error-Resilient Video Encoding

Error-resilient video encodings have been developed to combat errors in networks

PBPAIR – energy-efficient and error-resilient video encoding [Kim,06]Passive Error Exploitation

It compresses video data according to PLR

Error-prone Networks

Mobile Video Application

Packet LossMaintain the QoSEmbed Error-Resilience

against packet losses

64

•PBPAIR: Probability-BasedPower Aware Intra Refresh

NetworkResilience

PLRParameters

Page 65: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

65

Related Work

Energy/QoS-aware video encoding

Video encoding parameters [Mopatra, IPDPS05]

Motion estimation algorithm [Tourapis, VCIP00]

Integrated power management [Mohapatra, ACM MM03]

Global cross-layer adaption [Yuan, MMCN04]

Transmission power and QoS [Eisenberg, IEEE Trans. on CSVT 02]

Not consider error-resilience

Error-resilient video encodingError-resilient GOP [Yang, JVCIP07]

AIR (Adaptive Intra Refreshing) [Worral, ICASSP01]

PGOP (Progressive GOP) [Cheng, PCS04]

PBPAIR (Probability-Based Power Aware Intra Refresh) [Kim, MCCR06]

Passive error exploitation

Our Solution-Error-aware video encoding: exploits errors actively to minimize energy consumption

Our Solution-Error-aware video encoding: exploits errors actively to minimize energy consumption

Page 66: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

EE-PBPAIR66

Page 67: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Setup67

Page 68: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Results – Energy Reduction

Energy saving occurs at every component in a path from encoding to decoding in mobile video applications

EC= Energy ConsumptionEnc EC= EC for EncodingTx EC= EC for TransmissionDec EC= EC for DecodingRx EC= EC for Receiving

68

•PSNR: Peak Signal to Noise Ratio

PLR = 10% and EIR = 10%

Page 69: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Results – Expanded Tradeoff Space 69

Page 70: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Energy Saving70

•Source EC = Enc EC + Tx EC•Destination EC = Rx EC + Dec EC

Page 71: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Experimental Results – Adaptive EIR

Feedback-based approach (Adaptive EE-PBPAIR) maintains the required video quality compared to Static EE-PBPAIR

71

Page 72: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Adaptive EIR72

Page 73: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Conclusion

Studied two main cross-layer approaches

PPCEAVE

Demonstrated the effectiveness of our cooperative cross-layer approaches by exploiting error tolerance and error control schemes

NetworkEIR

FLR

Resilience

PLRfeedback

73

Tolerance

UnequalProtection

Page 74: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Failure Rate74

Page 75: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Video Quality75

Page 76: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Memory Access Time (performance)76

Page 77: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Future DirectionCooperative approaches combining PPC and EAVE

Middleware-driven cross-layer approach manages error control schemesTranslate errors to exploit existing approaches at other abstraction layers

PPCApply our approach for other components

Instruction caches and logics

EAVEIntelligent frame dropping techniques

To maximize the energy saving while minimizing the quality degradation

77

EIR

FLR

Resilience

PLRfeedback

Tolerance

UnequalProtection

SER

Page 78: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

Thesis Outline

Thesis proposes a cross-layer methodExploit errors and error control schemes across layers to maximize reliability with minimal costs for mobile embedded systems

Topic 1 – Approach at hardware and application layersPPC (unequal data protection at hardware exploiting error tolerance at application) [Lee, CASES06][Lee, DIPES08][Lee, TVLSI08]

Topic 2 – Approach at application, middleware, and network layersEAVE (intentional exploitation of errors at application, incorporating error resilience in networks) [Lee, DIPES08]

Topic 3 – Approach across application/middleware-OS/HWCC-PROTECT (middleware-driven cooperative exploitation of errors and error control schemes across layers) [Lee, ACM MM 08]

78

Application

Hardware

Middleware/ OS Network

Page 79: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

References (cross-layers and tools)[Bajic, 07] I. V. Bajic. Efficient cross-layer error control for wireless video multicast. 53(1):276–285, Mar 2007.

[Dynamo] DYNAMO. Power Aware Middleware for Distributed Mobile Computing. University of California at Irvine, http://dynamo.ics.uci.edu/.

[Forge] FORGE Project. A Framework for Optimization of Distributed Embedded Systems Software. University of California at Irvine, http://www.ics.uci.edu/~forge/.

[Grace] GRACE Project. Global Resource Adaptation through CoopEration. University of Illinois at Urbana-Champaign, http://rsim.cs.uiuc.edu/grace/.

[Kim, 08] M. Kim, N. Dutt, N. Venkatasubramanian, and C. Talcott. xTune: Online verifiable cross-layer adaptation for distributed real-time embedded systems. ACM SIGBED Review: Special Issue on the RTSS Forum on Deeply Embedded Real-Time Computing, 5(1), Jan 2008.

[Mohapatra, 03] S. Mohapatra, R. Cornea, N. Dutt, A. Nicolau, and N. Venkatasubramanian. Integrated power management for video streaming to mobile handheld devices. In ACM international conference on Multimedia, 2003.

[Mohapatra, 05] S. Mohapatra, R. Cornea, H. Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S. Shukla, and N. Venkatasubramanian. A cross-layer approach for power-performance optimization in distributed mobile systems. In Next Generation Software Program in conjunction with IPDPS, page218.1, April 2005.

[Shivakumar, 01] P. Shivakumar and N. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. In WRL Technical Report 2001/2, 2001.

[Synopsys] Synopsys Inc., Mountain View, CA, USA. Design Compiler Reference Manual, 2001.

[Schaar, 07] M. van der Schaar and D. S. Turaga. Cross-layer packetization and retransmission strategies for delay-sensitive wireless multimedia transmission. IEEE Transactions on Multimedia, 9(1):185–197, Jan. 2007.

[Vuran, 06] M. C. Vuran and I. F. Akyildiz. Cross-layer analysis of error control in wireless sensor networks. In IEEE Communications Society on Sensor and Ad Hoc Communications and Networks (SECON), pages 585–594, Sep 2006.

[Yuan, 03] W. Yuan and K. Nahrstedt. Energy-efficient soft real-time CPU scheduling for mobile multimedia systems. 37(5):149–163, Dec 2003.

[Yuan, 04] W. Yuan and K. Nahrstedt. Practical voltage scaling for mobile multimedia devices. In ACM international conference on Multimedia, pages 924–931, 2004.

79

Page 80: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

References (soft errors and reliability)[Baumann, 05] R. Baumann. Soft errors in advanced computer systems. IEEE Design and Test of Computers, pages 258–266, 2005.

[Hazucha, 00] P. Hazucha and C. Svensson. Impact of CMOS technology scaling on the atmospheric neutron soft error rate. IEEE Trans. on Nuclear Science, 47(6):2586–2594, 2000.

[Li, 05] J.-F. Li and Y.-J. Huang. An error detection and correction scheme for RAMs with partial-write function. In IEEE International Workshop on Memory Technology, Design and Testing (MTDT), pages 115–120, 2005.

[Li, 04] L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. Soft error and energy consumption interactions: A data cache perspective. In ISLPED, Aug 2004.

[Mastipuram, 04] R. Mastipuram and E. C. Wee. Soft Errors’ Impact on System Reliability. http://www.edn.com/article/CA454636, Sep 2004.

[Phelan, 03] R. Phelan. Addressing soft errors in arm core-based designs. Technical report, ARM, 2003.

[Pradhan, 96] D. K. Pradhan. Fault-Tolerant Computer System Design. Prentice Hall, 1996. ISBN 0-1305-7887-8.

[Shrivastava, 05] A. Shrivastava, I. Issenin, and N. Dutt. Compilation techniques for energy reduction in horizontally partitioned cache architectures. In CASES, pages 90–96, 2005.

[Wrobel, 01] F. Wrobel, J. M. Palau, M. C. Calvet, O. Bersillon, and H. Duarte. Simulation of nucleon-induced nuclear reactions in a simplified SRAM structure: Scaling effects on SEU and MBU cross sections. IEEE Trans. on Nuclear Science, 48(6), 2001.

[Xu, 96] J. Xu and B. Randell. Roll-forward error recovery in embedded real-time systems. In ICPADS, page 414, 1996.

[Nieuwland, 06] A. K. Nieuwland and S. Jasarevic and G. Jerin. Combinational Logic Soft Error Analysis and Protection. In IOLTS06, 2006.

80

Page 81: COOPERATIVE CROSS-LAYER PROTECTION FOR RESOURCE

References (error-resilient encoding, etc.)[Cheng, 04] L. Cheng and M. E. Zarki. PGOP: An error resilient techniques for low bit rate and low latency video communications. In Picture Coding Symposium

(PCS), Dec 2004.

[Kim, 06] M. Kim, H. Oh, N. Dutt, A. Nicolau, and N. Venkatasubramanian. PBPAIR: An energy-efficient error-resilient encoding using probability based power aware intra refresh. ACM SIGMOBILE Mobile Computing and Communications Review, 10(3):58–69, July 2006.

[Wang, 98] Y.Wang and Q.-F. Zhu. Error control and concealment for video communication: A review. 86(5):974–997, May 1998.

[Worrall, 01] S. Worrall, A. Sadka, P. Sweeney, and A. Kondoz. Motion adaptive error resilient encoding for MPEG-4. In ICASSP, May 2001.

81