an integrated framework for dependable and revivable architecture using multicore processors weidong...

Post on 18-Jan-2016

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Integrated Framework for Dependable and Revivable

Architecture Using Multicore Processors

Weidong Shi Motorola LabsHsien-Hsin “Sean” Lee Georgia TechLaura Falk University of MichiganMrinmoy Ghosh Georgia Tech

2

Problem Statement

• Highly Available, Reliable, and Revivable networked services.

• Explore new programming and usage models for Multi-core processors

• Provide “architectural support” for network services to be– Autonomic

– Remote-exploits revivable

– Self-recoverable

• Achieve high performance

3

Problem Statement

• Highly Available, Reliable, and Revivable networked services.

• Explore new programming and usage models for

Multi-core processors

• Provide “architectural support” for network services to be– Autonomic

– Remote-exploits revivable

– Self-recoverable

• Achieve high performance

4

Toward Self-recovery Network Services

Causes of Network Service Loss

AccidentalTransient Heisenbugs Damage Aging

IntentionalDoS Buffer

Overflow

Solutions

Replication

Rejuvenation

Checkpoint

Remote Exploit Self-

recovery

5

Multicore: An ideal platform

• Exploit insulation: Each core of a multicore can be programmed to run at different

privilege levels with different OS.

Dual Core Dual Core (Merome)(Merome)

Server Core

Monitor Core

SharedL2

• Tight coupling of cores comparing with SMP Fine-grained processor state monitoring

• Concurrent monitoring, efficient state backup and recovery

• Massive multi-core will have many idle cores

6

INDRA: A Dependable and Revivable Architecture

Monitor CoreMonitor Core

L2 CacheL2 CacheL2 CacheL2 Cache

IL1Cache

IL1Cache

DL1CacheDL1

Cache Monitor

Insulation

Issue Recovery

Control

Memory InterfaceWatch Dog

Memory InterfaceWatch Dog

Physical Memory Space(used by service OS and applications)

Protected Memory Space (monitor BIOS, OS, and SW)

Server Core(Network Apps)Server Core

(Network Apps)

IL1Cache

IL1Cache

DL1CacheDL1

Cache

TraceFilterTraceFilter

TraceFIFOTraceFIFO

Code origin check

CFG check

Control signals

7

Data Page

Code Page

Monitor Core: Insulated Parallel Inspection [Kiriansky et al., USENIX 2002]

Vuln_func(){ // Attack!!// Return address changed }

FunctionA(){ Vuln_func(); A =3;}

Malicious_func(){}

Code Page

Code Origin Check

Control Flow Graph Check

Exception Handling

8

Server Core: Request Based Recovery

Issue state backuprequest

Issue state backuprequest

Read network request(Request for page

arch.ece.gatech.edu)

Read network request(Request for page

arch.ece.gatech.edu)

Process networkrequest

Monitor SignalledError?

No Yes

Restore CheckpointedState

Restore CheckpointedState

9

Comparison of Backup and Recovery

Backup RecoveryApproach

Software checkpointing Slow

Fast, modify page translation

Memory Update Log Fast

Log based undo slow

Virtual Checkpointing

Copy dirty page on demand, slow

Fast, modify TLB entry

INDRAFast, no page copy Fast, no page

copy

10

INDRA Backup Page Record

Active Page

Modified TLB

Global Timestamp Register (GT) GT=4

Backup Page

TLB Extension for Backup and Rollback

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

RollbackValid

Local Timestamp

Active Page (Physical Address)

Tag

Dirty BlockBitvector

Backup Page (Physical Address)

LocalTimestamp

RollbackBitvector

RollbackValid

3

ProcessorMemory

11

INDRA Backup Page Record

Active Page

Modified TLB

Global Timestamp Register (GT) GT=4

Backup Page

TLB Extension for Backup and Rollback

Backup Page Record

ProcessorMemory

Dirty BlockBitvector

Backup Page (Physical Address)

LocalTimestamp

RollbackBitvector

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backp Record

RollbackValid

Local Timestamp

Active Page (Physical Address)

Tag

RollbackValid

3

12

INDRA Recovery Example

Active Page

Global Timestamp Register (GT) GT=5

Backup Page

Modified TLB TLB Extension for Backup and Rollback

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backup Record

RollbackValid

Local Timestamp

Active Page (Physical Address)

Tag

Current Operation

Wr memory line 7Wr memory line 7REQUEST nREQUEST n

5

13

INDRA Recovery Example

Active Page

Global Timestamp Register (GT) GT=5

Backup Page

Modified TLB TLB Extension for Backup and Rollback

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backup Record

RollbackValid

Local Timestamp

Active Page (Physical Address)

Tag

Current Operation

REQUEST nREQUEST n

5

Wr memory line 2Wr memory line 2

14

INDRA Recovery Example

Active Page

Global Timestamp Register (GT) GT=5

Backup Page

Modified TLB TLB Extension for Backup and Rollback

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backup Record

RollbackValid

Local Timestamp

Active Page (Physical Address)

Tag

REQUEST nREQUEST n

5

Failure SignalFailure Signal

Restore system resource allocationRestore process context

1

15

INDRA Recovery Example

Active Page

Global Timestamp Register (GT) GT=5

Backup Page

Modified TLB TLB Extension for Backup and Rollback

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backup Record

RollbackValid

1

Local Timestamp

Active Page (Physical Address)

Tag

REQUEST n+1REQUEST n+1

5

Current Operation

Rd memory line 7Rd memory line 7

16

INDRA Recovery Example

Active Page

Global Timestamp Register (GT) GT=5

Backup Page

Modified TLB TLB Extension for Backup and Rollback

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backup Record

RollbackValid

1

Local Timestamp

Active Page (Physical Address)

Tag

REQUEST n+1REQUEST n+1

5

Current Operation

Wr memory line 1Wr memory line 1

17

INDRA Recovery Example

Active Page

Global Timestamp Register (GT) GT=5

Backup Page

Modified TLB TLB Extension for Backup and Rollback

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backup Record

RollbackValid

1

Local Timestamp

Active Page (Physical Address)

Tag

REQUEST n+1REQUEST n+1

5

Current Operation

Handle Next RequestHandle Next Request Global Timestamp Register (GT) GT=6

Record system resource allocationRecord process context

18

INDRA Recovery Example

Active Page

Global Timestamp Register (GT) GT=5

Backup Page

Modified TLB TLB Extension for Backup and Rollback

3

Dirty Block Bitvector

Backup Page(Physical Address)

Rollback Bitvector

Backup Record

RollbackValid

1

Local Timestamp

Active Page (Physical Address)

Tag

REQUEST n+2REQUEST n+2

5

Current Operation

Global Timestamp Register (GT) GT=6Wr memory line 4Wr memory line 4

6

19

Test Bed (Bochs + TAXI [Vlaovic &

Davidson, ICCD’02])

Monitor(Stripped Down OS,Security SW, 10MB)

Monitor(Stripped Down OS,Security SW, 10MB)

Linux Network Server

Linux Network Server

Bochs + TAXIBochs + TAXI

Host OS

Network Requests

Server Response

• Run production OS with real service applications, httpd, ftpd, bind, sendmail, etc.

• Recoverability evaluated by applying real x86 remote exploits from security websites.

• Experiment with documented exploits

20

Inter-Request Interval (# of Instructions)

Average Network Request Interval (instructions/per request)

0

500000

1000000

1500000

2000000

2500000

21

I-Cache Miss Rate L1 Miss Rate

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

ftp http bind sendmail imap nfs average

• Code Origin Check reads traces of code read from L2 Cache

• Number of Instructions in the Trace is Proportional to L1 I Cache Miss Rate

• Overhead of monitoring code origin depends on L1 I Cache Miss Rate

22

Monitoring Overhead

Request Response Time Slowdown

0

0.2

0.4

0.6

0.8

1

1.2

23

Sensitivity of Monitoring Queue Size

1

1.1

1.2

1.3

1.4

1.5

1.6

8 16 32 64 128

Queue Size

Queue Size vs. Performance

Slo

wd

ow

n

24

Backup Overhead of Modified Lines

Percentage of Modified Lines Requiring Backup

0%

2%

4%

6%

8%

10%

12%

14%

25

Performance of Recovery + Monitoring

Slowdown of Service Response Time

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

ftpd httpd bind sendmail imap nfs average

Monitor+Backup Monitor+Backup+Rollback

26

Conclusions

• Real time exploit monitoring with autonomic recovery increases revivability and availability.

• Multicore architectures are an ideal candidate for new type of revivable system.

• INDRA-based Multicore system can provide improved reliability and availability.

• More research is required to explore the trade-off between availability, performance, architecture design, and cost.

27

Questions and Answers

http://arch.ece.gatech.edu

Thank you !

top related