1 ramp jan’08 raksha & atlas: prototyping & emulation at stanford christos kozyrakis work...

1

RAMP

Jan’08

Raksha & AtlasRaksha & Atlas::Prototyping & Emulation at StanfordPrototyping & Emulation at Stanford

Christos Kozyrakis

work done by S. Wee, N. Njoroge, M. Dalton, H. Kannan

Computer Systems Laboratory

Stanford University

2

RAMP

Jan’08

OutlineOutline

Raksha prototyping security architectures• Raksha goals

• Generations of Raksha prototypes

• Experience & lessons

Atlas emulating transactional memory architectures• Atlas goals

• Architecture overview

• New programmability features


3

RAMP

Jan’08

Raksha GoalsRaksha Goals

Architectural support for software security

1.Protect existing software from attacks Prevent buffer overflows, SQL injections, … Based on dynamic information flow tracking (DIFT)

2.Reduce trusted code base (TCB) for new software Simplify design & verification of security guarantees Using word-granularity protection on physical memory

Robust, flexible, practical, end-to-end, fast

4

RAMP

Jan’08

Raksha Architecture, Version 1Raksha Architecture, Version 1

Policy Decode

Tag ALU

Tag Check

PC

Decode D-CacheRegFile ALUI-Cache TrapsWB

Modified Sparc V8 processor (Leon)• 4 programmable security policies using 4-bits/word

• User-level handling of security exceptions

• +7% logic, +0% clock cycle time over base design

Full Linux distribution with > 120 software packages 1st DIFT architecture to detect high-level attacks on binaries

• Have shared this design with 3 other institutions so far…

5

RAMP

Jan’08

Raksha Architecture, Version 2Raksha Architecture, Version 2

Small off-core coprocessor for all DIFT functionality + state• Can be reused across multiple chips

Requires minimal changes to main processor core• <1% for our Sparc V8 processor

Same security features as original architecture • 8% performance overhead for SpecInt2000

Processor

Core

I Cache D Cache

ROB

Policy Decode

Tag ALU

Tag CheckTag

Cache

Tag RF

W B

DIFT Coprocessor

PC, Inst, Address

Security exception

L2 Cache

6

RAMP

Jan’08

Raksha Architecture, Version 3 (Loki)Raksha Architecture, Version 3 (Loki)

Supports fine-grain permission check on physical memory• All words associated with a 32-bit tag

• Permission table provides access rights for different tags

• Trusted SW specifies permissions; HW enforces them Independently from OS; checks on device accesses as well

Reduces TCB of a full OS down to 5KLOC• Invariant: malicious user/kernel code cannot access data without permission

• Virtual memory & all device drivers outside of the TCB

PC

DecodeD-Cache

RegFile ALUI-Cache

TrapsWB

I-TLB

P-cache

D-TLB

P-cache

7

RAMP

Jan’08

Experience & LessonsExperience & Lessons

HW: a stable starting point is critical• Despite deficiencies, Leon has been a reasonable base

• Good compromise of size, performance, flexibility, support Even for ISA-level research

• Can we match this with upcoming RAMP models?

SW: full system is important (full OS + devices)• Enables experimentation with wide range of apps

• Increases credibility of results

• What is the OS story for RAMP models?

System: need low-cost board option• Makes it easier to attract collaborators & disseminate design

• What is the replacement plan for XUPv5?

8

RAMP

Jan’08

Repeat outlineRepeat outline

Raksha prototyping security architectures• Raksha goals

• Generations of Raksha prototypes


Atlas emulating transactional memory architectures• Atlas goals

• Architecture overview

• New programmability features


9

RAMP

Jan’08

Atlas GoalsAtlas Goals

Fast: at speed experiments with hardware TM

• ~100x faster than simulator

Comfortable: full-system environment

• Full Linux OS

• Integration with standard debugging tools

Easy-to-use: rich support for programmability

• Automatic detection of performance bottlenecks

• Deterministic replay

• Automatic detection of atomicity bugs

10

RAMP

Jan’08

ATLAS Hardware ArchitectureATLAS Hardware Architecture

9-way CMP with hardware support for TM• TM support builds upon private caches & coherence protocol

• One processor dedicated for system code

• Uses hardcore PowerPC codes in user & control FPGAs in BEE2

TCC PPC 0

TCC PPC 1

I/O

Linux PPC

TCC PPC 2

TCC PPC 3

TCC PPC 4

TCC PPC 5

TCC PPC 6

TCC PPC 7

Control Switch

MainMemory

User Switch

User Switch

User Switch

User Switch

11

RAMP

Jan’08

ATLAS Software ArchitectureATLAS Software Architecture

Application (OpenMP+TM)

TM API ATLAS Profiler

ATLAS Runtime System

Linux OS

ATLAS HW on BEE2

High-level application development

• OpenMP + TM, (Java + TM), …

High-level application debugging

• Gdb based for common & new features (e.g., infinite watchpoints)

12

RAMP

Jan’08

Deterministic Replay with ReplayTDeterministic Replay with ReplayT

A critical tool for multiprocessor debugging• Small system variations can mask bugs

ReplayT: record & replay transaction commit order• Sufficient for TCC’s “all transaction, all the time” execution model

Serializable commit order captures all thread interactions

• Minimal runtime & space overhead (1 byte/transaction)

Logging phase Replay phase

Commit

time time

LOG:

T0

T1

T2 T2

write-set

T0

T1

T2 T2

Commit protocolreplays loggedcommit order

T0 T1 T2

Computation Arbitration Commit Abort

13

RAMP

Jan’08

-4

-2

0

2

4

6

8

10

12

14

16

18

20

1p

2p

4p

8p

1p

2p

4p

8p

1p

2p

4p

8p

1p

2p

4p

8p

1p

2p

4p

8p

1p

2p

4p

8p

1p

2p

4p

8p

1p

2p

4p

8p

vacation kmeans genome delaunay labyrinth radix mp3d ocean

Ru

ntim

e O

verh

ead

(%

)

ReplayT Runtime Overhead (logging phase)ReplayT Runtime Overhead (logging phase)

Average slowdown is 1.05%

Can continuously log on production runs

14

RAMP

Jan’08

ReplayT ExtensionsReplayT Extensions

Unique replay

• Problem: maximize usefulness of test runs

• Approach: shuffle commit order to generate unique scenarios

Replay with monitoring code

• Problem: replay accuracy after recompilation

• Approach: faithfully repeat commit order if binary changes E.g., printf statements inserted for monitoring purposes

Cross-platform replay

• Problem: debugging on multiple platforms

• Approach: support for replaying log across platforms & ISAs

15

RAMP

Jan’08

Atomicity Bug DetectionAtomicity Bug Detection

Problem: user breaks an atomic task as two transactions

• Hard to pinpoint problem even with replay

The AVIO proposal [Lu et al. @ ASPLOS’06]

• Unserializable access interleavings are likely bugs

• Whitelist unserializable interleavings from correct runs Performed during application testing

• AVIO challenges Long & intrusive data collection phase Long analysis phase Corner cases (false positives & false negatives)

16

RAMP

Jan’08

Atomicity Bug Detection on ATLASAtomicity Bug Detection on ATLAS

Based on the general approach of AVIO but

• Fast & non-intrusive data collection Single log for each address accessed in transaction

Log collected during deterministic replay

• Fast analysis Interleavings examined at transaction granularity

• More accurate analysis Eliminated false-negatives due to intermediate writes

17

RAMP

Jan’08

Experience & LessonsExperience & Lessons

HW: need multiple grades of hardware modeling• Enable fast prototyping of new ISA & HW features

Even if timing or other details not exactly accurate

• Atlas experience: 40+ tutorial participants enjoyed using new features in a timing “inaccurate” system

SW: full system is important (full OS + devices)• Enables experimentation with wide range of apps

System: need low-cost board option• Makes it easier to attract collaborators & disseminate design

Scalability: need access to multiple boards• Students will not scale design until 2nd board arrives

ISA: unfortunately, the key to more sharing of HW & SW models• Difficult to share across ISAs due to differences in specification, interfaces, etc• Should RAMP simply adapt Sparc?

18

RAMP

Jan’08

Questions?Questions?

1 ramp jan’08 raksha & atlas: prototyping & emulation at stanford christos kozyrakis work...

Documents