1 ramp jan’08 raksha & atlas: prototyping & emulation at stanford christos kozyrakis work...
TRANSCRIPT
1
RAMP
Jan’08
Raksha & AtlasRaksha & Atlas::Prototyping & Emulation at StanfordPrototyping & Emulation at Stanford
Christos Kozyrakis
work done by S. Wee, N. Njoroge, M. Dalton, H. Kannan
Computer Systems Laboratory
Stanford University
2
RAMP
Jan’08
OutlineOutline
Raksha prototyping security architectures• Raksha goals
• Generations of Raksha prototypes
• Experience & lessons
Atlas emulating transactional memory architectures• Atlas goals
• Architecture overview
• New programmability features
• Experience & lessons
3
RAMP
Jan’08
Raksha GoalsRaksha Goals
Architectural support for software security
1.Protect existing software from attacks Prevent buffer overflows, SQL injections, … Based on dynamic information flow tracking (DIFT)
2.Reduce trusted code base (TCB) for new software Simplify design & verification of security guarantees Using word-granularity protection on physical memory
Robust, flexible, practical, end-to-end, fast
4
RAMP
Jan’08
Raksha Architecture, Version 1Raksha Architecture, Version 1
Policy Decode
Tag ALU
Tag Check
PC
Decode D-CacheRegFile ALUI-Cache TrapsWB
Modified Sparc V8 processor (Leon)• 4 programmable security policies using 4-bits/word
• User-level handling of security exceptions
• +7% logic, +0% clock cycle time over base design
Full Linux distribution with > 120 software packages 1st DIFT architecture to detect high-level attacks on binaries
• Have shared this design with 3 other institutions so far…
5
RAMP
Jan’08
Raksha Architecture, Version 2Raksha Architecture, Version 2
Small off-core coprocessor for all DIFT functionality + state• Can be reused across multiple chips
Requires minimal changes to main processor core• <1% for our Sparc V8 processor
Same security features as original architecture • 8% performance overhead for SpecInt2000
Processor
Core
I Cache D Cache
ROB
Policy Decode
Tag ALU
Tag CheckTag
Cache
Tag RF
W B
DIFT Coprocessor
PC, Inst, Address
Security exception
L2 Cache
6
RAMP
Jan’08
Raksha Architecture, Version 3 (Loki)Raksha Architecture, Version 3 (Loki)
Supports fine-grain permission check on physical memory• All words associated with a 32-bit tag
• Permission table provides access rights for different tags
• Trusted SW specifies permissions; HW enforces them Independently from OS; checks on device accesses as well
Reduces TCB of a full OS down to 5KLOC• Invariant: malicious user/kernel code cannot access data without permission
• Virtual memory & all device drivers outside of the TCB
PC
DecodeD-Cache
RegFile ALUI-Cache
TrapsWB
I-TLB
P-cache
D-TLB
P-cache
7
RAMP
Jan’08
Experience & LessonsExperience & Lessons
HW: a stable starting point is critical• Despite deficiencies, Leon has been a reasonable base
• Good compromise of size, performance, flexibility, support Even for ISA-level research
• Can we match this with upcoming RAMP models?
SW: full system is important (full OS + devices)• Enables experimentation with wide range of apps
• Increases credibility of results
• What is the OS story for RAMP models?
System: need low-cost board option• Makes it easier to attract collaborators & disseminate design
• What is the replacement plan for XUPv5?
8
RAMP
Jan’08
Repeat outlineRepeat outline
Raksha prototyping security architectures• Raksha goals
• Generations of Raksha prototypes
• Experience & lessons
Atlas emulating transactional memory architectures• Atlas goals
• Architecture overview
• New programmability features
• Experience & lessons
9
RAMP
Jan’08
Atlas GoalsAtlas Goals
Fast: at speed experiments with hardware TM
• ~100x faster than simulator
Comfortable: full-system environment
• Full Linux OS
• Integration with standard debugging tools
Easy-to-use: rich support for programmability
• Automatic detection of performance bottlenecks
• Deterministic replay
• Automatic detection of atomicity bugs
10
RAMP
Jan’08
ATLAS Hardware ArchitectureATLAS Hardware Architecture
9-way CMP with hardware support for TM• TM support builds upon private caches & coherence protocol
• One processor dedicated for system code
• Uses hardcore PowerPC codes in user & control FPGAs in BEE2
TCC PPC 0
TCC PPC 1
I/O
Linux PPC
TCC PPC 2
TCC PPC 3
TCC PPC 4
TCC PPC 5
TCC PPC 6
TCC PPC 7
Control Switch
MainMemory
User Switch
User Switch
User Switch
User Switch
11
RAMP
Jan’08
ATLAS Software ArchitectureATLAS Software Architecture
Application (OpenMP+TM)
TM API ATLAS Profiler
ATLAS Runtime System
Linux OS
ATLAS HW on BEE2
High-level application development
• OpenMP + TM, (Java + TM), …
High-level application debugging
• Gdb based for common & new features (e.g., infinite watchpoints)
12
RAMP
Jan’08
Deterministic Replay with ReplayTDeterministic Replay with ReplayT
A critical tool for multiprocessor debugging• Small system variations can mask bugs
ReplayT: record & replay transaction commit order• Sufficient for TCC’s “all transaction, all the time” execution model
Serializable commit order captures all thread interactions
• Minimal runtime & space overhead (1 byte/transaction)
Logging phase Replay phase
Commit
time time
LOG:
T0
T1
T2 T2
write-set
T0
T1
T2 T2
Commit protocolreplays loggedcommit order
T0 T1 T2
Computation Arbitration Commit Abort
13
RAMP
Jan’08
-4
-2
0
2
4
6
8
10
12
14
16
18
20
1p
2p
4p
8p
1p
2p
4p
8p
1p
2p
4p
8p
1p
2p
4p
8p
1p
2p
4p
8p
1p
2p
4p
8p
1p
2p
4p
8p
1p
2p
4p
8p
vacation kmeans genome delaunay labyrinth radix mp3d ocean
Ru
ntim
e O
verh
ead
(%
)
ReplayT Runtime Overhead (logging phase)ReplayT Runtime Overhead (logging phase)
Average slowdown is 1.05%
Can continuously log on production runs
14
RAMP
Jan’08
ReplayT ExtensionsReplayT Extensions
Unique replay
• Problem: maximize usefulness of test runs
• Approach: shuffle commit order to generate unique scenarios
Replay with monitoring code
• Problem: replay accuracy after recompilation
• Approach: faithfully repeat commit order if binary changes E.g., printf statements inserted for monitoring purposes
Cross-platform replay
• Problem: debugging on multiple platforms
• Approach: support for replaying log across platforms & ISAs
15
RAMP
Jan’08
Atomicity Bug DetectionAtomicity Bug Detection
Problem: user breaks an atomic task as two transactions
• Hard to pinpoint problem even with replay
The AVIO proposal [Lu et al. @ ASPLOS’06]
• Unserializable access interleavings are likely bugs
• Whitelist unserializable interleavings from correct runs Performed during application testing
• AVIO challenges Long & intrusive data collection phase Long analysis phase Corner cases (false positives & false negatives)
16
RAMP
Jan’08
Atomicity Bug Detection on ATLASAtomicity Bug Detection on ATLAS
Based on the general approach of AVIO but
• Fast & non-intrusive data collection Single log for each address accessed in transaction
Log collected during deterministic replay
• Fast analysis Interleavings examined at transaction granularity
• More accurate analysis Eliminated false-negatives due to intermediate writes
17
RAMP
Jan’08
Experience & LessonsExperience & Lessons
HW: need multiple grades of hardware modeling• Enable fast prototyping of new ISA & HW features
Even if timing or other details not exactly accurate
• Atlas experience: 40+ tutorial participants enjoyed using new features in a timing “inaccurate” system
SW: full system is important (full OS + devices)• Enables experimentation with wide range of apps
System: need low-cost board option• Makes it easier to attract collaborators & disseminate design
Scalability: need access to multiple boards• Students will not scale design until 2nd board arrives
ISA: unfortunately, the key to more sharing of HW & SW models• Difficult to share across ISAs due to differences in specification, interfaces, etc• Should RAMP simply adapt Sparc?