tornado: maximizing locality and concurrency in a smmp os

27
Tornado: Maximizing Locality and Concurrency in a SMMP OS

Post on 21-Dec-2015

232 views

Category:

Documents


3 download

TRANSCRIPT

Tornado: Maximizing Locality and Concurrency in a SMMP OS

Contents

Types of Locality Locality: A closer look Requirements for locality Design Basics of Tornado Test Results Conclusion

Types of Locality* Temporal locality

“The concept that a resource that is referenced at one point in time will be referenced again sometime in the near future.”

Spatial locality“The concept that the likelihood of referencing a

resource is higher if a resource near it has been referenced.”

Sequential locality“The concept that memory is accessed

sequentially.”

*Source: Wikipedia

Locality: A closer look, Read only case

bool x = true;while (x) { // Do some work // reading but not // writing x…}

Processor # 1

x

Processor # 2

xCache Cache

xMemory

Locality: A closer look, Read only case

bool x = true;while (x) { // Do some work // reading but not // writing x…}

Processor # 1

x

Processor # 2

x

x

Cache Cache

Memory

Locality: A closer look, Read only case

bool x = true;while (x) { // Do some work // reading but not // writing x…}

Processor # 1

x

Processor # 2

x

x

Cache Cache

Memory

Locality: A closer look, Read only case

bool x = true;while (x) { // Do some work // reading but not // writing x…}

Processor # 1

x

Processor # 2

x

x

Cache Cache

Memory

Notes: No accesses on the bus Because accesses are

reads that are satisfied in local caches and no invalidations are sent

Locality: A closer look, Read/Write case

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x

Processor # 2

x Cache

xMemory

bool x = true;while (x) { x = false; // Do other // work…}

Locality: A closer look, Read/Write case

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x

Processor # 2

x

xMemory

bool x = true;while (x) { x = false; // Do other // work…}

Locality: A closer look, Read/Write case

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x

Processor # 2

x

xMemory

bool x = true;while (x) { x = false; // Do other // work…}

Invalidate block containing x

Locality: A closer look, Read/Write case

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x

Processor # 2

x

xMemory

bool x = true;while (x) { x = false; // Do other // work…}

2. Read request

1. Cache miss

Locality: A closer look, Read/Write case

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x

Processor # 2

x

xMemory

bool x = true;while (x) { x = false; // Do other // work…}

2. Read request

1. Cache miss

3. Data

Locality: A closer look, Read/Write case

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x

Processor # 2

x

xMemory

bool x = true;while (x) { x = false; // Do other // work…}

2. Read request

1. Cache miss

3. Data

4. Write

5. Invalidate block containing x Notes:

x becomes a bottleneck, the valid copy keeps jumping from one cache to the other

Every write access causing invalidation

Almost every read causing a read miss and a bus read

Locality: A closer look, Effect of Cache Line Length

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x,y

Processor # 2

x

Memory

bool y = true;while (y) { y = false; // Do other // work…}

y0x00x4

x,y

Notes: x & y have different

addresses but fall into the same cache line (block)!

Locality: A closer look, Effect of Cache Line Length

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x,y

Processor # 2

x

Memory

bool y = true;while (y) { y = false; // Do other // work…}

y0x00x4

x,y

Notes: Read doesn’t cause

any problem

Locality: A closer look, Effect of Cache Line Length

bool x = true;while (x) { x = false; // Do other // work…}

Processor # 1

x,y

Processor # 2

x

Memory

bool y = true;while (y) { y = false; // Do other // work…}

y0x00x4

x,y

Notes: Remember: Invalidations are

per cache-line/block not word! So we have pretty much the

same behavior as the read/write case on a single variable

Invalidate block containing x & y

Requirements for Locality

Spatial and temporal locality Minimizing read/write and write

sharing Minimize false sharing Minimize the distance between the

accessing processor and the target memory module.

Design Basics for Tornado

Individual resources are individual objects

Clustering objects Protected procedure calls (PPC) Semi-automatic garbage collection

Clustered Objects Appears as a

single object from the outside but is internally split into reps

Each rep handles requests from one or more processors

Lots of advantages to this design

Clustered Objects (cont.) Per-processor

translation tables Partitioned global

translation table Default “miss”

handlers

Protected Procedure Calls Microkernel: relies

on servers to carry on part of the OS job

As many server threads as there are clients

A request is handled on the same processor where it was issued

*Image source: Wikipedia

Garbage Collection

Semi-automatic Makes distinction between

temporary and persistent references to objects

Eliminates the need for two locks to guarantee existence and locking altogether for read only data

Test Results: Effect of rep Count (1)

Test Results: Effect of rep Count (2)

Test Results: Effect of Cache Associativity

Test Results: Tornado vs. Commercial OSes

Conclusion

Tornado performs much better than many commercial OSes

The concept of clustered objects gives it a lot of advantage High locality of data Diminished need for locking Higher degree of sharing,

concurrency and modularity