ph.d. thesis presentation

27
1 Operating System Management of Shared Caches on Multicore Processors -- Ph.D. Thesis Presentation -- Apr. 20, 2010 David Tam Supervisor: Michael Stumm

Upload: davidkftam

Post on 04-Dec-2014

3.103 views

Category:

Documents


0 download

DESCRIPTION

Slides from my Ph.D. thesis presentation."Operating System Management of Shared Caches on Multicore Processors"

TRANSCRIPT

Page 1: Ph.D. thesis presentation

1

Operating System Managementof Shared Caches

on Multicore Processors

-- Ph.D. Thesis Presentation --

Apr. 20, 2010

David Tam

Supervisor: Michael Stumm

Page 2: Ph.D. thesis presentation

2

Multicores Today

Multicores are Ubiquitous● Unexpected by most software developers● Software support is lacking (e.g., OS)

General Role of OS ● Manage shared hardware resources

New Candidate● Shared cache: performance critical● Focus of thesis

Cache Cache

Shared Cache

...

...

Page 3: Ph.D. thesis presentation

3

ThesisOS should manage on-chip shared cachesof multicore processors

Demonstrate:● Properly managing shared caches at OS level

can increase performance

Management Principles1. Promote sharing

● For threads that share data● Maximize major advantage of shared caches

2. Provide isolation● For threads that do not share data● Minimize major disadvantage of shared caches

Supporting Role● Provision the shared cache online

Page 4: Ph.D. thesis presentation

4

#1 - Promote SharingProblem: Cross-chip accesses are slowSolution: Exploit major advantage of shared caches:

Fast access to shared dataOS Actions: Identify & localize data sharingView: Match software sharing to hardware sharing

Thread A Thread B

Shared Data Traffic

Shared Data Shared Data

Chip A Chip B

L2 L2

Page 5: Ph.D. thesis presentation

5

#1 - Promote Sharing

Chip A Chip BThread A

Thread B

L2 L2Shared Data

Problem: Cross-chip accesses are slowSolution: Exploit major advantage of shared caches:

Fast access to shared dataOS Actions: Identify & localize data sharingView: Match software sharing to hardware sharing

Page 6: Ph.D. thesis presentation

6

Identify Data Sharing● Detect sharing online with hardware performance counters

● Monitor remote cache accesses (data addresses)● Track on a per-thread basis● Data addresses are memory regions shared with other threads

Localize Data Sharing● Identify clusters of threads that access same memory regions● Migrate threads of a cluster onto same chip

Thread A Thread B

Shared Data Traffic

Shared Data Shared Data

Chip A Chip B

L2 L2

Thread Clustering [EuroSys'07]

Page 7: Ph.D. thesis presentation

7

Chip A Chip BThread A

Thread B

L2 L2Shared Data

Identify Data Sharing● Detect sharing online with hardware performance counters

● Monitor remote cache accesses (data addresses)● Track on a per-thread basis● Data addresses are memory regions shared with other threads

Localize Data Sharing● Identify clusters of threads that access same memory regions● Migrate threads of a cluster onto same chip

Thread Clustering [EuroSys'07]

Page 8: Ph.D. thesis presentation

8

Visualization of Clusters

{16threads

● SPECjbb 2000● 4 warehouses, 16 threads per warehouse

● Threads have been sorted by cluster for visualization

Memory Regions

Threads

Sharing IntensityHighMediumLowNone

0 264

(Virtual Address)

Page 9: Ph.D. thesis presentation

9

{16threads

Memory Regions

Threads

Sharing IntensityHighMediumLowNone

Memory Regions0 264

(Virtual Address)

Visualization of Clusters● SPECjbb 2000

● 4 warehouses, 16 threads per warehouse

● Threads have been sorted by cluster for visualization

Page 10: Ph.D. thesis presentation

10

Performance Results

● Multithreaded commercial workloads● RUBiS, VolanoMark, SPECjbb2k

● 8-way IBM POWER5 Linux system● 22%, 32%, 70% reduction in stalls caused by

cross-chip accesses● 7%, 5%, 6% performance improvement

● 32-way IBM POWER5+ Linux system● 14% SPECjbb2k potential improvement

36 MB

4 GB

1.9MB L2 1.9MB L236 MB

4 GB

Page 11: Ph.D. thesis presentation

11

#2 – Provide Isolation

Apache

MySQL

Problem: Major disadvantage of shared cachesCache space interference

Solution: Provide cache space isolation between applicationsOS Actions: Enforce isolation during physical page allocationView: Partition into smaller private caches

Page 12: Ph.D. thesis presentation

12

#2 – Provide Isolation

Apache

MySQL

Problem: Major disadvantage of shared cachesCache space interference

Solution: Provide cache space isolation between applicationsOS Actions: Enforce isolation during physical page allocationView: Partition into smaller private caches

Page 13: Ph.D. thesis presentation

13

#2 – Provide Isolation

Apache

MySQL

Problem: Major disadvantage of shared cachesCache space interference

Solution: Provide cache space isolation between applicationsOS Actions: Enforce isolation during physical page allocationView: Partition into smaller private caches

Page 14: Ph.D. thesis presentation

14

#2 – Provide Isolation

Apache

MySQL

Boundary

Problem: Major disadvantage of shared cachesCache space interference

Solution: Provide cache space isolation between applicationsOS Actions: Enforce isolation during physical page allocationView: Partition into smaller private caches

Page 15: Ph.D. thesis presentation

15

#2 – Provide Isolation

Apache

MySQL

Problem: Major disadvantage of shared cachesCache space interference

Solution: Provide cache space isolation between applicationsOS Actions: Enforce isolation during physical page allocationView: Partition into smaller private caches

Boundary

Page 16: Ph.D. thesis presentation

16

Cache Partitioning● Apply page-coloring technique● Guide physical page allocation to control cache line usage● Works on existing processors

Physical PagesColor A

Color A

Color A

}Color A(N sets)

L2 Cache{

Virtual Pages

Application

Fixed Mapping(Hardware)

OS Managed

[WIOSCA'07]

Page 17: Ph.D. thesis presentation

17

Physical PagesColor A

Color A

Color A

}Color A(N sets)

L2 Cache{

Virtual Pages

Application A

Fixed Mapping(Hardware)

OS Managed

Virtual Pages

Application B

Color B

Color B

Color B

}Color B(N sets)

{

● Apply page-coloring technique● Guide physical page allocation to control cache line usage● Works on existing processors

Cache Partitioning [WIOSCA'07]

Page 18: Ph.D. thesis presentation

18

Impact of Partitioning

PerformanceWithoutIsolation

16 14 12 10 8 6 4 2 0 artmcf

L2 Cache Sizes (# of Colors)

art

mcf

Performance of Other Combos● 10 pairs of applications: SPECcpu2k, SPECjbb2k

● 4% to 17% improvement (36MB L3 cache)● 28%, 50% improvement (no L3 cache)

Page 19: Ph.D. thesis presentation

190 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100Application X

Allocated Cache Size (%)

Mis

s R

ate

(%)

Provisioning the CacheProblem: How to determine cache partition size

Solution: Use L2 cache miss rate curve (MRC) of applicationCriteria: Obtain MRC rapidly, accurately, online, with low overhead,

on existing hardware OS Actions: Monitor L2 cache accesses

using hardware performance counters

Page 20: Ph.D. thesis presentation

20

Design● Upon every L2 access:

● Update sampling register with data address● Trigger interrupt to copy register to trace log in main memory

● Feed trace log into Mattson's stack algorithm [1970]to obtain L2 MRC

Results● Workloads

● 30 apps from SPECcpu2k, SPECcpu2k6, SPECjbb2k● Latency

● 227 ms to generate online L2 MRC● Accuracy

● Good, e.g. up to 27% performance improvement when applied to cache partitioning

RapidMRC [ASPLOS'09]

Page 21: Ph.D. thesis presentation

21

xalancbmk

● Execution slice at 10 billion instructions

Cache Size (# colors)

Mis

s R

ate

(MP

KI)

jbb

mcf 2k

gzip mgrid

Accuracy of RapidMRC

ammp

Page 22: Ph.D. thesis presentation

22

Performance WithoutIsolationRapidMRC Real MRC

L2 Cache Sizes (# of colors)16 14 12 10 8 6 4 2 0

twolfequake

0 2 4 6 8 10 12 14 16

Effectiveness on Provisioning

Performance of Other Combos Using RapidMRC● 12% improvement for vpr+applu● 14% improvement for ammp+3applu

Page 23: Ph.D. thesis presentation

23

ContributionsOn commodity multicores, first to demonstrate● Mechanism: To detect data sharing online & automatically cluster threads● Benefits: Promoting sharing [EuroSys'07]

● Mechanism: To partition shared cache by applying page-coloring● Benefits: Providing isolation [WIOSCA'07]

● Mechanism: To approximate L2 MRCs online in software● Benefits: Provisioning the cache [ASPLOS'09]

...all performed by the OS.

Page 24: Ph.D. thesis presentation

24

Concluding RemarksDemonstrated Performance Improvements● Promoting Sharing

● 5% – 7% SPECjbb2k, RUBiS, VolanoMark (2 chips)● 14% potential: SPECjbb2k (8 chips)

● Providing Isolation● 4% – 17% 8 combos: SPECcpu2k, SPECjbb2k (36MB L3 cache)● 28%, 50% 2 combos: SPECcpu2k (no L3 cache)

● Provisioning the Cache Online● 12% – 27% 3 combos: SPECcpu2k

OS should manage on-chip shared caches

Page 25: Ph.D. thesis presentation

25

Thank You

Page 26: Ph.D. thesis presentation

26

24-9=15 slides

Page 27: Ph.D. thesis presentation

27

Future Research OpportunitiesShared cache management principles can be applied to other layers:

● Application, managed runtime, virtual machine monitor

Promoting sharing● Improve locality on NUMA multiprocessor systems

Providing isolation● Finer granularity, within one application [MICRO'08]

● Regions● Objects

RapidMRC● Online L2 MRCs

● Reducing energy● Guiding co-scheduling

● Underlying Tracing Mechanism● Trace other hardware events