taming non-blocking caches to improve isolation in multicore...

14
Taming Non-blocking Caches to Improve Isolation in Multicore Real-time Systems (RTAS 2016) Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Taming Non-blocking Caches to Improve

Isolation in Multicore Real-time Systems

(RTAS 2016)

Prathap Kumar Valsan, Heechul Yun, Farzad FarshchiUniversity of Kansas

Page 2: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Multicore Processors in Real-Time Systems● Real-Time systems need increased performance as they become more

intelligent○ Computer Vision○ Collision Avoidance

● Real-Time systems still need high levels of predictability in order to be effective and safe

Page 3: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Time Predictability in Multicore Processors● Multicore processors are less

predictable than single core because of shared resources○ Lowest Level Cache (LLC)○ Bus Interface

● Out-of-order cores using non-blocking caches also share Miss Status Holding Registers (MSHRs)

LLC

http://www.cse.wustl.edu/~jain/cse567-11/ftp/multcore/

Page 4: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Page Coloring for Cache Partitioning● Cache Partitioning is used to prevent cores from interfering with other cores

shared cache space

● Partitioning is done through page coloring○ Implemented in either hardware or software

○ Allocates non-overlapping partitions of LLC to cores

● Prevents unpredictable cache-line evictions being caused by other cores

● This does NOT fully isolate cores with non-blocking caches

Page 5: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Non-blocking Cache● Memory Level Parallelism (MLP)

○ Ability to handle multiple memory operations concurrently

● Continue to serve cache hits even as cache misses are waiting to be served

● Miss Status Holding Registers (MSHRs)○ Cache Miss - allocate MSHR entry to “pend” the memory operation until it can be fulfilled

○ Data Received - clear entry from MSHR

● The number of MSHRs available determines the Memory Level Parallelism of

the cache

Page 6: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

MSHR Contention● MSHRs of the shared LLC are also shared by the cores

● If all MSHRs are full:○ The cache becomes blocked

○ Memory operations (including cache hits) will be blocked until free MSHRs become available

● Cache partitioning does not prevent MSHR contention

Core MSHR Request

L1 Cache Miss

LLC Inaccessible

Until MSHR is Available

All MSHRsOccupied

Page 7: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

MSHR Contention as a Source of Interference● Performance of the subject is

measured independently and with co-runners

● Unwanted cache-line evictions are prevented by page coloring

● If page coloring is sufficient for isolation, co-runners should not affect the performance of the subject

Core 1 (Subject)

Core 2(Co-runner)

Core 3(Co-runner)

Core 4(Co-runner)

Partitioned LLC

DRAM

Page 8: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Testing Platforms

Page 9: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Results

LLC : All memory accesses are LLC hitsDRAM : All memory accesses are LLC missesLatency : Has data dependencies that cause it to only generate one outstanding request at a timeBwRead : Has no data dependencies so it can generate multiple outstanding requests at a time

Page 10: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Results● Number of Global MSHRs relative to Local MSHRs significantly impacts the

amount of contention between the cores● MSHR setting : (Local MSHRs / Global MSHRs)

Page 11: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Proposition● Dynamically controlling the MSHRs will improve isolation of the cores● Add “Target Count” and “Valid Count” registers to the local cache MSHRs● This allows the OS to control each core’s MLP independently

Page 12: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Implementation

● Utilized GEM5 cycle-accurate simulator

If the next task is a real-time task, configure TargetCount register of core to reserve appropriate MSHR slots for the task

If no currently running tasks require MSHR reservations then the TargetCount of each core is reset to the maximum

Occurs upon context switch in a core

Any remaining (unreserved) MSHR slots are distributed across the cores to be utilized for best-effort processes

Page 13: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Evaluation● BwWrite(DRAM) is run on each core as a

“best-effort” task

● Periodic EEMBC benchmarks with computation times of ~8ms are used for the “real-time” tasks with periods

○ Core1 : 20ms ○ Core2 : 30ms

○ Core3 : 40ms ○ Core4 : 60ms

● Real-time tasks see an improvement of up to 20% due to reduction in MSHR contention

● Best-effort tasks suffer a 3% throughput reduction

Page 14: Taming Non-blocking Caches to Improve Isolation in Multicore …heechul/courses/eecs753/S17/slides/W5.2... · 2017-02-16 · Taming Non-blocking Caches to Improve Isolation in Multicore

Questions?● What are some real-time systems that could benefit from this architecture?● Why don’t multicore processors currently allow control over MSHR allocation?● What is a remaining source of contention?● Why is the implementation of page coloring in the system a prerequisite for

performing these experiments accurately?