don't shoot down tlb shootdowns! - eurosys 2020 · 2020. 4. 29. · don't shoot down tlb...

Don't shoot down TLB shootdowns! Nadav Amit, Amy Tai,Michael Wei

April 2020

Virtual Address

Translation Lookaside Buffer (TLB)

TLB = cache for virtual to physical address translations

PGD PUDPMD

TLBPage-Tables

VAàPA

TLB Coherency

Hardware does not maintain TLBs coherent

The problem is left for software (OS)

TLBincoherent

PTEs TLB

VAàPA VAàPA’ VAàPA’’

incoherent

TLB Shootdown (in Linux)

initiator

responder

Challenge

TLB shootdowns are expensive.

How can we further optimize them?

This work focus on:• Linux/x86 – common lessons• Userspace mappings – common case

Lessons are relevant to other environments

Existing Solutions

Hardware based TLB invalidations• Not available on all architectures

• Does not coexist (yet) with software techniques:– No selective target cores for TLB invalidation

Software solutions• Replicating page-tables [RadixVM, Clements’13]

– Can increase overhead with low-latency IPIs

• Aggressive batching [LATR, Kumar’18]– Breaks POSIX semantics

TLB Flushes in Linux and FreeBSD

initiator

responder

busy-wait

Optimization 1: Concurrent Flushes (forgotten lesson)

initiator

RP3 TLB consistency algorithm [Rosenburg’89]

responder

TLB Shootdown Responder

Page Table Isolation

Optimization 2: Cacheline Consolidation

SMP info

TLB flush info

memoryEntry

Optimization 3: Early Acknowledgment

Safe: flush will happenBetter: Initiator is faster

Optimization 4: In-Context Flushes

1. Efficient2. Better batching

In the Paper

Userspace-safe batching• Deferring TLB shootdowns while the kernel runs

Avoiding TLB flushes on Copy-on-Write• Special case we can optimize

TLB flushes in virtualization• The effect of page size mismatch

Many important and subtle details

Evaluation: Unmapping and Flushing 10 PTEsmadvise(MADV_DONTNEED)

samecore samesocket diffsocket

baseconcurrentcachelineearly-ackin-context

samecore samesocket diffsocket

Initiator Responder

Evaluation: SysBench – Random Writes

0 5 10 15 20 25

threads [#]

baseconcurrent

cacheline-consolearly-ack

in-context flushesuserspace-safe batchingRandom writes

Periodic flushes

Memory-mapped file

Emulated persistent memory, no write-cache

Conclusions

TLB shootdown can be improved

Doing it well in software è better hardware interfaces

We are working to push these enhancements to Linux

don't shoot down tlb shootdowns! - eurosys 2020 · 2020. 4. 29. · don't shoot down tlb...

Documents

proposed rights issue of irredeemable ... - listed...

tlb all models ops revision proto - welcome to … ·...

viney catalogue.pdf · suprajit switch ... manesar tlb 2801...

effect of context aware scheduler on tlb

conversionboost 2015 tlb

tlb profi manual en

hybrid tlb coalescing: improving tlb translation coverage...

tlb normenvergleich e 08092011 (1)

tlb final 6 month

tlb 25k parts pm65

tlb-325 tlb-225 tlb-220 -...

tlb all models ops revision...

all allmand tlbs€¦ · 1 tlb 220 and tlb 325 assembly...

eksamen-tlb+251-kandidatoppgave-heidi+nordberg_prcent_2c+

tlb 2015:2016

paging hardware with tlb

tlb-ps61c and tlb-ps801c power supplies...

copy of tlb shortage list july-12

virtual memory -...

tlb feuerverzinkt stahlb e 20022012