redundant memory mappings for fast access to large memories vasileios karakostas, jayneel gandhi,...

38
Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. Mckinley, Mario Nemirovsky, Michael M. Swift, Osman S. Ünsal

Upload: shon-carpenter

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Redundant Memory Mappings for Fast Access to Large Memories

Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. Mckinley,

Mario Nemirovsky, Michael M. Swift, Osman S. Ünsal

Page 2: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Executive Summary

• Problem: Virtual memory overheads are high (up to 41%)• Proposal: Redundant Memory Mappings– Propose compact representation called range translation– Range Translation – arbitrarily large contiguous mapping– Effectively cache, manage and facilitate range translations– Retain flexibility of 4KB paging

• Result:– Reduces overheads of virtual memory to less than 1%

2

Page 3: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Outline

Motivation Virtual Memory Refresher + Key Technology TrendsPrevious ApproachesGoals + Key Observation

Design: Redundant Memory MappingsResultsConclusion

3

Page 4: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Virtual Memory Refresher

4

TLB(Translation Lookaside Buffer)

Proc

ess

1Pr

oces

s 2

Virtual Address Space

Physical Memory

Page Table

Challenge: How to reduce costly

page walks?

Page 5: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Two Technology Trends

5*Inflation-adjusted 2011 USD, from: jcmit.com

1980 1985 1990 1995 2000 2005 2010 20150

0

0

1

10

100

1,000

10,000 Memory capacity for $10,000*

Years

Mem

ory

size

MB

GB

TB

1

10

100

1

10

100

1

10

TLB reach is limited

Year Processor L1 DTLB entries

1999 Pent. III 722001 Pent. 4 642008 Nehalem 962012 IvyBridge 1002015 Broadwell 100

Page 6: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

0. Page-based Translation

6

Virtual Memory

VPN0 PFN0TLB

Physical Memory

Page 7: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

1. Multipage Mapping

7

Virtual Memory

Clustered TLB

[ASPLOS’94, MICRO’12 and HPCA’14]

Physical Memory

Sub-blocked TLB/CoLTVPN(0-3) PFN(0-3) BitmapMap

Page 8: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

2. Large Pages

8

Virtual Memory

Physical Memory

[Transparent Huge Pages and libhugetlbfs]

VPN0 PFN0

Large Page TLB

Page 9: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

3. Direct Segments

9

Virtual Memory

Direct Segment(BASE,LIMIT) OFFSET

BASE LIMIT

OFFSET

[ISCA’13 and MICRO’14]

Physical Memory

Page 10: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Can we get best of many worlds?Multipage Mapping Large Pages Direct Segments Our

ProposalFlexible alignment

Arbitrary reach

Multiple entries

Transparent to applications

Applicable to all workloads

10

Page 11: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Key Observation

11

Virtual Memory

Physical Memory

Page 12: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Key Observation

12

Virtual Memory

1. Large contiguous regions of virtual memory2. Limited in number: only a few handful

Physical Memory

Code Heap Stack Shared Lib.

Page 13: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Compact Representation: Range Translation

13

Virtual Memory

Physical Memory

BASE1 LIMIT1

OFFSET1Range

Translation 1

Range Translation: is a mapping between contiguous virtual pages mapped to contiguous physical pages with uniform protection

Page 14: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Redundant Memory Mappings

14

Virtual Memory

Physical Memory

Range Translation 1

Range Translation 2

Range Translation 3

Range Translation 4

Range Translation 5

Map most of process’s virtual address space redundantly with modest number of range translations in addition to page mappings

Page 15: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Outline

MotivationDesign: Redundant Memory Mappings

A. Caching Range TranslationsB. Managing Range TranslationsC. Facilitating Range Translations

ResultsConclusion

15

Page 16: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

A. Caching Range Translations

16

V47 …………. V12

P47 …………. P12

L1 DTLB

L2 DTLB Range TLB

Page Table WalkerEnhanced Page Table Walker

Page 17: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

A. Caching Range Translations

17

Hit

V47 …………. V12

P47 …………. P12

L1 DTLB

Range TLB

Enhanced Page Table Walker

L2 DTLB

Page 18: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

A. Caching Range Translations

18

Miss

V47 …………. V12

P47 …………. P12

L1 DTLB

Range TLB

Enhanced Page Table Walker

L2 DTLBHit

Refill

Page 19: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

A. Caching Range Translations

19

Miss

V47 …………. V12

P47 …………. P12

L1 DTLB

Range TLB

Enhanced Page Table Walker

L2 DTLB Hit

Refill

Page 20: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

A. Caching Range Translations

20

Miss

V47 …………. V12

P47 …………. P12

L1 DTLB

Range TLBL2 DTLB Hit

RefillEntry 1

BASE 1 LIMIT 1≤ >

Entry NBASE N LIMIT N≤ >

OFFSET 1 Protection 1

OFFSET N Protection N

L1 TLB Entry Generator Logic:(Virtual Address + OFFSET) Protection

Page 21: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

A. Caching Range Translations

21

Miss

V47 …………. V12

P47 …………. P12

L1 DTLB

Range TLB

Enhanced Page Table Walker

L2 DTLBMiss Miss

Page 22: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

B. Managing Range Translations

• Stores all the range translations in a OS managed structure• Per-process like page-table

22

Range Table

CR-RTRTC RTD RTF RTG

RTA RTB RTE

Page 23: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

B. Managing Range Translations

23

A) Page Table B) Range Table

C) Both A) and B) D) Either?

On a L2+Range TLB miss, what structure to walk?

Is a virtual page part of range? – Not known at a miss

Page 24: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

B. Managing Range TranslationsRedundancy to the rescue

One bit in page table entry denotes that page is part of a range

24

Page Table Walk

1

Insert into L1 TLB

2

Application resumes memory access

3

Range Table Walk (Background)

Insert into Range TLB

Part of a range

CR-RTRTC RTD RTF RTG

RTA RTB RTE

CR-3

Page 25: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

C. Facilitating Range Translations

25

Virtual Memory

Physical Memory

Does not facilitate physical page contiguity for range creation

Demand Paging

Page 26: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

C. Facilitating Range Translations

26

Virtual Memory

Physical Memory

Allocate physical pages when virtual memory is allocatedIncreases range sizes Reduces number of ranges

Eager Paging

Page 27: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Outline

MotivationDesign: Redundant Memory MappingsResults

MethodologyPerformance ResultsVirtual Contiguity

Conclusion

27

Page 28: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Methodology

• Measure cost on page walks on real hardware– Intel 12-core Sandy-bridge with 96GB memory– 64-entry L1 TLB + 512-entry L2 TLB 4-way associative for 4KB pages– 32-entry L1 TLB 4-way associative for 2MB pages

• Prototype Eager Paging and Emulator in Linux v3.15.5– BadgerTrap for online analysis of TLB misses and emulate Range TLB

• Linear model to predict performance• Workloads– Big-memory workloads, SPEC 2006, BioBench, PARSEC

28

Page 29: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Comparisons

• 4KB: Baseline using 4KB paging• THP: Transparent Huge Pages using 2MB paging [Transparent Huge Pages]

• CTLB: Clustered TLB with cluster of 8 4KB entries [HPCA’14]

• DS: Direct Segments [ISCA’13 and MICRO’14]

• RMM: Our proposal: Redundant Memory Mappings [ISCA’15]

29

Page 30: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Performance Results

30

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

cactusADM canneal graph500 mcf tigr

0%5%

10%15%20%25%30%35%40%45%

Exec

ution

Tim

e O

verh

ead

Measured using performance counters

Modeled based on emulator

5/14 workloadsRest in paper

Assumptions:CTLB: 512 entry fully-associativeRMM: 32 entry fully-associative

Both in parallel with L2

Page 31: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Performance Results

31

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

cactusADM canneal graph500 mcf tigr

0%5%

10%15%20%25%30%35%40%45%

Exec

ution

Tim

e O

verh

ead

Overheads of using 4KB pages are very high

Page 32: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Performance Results

32

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

cactusADM canneal graph500 mcf tigr

0%5%

10%15%20%25%30%35%40%45%

Exec

ution

Tim

e O

verh

ead

Clustered TLB works well, but limited by 8x reach

Page 33: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Performance Results

33

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

cactusADM canneal graph500 mcf tigr

0%5%

10%15%20%25%30%35%40%45%

Exec

ution

Tim

e O

verh

ead

2MB page helps with 512x reach: Overheads not very low

Page 34: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Performance Results

34

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

cactusADM canneal graph500 mcf tigr

0%5%

10%15%20%25%30%35%40%45%

0.00

%

0.00

%

0.06

%

Exec

ution

Tim

e O

verh

ead

Direct Segment perfect for some but not all workloads

Page 35: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Performance Results

35

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

4KB

CTLB

THP DS

RMM

cactusADM canneal graph500 mcf tigr

0%5%

10%15%20%25%30%35%40%45%

0.00

%

0.25

%

0.40

%

0.00

%

0.14

%

0.06

%

0.26

%

1.06

%

Exec

ution

Tim

e O

verh

ead

RMM achieves low overheads robustly across all workloads

Page 36: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Why low overheads? Virtual Contiguity

BenchmarkPaging Ideal RMM ranges

4KB + 2MBTHP

# of ranges #of ranges to cover more than 99% of memory

cactusADM 1365 + 333 112 49canneal 10016 + 359 77 4graph500 8983 + 35725 86 3mcf 1737 + 839 55 1tigr 28299 + 235 16 3

36

1000s of TLB entries requiredOnly 10s-100s of ranges per applicationOnly few ranges for 99% coverage

Page 37: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

Summary

• Problem: Virtual memory overheads are high• Proposal: Redundant Memory Mappings– Propose compact representation called range translation– Range Translation – arbitrarily large contiguous mapping– Effectively cache, manage and facilitate range translations– Retain flexibility of 4KB paging

• Result:– Reduces overheads of virtual memory to less than 1%

37

Page 38: Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S

38

Questions ?