cstalks - the multicore midlife crisis - 30 mar

21
The Multicore Midlife Crisis Bogdan Marius Tudor CSTalks 30 March 2011

Upload: cstalks

Post on 28-Nov-2014

788 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: CSTalks - The Multicore Midlife Crisis - 30 Mar

The Multicore Midlife Crisis

Bogdan Marius Tudor

CSTalks 30 March 2011

Page 2: CSTalks - The Multicore Midlife Crisis - 30 Mar

Outline

•  The Memory Problem •  Do We Need All These Cores? •  Tomorrow’s Multicore •  Research Perspective

5/4/11 2

Page 3: CSTalks - The Multicore Midlife Crisis - 30 Mar

Remember Single Core?

5/4/11 3

Wikipedia

Page 4: CSTalks - The Multicore Midlife Crisis - 30 Mar

0

1000

2000

3000

4000

Apr

-94

Apr

-98

Nov

-01

May

-04

Jul-0

6

Jul-0

8

Mar

-11

Cac

he S

ize

[kB

]My Next Processors

66 MHz

200 MHz

1000 MHz

2250 MHz

1600 MHz

2400 MHz

2400 MHz

5/4/11 4

Page 5: CSTalks - The Multicore Midlife Crisis - 30 Mar

0

1000

2000

3000

4000

Apr

-94

Apr

-98

Nov

-01

May

-04

Jul-0

6

Jul-0

8

Mar

-11

Cac

he S

ize

[kB

]My Next Processors

66 MHz

200 MHz

1000 MHz

2250 MHz

1600 MHz

2400 MHz

2400 MHz

5/4/11 5

Page 6: CSTalks - The Multicore Midlife Crisis - 30 Mar

So What?

Yeap, they improved the cache size. Do I care? The interesting part is why they did it.

5/4/11 6

Page 7: CSTalks - The Multicore Midlife Crisis - 30 Mar

The Memory Problem

•  Moore’s Law: the number of transistors double every 18 months –  Singlecore: new transistors

= faster speed –  Multicore: new transistors

= more cores

•  Memory speed increase does not obey Moore’s Law!

Core Core Core Core

Cache

Memory

Processor

5/4/11 7

Page 8: CSTalks - The Multicore Midlife Crisis - 30 Mar

The Memory Problem

•  Problem: More cores compete for same slow memory!

•  Implications:

IF

ID

X

M

W

J 5 cycles

IF

ID

ID Queue

L > 100 cycles 5/4/11 8

access to cache or RAM

Stalled!

Page 9: CSTalks - The Multicore Midlife Crisis - 30 Mar

The Memory Problem

•  Problem: More cores compete for same slow memory!

•  Solution: Increase cache size J – Maintain cache hit rate •  2x cache hit rate requires 4x cache size •  Exponential increase in #transistors need

– Cache coherence overhead

5/4/11 9

Page 10: CSTalks - The Multicore Midlife Crisis - 30 Mar

Increasing Cache Size

Not practical!

5/4/11 10

B. M. Rogers et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling. ISCA 2009

Page 11: CSTalks - The Multicore Midlife Crisis - 30 Mar

Other Approaches

•  Improve memory speed – Slow, power-hungry and error-prone

•  Better caching •  Improve memory bandwidth – Latency tradeoff

•  Prefetch – Mixed blessings

•  Allow more in-flight requests

5/4/11 11

Page 12: CSTalks - The Multicore Midlife Crisis - 30 Mar

Do We Need All These Cores?

•  Average utilization: < 20% •  We don’t have too many parallel apps •  We just have enough compute power

•  Until you try to encode an HD video – Star Trek holodecks: not there yet

•  CPU vendors still have to make a living

5/4/11 12

Page 13: CSTalks - The Multicore Midlife Crisis - 30 Mar

Tomorrow’s Multicore

5/4/11 13

Intel

Page 14: CSTalks - The Multicore Midlife Crisis - 30 Mar

Tomorrow’s Multicore

•  Intel Core i3, i5, i7 – Video is integrated into CPU – Must balance sequential and parallel performance – Lower energy requirements than prev. generations

•  Heterogeneous cores – Many, slow, good at floating points – Some general purpose cores – “Combine” cores into super-cores

•  Must live with the memory problems 5/4/11 14

Page 15: CSTalks - The Multicore Midlife Crisis - 30 Mar

Tomorrow’s Multicore

•  The number of cores is becoming less important – They can’t keep increasing them –  i3, i5, i7: how many cores each?

5/4/11 15

Page 16: CSTalks - The Multicore Midlife Crisis - 30 Mar

Tomorrow’s Multicore

5/4/11 16

Wikipedia

Page 17: CSTalks - The Multicore Midlife Crisis - 30 Mar

Tomorrow’s Multicore

•  The number of cores is becoming less important – They can’t keep increasing them –  i3, i5, i7: how many cores each?

•  Important is what the system provides – FLOP intensive: GPU-style cores –  I/O intensive: FAWN (CMU) – Memory intensive: Opteron/Xeon NUMA servers

5/4/11 17

Page 18: CSTalks - The Multicore Midlife Crisis - 30 Mar

A Research Perspective

•  Coping with heterogeneity is hard – Different degrees of parallelism have different

sequential executions speeds – Many tradeoffs: Speed vs. Energy vs. Memory

intensity vs. I/O intensity •  Need models for heterogeneity – Understand the cost of the applications in terms

of FLOPS, INTOPS, memory, I/O etc. •  Silver lining: stick to sequential apps (?)

5/4/11 18

Page 19: CSTalks - The Multicore Midlife Crisis - 30 Mar

A Research Perspective

•  Coping with slow memory •  Need to improve data locality by orders of

magnitude •  Compiler support, auto-tunners etc.

•  Space-efficient data types: •  HOT area in algo & systems •  Bloom filters: NSDI’10: 3 papers! •  Succinct data structures: STOC’08-STOC’10 •  Cache oblivious algorithms

5/4/11 19

Page 20: CSTalks - The Multicore Midlife Crisis - 30 Mar

A Research Perspective

•  Software-helped cache coherence – Or go without it J

•  Renounce some programming patterns •  Java initializes all objects to some value… •  Rethink those hash tables

•  Go for approximate solutions –  It’s better if you can provide error bounds

5/4/11 20

Page 21: CSTalks - The Multicore Midlife Crisis - 30 Mar

Discussion

Thank you for your attention

5/4/11 21