cstalks - the multicore midlife crisis - 30 mar
DESCRIPTION
TRANSCRIPT
The Multicore Midlife Crisis
Bogdan Marius Tudor
CSTalks 30 March 2011
Outline
• The Memory Problem • Do We Need All These Cores? • Tomorrow’s Multicore • Research Perspective
5/4/11 2
Remember Single Core?
5/4/11 3
Wikipedia
0
1000
2000
3000
4000
Apr
-94
Apr
-98
Nov
-01
May
-04
Jul-0
6
Jul-0
8
Mar
-11
Cac
he S
ize
[kB
]My Next Processors
66 MHz
200 MHz
1000 MHz
2250 MHz
1600 MHz
2400 MHz
2400 MHz
5/4/11 4
0
1000
2000
3000
4000
Apr
-94
Apr
-98
Nov
-01
May
-04
Jul-0
6
Jul-0
8
Mar
-11
Cac
he S
ize
[kB
]My Next Processors
66 MHz
200 MHz
1000 MHz
2250 MHz
1600 MHz
2400 MHz
2400 MHz
5/4/11 5
So What?
Yeap, they improved the cache size. Do I care? The interesting part is why they did it.
5/4/11 6
The Memory Problem
• Moore’s Law: the number of transistors double every 18 months – Singlecore: new transistors
= faster speed – Multicore: new transistors
= more cores
• Memory speed increase does not obey Moore’s Law!
Core Core Core Core
Cache
Memory
Processor
5/4/11 7
The Memory Problem
• Problem: More cores compete for same slow memory!
• Implications:
IF
ID
X
M
W
J 5 cycles
IF
ID
ID Queue
L > 100 cycles 5/4/11 8
access to cache or RAM
Stalled!
The Memory Problem
• Problem: More cores compete for same slow memory!
• Solution: Increase cache size J – Maintain cache hit rate • 2x cache hit rate requires 4x cache size • Exponential increase in #transistors need
– Cache coherence overhead
5/4/11 9
Increasing Cache Size
Not practical!
5/4/11 10
B. M. Rogers et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling. ISCA 2009
Other Approaches
• Improve memory speed – Slow, power-hungry and error-prone
• Better caching • Improve memory bandwidth – Latency tradeoff
• Prefetch – Mixed blessings
• Allow more in-flight requests
5/4/11 11
Do We Need All These Cores?
• Average utilization: < 20% • We don’t have too many parallel apps • We just have enough compute power
• Until you try to encode an HD video – Star Trek holodecks: not there yet
• CPU vendors still have to make a living
5/4/11 12
Tomorrow’s Multicore
5/4/11 13
Intel
Tomorrow’s Multicore
• Intel Core i3, i5, i7 – Video is integrated into CPU – Must balance sequential and parallel performance – Lower energy requirements than prev. generations
• Heterogeneous cores – Many, slow, good at floating points – Some general purpose cores – “Combine” cores into super-cores
• Must live with the memory problems 5/4/11 14
Tomorrow’s Multicore
• The number of cores is becoming less important – They can’t keep increasing them – i3, i5, i7: how many cores each?
5/4/11 15
Tomorrow’s Multicore
5/4/11 16
Wikipedia
Tomorrow’s Multicore
• The number of cores is becoming less important – They can’t keep increasing them – i3, i5, i7: how many cores each?
• Important is what the system provides – FLOP intensive: GPU-style cores – I/O intensive: FAWN (CMU) – Memory intensive: Opteron/Xeon NUMA servers
5/4/11 17
A Research Perspective
• Coping with heterogeneity is hard – Different degrees of parallelism have different
sequential executions speeds – Many tradeoffs: Speed vs. Energy vs. Memory
intensity vs. I/O intensity • Need models for heterogeneity – Understand the cost of the applications in terms
of FLOPS, INTOPS, memory, I/O etc. • Silver lining: stick to sequential apps (?)
5/4/11 18
A Research Perspective
• Coping with slow memory • Need to improve data locality by orders of
magnitude • Compiler support, auto-tunners etc.
• Space-efficient data types: • HOT area in algo & systems • Bloom filters: NSDI’10: 3 papers! • Succinct data structures: STOC’08-STOC’10 • Cache oblivious algorithms
5/4/11 19
A Research Perspective
• Software-helped cache coherence – Or go without it J
• Renounce some programming patterns • Java initializes all objects to some value… • Rethink those hash tables
• Go for approximate solutions – It’s better if you can provide error bounds
5/4/11 20
Discussion
Thank you for your attention
5/4/11 21