vishal gupta* (georgia tech) ripal nathuji (microsoft ...vishal gupta* (georgia tech) ripal nathuji...
TRANSCRIPT
![Page 1: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/1.jpg)
Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research)
* Work done during summer internship at Microsoft Research
![Page 2: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/2.jpg)
Symmetric
SMP
Asymmetric multicore processor
AMP
Different types of CPU cores
P P
P P P
P
P
CPU Cores
multicore processor
![Page 3: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/3.jpg)
B
B
C A
P P
P P
P P
P
SMP
AMP
SMP
AMP
Application
time T 2T 3T
Speedup!
![Page 4: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/4.jpg)
• How good are AMPs as compared to SMPs?
• Can datacenter applications save power using AMPs?
![Page 5: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/5.jpg)
S … S S S
S … S S S
S … S S S
. . . . . . . . . . . .
Others
Processor
Datacenter
Server
P P
P P P
P
P SMP AMP
λdatacenter (throughput)
![Page 6: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/6.jpg)
• Constant work • Meet latency SLA
€
PdatacenterAMP < Pdatacenter
SMP ?
![Page 7: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/7.jpg)
• Energy Scaling
• Parallel Speedup …
Sequential execution
Parallel execution
![Page 8: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/8.jpg)
P P
P
Sequential application
SMP AMP
Area equivalent
![Page 9: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/9.jpg)
time TSLA
TSMP
TAMP
Slack
tlarge tsmall
P
P
P
SMP
AMP
Smaller core = lesser power
![Page 10: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/10.jpg)
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P … SMP AMP
Parallel application
![Page 11: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/11.jpg)
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
SMP AMP
…
…
Sequential Phase
Small cores: Bottleneck
Run on the fast core
Speedup = Higher throughput
![Page 12: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/12.jpg)
Server
Request Queue
Arrival Rate λ Service Rate
µ
Latency SLA
M/M/1 Queuing Model
€
E[T] =1
µ − λAvg.
Response Time
![Page 13: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/13.jpg)
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P … SMP AMP
Parallel application
Amdahl’s Law for Multicores
Parallel Speedup (PS) (refer to paper for ES)
![Page 14: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/14.jpg)
Area = 1 Area = r Perf = perf(r)
n = Chip area
P P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P P
P
P
P
P
P
P
P
P
P
P P
P P
SMP n=16, r=1
AMP n=16, r=4
SMP n=16, r=4
r = Area(Big/Core)
f = fraction of computation that can be parallelized
![Page 15: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/15.jpg)
€
µAMP ( f ,n,r) =1
1− fperf (r)
+f
n − r
Ref: Hill and Marty, Amdahl's law in the multicore era (IEEE Computer’08)
€
µSMP ( f ,n,r) =1
1− fperf (r)
+f
nr* perf (r)
![Page 16: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/16.jpg)
€
λdatacenter = NserverSMP * λserver
SMP
€
λdatacenter = NserverAMP * λserver
AMP
Datacenter capacity = No. of servers * Server throughput
Constant Work
€
λserverpeak = µ −
1TSLA
![Page 17: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/17.jpg)
€
PdatacenterSMP = Nserver
SMP *PserverSMP
€
PdatacenterAMP = Nserver
AMP *PserverAMP
Datacenter power (P) = No. of servers * Server power
![Page 18: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/18.jpg)
CPU Utilization (U)
Serv
er P
ower
C
onsu
mpt
ion
P(U
)
Idle Power
Peak Power
Ref: The Case for Energy-Proportional Computing, Barroso & Hölzle, IEEE Computer 2007
![Page 19: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/19.jpg)
CPU Utilization (U)
Frac
tion
of ti
me Server load distribution (Wload)
€
Pserver = Wload (U) *Pserver (U)∑
![Page 20: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/20.jpg)
€
PdatacenterAMP < Pdatacenter
SMP ?
![Page 21: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/21.jpg)
Upto 52% power savings n = 64
0% 10% 20% 30% 40% 50% 60%
0 0.2 0.4 0.6 0.8 1
Pow
er sa
ving
s of A
MP
over
SM
P
Fraction of work that can be parallelized (f)
r=32 r=16 r=8 r=4
![Page 22: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/22.jpg)
Upto 14% power savings
-25% -20% -15% -10% -5% 0% 5%
10% 15% 20%
5% 10% 15% 20% 25% 30% 35% 40% 45%
Pow
er sa
ving
s of A
MP
over
SM
P
Fraction of area sacrificed for small core
Small core bias Uniform bias Large core bias
Application A Application B Application C
![Page 23: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/23.jpg)
• PS looks more promising that ES
• Can we achieve these savings in reality?
![Page 24: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/24.jpg)
High r (realistic r = 3)
High (but not too high!) f
0% 10% 20% 30% 40% 50% 60%
0 0.5 1 Pow
er sa
ving
s of A
MP
over
SM
P
Fraction of work that can be parallelized (f)
r=32 r=16 r=8 r=4
![Page 25: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/25.jpg)
• Scalability: Amdahl’s law assumes unbounded scalability
• Migration overhead: zero migration overhead
• Perfect scheduling: oracle scheduler
Actual savings are going to be lower
![Page 26: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research](https://reader035.vdocuments.us/reader035/viewer/2022062917/5ed9527ef59b0f56f45f45c0/html5/thumbnails/26.jpg)
• Potential for power savings in datacenters using AMPs
• Parallel Speedup more promising than Energy Scaling
• Practical considerations to realize full benefits
Future work: Extend our analysis to functional asymmetry