reevaluating amdahl’s law in the multicore era€™s law assume program can be divided in 2...
TRANSCRIPT
Today● Massively parallel machines: IBM’s Roadrunner (25.2k processors), Sun
Microsystems’ Ranger (15.7k processors).● Consumer-available CPUs - up to four cores though.● Why? Amdahl’s law (1967)
cost model for multicore chips
Hill and Marty introduce following model:BCE = Base Core Equivalent
https://www.youtube.com/watch?v=KfgWmQpzD74
Core using R BCEs has performanceperf(R)
Amdahl’s law● Assume program can be divided in 2 fractions: parallelizable f and
sequential (1-f) ● For m cores:
parallelizable fraction = 90%, then with 8-16 processors sequential part will take 50%-80% of total execution time.
~Pessimistic view on the perspectives of parallel computing
Gustafson’s law● Amdahl’s law - fixed-size speedup model.● Gustafson’s law - fixed-time speedup model.
○ Problem size should scale up with the increase of computing capability.
assume time is fixed:original problem size - wscaled problem size - w’ = (1-f)*w + f*m*w
Yet another model: memory-bounded speedup model (Sun and Ni)
Let w* be workload under a memory-space constraint.
Parallel workload increase depending on memory: y = g(x)
Assume that each node is a processor-memory pair. Increase number processors m times => memory capacity is increased also m times.w = g(M)w* = g(m*M) = g(m * g-1(w) ) =>
memory-bounded model exmample:
Matrix multiplication:computational requriement y = 2*N^3memory requirement x = 3*N^2
So?In general, if we assume each element stored in memory will be used at least once, we have w* ≥ w’, and the memory-bounded speedup is greater than or equal to the fixed-time speedup.
Sun and Ni’s law
Generalization of Amdahl’s law and Gustafson’s law● Amdahl’s law is a special case with = 1● Gustafson’s law is a special case with = m
In practice, computational workload increases faster than the memory requirement. g'(m) > m and the memory-bounded speedup model gives a higher speedup
Performance of speed-up models described under cost-model
Amdahl’s:
Gustafson’s:
Memory-bounded:
, where c = perf(r) and number cores m = n/r
Memory wall and scalability● High latency● For memory in general, latency tends to increase with size● For the power function g, we have the following speedup, assuming that
total work “w” can be divided into w=w(c)+w(p)● Constant memory access delay is assumed, as well as independency of
workload(size) and number of cores.