an accumulative computation framework on mapreduce ppl2013
TRANSCRIPT
Examples of the Accumulative Computation Benchmarks on the MapReduce Clusters
• Programmability: The parallel accumulate programming interface can simplify many problems which have data dependency.
• Efficiency and Scalability: The experiments show the framework can process large data in reasonable time, and achieves near linear speed-up when increasing CPUs.
Conclusions
Line-of-sight
Hideya Iwasaki and Zhenjiang Hu. A new parallel skeleton for general accumulative computations, International Journal of Parallel Programming, 2004.Yu Liu, Kento Emoto , Kiminori Matsuzaki, Zhenjiang Hu, Accumulative Computation on MapReduce ( Submitted to Euro-Par 2013).
Computations which have data dependency are usually hard to be parallelized by using MapReduce or other parallel programming models. For example, given an input list [ x1, x2, x3, x4 ], and an binary operator ⊙ to compute:
An Accumulative Computation Framework on MapReduce
劉 雨1,4 江本 健斗2 松崎 公紀3 胡 振江1,4
1総合研究大学院大学 2東京大学 3高知工科大学 4国立情報学研究所
Parallel Accumulative Computation MapReduce-Accumulation
連絡先: 劉 雨(YU LIU)/ 国立情報学研究所 アーキテクチャ科学研究系 胡研究室
TEL : 03-4212-2611 FAX : 03-4212-2533 Email : [email protected]
EduBaseCloud of National Institute of Informatics
Eliminate Smallers
Tag Match
Here are four accumulative computation examples
< , <, />, <, />, <, />
[ 1, 3, 2, 9, 4, 6, 7, 12, 10 ]
A communication-efficient MapReduce algorithm
To simplify the problems like above, we propose an accumulative computation framework on MapReduce. We provide a general pattern: accumulate to encode many parallel computations in this framework.
The above definition can be rewrite in the following form:
Programs written in terms of accumulate can be automatically transformed to efficient MapReduce programs by our framework.
Here are 6 accumulate programs.
Input data size is about 5 x 109 items, for each program.