an accumulative computation framework on mapreduce ppl2013

Examples of the Accumulative Computation Benchmarks on the MapReduce Clusters • Programmability: The parallel accumulate programming interface can simplify many problems which have data dependency. • Efficiency and Scalability: The experiments show the framework can process large data in reasonable time, and achieves near linear speed-up when increasing CPUs. Conclusions Line-of-sight Hideya Iwasaki and Zhenjiang Hu. A new parallel skeleton for general accumulative computations, International Journal of Parallel Programming, 2004. Yu Liu, Kento Emoto , Kiminori Matsuzaki, Zhenjiang Hu, Accumulative Computation on MapReduce ( Submitted to Euro-Par 2013). Computations which have data dependency are usually hard to be parallelized by using MapReduce or other parallel programming models. For example, given an input list [ x 1 , x 2 , x 3 , x 4 ], and an binary operator ⊙ to compute: An Accumulative Computation Framework on MapReduce 劉雨 1,4 江本健斗 2 松崎公紀 3 胡振江 1,4 1 総合研究大学院大学 2 東京大学 3 高知工科大学 4 国立情報学研究所 Parallel Accumulative Computation MapReduce-Accumulation 連絡先：劉雨（YU LIU）／国立情報学研究所アーキテクチャ科学研究系胡研究室 TEL : 03-4212-2611 FAX : 03-4212-2533 Email : [email protected] EduBaseCloud of National Institute of Informatics Eliminate Smallers Tag Match Here are four accumulative computation examples < , <, />, <, />, <, /> [ 1, 3, 2 , 9, 4, 6, 7 , 12, 10 ] A communication-efficient MapReduce algorithm To simplify the problems like above, we propose an accumulative computation framework on MapReduce. We provide a general pattern: accumulate to encode many parallel computations in this framework. The above definition can be rewrite in the following form: Programs written in terms of accumulate can be automatically transformed to efficient MapReduce programs by our framework. Here are 6 accumulate programs. Input data size is about 5 x 10 9 items, for each program.

Upload: yu-liu

Post on 18-Feb-2017

92 views

Category:

Technology

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: An accumulative computation framework on MapReduce ppl2013

Examples of the Accumulative Computation Benchmarks on the MapReduce Clusters

• Programmability: The parallel accumulate programming interface can simplify many problems which have data dependency.

• Efficiency and Scalability: The experiments show the framework can process large data in reasonable time, and achieves near linear speed-up when increasing CPUs.

Conclusions

Line-of-sight

Hideya Iwasaki and Zhenjiang Hu. A new parallel skeleton for general accumulative computations, International Journal of Parallel Programming, 2004.Yu Liu, Kento Emoto , Kiminori Matsuzaki, Zhenjiang Hu, Accumulative Computation on MapReduce ( Submitted to Euro-Par 2013).

Computations which have data dependency are usually hard to be parallelized by using MapReduce or other parallel programming models. For example, given an input list [ x1, x2, x3, x4 ], and an binary operator ⊙ to compute:

An Accumulative Computation Framework on MapReduce

劉雨1,4 江本健斗2 松崎公紀3 胡振江1,4

1総合研究大学院大学 2東京大学 3高知工科大学 4国立情報学研究所

Parallel Accumulative Computation MapReduce-Accumulation

連絡先：劉雨（YU LIU）／国立情報学研究所アーキテクチャ科学研究系胡研究室

TEL : 03-4212-2611 FAX : 03-4212-2533 Email : [email protected]

EduBaseCloud of National Institute of Informatics

Eliminate Smallers

Tag Match

Here are four accumulative computation examples

< , <, />, <, />, <, />

[ 1, 3, 2, 9, 4, 6, 7, 12, 10 ]

A communication-efficient MapReduce algorithm

To simplify the problems like above, we propose an accumulative computation framework on MapReduce. We provide a general pattern: accumulate to encode many parallel computations in this framework.

The above definition can be rewrite in the following form:

Programs written in terms of accumulate can be automatically transformed to efficient MapReduce programs by our framework.

Here are 6 accumulate programs.

Input data size is about 5 x 109 items, for each program.

MapReduce and Hadoop File Systemnsrit.edu.in/admin/img/cms/10096mapreduce.pdf · The Outline Introduction to MapReduce From CS Foundation to MapReduce MapReduce programming model

Data Management in Large-Scale Distributed Systems - MapReduce … · Introduction to MapReduce The Hadoop Eco-System HDFS Hadoop MapReduce 4. MapReduce at Google Publication The

21 Overview Accumulative

Efficient Skyline Computation in MapReduce

MapReduce vs Pig | MapReduce Pig Integration

MapReduce. MapReduce Outline MapReduce Architecture MapReduce Internals MapReduce Examples JobTracker Interface

Introduction to Hadoop and MapReduce - Europa · Eurostat Large-scale Computation • Traditional solutions for computing large quantities of data relied mainly on processor •Complex

Distributed Data Management - - TU Kaiserslautern€¢Abstract computation ... –Distributed and –Large scale Distributed Data ... Sanjay Ghemawat: MapReduce: Simplified Data Processing