making sense of performance and identifying stragglers in data analytics framework

24
Making sense of performance and identifying stragglers in Data Analytics Framework CSCI 8780 Advanced Distributed Systems Manish Ranjan and Narita Pandhe

Upload: manish-ranjan

Post on 14-Apr-2017

90 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Making sense of performance and identifying stragglers in Data Analytics Framework

Making sense of performance and identifying stragglers inData Analytics Framework

CSCI 8780 Advanced Distributed Systems

Manish Ranjan and Narita Pandhe

Page 2: Making sense of performance and identifying stragglers in Data Analytics Framework

Introduction

- Large-scale data analytics has become widespread

- Research devoted to improving the performance of data analytics frameworks

- BUT comparatively little effort : spent in identifying the performance bottlenecks!!

2

Page 3: Making sense of performance and identifying stragglers in Data Analytics Framework

More resource efficient

Faster

3

Page 4: Making sense of performance and identifying stragglers in Data Analytics Framework

4

Page 5: Making sense of performance and identifying stragglers in Data Analytics Framework

5

Page 6: Making sense of performance and identifying stragglers in Data Analytics Framework

6

Page 7: Making sense of performance and identifying stragglers in Data Analytics Framework

7

Page 8: Making sense of performance and identifying stragglers in Data Analytics Framework

8

Page 9: Making sense of performance and identifying stragglers in Data Analytics Framework

9

Page 10: Making sense of performance and identifying stragglers in Data Analytics Framework

Experiments

10

Page 11: Making sense of performance and identifying stragglers in Data Analytics Framework

What Cluster Configuration did we use?

- #1 Master, #6 Slaves

- Master Config- 64 - Bit,

- 8GB RAM,

- 2 Cores,

- 50GB SSD

- Slaves Config(each):- 64 - Bit

- 2GB RAM,

- 1 Core,

- 30GB SSD

Config related modifications: eg. Replication + SSDs

11

Page 12: Making sense of performance and identifying stragglers in Data Analytics Framework

First Benchmarking namenode

To first test Namenode hardware and config: NNBench

What it does:

Generates a lot of HDFS related requests

Why it does:

To put a “HIGH” HDFS management stress on the namenode

How it does:

Simulates request for creating, reading, renaming and deleting files on HDFS

12

Page 13: Making sense of performance and identifying stragglers in Data Analytics Framework

What Workload did we use?

- TeraSort benchmark suite

- Goal of TeraSort: sort 1TB of data (or any other amount of data you want) as fast as possible.

- Limited by our cluster configuration, we performed several experiments with data of size 1GB, 5GB and 10GB.

- TeraSort benchmark can be utilized to iron out your Hadoop configuration

13

Page 14: Making sense of performance and identifying stragglers in Data Analytics Framework

14

Hadoop

i-6c76c1da (M), i-40684ef0

(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)

Page 15: Making sense of performance and identifying stragglers in Data Analytics Framework

15

i-6c76c1da (M), i-40684ef0

(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)

Red : s6Dark Green: s4

Page 16: Making sense of performance and identifying stragglers in Data Analytics Framework

16

i-6c76c1da (M), i-40684ef0

(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)

Observations for 10GB

Red : s6Dark Green: s4

Page 17: Making sense of performance and identifying stragglers in Data Analytics Framework

17

i-6c76c1da (M), i-40684ef0

(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)

Observations for 10GB

Red : s6Dark Green: s4

Page 18: Making sense of performance and identifying stragglers in Data Analytics Framework

18

i-6c76c1da (M), i-40684ef0

(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)

Identified Stragglers

Page 19: Making sense of performance and identifying stragglers in Data Analytics Framework

19

Spark

i-6c76c1da (M), i-40684ef0

(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)

Orange: s2Red: s6

Page 20: Making sense of performance and identifying stragglers in Data Analytics Framework

20

Hadoop SparkRed s6Bright Blue :

s5Orange : s2

Page 21: Making sense of performance and identifying stragglers in Data Analytics Framework

Conclusions- Straggler task spends an unusually long amount of time in a particular part

of task execution.

- It usually not too hard to found a straggler for a specific execution- what is hard is to get it consistently enough!

- Though we were lucky enough to spot few even in a mediocre strength cluster. Which emphasizes the necessity of understanding the cluster meta info well.

Eg: DFS disk read time, shuffle write time, shuffle read time, and Java’s garbage collection

- Since, Spark:

- often breaks jobs into many more tasks

- has much lower task launch overhead than Hadoop

21

Page 22: Making sense of performance and identifying stragglers in Data Analytics Framework

References- Making Sense of Performance in Data Analytics Frameworks,

Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, UC Berkeley, ICSI,

VMware, Seoul National University- No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics

https://www.cs.duke.edu/starfish/files/socc11-cluster-sizing.pdf- http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-ha

doop-cluster-with-terasort-testdfsio-nnbench-mrbench/- https://github.com/ehiggs/spark-terasort- aws.amazon.com

22

Page 23: Making sense of performance and identifying stragglers in Data Analytics Framework

23

Page 24: Making sense of performance and identifying stragglers in Data Analytics Framework

24