apache hadoop india summit 2011 keynote talk "scaling hadoop applications" by milind...

29
Scaling Hadoop Applications Milind Bhandarkar Distributed Data Systems Engineer [[email protected]] http://www.flickr.com/photos/theplanetdotcom/4878812549/

Upload: yahoo-developer-network

Post on 25-May-2015

1.291 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Scaling Hadoop Applications

Milind BhandarkarDistributed Data Systems Engineer

[[email protected]]

http://www.flickr.com/photos/theplanetdotcom/4878812549/

Page 2: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

About Me

http://www.linkedin.com/in/milindb

Founding member of Hadoop team at Yahoo! [2005-2010]

Contributor to Apache Hadoop & Pig since version 0.1

Built and Led Grid Solutions Team at Yahoo! [2007-2010]

Parallel Programming Paradigms since 1989 (PhD, cs.illinois.edu)

Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo!, and LinkedIn

http://www.flickr.com/photos/tmartin/32010732/

Page 3: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Outline

Scalability of Applications

Causes of Sublinear Scalability

Best practices

Q & A

http://www.flickr.com/photos/86798786@N00/2202278303/

Page 4: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Importance of Scaling

Zynga’s Cityville: 0 to 100 Million users in 43 days ! (http://venturebeat.com/2011/01/14/zyngas-cityville-grows-to-100-million-users-in-43-days/)

Facebook in 2009 : From 150 Million users to 350 Million ! (http://www.facebook.com/press/info.php?timeline)

LinkedIn in 2010: Adding a member per second ! (http://money.cnn.com/2010/11/17/technology/linkedin_web2/index.htm)

Public WWW size: 26M in 1998, 1B in 2000, 1T in 2008 (http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html)

Scalability allows dealing with success gracefully !

http://www.flickr.com/photos/palmdiscipline/2784992768/

Page 5: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Explosion of Data

(Data-Rich Computing theme proposal. J. Campbell, et al, 2007)

http://www.flickr.com/photos/28634332@N05/4128229979/

Page 6: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Hadoop

Simplifies building parallel applications

Programmer specifies Record-level and Group-level sequential computation

Framework handles partitioning, scheduling, dispatch, execution, communication, failure handling, monitoring, reporting etc.

User provides hints to control parallel execution

Page 7: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Scalability of Parallel Programs

If one node processes k MB/s, then N nodes should process (k*N) MB/s

If some fixed amount of data is processed in T minutes on one node, the N nodes should process same data in (T/N) minutes

Linear Scalability

http://www-03.ibm.com/innovation/us/watson/

Page 8: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Reduce Latency

Minimize program

execution time

http://www.flickr.com/photos/adhe55/2560649325/

Page 9: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Increase Throughput

Maximize data processed per unit

time

http://www.flickr.com/photos/mikebaird/3898801499/

Page 10: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Three Equations

http://www.flickr.com/photos/ucumari/321343385/

Page 11: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Amdahl’s Law

http://www.computer.org/portal/web/awards/amdahl

Page 12: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Multi-Phase Computations

If computation C is split into N different parts, C1..CN

If partial computation Ci can be speeded up by a factor of Si

http://www.computer.org/portal/web/awards/amdahl

Page 13: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Amdahl’s Law, Restated

http://www.computer.org/portal/web/awards/amdahl

Page 14: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

MapReduce Workflow

http://www.computer.org/portal/web/awards/amdahl

Page 15: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Little’s Law

http://sloancf.mit.edu/photo/facstaff/little.gif

Page 16: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Message Cost Model

http://www.flickr.com/photos/mattimattila/4485665105/

Page 17: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Message Granularity

For Gigabit Ethernet α = 300 μS β = 100 MB/s

100 Messages of 10KB each = 40 ms

10 Messages of 100 KB each = 13 ms

http://www.flickr.com/photos/philmciver/982442098/

Page 18: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Alpha-Beta

Common Mistake: Assuming that α is constant Scheduling latency for responder Jobtracker/Tasktracker time slice inversely

proportional to number of concurrent tasks

Common Mistake: Assuming that β is constant Network congestion TCP incast

Page 19: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Causes of Sublinear Scalability

http://www.flickr.com/photos/sabertasche2/2609405532/

Page 20: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Sequential Bottlenecks

Mistake: Single reducer (default) mapred.reduce.tasks Parallel directive

Mistake: Constant number of reducers independent of data size default_parallel

http://www.flickr.com/photos/bogenfreund/556656621/

Page 21: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Load Imbalance

Unequal Partitioning of Work

Imbalance = Max Partition Size – Min Partition Size

Heterogeneous workers Example: Disk Bandwidth

Empty Disk: 100 MB/s 75% Full disk: 25 MB/s

http://www.flickr.com/photos/kelpatolog/176424797/

Page 22: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Over-Partitioning

For N workers, choose M partitions, M >> N

Adaptive Scheduling, each worker chooses the next unscheduled partition

Load Imbalance = Max(Sum(Wk)) – Min(Sum(Wk))

For sufficiently large M, both Max and Min tend to Average(Wk), thus reducing load imbalance

http://www.flickr.com/photos/infidelic/3238637031/

Page 23: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Critical Paths

Maximum distance between start and end nodes

Long tail in Map task executions Enable Speculative Execution

Overlap communication and computation Asynchronous notifications Example: deleting intermediate outputs after

job completion

http://www.flickr.com/photos/cdevers/4619306252/

Page 24: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Algorithmic Overheads

Repeating computation in place of adding communication

Parallel exploration of solution space results in wastage of computations

Need for a coordinator

Share partial, unmaterialized results Consider memory-based small replicated K-V

stores with eventual consistency

http://www.flickr.com/photos/sbprzd/140903299/

Page 25: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Synchronization

Shuffle stage in Hadoop, M*R transfers

Slowest system components involved: Network, Disk

Can you eliminate Reduce stage ?

Use combiners ?

Compress intermediate output ?

Launch reduce tasks before all maps finish

http://www.flickr.com/photos/susan_d_p/2098849477/

Page 26: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Task Granularity

Amortize task scheduling and launching overheads

Centralized scheduler Out-of-band Tasktracker heartbeat reduces latency,

but May overwhelm the scheduler

Increase task granularity Combine multiple small files Maximize split sizes Combine computations per datum

http://www.flickr.com/photos/philmciver/982442098/

Page 27: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Hadoop Best Practices

Use higher-level languages (e.g. Pig)

Coalesce small files into larger ones & use bigger HDFS block size

Tune buffer sizes for tasks to avoid spill to disk

Consume only one slot per task

Use combiner if local aggregation reduces size

http://www.flickr.com/photos/yaffamedia/1388280476/

Page 28: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Hadoop Best Practices

Use compression everywhere CPU-efficient for intermediate data Disk-efficient for output data

Use Distributed Cache to distribute small side-files (<< than input split size)

Minimize number of NameNode/JobTracker RPCs from tasks

http://www.flickr.com/photos/yaffamedia/1388280476/

Page 29: Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by Milind Bhandarkar

Questions ?

http://www.flickr.com/photos/horiavarlan/4273168957/