advanced topics: mapreduce ece 454 computer systems programming topics: reductions implemented in...
TRANSCRIPT
Advanced Topics: MapReduce ECE 454 Computer Systems
Programming
Advanced Topics: MapReduce ECE 454 Computer Systems
Programming
Topics:Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop Distributed File System
Cristiana Amza
– 2 –
Motivation: Big Data AnalyticsMotivation: Big Data Analytics
2
http://www.google.org/flutrends/ca/
(2012)
Average Searches Per Day:
5,134,000,000
– 3 –
Motivation: Big Data AnalyticsMotivation: Big Data Analytics
Process lots of dataProcess lots of dataGoogle processed about 24 petabytes of data per day in 2009.
A single machine A single machine cannot serve all the datacannot serve all the dataYou need a distributed system to store and process in parallelin parallel
Parallel programming?Parallel programming?Threading is hard!How do you facilitate communicationcommunication between nodes?How do you scale to more machines?How do you handle machine failuresfailures?
3
– 4 –
MapReduceMapReduce
MapReduce MapReduce [OSDI’04] [OSDI’04] provides provides Automatic parallelization, distribution I/O scheduling
Load balancingNetwork and data transfer optimization
Fault toleranceHandling of machine failures
Need more power: Need more power: Scale outScale out, not up!, not up!Large number of commodity servers as opposed to some high
end specialized servers
4
Apache Hadoop:
Open source implementation of
MapReduce
– 5 –
Typical problem solved by MapReduceTypical problem solved by MapReduce
Read a lot of dataRead a lot of data
MapMap: extract something you care about from each : extract something you care about from each recordrecord
Shuffle and SortShuffle and Sort
ReduceReduce: aggregate, summarize, filter, or transform: aggregate, summarize, filter, or transform
Write the resultsWrite the results
5
– 6 –
MapReduce workflowMapReduce workflow
6
Worker
WorkerWorker
Worker
Worker
readlocalwrite
remoteread,sort
OutputFile 0
OutputFile 1
write
Split 0Split 1Split 2
Input Data Output Data
Mapextract something
you care about from each record
Reduce aggregate, summarize,
filter, or transform
– 7 –
Mappers and ReducersMappers and Reducers
Need to handleNeed to handle more data more data? Just add ? Just add more more Mappers/ReducersMappers/Reducers!!
No need to handle No need to handle multithreaded code multithreaded code Mappers and Reducers are typically single threaded and
deterministicDeterminism allows for restarting of failed jobs
Mappers/Reducers run entirely independent of each otherIn Hadoop, they run in separate JVMs
7
– 8 – 8
http://kickstarthadoop.blogspot.ca/2011/04/word-count-hadoop-map-reduce-example.html
Example: Word CountExample: Word Count
– 9 –
MapperMapper
Reads in Reads in input pairinput pair <Key,Value><Key,Value>
Outputs a pair Outputs a pair <K’, V’><K’, V’> Let’s count number of each word in user queries (or
Tweets/Blogs) The input to the mapper will be <queryID, QueryText>: <Q1,“The teacher went to the store. The store was closed; the store opens in the morning. The store opens at 9am.” >
The output would be:
<The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store,1> <the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store,1> <opens, 1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1> <opens, 1> <at, 1> <9am, 1>
9
– 10 –
ReducerReducer
Accepts the Accepts the Mapper outputMapper output, and aggregates values on , and aggregates values on the keythe key For our example, the reducer input would be:
<The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store, 1> <the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store, 1> <opens,1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1> <opens, 1> <at, 1> <9am, 1>
The output would be:<The, 6> <teacher, 1> <went, 1> <to, 1> <store, 3> <was, 1> <closed, 1> <opens, 1> <morning, 1> <at, 1> <9am, 1>
10
– 11 –
MapReduceMapReduce
11
HadoopProgramHadoopProgram
MasterMaster
fork fork fork
assignmap
assignreduce
Worker
WorkerWorker
Worker
Worker
readlocalwrite
remoteread,sort
Split 0Split 1Split 2
Input Data
Map Reduce
OutputFile 0
OutputFile 1
write
Output Data
Transfer peta-scale
data through network
– 12 –
Google File System (GFS)Hadoop Distributed File System (HDFS)Google File System (GFS)Hadoop Distributed File System (HDFS)Split data and store 3 replica on commodity serversSplit data and store 3 replica on commodity servers
12
– 13 –
MapReduceMapReduce
13
MasterMasterassignmap assign
reduce
Worker
WorkerWorker
Worker
Worker
localwrite
remoteread,sort
OutputFile 0
OutputFile 1
write
Split 0Split 1Split 2
Split 0
Split 1
Split 2
Input Data Output Data
Map Reduce
HDFSNameNode
HDFSNameNode
Read from local disk
Where are the chunks of input
data?Location of the chunks of input
data
– 14 –
Locality OptimizationLocality Optimization
Master scheduling policy:Master scheduling policy: Asks GFS for locations of replicas of input file blocks Map tasks scheduled so GFS input block replica are on
same machine or same rack
Effect: Thousands of machines Effect: Thousands of machines read input at local disk read input at local disk speedspeed Eliminate network bottleneck!
14
– 15 –
Failure in MapReduceFailure in MapReduce
FailuresFailures are are norm norm in commodity hardwarein commodity hardware
Worker failureWorker failure Detect failure via periodic heartbeats Re-execute in-progress map/reduce tasks
Master failureMaster failure Single point of failure; Resume from Execution Log
RobustRobust Google’s experience: lost 1600 of 1800 machines once!, but
finished fine
15
– 16 –
Fault tolerance: Handled via re-executionFault tolerance: Handled via re-executionOn worker On worker failurefailure::
Detect failure via periodic heartbeats Re-execute completed and in-progress map tasks Task completion committed through master
Robust: [Google’s experience] lost 1600 of 1800 Robust: [Google’s experience] lost 1600 of 1800 machines, but finished machines, but finished finefine
16
– 17 –
Refinement: Redundant ExecutionRefinement: Redundant ExecutionSlow workersSlow workers significantly lengthen completion time significantly lengthen completion time
Other jobs consuming resources on machine Bad disks with soft errors transfer data very slowly Weird things: processor caches disabled (!!)
SolutionSolution: spawn backup copies of tasks: spawn backup copies of tasks Whichever one finishes first "wins"
17
– 18 –
Refinement: Skipping Bad RecordsRefinement: Skipping Bad RecordsMap/Reduce functions sometimes fail for particular Map/Reduce functions sometimes fail for particular inputsinputs
Best solution is to debug & fix, but not always possibleBest solution is to debug & fix, but not always possible
If master sees If master sees two failurestwo failures for the for the same recordsame record:: Next worker is told to skip the record
18