![Page 1: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/1.jpg)
MapReducesPATTERNS FOR PROCESS
![Page 2: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/2.jpg)
Agenda Overview of all the Map Reduce Design Patterns
MapReduce Design Patterns Overview
Deep Dive into following Patterns
Filtering Patterns
Join Patterns
Input and Output Patterns
Other Patterns Overview
Summarization Patterns
Data Organization Patterns
MetaPatterns
Comparison chart of when to use which design patterns
Best Practices
2
![Page 3: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/3.jpg)
MapReduce Patterns
Summarization Patterns
Filtering Patterns
Data Organization Patterns
Join Patterns
Meta Patterns
Input & Output Patterns
BIG DATA SERIES 3Powered by Prognosive © 2015
![Page 4: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/4.jpg)
Summarization Patterns
Numerical Summarizations
Inverted Indexes
Counters
![Page 5: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/5.jpg)
Numerical Summarizations
Word Count
Record Counts
Min / Max
Average/Median/Std Deviation
BIG DATA SERIES 5Powered by Prognosive © 2015
![Page 6: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/6.jpg)
Inverted Indexes
source: MapReduce Design Patterns, Miner & Shook, O’Reilly
![Page 7: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/7.jpg)
Counters
Record Count
Unique Instances
Summations
if( StringUtils.startsWithLetter(token) ){
context.getCounter(WordsNature.STARTS_WITH_LETTER).increment(1);
}
![Page 8: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/8.jpg)
Filter Patterns
Filters
Bloom Filters
Top Ten
Distinct
BIG DATA SERIES 8Powered by Prognosive © 2015
![Page 9: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/9.jpg)
Filters
Narrowing Views
Tracking Event Threads
Distributed Grep
Data Cleansing
Simple Random Sampling
Low Scoring Data
![Page 10: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/10.jpg)
Bloom Filters Similar to other filters Check each Record – decide to keep or remove
Different: Filter based on set membership
Set membership is evaluated as well
Compares one list to another
Sometimes emits a false positive Often this is OK
Steps: Train the filter and list of values – store in HDFS
Do the filtering
![Page 11: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/11.jpg)
Bloom Filters
source: MapReduce Design Patterns, Miner & Shook, O’Reilly
![Page 12: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/12.jpg)
Top Ten
![Page 13: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/13.jpg)
Distinct (De-dupe)
Several Methods:
HDFS & MapReduce Alone
HBase & HDFS
HDFS, MapReduce & Storage Controller
Streaming, HDFS & MapReduce
MapReduce with Blocking
![Page 14: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/14.jpg)
Data Organization Patterns
Structured to Hierarchical
Partitioning
Binning
Total Ordering
Shuffling
BIG DATA SERIES 14Powered by Prognosive © 2015
![Page 15: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/15.jpg)
Structural to Hierarchical
source: MapReduce Design Patterns, Miner & Shook, O’Reilly
![Page 16: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/16.jpg)
Partitioning
![Page 17: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/17.jpg)
Binning
Uses MultipleOutputs class Emits multiple distinct files The mapper:
looks at each line iterates through a list of criteria
for each bin If the record meets the criteria, it
is sent to that bin No combiner, partitioner, or reducer
used
source: MapReduce Design Patterns, Miner & Shook, O’Reilly
![Page 18: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/18.jpg)
Total Order Sort
Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from file decides which reducer to target
Reducers = Identity reducermust = # of partitions
$ hadoop fs -cat output/part-r-*
![Page 19: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/19.jpg)
Shuffling
Mapper just outputs random K for K,V’s Reducer sorts these further randomization results
Use case: random sampling Load-balances well
BIG DATA SERIES 19Powered by Prognosive © 2015
![Page 20: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/20.jpg)
Join Patterns
Reduce-Side Joins
Replicated Joins
Composite Joins
Cartesian Product
![Page 21: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/21.jpg)
Review: Inner Join
![Page 22: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/22.jpg)
Review: Outer Join
![Page 23: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/23.jpg)
Review: Cartesian Product
![Page 24: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/24.jpg)
Reduce-side Join
BIG DATA SERIES 24Powered by Prognosive © 2015
![Page 25: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/25.jpg)
Replicated Join
Map-onlyMapper reads join file
at startup from cache store in-memory
source: MapReduce Design Patterns, Miner & Shook, O’Reilly
![Page 26: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/26.jpg)
Composite Join
Map-only Driver code handles most of the
work Hadoop does the rest
BIG DATA SERIES 26Powered by Prognosive © 2015
![Page 27: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/27.jpg)
Cartesian Product
Map-only Driver code handles
most of the work Simple mapper
![Page 28: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/28.jpg)
Input/Output Patterns
Custom Input & Output
Generating Data
External Sources
Partition Pruning
![Page 29: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/29.jpg)
MapReduce Input and Output
BIG DATA SERIES 29Powered by Prognosive © 2015
![Page 30: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/30.jpg)
Custom Inputs
![Page 31: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/31.jpg)
OutputFormats FileOutputFormat<K,V> superclass
TextOutputFormat<K,V> default output format
SequenceFileOutputFormat<K,V>
MultipleOutputs<K,V> sends to various destinations
NullOutputFormat<K,V> null output
LazyOutputFormat<K,V>
BIG DATA SERIES 31Powered by Prognosive © 2015
![Page 32: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/32.jpg)
Custom Output Extend OutputFormat usually FileOutputFormat
implement getRecordReader() returning a RecordWriter instance
Define write() in the class invoke for each K-V
write(AccountKey key, Account value) {
out.println(key.getAccountKeyId() + ‘\t
+ value.getAccountNbr());
Class: BankRecordWriter
OutputFormat
RecordWriter
![Page 33: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/33.jpg)
Generating Data
Map-Only
Good for generating sample data
MapReduce is a good tool to use
Seldom done
33
![Page 34: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/34.jpg)
External Outputs
![Page 35: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/35.jpg)
Partition Pruning
source: MapReduce Design Patterns, Miner & Shook, O’Reilly
![Page 36: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/36.jpg)
MetaPatterns
Job Chaining
Chain Folding
Job Merging
BIG DATA SERIES 36Powered by Prognosive © 2015
![Page 37: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/37.jpg)
End of Chapter
![Page 38: MapReducessource: MapReduce Design Patterns, Miner & Shook, O’Reilly Total Order Sort Mapper extracts the sort key Custom partitioner loads partition file takes data ranges from](https://reader033.vdocuments.us/reader033/viewer/2022051903/5ff4093ed158b3227431feef/html5/thumbnails/38.jpg)
Lab
38