map-reduce - rose-hulman.edu€¦ · map-reduce curt clifton rose-hulman institute of technology...
TRANSCRIPT
![Page 1: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/1.jpg)
MAP-REDUCECurt Clifton
Rose-Hulman Institute of Technology
SVN Update ErlangInClass
![Page 2: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/2.jpg)
GOOGLE’S MAP-REDUCE
Described by Jeffrey Dean and Sanjay Ghemawat [OSDI 2004]
Relies on the Google File System for storing massive data sets across thousands of commodity drives
Open source version implemented by Yahoo!, et al
![Page 3: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/3.jpg)
FUNCTIONS FTW
Algorithms implemented by a pair of functions
map: processes a key/value pair, generates a set of new key/value pairs
reduce: gets a single key and a set of all associated values, processes the set into a single result for the key
Automatically parallelized and distributed!
![Page 4: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/4.jpg)
EXAMPLE: INDEXING
map:
takes a (URL, textual contents) pair
emits a list of (word, URL) pairs
reduce:
takes every URL for a given word
produces a (word, [URL]) pair
![Page 5: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/5.jpg)
GOOGLE FILE SYSTEM
Disk Disk Disk Disk Disk
Disk Disk Disk Disk Disk
Disk Disk Disk Disk Disk
Disk Disk Disk Disk Disk
Disk
Disk
Disk
Disk
map(k,v)
map(k,v) map(k,v) map(k,v)
map(k,v)
map(k,v)map(k,v)
Disk
Disk Disk
Disk
reduce(k,v)
reduce(k,v)
reduce(k,v)
reduce(k,v)
reduce(k,v)
![Page 6: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/6.jpg)
TYPES
map :: (Key k1, Key k2, Value v1, Value v2) => k1 -> v1 -> [(k2, v2)]
reduce :: (Key k2, Value v2, Value v3) => k2 -> [v2] -> v3
![Page 7: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/7.jpg)
OTHER EXAMPLES
Inverted Index
Distributed Grep
Count of URL Access Frequency
Reverse Web-Link Graph
Q1
![Page 8: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/8.jpg)
PAGE RANK: RANDOM WALK OF THE WEB
Suppose user starts at a random page
Surfs by either:
Clicking some link from the page at random, or
Entering a new random URL
What is the probability that she arrives at a given page?
![Page 9: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/9.jpg)
THE FORMULA
Given a page A, and pages T1–Tn that link to A, page rank of A is:
where:
C(Ti) is the number of edges leaving page Ti
d represents the likelihood of a user clicking (rather than randomly entering a new URL)
PR(A) = (1! d) + d
!PR(T1)C(T1)
+ . . . +PR(Tn)C(Tn)
"
Q2
![Page 10: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/10.jpg)
PAGE RANK USINGMAP-REDUCE
Phase 1:
map:: URL -> pageText -> [(URL, (1, [targetURL]))]
reduce is just identity function
Multiple Passes!
PRinit
![Page 11: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/11.jpg)
PAGE RANK USINGMAP-REDUCE
Phase 2:
map :: URL -> (currentRank, [targetURL]) -> (URL, [targetURL]) : [(targetURL, partialRank)]
reduce :: targetURL -> ([targetsTargets]) : [partialRank] -> (targetURL, (newRank, [targetsTargets]))
map-reduce isn’t statically typed!
currentRank / len([targetURL])
∑[partialRank]
Repeat Phase
2 until it
converges!
![Page 12: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/12.jpg)
FAULT TOLERANCE
Google file system stores data in triplicate!
![Page 13: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/13.jpg)
HADOOP
Yahoo’s open source implementation of
Google File System
Map-Reduce
Includes several interfaces: Java, pipes (including bash, perl, and Python), and Pig
![Page 14: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/14.jpg)
DEMO
![Page 15: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/15.jpg)
PAC-MANDUE NEXT THURSDAY
CAN PAIR PROGRAM THIS ONE
![Page 16: MAP-REDUCE - rose-hulman.edu€¦ · MAP-REDUCE Curt Clifton Rose-Hulman Institute of Technology SVN Update ErlangInClass. GOOGLE’S MAP-REDUCE Described by Jeffrey Dean and Sanjay](https://reader034.vdocuments.us/reader034/viewer/2022042911/5f45152b856c8f5411384cce/html5/thumbnails/16.jpg)
ACKNOWLEDGEMENTS
Slides contain material © 2008 Google, Inc. and © Spinaker Labs, Inc., distributed under the Creative Commons Attribution 2.5 license.
Original materials from the 2008 NSF Data-Intensive Scalable Computing in Education Workshop, Seattle, WA.