app engine mapreduce - csu east bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf ·...
TRANSCRIPT
![Page 1: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/1.jpg)
![Page 2: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/2.jpg)
App Engine MapReduce
Mike Aizatsky11 May 2011
Hashtags: #io2011 #AppEngine Feedback: http://goo.gl/SnV2i
![Page 3: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/3.jpg)
Agenda
● MapReduce Computational Model
● Mapper library
● Announcement
● Technical bits:
○ Files API
○ User-space shuffling
● MapReduce & Pipeline API
● Examples and Demos
![Page 4: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/4.jpg)
MapReduce Computational Model
![Page 5: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/5.jpg)
MapReduce
● A model to do efficient distributed computing over large data sets.
● Used at Google for years
● Every project uses MapReduce!
![Page 6: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/6.jpg)
MapReduce Computational Model
![Page 7: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/7.jpg)
Map
● �Input: user data
● Output: (key, value) pairs
● User code
![Page 8: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/8.jpg)
Shuffle
● Collates value with the same key
● �Input: (key, value) pairs
● Output: (key, [value]) pairs
● No user code
![Page 9: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/9.jpg)
Reduce
● �Input: (key, [value]) pairs
● Output: user data
● User code
![Page 10: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/10.jpg)
MapReduce Computational Model
![Page 11: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/11.jpg)
Common App Engine Approach
● Take what works for us at Google
● Give it to people
![Page 12: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/12.jpg)
App Engine & Google's MapReduce
● Additional scaling dimension:
○ Lots and lots of applications
○ Many of them will run MapReduce at the same time
● Isolation: application shouldn't influence performance of the other
![Page 13: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/13.jpg)
App Engine & Google's MapReduce
● Rate limiting: you don't want to burn all day's resources in 15min and kill your online traffic
● Very slow execution: free apps want to go really slow, staying under their resource limint
● Protection: from malicious App Engine users
![Page 14: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/14.jpg)
Mapper
![Page 15: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/15.jpg)
Mapper Library
● Released at Google I/O 2010
● Heavily used by developers outside and inside Google (admin console, new indexer pipeline, etc.)
● Has seen lots of improvements since
![Page 16: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/16.jpg)
Mapper Library Improvements
● Control API - start your jobs programmatically (and transactionally)
● Custom mutation pools - batch work between map function calls
● Namespaces support - iterate over data in different namespaces or over namespaces themselves
● Better sharding with scatter indices
● And more!
![Page 17: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/17.jpg)
Mapper => MapReduce?
● Storage system for intermediate data:
○ Files API, released in 1.4.3 (March 2011)
● Shuffler
● Lots of glue code
![Page 18: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/18.jpg)
Launching Shuffler Functionality
● In-memory, user-space, task-driven shuffle for small (100Mb) datasets.
● Trusted testers access to big shuffler.
● All the integration pieces needed to run your own mapreduce jobs are part of Mapper library.
● Mapper library => Mapreduce library!
● Python today, Java soon.
http://mapreduce.appspot.com
![Page 19: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/19.jpg)
Examples
![Page 20: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/20.jpg)
Example 1: Word Count
# Mapdef map(line): for w in clean(line).split(): yield (w, '')
# Reducedef reduce(key, values): yield (key, len(values))
Zed's dead, baby, Zed's dead!
('zed's', ''), ('dead', ''), ('baby', ''), ('zed's', ''), ('dead', '')
('zed's', ['', '']), ('dead', ['', '']), ('baby', [''])
('zed's', 2), ('dead', 2'), ('baby', 1)
![Page 21: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/21.jpg)
Demo
![Page 22: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/22.jpg)
Example 2: Inverse Index
# Mapdef map(line, filename): for w in clean(line).split(): yield (w, filename)
# Reducedef reduce(key, values): yield (key, list(set(values)))
![Page 23: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/23.jpg)
Demo
![Page 24: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/24.jpg)
MapReduce: Technical Bits
![Page 25: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/25.jpg)
Technical Bits
● Files API: solution to MapReduce storage problem
● User-space shuffler
![Page 26: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/26.jpg)
Files API
![Page 27: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/27.jpg)
Mapreduce Storage
● Mapreduce jobs generate lots of intermediate data.
● Datastore: expensive, 1MB entity limit
● Blobstore: read-only
● Memcache: small, volatile
![Page 28: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/28.jpg)
Files API
● Familiar, files-like interface to various virtual file systems.
● Released in 1.4.3, integrated with Mapper library.
● Considered to be a low-level API.
![Page 29: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/29.jpg)
Files API
● Files have two states: writable and readable.
● Start in writable. Moved to readable by "finalization".
● Can't read writable, can't write to readable.
● Write is append-only, atomic and fully serializable between concurrent clients.
● Concrete filesystems might have their own reliability constraints and/or additional APIs.
![Page 30: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/30.jpg)
Blobstore Filesystem
● Write directly to blobstore.
● Files can be >2G.
● Finalized files are durable.
● Writable files are not (just restart your MapReduce)
● Can fetch a blob key for finalized files and use blobstore api.
![Page 31: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/31.jpg)
Blobstore Filesystem Python Example
from google.appengine.api import filesfrom __future__ import with_statement
# Create the file.file_name = files.blobstore.create()# Open the file for append.with files.open(file_name, 'a') as f: f.write('data')
# All data is in. Finalize the file.files.finalize(file_name)
![Page 32: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/32.jpg)
Blobstore Filesystem Python Example
# Open the file for read.with files.open(file_name, 'r') as f: data = f.read(4)
# Fetch blobkey for blobstore api.blob_key = files.blobstore.get_blob_key(file_name)
![Page 33: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/33.jpg)
Mapper Integration
# mapreduce.yaml...mapper: output_writer: mapreduce.output_writers.BlobstoreOutputWriters
# Handler functiondef map(entity): yield entity.to_csv_line() + '\n'
![Page 34: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/34.jpg)
Low-level Features
● Exclusive locks: files can be opened exclusively by a single client only.
● Sequence keys: each write can have a "sequence key" attached. Our backends make sure that they only increase.
![Page 35: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/35.jpg)
Future Plans
● "Tempfile" file system: much faster, much cheaper, but not durable, several days of storage only (geared specifically towards MapReduce)
● Integrations with other Google storage technologies and other reliability guarantees
![Page 36: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/36.jpg)
User-Space Shuffler
![Page 37: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/37.jpg)
User-Space Shuffler
● Consolidates values for the same key together.● [(key, value)] => [(key, [value])]
● Should be reasonably fast, scalable and efficient.
● User-space: full source code, no new AppEngine components.
![Page 38: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/38.jpg)
Take 1
● Load all data into memory
● Sort
● Read sorted array
![Page 39: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/39.jpg)
Take 1
![Page 40: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/40.jpg)
Take 1 Properties
● Memory-bound
● No parallelism
![Page 41: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/41.jpg)
Take 2
● Sort chunks of data and store them back to Files API
● Merge-sort all chunks (or merge-read)
![Page 42: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/42.jpg)
Take 2
![Page 43: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/43.jpg)
Take 2 Properties
● No longer memory-bound
● Sorting is parallel
● Merge phase is not parallel
● Difficult (and slow) to read from too many files
![Page 44: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/44.jpg)
Take 3
● Shard mapper output by key hash code
● Sort each shard into chunks
● Merge-read each shard
![Page 45: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/45.jpg)
Take 3
![Page 46: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/46.jpg)
Take 3 Properties
● No longer memory-bound
● Sorting is parallel
● Merge phase is now parallel
● This is the shuffler we release today.
![Page 47: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/47.jpg)
MapReduce & Pipeline
![Page 48: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/48.jpg)
Pipeline API
● New API to chain complex work together.
● A glue which holds Mapper + Shuffler + Reducer together.
● MapReduce library is fully integrated with Pipeline.
● For in-depth look visit "Large-scale Data Analysis Using the App Engine Pipeline API" talk later today.
![Page 49: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/49.jpg)
More Complex Example
![Page 50: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/50.jpg)
Example 3: Distinguishing Phrases
# Mapdef map(text, filename): for words in ngrams(text): yield (words, filename)
# Reducedef reduce(key, values):if len(values) < 10:returnfor filename, count in count_occurences(values):if count > len(values) / 2:yield (key, filename)
![Page 51: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/51.jpg)
Demo
![Page 52: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/52.jpg)
Summary
● Small & Medium MapReduce jobs can be run by anyone today!
● Contact us for getting access to Large MapReduce jobs.
http://mapreduce.appspot.com
![Page 53: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/53.jpg)
Questions?
Hashtags: #io2011 #AppEngine Feedback: http://goo.gl/SnV2i
![Page 54: App Engine MapReduce - CSU East Bayalgebra.sci.csueastbay.edu/.../appengine_mapreduce.pdf · 2012-05-09 · Launching Shuffler Functionality In-memory, user-space, task-driven shuffle](https://reader035.vdocuments.us/reader035/viewer/2022070719/5edef56fad6a402d666a51d2/html5/thumbnails/54.jpg)