presenters: abhishek verma, nicolas zea. map reduce clean abstraction extremely rigid 2 stage...
TRANSCRIPT
![Page 1: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/1.jpg)
Presenters: Abhishek Verma, Nicolas Zea
![Page 2: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/2.jpg)
Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance difficult
Google → MapReduce, Sawzall Yahoo → Hadoop, Pig Latin Microsoft → Dryad, DryadLINQ Improving MapReduce in heterogeneous
environment
![Page 3: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/3.jpg)
k1 v1
k2 v2
k1 v3
k2 v4
k1 v5
map
k1 v1
k1 v3
k1 v5
k2 v2
k2 v4
Outputrecords
map reduc
e
reduce
Inputrecords
Split
Split
shuffle
k1 v1
k1 v3
k2 v2
Local QSort
k1 v5
k2 v4
![Page 4: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/4.jpg)
Extremely rigid data flow Other flows hacked in
Stages Joins Splits Common operations must be coded by hand
Join, filter, projection, aggregates, sorting,distinct Semantics hidden inside map-reduce fns
Difficult to maintain, extend, and optimize
M R
M R M R
![Page 5: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/5.jpg)
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
Research
![Page 6: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/6.jpg)
Pigs Eat Anything Can operate on data w/o metadata : relational, nested, or
unstructured. Pigs Live Anywhere
Not tied to one particular parallel framework Pigs Are Domestic Animals
Designed to be easily controlled and modified by its users. UDFs : transformation functions, aggregates, grouping functions, and
conditionals. Pigs Fly
Processes data quickly(?)
6
![Page 7: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/7.jpg)
Dataflow language Procedural : different from SQL
Quick Start and Interoperability Nested Data Model UDFs as First-Class Citizens Parallelism Required Debugging Environment
7
![Page 8: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/8.jpg)
Data Model Atom : 'cs' Tuple: ('cs', 'ece', 'ee') Bag: { ('cs', 'ece'), ('cs')} Map: [ 'courses' → { ('523', '525', '599'}]
Expressions Fields by position $0 Fields by name f1, Map Lookup #
8
![Page 9: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/9.jpg)
Find the top 10 most visited pages in each category
URLCatego
ryPageRa
nk
cnn.com News 0.9
bbc.com News 0.8
flickr.com
Photos 0.7
espn.com Sports 0.9
Visits URL Info
User URL Time
Amy cnn.com 8:00
Amy bbc.com 10:00
Amy flickr.com 10:05
Fred cnn.com 12:00
![Page 10: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/10.jpg)
Load Visits
Group by url
Foreach urlgenerate count
Load Url Info
Join on url
Group by category
Foreach categorygenerate top10
urls
![Page 11: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/11.jpg)
visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);
urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);
visitCounts = join visitCounts by url, urlInfo by url;
gCategories = group visitCounts by category;topUrls = foreach gCategories
generate top(visitCounts,10);
store topUrls into ‘/data/topUrls’;
![Page 12: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/12.jpg)
visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);
urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);
visitCounts = join visitCounts by url, urlInfo by url;
gCategories = group visitCounts by category;topUrls = foreach gCategories
generate top(visitCounts,10);
store topUrls into ‘/data/topUrls’;Operates directly over files
![Page 13: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/13.jpg)
visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);
urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);
visitCounts = join visitCounts by url, urlInfo by url;
gCategories = group visitCounts by category;topUrls = foreach gCategories
generate top(visitCounts,10);
store topUrls into ‘/data/topUrls’;
Schemas 0ptional can be assigned dynamically
![Page 14: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/14.jpg)
visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);
urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);
visitCounts = join visitCounts by url, urlInfo by url;
gCategories = group visitCounts by category;topUrls = foreach gCategories
generate top(visitCounts,10);
store topUrls into ‘/data/topUrls’;UDFs can be used in every construct
![Page 15: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/15.jpg)
LOAD: specifying input data FOREACH: per-tuple processing FLATTEN: eliminate nesting FILTER: discarding unwanted data COGROUP: getting related data together
GROUP, JOIN STORE: asking for output Other: UNION, CROSS, ORDER, DISTINCT
15
![Page 16: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/16.jpg)
![Page 17: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/17.jpg)
Every group or join operation forms a map-reduce
boundary
Other operations pipelined into map and reduce phases
Load Visits
Group by url
Foreach urlgenerate count
Load Url Info
Join on url
Group by category
Foreach categorygenerate top10
urls
Map1
Reduce1 Map2
Reduce2
Map3
Reduce3
![Page 18: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/18.jpg)
Write-run-debug cycle Sandbox dataset Objectives:
Realism Conciseness Completeness
Problems: UDFs
18
![Page 19: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/19.jpg)
Optional “safe” query optimizer Performs only high-confidence rewrites
User interface Boxes and arrows UI Promote collaboration, sharing code fragments
and UDFs Tight integration with a scripting language
Use loops, conditionals of host language
![Page 20: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/20.jpg)
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu,
Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey
![Page 21: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/21.jpg)
Files, TCP, FIFO, NetworkFiles, TCP, FIFO, Networkjob schedule
data plane
control plane
NSNS PDPD PDPDPDPD
V V V
Job manager cluster
![Page 22: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/22.jpg)
Collection<T> collection;bool IsLegal(Key);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
![Page 23: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/23.jpg)
Partition
Collection
C# objects
Partitioning: Hash, Range, RoundRobin
Apply, Fork Hints
![Page 24: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/24.jpg)
Collection<T> collection;bool IsLegal(Key k);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key),
c.value};
C#
collection
results
C# C# C#
Vertexcode
Queryplan(Dryad job)Data
![Page 25: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/25.jpg)
DryadLINQ
Client machine
(11)
Distributed query
plan
C#
Query Expr
Data center
Output TablesResults
Input Tables
Invoke Query
Output DryadTa
ble
Dryad Execution
C# Objects
JM
ToDryadTable
foreach
![Page 26: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/26.jpg)
LINQ expressions converted to execution plan graph (EPG)
similar to database query plan
DAG
annotated with metadata properties
EPG is skeleton of Dryad DFG
as long as native operations are used, properties can propagate helping optimization
![Page 27: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/27.jpg)
Pipelining
Multiple operations in a single process
Removing redundancy
Eager Aggregation
Move aggregations in front of partitionings
I/O Reduction
Try to use TCP and in-memory FIFO instead of disk space
![Page 28: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/28.jpg)
As information from job becomes available, mutate execution graph Dataset size based
decisions▪ Intelligent
partitioning of data
![Page 29: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/29.jpg)
Aggregation can turn into tree to improve I/O based on locality Example if part of
computation is done locally, then aggregated before being sent across network
![Page 30: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/30.jpg)
TeraSort - scalability
240 computer cluster of 2.6Ghz dual core AMD Opterons
Sort 10 billion 100-byte records on 10-byte key
Each computer stores 3.87 GBs
![Page 31: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/31.jpg)
DryadLINQ vs Dryad - SkyServer
Dryad is hand optimized
No dynamic optimization overhead
DryadLINQ is 10% native code
![Page 32: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/32.jpg)
High level and data type transparent
Automatic optimization friendly
Manual optimizations using Apply operator
Leverage any system running LINQ framework
Support for interacting with SQL databases
Single computer debugging made easy
Strong typing, narrow interface
Deterministic replay execution
![Page 33: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/33.jpg)
Dynamic optimizations appear data intensive What kind of overhead?
EPG analysis overhead -> high latency No real comparison with other systems Progress tracking is difficult
No speculation Will Solid State Drives diminish advantages of MapReduce? Why not use Parallel Databases? MapReduce Vs Dryad How different from Sawzall and Pig?
![Page 34: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/34.jpg)
Language Sawzall Pig Latin DryadLINQ
Built by Google Yahoo Microsoft
Programming Imperative ImperativeImperative & Declarative
Hybrid
Resemblance to SQL
Least Moderate Most
Execution EngineGoogle
MapReduceHadoop Dryad
Performance * Very Efficient5-10 times
slower1.3-2 times
slower
ImplementationInternal, inside
Open Source Apache-License
Internal, inside Microsoft
ModelOperate per
recordSequence of
MRDAGs
Usage Log Analysis+ Machine Learning
+ Iterative computations
![Page 35: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/35.jpg)
Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica
University of California at Berkeley
![Page 36: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/36.jpg)
Speculative tasks executed only if no failed or waiting avail. Notion of progress
3 phases of execution
1.Copy phase
2.Sort phase
3.Reduce phase Each phase weighted by % data processed
Determines whether a job failed or is a straggler and available for speculation
![Page 37: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/37.jpg)
1. Nodes can perform work at exactly the same rate
2. Tasks progress at a constant rate throughout time
3. There is no cost to launching a speculative task on an idle node
4. The three phases of execution take approximately same time
5. Tasks with a low progress score are stragglers
6. Maps and Reduces require roughly the same amount of work
![Page 38: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/38.jpg)
Virtualization breaks down homogeneity
Amazon EC2 - multiple vm’s on same physical host
Compete for memory/network bandwidth
Ex: two map tasks can compete for disk bandwidth, causing one to be a straggler
![Page 39: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/39.jpg)
Progress threshold in Hadoop is fixed and assumes low progress = faulty node Too Many speculative tasks executed Speculative execution can harm running tasks
![Page 40: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/40.jpg)
Task’s phases are not equal
Copy phase typically the most expensive due to network communication cost
Causes rapid jump from 1/3 progress to 1 of many tasks, creating fake stragglers
Real stragglers get usurped
Unnecessary copying due to fake stragglers
Progress score means anything with >80% never speculatively executed
![Page 41: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/41.jpg)
Longest Approximate Time to End
Primary assumption: best task to execute is the one that finishes furthest into the future
Secondary: tasks make progress at approx. constant rate
Progress Rate = ProgressScore/T*
T = time task has run for
Time to completion = (1-ProgressScore)/T
![Page 42: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/42.jpg)
Launch speculative jobs on fast nodes best chance to overcome straggler vs using first
available node Cap on total number of speculative tasks ‘Slowness’ minimum threshold Does not take into account data locality
![Page 43: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/43.jpg)
Sort
EC2 test cluster 1.0-1.2 Ghz
Opteron/Xeon w/1.7 GB mem
![Page 44: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/44.jpg)
Sort
Manually slowed down 8 VM’s with background processes
![Page 45: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/45.jpg)
Grep WordCount
![Page 46: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/46.jpg)
![Page 47: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/47.jpg)
![Page 48: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/48.jpg)
1.Make decisions early2.Use finishing times3.Nodes are not equal4.Resources are precious
![Page 49: Presenters: Abhishek Verma, Nicolas Zea. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance](https://reader030.vdocuments.us/reader030/viewer/2022032803/56649e2d5503460f94b1cfb6/html5/thumbnails/49.jpg)
Focusing work on small vm’s fair? Would it be better to pay for large vm and
implement system with more customized control?
Could this be used in other systems? Progress tracking is key
Is this a fundamental contribution? Or just an optimization? “Good” research?