presented by nirupam roy starfish: a self-tuning system for big data analytics herodotos herodotou,...
TRANSCRIPT
![Page 1: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/1.jpg)
Presented by Nirupam Roy
Starfish: A Self-tuning System for Big Data Analytics
Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, Shivnath Babu
Department of Computer ScienceDuke University
![Page 2: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/2.jpg)
The Growth of Data
![Page 3: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/3.jpg)
MAD: Features of Ideal Analytics System
Magnetism
Agility
Depth
-- accept all data
-- allow complex analysis
-- adapt with data, real-time processing
![Page 4: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/4.jpg)
Magnetism
Agility
Depth
-- accept all data
-- adapt with data, real-time processing
-- allow complex analysis
Hadoop is MAD
- Blindly loads data into HDFS.
- Fine-grained scheduler- End-to-end data pipeline- Dynamic node addition/ dropping
- Well integrated with programming languages
![Page 5: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/5.jpg)
Tuning for Good Performance: Challenges
- Multiple dimensions of performance-- time, cost, scalability …
- Tons of Parameters-- more than 190 parameters in Hadoop.
- Multiple levels of abstraction-- job-level, workflow-level, workload-level …
![Page 6: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/6.jpg)
Thumb rule
Tuning for Good Performance: Challenges
![Page 7: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/7.jpg)
Thumb rule
Tuning for Good Performance: Challenges
![Page 8: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/8.jpg)
Starfish: A Self-tuning System
- Builds on Hadoop- Tunes to ‘good’ performance automatically
![Page 9: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/9.jpg)
Starfish Architecture
![Page 10: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/10.jpg)
The “What-if” Engine
Model + simulation based prediction algo.
Predicted performance
Learning from previous job
profiles
Analytical models to estimate
dataflow
Simulating the execution of MR
workload
Profile of a job (P)
+New
parameter set (S)
[Ref:] A What-if Engine for Cost-based MapReduce Optimization. H. Herodotou et.al.
![Page 11: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/11.jpg)
The “What-if” Engine
Ground truth Estimated by the What-if engine
![Page 12: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/12.jpg)
Starfish Architecture: Job Level
![Page 13: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/13.jpg)
Starfish Architecture: Job Level
Just-in-time optimizer-- Searches the parameter space
Profiler-- Collects info. on MapReduce job execution through dynamic instrumentation-- Reports timings, data size, and resource utilizationSampler-- Generates profile statistics from training benchmark jobs
![Page 14: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/14.jpg)
Starfish Architecture: Workflow Level
![Page 15: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/15.jpg)
Starfish Architecture: Workflow Level
Scheduler to balanced distribution of data
Block placement policy for data collocation
-- deals with skewed data, add/drop of nodes, tradeoff between balanced data v/s data-locality
-- Local-write v/s round-robin
![Page 16: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/16.jpg)
Starfish Architecture: Workflow Level
Producer
Consumer
Wasted production
![Page 17: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/17.jpg)
Starfish Architecture: Workflow Level
File level parallelism
Block level parallelism
![Page 18: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/18.jpg)
Starfish Architecture: Workflow Level
What-if simulation
Workflow Aware OptimizerSelect best data layout and job parameters
• MR job execution• Task scheduling• Block placement
Compare cost & benefits
Running time?
Data layout?
![Page 19: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/19.jpg)
Starfish Architecture: Workload Level
![Page 20: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/20.jpg)
Starfish Architecture: Workload Level
Workload Optimizer
Elastisizer • Determine best cluster and Hadoop configurations
• Jumbo operator• Cost based estimation for
best optimization
![Page 21: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/21.jpg)
Starfish: Summary
- Optimizes on different granularities-- Workload, workflow, job (procedural & declarative)
- Considers different decision points-- Provisioning, optimization, Scheduling, Data layout
![Page 22: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/22.jpg)
Starfish: Piazza Discussion
1) Limited evaluation: 10
Top criticisms (till 1:30pm, 17 reviews):
2) Not explained well: 7 3) Profiler overhead/better search algo: 5
* What is the effect of wrong prediction?
* What-if engine requires prior knowledge.
![Page 23: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/23.jpg)
http://www.cs.duke.edu/starfish/
Thank you.
Photo courtesy: Starfish group, Duke University
![Page 24: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/24.jpg)
Going MAD with Big Data
Magnetic system
Agile system and Analytics
Deep Analytics
Data Life Cycle Awareness
Elasticity
Robustness
![Page 25: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/25.jpg)
Backup: What-if Engine 1
![Page 26: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d1f5503460f949f240d/html5/thumbnails/26.jpg)
Backup: What-if Engine 2