starfish: a self-tuning system for big data analytics · 2011-09-16 · analysis in the big data...
TRANSCRIPT
Herodotos Herodotou, Harold Lim,
Gang Luo, Nedyalko Borisov, Liang Dong,
Fatma Bilgen Cetin, Shivnath Babu
Duke University
Analysis in the Big Data Era
01/12/2011 Duke University 2
Automate decision processes
Increase cost savings and revenue
Massive Data
Data Analysis
Insight
Key to Success = Timely and Cost-Effective Analysis
Analysis in the Big Data Era
01/12/2011 Duke University 3
Popular option
Hadoop software stack
MapReduce Execution Engine
Distributed File System
Hadoop
OozieHivePig Elastic MRJava Client …
Analysis in the Big Data Era
01/12/2011 Duke University 4
Popular option
Hadoop software stack
Burden on the users
Responsible for provisioning & configuration
Usually lack expertise to tune the system
Challenges
Tasks expressed in general-purpose programming
languages
Input data stored as files and interpreted at run-time
Starfish: Self-Tuning System
01/12/2011 Duke University 5
NOT our goal: Improve Hadoop’s peak performance
Our goal: Provide good performance automatically
MapReduce Execution Engine
Distributed File System
Hadoop
OozieHivePig Elastic MRJava Client
Starfish
Analytics System
…
MR Job
Word Counts For Docs
Workload on a Starfish Cluster MapReduce (MR) Job
Workflow
Physical: directed graph of MR job nodes
Logical: directed graph of SPJA & UDF nodes
Workload: Collection of workflows
01/12/2011 Duke University 6
Count clicks
per url type
Join
Filter
age < 21
Load users
(id, age, ip)
Load clicks
(id, url, count)
MR Job 3
Compute TF-IDF
MR Job 2
Word Counts For Docs
MR Job 1
Word Frequency in Doc
Starfish Architecture
01/12/2011 Duke University 7
What-if
Engine
Workflow-level tuning
Workflow-aware
Scheduler
Workload-level tuning
Workload Optimizer Elastisizer
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
Just-in-Time Optimizer
Profiler
Job-level tuning
Sampler
What-if
Engine
Starfish Architecture
01/12/2011 Duke University 8
Workflow-level tuning
Workflow-aware
Scheduler
Workload-level tuning
Workload Optimizer Elastisizer
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
Just-in-Time Optimizer
Profiler
Job-level tuning
Sampler
Job Configuration Parameters
01/12/2011 Duke University 9
Over 190 parameters
Many affect performance in complex ways
Impact depends on Job, Data, and Cluster properties
Current Approach
01/12/2011 Duke University 10
Rules of thumb
mapred.reduce.tasks = 0.9 * number_of_reduce_slots
io.sort.record.percent = 16 / (16 + average_record_size)
Rules of thumb may not suffice
Rules of
thumb
Just-in-Time Job Optimization Just-in-Time Optimizer
Searches through the high-dimensional space of
parameter settings
What-if Engine
Uses mix of simulation and model-based estimation
Sampler
Collects statistics about input, intermediate, and output
key-value spaces of MapReduce jobs
Profiler
Collects information about MR job executions
01/12/2011 Duke University 11
Job Profiler Dynamic instrumentation
Monitor specific components in a system
Collect run-time information
Benefits
Zero overhead when it is turned off
Works with unmodified MapReduce programs
Used to construct a job profile
Concise representation of the job execution
Allows for in-depth analysis of the job behavior
12/07/2010 Herodotos Herodotou 12
What-if
Engine
Starfish Architecture
01/12/2011 Duke University 13
Workflow-level tuning
Workflow-aware
Scheduler
Workload-level tuning
Workload Optimizer Elastisizer
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
Just-in-Time Optimizer
Profiler
Job-level tuning
Sampler
Job Workflows Producer-Consumer
relationships among jobs
Data Layout crucial for later
jobs
Effective use of parallelism
Task scheduling
Major Problem
Unbalanced data layouts
01/12/2011 Duke University 14
Unbalanced Data Layouts Issues with data-locality-aware schedulers
Performance degradation due to reduced parallelism
Further unbalanced layout due to job outputs
01/12/2011 Duke University 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
5
10
15
20
Data Node
Dis
k U
sage
(%)
Map-only aggregation
(non-data local)
Map-only aggregation
(data local)
Partition (replication
count = 1)
Initial layout
Workflow-Aware Scheduler Goal: Optimize overall performance of workflow
Select best data layout + job parameters
Space of options
Block placement policy
Replication factor
Block size
Output compression
Approach
Simulate task scheduling and block placement policies
Perform cost-based search
01/12/2011 Duke University 16
Starfish Architecture
01/12/2011 Duke University 17
What-if
Engine
Workflow-level tuning
Workflow-aware
Scheduler
Workload-level tuning
Workload Optimizer Elastisizer
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
Just-in-Time Optimizer
Profiler
Job-level tuning
Sampler
Optimizing Starfish Workloads Data-flow sharing
Materialization
Reorganization
01/12/2011 Duke University 18
MapReduce Execution Engine
Distributed File System
Hadoop
OozieHivePig Elastic MRJava Client …
Starfish
Analytics System
Jumbo Operator Single MapReduce job to process multiple Select-
Project-Aggregate operations over a table
Enables sharing of scans, computation, sorting,
shuffling, and output generation
01/12/2011 Duke University 19
0
200
400
600
800
1000
Ru
nn
ing
Tim
e (s
ec) Serial
Concurrent
Jumbo (Share Scans)
Jumbo (Share Interm)
Challenges and Opportunities1. Study the interactions among optimizations across
different levels
2. Construct a What-if Engine that can model these
interactions
01/12/2011 Duke University 20
What-if
Engine
In
ter
ac
tio
ns
Schedule on Map & Reduce Slots
Translate to MapReduce jobs
Optimize logical workloads
Elastisizer – Hadoop Provisioning Goal: Make provisioning decisions based on workload
requirements (e.g., completion time, cost)
01/12/2011 Duke University 21
0
2000
4000
6000
8000
10000
12000
m1.small m1.large m1.xlarge
Exec
uti
on
Tim
e (
sec)
Node Type on Amazon EC2
2 nodes 4 nodes 6 nodes
$0.00
$1.00
$2.00
$3.00
$4.00
$5.00
$6.00
m1.small m1.large m1.xlarge
Cost
($)
Node Type on Amazon EC2
2 nodes 4 nodes 6 nodes
Starfish: Self-Tuning SystemFocus simultaneously on
Different workload granularities
Workload
Workflows
Jobs (procedural and declarative)
Across various decision points
Provisioning
Optimization
Scheduling
Data layout
01/12/2011 Duke University 22