starfish: a self-tuning system for big data analytics
DESCRIPTION
Starfish: A Self-tuning System for Big Data Analytics. Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu. Duke University. Analysis in the Big Data Era. Data Analysis. Massive Data. Insight. Key to Success = Timely and Cost-Effective Analysis. Hadoop MapReduce Ecosystem. - PowerPoint PPT PresentationTRANSCRIPT
Starfish: A Self-tuning System for Big Data Analytics
Herodotos Herodotou,Harold Lim, Fei Dong, Shivnath Babu
Duke University
2
Analysis in the Big Data Era
9/26/2011
Massive Data
DataAnalysi
s
Insight
Key to Success = Timely and Cost-Effective Analysis
Starfish
3
Hadoop MapReduce EcosystemPopular solution to Big Data Analytics
9/26/2011
MapReduce Execution Engine
Distributed File System
Hadoop
Java / C++ / R / Python OozieHivePig Elastic
MapReduceJaql
HBase
Starfish
4
Practitioners of Big Data AnalyticsWho are the users?
Data analysts, statisticians, computational scientists…Researchers, developers, testers…You!
Who performs setup and tuning?The users!Usually lack expertise to tune the system
9/26/2011 Starfish
5
Tuning ChallengesHeavy use of programming languages for
MapReduce programs (e.g., Java/python)
Data loaded/accessed as opaque files
Large space of tuning choices
Elasticity is wonderful, but hard to achieve (Hadoop has many useful mechanisms, but policies are lacking)
Terabyte-scale data cycles
9/26/2011 Starfish
6
Our goal: Provide good performance automatically
Starfish: Self-tuning System
9/26/2011
MapReduce Execution Engine
Distributed File System
Hadoop
Java / C++ / R / Python OozieHivePig Elastic
MapReduceJaql
HBase
StarfishAnalytics System
Starfish
7
What are the Tuning Problems?
9/26/2011
Job-level MapReduce
configuration
Workload management
Datalayout tuning
Cluster sizing
Workflow optimization
J1 J2
J3
J4
Starfish
8
Starfish’s Core Approach to Tuning
9/26/2011
1) if Δ(conf. parameters) then what …? 2) if Δ(data properties) then what …? 3) if Δ(cluster properties) then what …?
ProfilerCollects concisesummaries of
execution
What-if EngineEstimates impact of hypothetical
changes on execution
OptimizersSearch through space of tuning choices
Job
WorkflowWorkload
Data layout
Cluster
Starfish
Starfish Architecture
9/26/2011 9
Profiler What-if EngineWorkflow Optimizer
Workload Optimizer Elastisizer
Job Optimizer
Data ManagerMetadata
Mgr.Intermediate
Data Mgr.Data Layout & Storage Mgr.
Starfish
10
MapReduce Job Execution
9/26/2011
split 0 map out 0reduce
Two Map Waves One Reduce Wave
split 2 map
split 1 map split 3 map Out 1reduce
job j = < program p, data d, resources r, configuration c >
Starfish
11
What Controls MR Job Execution?
Space of configuration choices:Number of map tasksNumber of reduce tasksPartitioning of map outputs to reduce tasksMemory allocation to task-level buffersMultiphase external sorting in the tasksWhether output data from tasks should be compressedWhether combine function should be used
9/26/2011
job j = < program p, data d, resources r, configuration c >
Starfish
12
Effect of Configuration Settings
Use defaults or set manually (rules-of-thumb)Rules-of-thumb may not suffice
9/26/2011
Two-dimensional projection of a multi-dimensional surface(Word Co-occurrence MapReduce Program)
Rules-of-thumb settings
Starfish
13
MapReduce Job Tuning in a NutshellGoal:
Challenges: p is an arbitrary MapReduce program; c is high-dimensional; …
9/26/2011
),,,(minarg crdpFcSc
opt
),,,( crdpFperf
Profiler
What-if Engine
Optimizer
Runs p to collect a job profile (concise execution summary) of <p,d1,r1,c1> Given profile of <p,d1,r1,c1>, estimates virtual profile for <p,d2,r2,c2>Enumerates and searches through the optimization space S efficiently
Starfish
14
Job ProfileConcise representation of program execution as a jobRecords information at the level of “task phases”Generated by Profiler through measurement or by the
What-if Engine through estimation
9/26/2011
Memory Buffer
Merge
Sort,[Combine],[Compress]
Serialize,Partitionmap
Merge
split
DFS
SpillCollectMapReadStarfish
15
Job Profile FieldsDataflow: amount of data flowing through task phasesMap output bytes
Number of spills
Number of records in buffer per spill
9/26/2011
Costs: execution times at the level of task phasesRead phase time in the map task
Map phase time in the map task
Spill phase time in the map task
Dataflow Statistics: statistical information about dataflowWidth of input key-value pairs
Map selectivity in terms of records
Map output compression ratio
Cost Statistics: statistical information about resource costsI/O cost for reading from local disk per byte
CPU cost for executing the Mapper per record
CPU cost for uncompressing the input per byte
Starfish
16
Generating Profiles by MeasurementGoals
Have zero overhead when profiling is turned offRequire no modifications to HadoopSupport unmodified MapReduce programs written in
Java or Hadoop Streaming/Pipes (Python/Ruby/C++)
Approach: Dynamic (on-demand) instrumentationEvent-condition-action rules are specified (in Java)Leads to run-time instrumentation of Hadoop internalsMonitors task phases of MapReduce job executionWe currently use Btrace (Hadoop internals are in Java)
9/26/2011 Starfish
17
Generating Profiles by Measurement
9/26/2011
split 0 map out 0reduce
split 1 map
raw data
raw data
raw data
map profile
reduce profile
job profile
Use of Sampling• Profile fewer tasks• Execute fewer tasks
JVM = Java Virtual Machine, ECA = Event-Condition-Action
JVM JVM
JVM
Enable Profiling
ECA rules
Starfish
18
What-if Engine
Job Oracle
Virtual Job Profile for <p, d2, r2, c2>
What-if Engine
9/26/2011
Task Scheduler Simulator
JobProfile
<p, d1, r1, c1>
Properties of Hypothetical job
Input DataProperties
<d2>
ClusterResources
<r2>
ConfigurationSettings
<c2>
Possibly Hypothetical
Starfish
19
Virtual Profile Estimation
9/26/2011
Given profile for job j = <p, d1, r1, c1> estimate profile for job j' = <p, d2, r2, c2>
(Virtual) Profile for j'DataflowStatistics
Dataflow
CostStatistics
Costs
Profile for jInput
Data d2
Confi-guration
c2
Resourcesr2
Costs
White-box Models
CostStatisticsRelative
Black-boxModels
Dataflow
White-box Models
DataflowStatistics
CardinalityModels
Starfish
20
Job Optimizer
9/26/2011
Best Configuration Settings <copt> for <p, d2, r2>
Subspace Enumeration
Recursive Random Search
Just-in-Time Optimizer
JobProfile
<p, d1, r1, c1>
Input DataProperties
<d2>
ClusterResources
<r2>
What-ifcalls
Starfish
21
Workflow Optimization Space
9/26/2011
Job-level Configuration
Dataset-level Configuration
Physical
Optimization Space
Logical
Join Selection
Partition Function Selection
Vertical Packing
Inter-job Inter-job
Starfish
22
Optimizations on TF-IDF Workflow
9/26/2011
LogicalOptimization
M1R1
…D0 <{D},{W}>
J1
D1
D2
D4
M2R2
…<{D, W},{f}>
J2
…<{D},{W, f, c}>
J3, J4
…<{W},{D, t}>
Partition:{D}Sort: {D,W}
M1R1M2R2
…D0 <{D},{W}>
J1, J2
D2
D4
M3R3M4
…<{D},{W, f, c}>
J3, J4
…<{W},{D, t}>
PhysicalOptimization
Reducers= 50Compress = offMemory = 400…
…
…
…
Reducers= 20Compress = onMemory = 300…
LegendD = docname f = frequencyW = word c = countt = TF-IDF
M3R3M4
Starfish
23
New ChallengesWhat-if challenges:
Support concurrent job execution
Estimate intermediate data properties
Optimization challengesInteractions across jobsExtended optimization spaceFind good configuration
settings for individual jobs
9/26/2011
J1 J2
J3
J4
Workflow
Starfish
24
Cluster Sizing ProblemUse-cases for cluster sizing
Tuning the cluster size for elastic workloadsWorkload transitioning from development cluster to
production clusterMulti-objective cluster provisioning
GoalDetermine cluster resources & job-level configuration
parameters to meet workload requirements
9/26/2011 Starfish
25
Multi-objective Cluster Provisioning
9/26/2011
m1.small m1.large m1.xlarge c1.medium c1.xlarge0
200400600800
1,0001,200
Run
ning
Tim
e (m
in)
m1.small m1.large m1.xlarge c1.medium c1.xlarge0.002.004.006.008.00
10.00
EC2 Instance Type
Cos
t ($)
Cloud enables users to provision clusters in minutes
Starfish
Experimental Evaluation
9/26/2011 26
Starfish (versions 0.1, 0.2) to manage Hadoop on EC2Different scenarios: Cluster × Workload × Data
EC2 Node Type
CPU: EC2 units
Mem I/O Perf. Cost /hour
#Maps /node
#Reds/node
MaxMem /task
m1.small 1 (1 x 1) 1.7 GB moderate $0.085 2 1 300 MB
m1.large 4 (2 x 2) 7.5 GB high $0.34 3 2 1024 MB
m1.xlarge 8 (4 x 2) 15 GB high $0.68 4 4 1536 MB
c1.medium 5 (2 x 2.5) 1.7 GB moderate $0.17 2 2 300 MB
c1.xlarge 20 (8 x 2.5) 7 GB high $0.68 8 6 400 MB
cc1.4xlarge 33.5 (8) 23 GB very high $1.60 8 6 1536 MB
Starfish
Experimental Evaluation
9/26/2011 27
Starfish (versions 0.1, 0.2) to manage Hadoop on EC2Different scenarios: Cluster × Workload × Data
Abbr. MapReduce Program Domain Dataset
CO Word Co-occurrence Natural Lang Proc. Wikipedia (10GB – 22GB)
WC WordCount Text Analytics Wikipedia (30GB – 1TB)
TS TeraSort Business Analytics TeraGen (30GB – 1TB)
LG LinkGraph Graph Processing Wikipedia (compressed ~6x)
JO Join Business Analytics TPC-H (30GB – 1TB)
TF Term Freq. - Inverse Document Freq.
Information Retrieval Wikipedia (30GB – 1TB)
Starfish
28
Job Optimizer Evaluation
9/26/2011
Hadoop cluster: 30 nodes, m1.xlargeData sizes: 60-180 GB
TS WC LG JO TF CO0
10
20
30
40
50
60
Default Set-tingsRule-based OptimizerCost-based Optimizer
MapReduce Programs
Spee
dup
over
Def
ault
Starfish
29
Estimates from the What-if Engine
9/26/2011
Hadoop cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB Wikipedia
True surface Estimated surface
Starfish
30
Profiling Overhead Vs. Benefit
9/26/2011
1 5 10 20 40 60 80 1000
5
10
15
20
25
30
35
Percent of Tasks Profiled
Perc
ent O
verh
ead
over
Job
R
unni
ng T
ime
with
Pro
filin
g T
urne
d O
ff
1 5 10 20 40 60 80 1000.0
0.5
1.0
1.5
2.0
2.5
Percent of Tasks Profiled
Spee
dup
over
Job
run
w
ith R
BO
Set
tings
Hadoop cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB Wikipedia
Starfish
31
Multi-objective Cluster Provisioning
9/26/2011
m1.small m1.large m1.xlarge c1.medium c1.xlarge0
200400600800
1,0001,200
ActualPredicted
Run
ning
Tim
e (m
in)
m1.small m1.large m1.xlarge c1.medium c1.xlarge0.002.004.006.008.00
10.00
ActualPredicted
EC2 Instance Type for Target Cluster
Cos
t ($)
Instance Type for Source Cluster: m1.large
Starfish
32
More info: www.cs.duke.edu/starfish
9/26/2011
Job-level MapReduce
configuration
Workflow optimization
Workload management
Datalayout tuning
Cluster sizing
J1 J2
J3
J4
Starfish