meeting service level objectives of pig programs zhuoyao zhang, ludmila cherkasova, abhishek verma,...
TRANSCRIPT
![Page 1: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/1.jpg)
Meeting Service Level Objectives of Pig Programs
Zhuoyao Zhang, Ludmila Cherkasova,
Abhishek Verma, Boon Thau Loo
University of PennsylvaniaHewlett-Packard Labs
![Page 2: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/2.jpg)
Cloud Environment
•Advantages▫Large amount of resources▫Elasticity ▫Pay-as-you-go pricing model
•Challenges▫Distributed resources▫Error-prone
![Page 3: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/3.jpg)
MapReduce and Pig
•MapReduce: Simple and fault tolerant framework for data processing in the cloud
•Pig▫Advanced MapReduce based platform▫Widely used: Yahoo!, Twitter, LinkedIn▫PigLatin: A high-level declaratice language
for expressing data analysis tasks as Pig programs
j1
j2
j3
j4
j5
j6
j7
![Page 4: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/4.jpg)
Motivation•Latency-sensitive applications
▫Personalized advertising▫Spam and fraud detection▫Real-time log analysis
•How much resource does an application need to meet their deadlines?
![Page 5: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/5.jpg)
Contributions•Performance modeling for Pig programs▫Given a Pig grogram, estimates its
completion time as a function of assigned resource
•Deadline driven resource allocation estimates for Pig programs▫Given a completion time target,
determine the amount of resources for a Pig program to achieve it
![Page 6: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/6.jpg)
Outline•Introduction•Building block
▫Performance model for single MapReduce jobs
•Resource allocation for Pig programs
•Evaluation•Conclusion and ongoing work
![Page 7: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/7.jpg)
Theoretical Makespan Bounds•Bounds- based makespan estimates
▫n tasks, k servers▫avg: average duration of the n tasks▫max: maximum duration of the n tasks
•Lower bound
•Upper boundk
navgTlow
max)1(
k
navgTup
![Page 8: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/8.jpg)
IllustrationSchedule 1: 1 4 3 2 3 1 2
Schedule 2: 3 1 2 3 2 1 4
Makespan = 4Lower bound =
4
Makespan = 7Upper bound =
8
1
2
4
3
1
2
4
3
![Page 9: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/9.jpg)
•Estimate the bounds of the job completion time based on job profile▫Most production jobs are executed
routinely on new data sets
▫Job profile based on previous running Map stage: Mavg, Mmax, AvgInputSize, Selectivity
Reduce stage: Shavg, Shmax, Ravg, Rmax, Selectivity
▫Predict the completion time for future running with the profile
Estimate Completion Time for Single MR Job
![Page 10: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/10.jpg)
•Estimating bounds on the duration of map and reduce stages
•Map stage duration depends on:▫NM -- the number of map tasks
▫SM -- the number of map slots
•Reduce stage duration depends on:▫NR -- the number of reduce tasks
▫SR -- the number of reduce slots
•Job duration TJlow , TJ
up , Tjavg
▫ Sum of the map and reduce stage duration10
max
)1(
MS
NMT
SN
MT
M
Mavg
upM
M
Mavg
lowM
Estimate Completion Time for Single MR Job
![Page 11: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/11.jpg)
•Given a deadline D and the job profile, find the minimal resource to complete the job within D
Resource Allocation for Single MR Job
Given number of map/reduce tasks
Find the value of SMJ, SR
J with minimum value of SM
J+ SRJ using Lagrange's multipliers
Statistics from job profile
![Page 12: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/12.jpg)
Outline•Introduction•Building block
▫Performance model for single MapReduce jobs
•Resource allocation for Pig programs
•Evaluation•Conclusion and ongoing work
![Page 13: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/13.jpg)
Performance Model for Pig Programs
•Let P = {J1, J2,….JN } , extract the job profile of each job contained in P▫Assign unique name for each job within a
program•The program completion time sum of
the completion time of all the jobs contained in P
Ni iP TT
1
![Page 14: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/14.jpg)
•Possible strategy: find out an appropriate pair of map and reduce slots for each job in the program
•Problem: difficult to implement and manage by the scheduler
NNN
R
N
N
M
N
RM
RM
dC SB
SA
dC SB
SA
dC SB
SA
222
2
2
2
111
1
1
1
Dd
Ni i 1
Resource Allocation for Pig Programs
with
![Page 15: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/15.jpg)
Resource Allocation for Pig Programs
•A simpler and more elegant solution▫Allocate the same set of resource to the
entire program instead of to each job•Rewrite the previous equations into
DSS
TNi
NiNi
iPR
iPM
iP C
BA
1
11
Find the minimum set of map and reduce slots
( SMP , SR
P ) for the entire Pig program
![Page 16: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/16.jpg)
Experiment Setup•66 nodes cluster in 2 racks
▫4 AMD 2.39GHz cores▫8 GB RAM, ▫two 160GB hard disks
•Configuration▫1 jobtracker, 1 namenode, 64 worker
nodes▫2 map slots and 1 reduce slot for each
node
![Page 17: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/17.jpg)
Benchmark•Pigmix benchmark
▫17 programs▫8 tables as the input data
•Dataset▫Test dataset
Generated with the Pig mix data generator Total size around 1TB.
▫Experimental dataset Same layout as the test dataset 20% larger in size
![Page 18: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/18.jpg)
Model Accuracy•How well of our performance model
captures Pig program completion time?
Normalized results for predicted and measured completion time
![Page 19: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/19.jpg)
Meeting Deadlines•Are we meeting deadlines with our
resource allocation mode?
Pigmix executed on experimental data set : do we meet deadlines?
![Page 20: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/20.jpg)
Conclusion•Conclusion
▫The performance model can accurately estimate the completion time of MapReduce workflow
▫Enables automatic resource provisioning for MapReduce workflow with deadlines
•Ongoing work▫Refine the performance model for workflow with
concurrent jobs▫Incorporating failure scenarios in the current
model
![Page 21: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649cfa5503460f949cbd64/html5/thumbnails/21.jpg)
Thank you