scheduling jobs with dependenciessamir/dcscheduling18/slides...robert grandl, srikanth kandula,...
TRANSCRIPT
Scheduling Jobs With Dependencies: New Applications, Classic Problems
JanardhanKulkarni,MicrosoftResearch,Redmond.
31July2018,TTI,ChicagoTTICSUMMERWORKSHOP:DATACENTERSCHEDULINGFROMTHEORYTOPRACTICE
Roadmap
Ø Whichtheorymodelsaremoreclosertodata-centersettings.
SrikanthKandulaRatulMahajanAmarPhanishayeeMoniaGhobadi
Ø Focusonalgorithms Evencomplexalgorithmscanhavealgorithmicintuitionswhichareusefulinpractice.
Ø OneexampleOnesystemheuristicandonecomplexprovablealgorithm(UsingLPHierarchies)thathasgoodheuristicvalue.
LuleåFBDataCenter,SouthofArticCircle
LuleåFBDataCenter,SouthofArticCircle
Itisbeautifullikethisfor3days…..
LuleåFBDataCenter,SouthofArticCircle
cold,cold,place…
5%“aslargeascities”
Efficiency Matters a Lot
Efficiency Matters a Lot:
“aslargeascities”Emphasis on Principled Algorithms
Cost
Time
Simpleheuristics
TheoreticallySoundAlgorithms
Simplicityisnoteverything!
How we Measure Efficiency
Ø Makespan
Minimizingthemaximumcompletiontimeamongasetofjobs.Lengthoftheschedule.
Ø Average(ortotal)Flow-time(aka,JobCompletion-time)
• sameasresponsetime• measuresthetimeajobspendsinasystem
Fj = Cj � rj
How we Measure Efficiency
Ø Makespan
Minimizingthemaximumcompletiontimeamongasetofjobs.Lengthoftheschedule.
Ø Average(ortotal)Flow-time(aka,JobCompletion-time)
• sameasresponsetime• measuresthetimeajobspendsinasystem
Fj = Cj � rj
Throughput,energy,fairness,utilization,etc..
Challenges of Data Center Scheduling
ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)
JobsComplexdependencies:DAGs,Co-flows,etc.
Algorithms
Fast,simple,oftenonline.
Challenges of Data Center Scheduling
ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)
JobsComplexdependencies:DAGs,Co-flows,etc.
Algorithms
Fast,simple,oftenonline.
Richtheorywithmanynicealgorithmswhenjobshavesimplestructures.
Scheduling on Heterogeneous Clusters
Ø SpecialpurposehardwareØ Datalocality
Ø Geographiclocation
Ø Privacyconcerns
Whyareclustersheterogeneous?
Scheduling on Heterogeneous Clusters
1000
100
300
jobsrunfasteronsomeclustersandsloweronothersModeling Heterogeneity
Jobsarriveovertime
jobs
machines
1151000…..10
661005…..98
11588…..13
1007889…..13
Modeling Heterogeneity jobsrunfasteronsomeclustersandsloweronothers
Jobsarriveovertime
jobs
machines
1151000…..10
661005…..98
11588…..13
1007889…..13
Heterogeneous == “Unrelated Machines Scheduling”
Assign(match)jobstoclusters+scheduletoMinimizeQoS.
Beautiful Algorithms For Unrelated Machines Scheduling Problems
MakespanFlow-timeEnergyLST’87 CGK’09 AGK’12 ST’89 AGK’12 KLS’10
Svensson’12 BK’15IKMP’14AAFPW’97 IKMP’14P’07
KD’18 A’06
Offline,Online,Multidimensional,Clairvoyant,Non-Clairvoyant,Stochastic,Truthfulness…`
Hasleadtodevelopmentofveryniceideas:Useofvertexsolutionsanddualityindesignofalgorithms,configurationLPs,potentialfunctions,connectionstogametheoreticideas…
Beautiful Algorithms For Unrelated Machines Scheduling Problems
MakespanFlow-timeEnergyLST’87 CGK’09 AGK’12 ST’89 AGK’12 KLS’10
Svensson’12 BK’15IKMP’14AAFPW’97 IKMP’14P’07
KD’18 A’06
Offline,Online,Multidimensional,Clairvoyant,Non-Clairvoyant,Stochastic,Truthfulness…`
Hasleadtodevelopmentofveryniceideas:Useofvertexsolutionsanddualityindesignofalgorithms,configurationLPs,potentialfunctions,connectionstogametheoreticideas…
RESEARCHDIRECTION:FewMachinetypes:Canwegetbetteralgorithmsforsomeclassicunrelated
machinesscheduling?
Challenges of Data Center Scheduling
ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)
JobsComplexdependencies:DAGs,Co-flows,etc.
Algorithms
Fast,simple,oftenonline.
The plan
GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.
OneHeuristic
RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.
OneComplexTheoreticalFramework
Verygeneral,workswellinpractice,asbadasanyotheralgorithmonpaperJ
LeveyandRothvoss’16.Garg,Kulkarni,Li’18.Garg,Kukarni,Li’18.
Veryspecific,provable,andquitecomplex.
The plan
GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.
OneHeuristic
RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.
OneComplexTheoreticalFramework
LeveyandRothvoss’16.Garg,Kulkarni,Li’18.Garg,Kukarni,Li’18.
Oneofthebiggesthammersinapproximationalgorithms.“LiftandProject”
Verygeneral,workswellinpractice,asbadasanyotheralgorithmonpaperJ
Veryspecific,provable,andquitecomplex.
The plan
GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.
OneHeuristic
RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.
OneComplexTheoreticalFramework
LeveyandRothvoss’16.Garg,Kulkarni,Li’18.Garg,Kukarni,Li’18.
Verygeneral,workswellinpractice,asbadasanyotheralgorithmonpaperJ
Veryspecific,provable,andquitecomplex.
ADirectedAcyclicGraph(DAG)SchedulingProblem inLargeClusters
GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.
RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.
DAG Model Supported in Hadoop
Multidimensionality
Heterogeneityofclusters
Resourcesofacluster
(1,1,1)Dtypesofresources
Cluster Scheduling
AsinglejobrepresentedasaDAG(task)
Resourcesofacluster
(1,1,1)Dtypesofresources
Cluster Scheduling
AsinglejobrepresentedasaDAG(task)
DemandVector (1,0,…,1/2)
(1/2,1/2,…,1/2)
(1/4,1,…,1/10)
Resourcesofacluster
(1,1,1)Dtypesofresources
Cluster Scheduling
AsinglejobrepresentedasaDAG(task)
DemandVector (1,0,…,1/2)
(1/2,1/2,…,1/2)
(1/4,1,…,1/10)
Processinglength(duration)
Cluster Scheduling: Minimize Makespan
AsinglejobrepresentedasaDAG
(1,0),2
(0,1),1
(1,1),1
(1,1),1(0,1),1
1 234567 1 234567
(1,1)
Cluster
Is There a Good Algorithm?
Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.
Theorem:BansalandKhot‘09.
Ø Anynon-idlingalgorithmisequallygoodorequallybad!
Notausefulintuitionforsystemdesigners.
Is There a Good Algorithm?
Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.
Theorem:BansalandKhot‘09.
1 234567
OptimalAlgorithm:Doagreedyschedulerespectingprecedenceconstraints
Atleastoneresourceisused.Congestionforthatresourcedecreases.
Is There a Good Algorithm?
Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.
Theorem:BansalandKhot‘09.
1 234567
OptimalAlgorithm:Doagreedyschedulerespectingprecedenceconstraints
Atleastoneresourceisused.Congestionforthatresourcedecreases.
Is There a Good Algorithm?
Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.
Theorem:BansalandKhot‘09.
1 234567
OptimalAlgorithm:Doagreedyschedulerespectingprecedenceconstraints
Atleastoneresourceisused.Congestionforthatresourcedecreases.
When did System Designers Care for Lowerbounds?
GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.
RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.
² CouldfindalmostoptimalsolutionsonMSdatasets.² Improvesmakespanby30%atleastcomparedtosimplegreedyheuristics.
Intuition of Graphene
“pathologicallybadschedulesintoday’sapproachesmostlyariseduetotworeasons:(a)long-runningtaskshavenootherworktooverlapwiththem,whichreducesparallelism,and(b)thetasksthatarerunnabledonotpackwellwitheachother,whichincreasesresourcefragmentation.”
Whatgreedyalgorithmsmiss?(List-Scheduling,CriticalPath,etc)
Intuition of Graphene
MainSteps
Ø Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.
Intuition of Graphene
MainSteps
Ø Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.
Intuition of Graphene
MainSteps
Ø Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.
Canweform
alizethisin
tuition?
A-ApproximationforMakespanSchedulingwithPrecedenceConstraintsusingLPHierarchies.(1 + ✏)
LeveyandRothvoss‘16
Identical Machines Scheduling
AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.
midenticalmachines(orCPUs)
MinimizeMakespan
(SpecialcaseofDAGscheduling)
Identical Machines Scheduling
AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.
midenticalmachines(orCPUs)
Identical Machines Scheduling
AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.
midenticalmachines(orCPUs)
Chainoflength4
Identical Machines Scheduling
AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.
midenticalmachines(orCPUs)
Chainoflength4
Identical Machines Scheduling
GreedyorList-Schedulingis2approximationforminimizingmakespan.
Theorem.Graham1960.
Identical Machines Scheduling
GreedyorList-Schedulingis2approximationforminimizingmakespan.
Theorem.Graham1960.
BADSLOTS GOODSLOTS
Identical Machines Scheduling
GreedyorList-Schedulingis2approximationforminimizingmakespan.
Theorem.Graham1960.
BADSLOTS GOODSLOTS+Makespan
Identical Machines Scheduling
GreedyorList-Schedulingis2approximationforminimizingmakespan.
Theorem.Graham1960.
BADSLOTS GOODSLOTS
LengthofLongestchain n/m+Makespan
Identical Machines Scheduling
GreedyorList-Schedulingis2approximationforminimizingmakespan.
Theorem.Graham1960.
BADSLOTS GOODSLOTS
LengthofLongestchain n/m+Makespan OPT � OPT �
Identical Machines Scheduling
GreedyorList-Schedulingis2approximationforminimizingmakespan.
Theorem.Graham1960.
BADSLOTS GOODSLOTS
LengthofLongestchain n/m+Makespan OPT � OPT �
Ø Optimaltheoretically.Butconveysverylittleinformationinpractice.
Ø Doesnotworkwellinpracticewhentherearemorethanoneresourcetype.
Identical Machines Scheduling
Thereisaquasi-polynomialtimeapproximationforminimizingmakespanwhenjobshaveunitlengths,whennumberofmachinesisaconstant.
Theorem.LevyandRothvoss’16.
(1 + ✏)
Garg’17madeitstrictlyquasi-polynomialtime.
Identical Machines Scheduling
Thereisaquasi-polynomialtimeapproximationforminimizingmakespanwhenjobshavearbitrarylengths,whennumberofmachinesisaconstant.Thealgorithmschedulesjobsonasinglemachineandmaypreemptjobswithinamachine.
Theorem.Kulkarni,Li’18.
(1 + ✏)
Identical Machines Scheduling
Thereisapolynomialtimeoptimalapproximationforminimizingweightedcompletiontimeofjobs,whennumberofmachinesandjobsizesareuniform.
Theorem.Garg,Kulkarni,Li’18.
(2 + ✏)
Identical Machines Scheduling
GreedyorList-Schedulingis2approximationforminimizingmakespan.
Theorem.Graham1960.
BADSLOTS GOODSLOTS
LengthofLongestchain n/m+Makespan OPT � OPT �
Crucial Observation
BADSLOTS GOODSLOTS
LengthofLongestchain n/m+Makespan OPT � ✏ ·OPT
(1 + ✏) ·OPT
Crucial Observation
BADSLOTS GOODSLOTS
LengthofLongestchain n/m+Makespan OPT � ✏ ·OPT
(1 + ✏) ·OPT
troublesometasks
Howtoscheduletroublesometasks?
Framework
TimeInterval
T0 T1 T2 T3
Partitionthetasksintoasetofbottomtasksandasinglesetoftoptasks.Foreachsetofbottomtaskswefindasub-intervalwheretheyshouldbescheduled.
Thendoarecursiveschedulingofbottomtasks.
Framework
TimeInterval
T0 T1 T2 T3
Toptasks
BottomtasksBottomtasksBottomtasks
Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsacrossbottomtasksareautomaticallysatisfied.
Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
[T2, T3]
Foreverytaskinthesetoftoptaskswehavebasedonthetentativeassignmentofbottomjobs.
T0 T1 T2 T3
[rj , dj ]
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
Thereisenoughspacetoscheduletoptasks
T0 T1 T2 T3
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
Thereisenoughspacetoscheduletoptasksiftherearenoprecedenceconstraintsbetweentoptasks.
Foreverytaskinthesetoftoptaskswehavebasedonthetentativeassignmentofbottomjobs.
[rj , dj ]
T0 T1 T2 T3
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
Thereisenoughspacetoscheduletoptasksiftherearenoprecedenceconstraintsbetweentoptasks.
EDFwillschedulealltoptasksintheemptyspacebutmayviolatetheprecedenceconstraintsbetweentoptasks
Foreverytaskinthesetoftoptaskswehavebasedonthetentativeassignmentofbottomjobs.
[rj , dj ]
Intuition of Graphene
MainSteps
Ø Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.
Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
[T2, T3]
Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
[T2, T3]
Thechainlengthamongtoptasksisverysmall.
Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
[T2, T3]
Thechainlengthamongtoptasksisverysmall.
Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
[T2, T3]
Thechainlengthamongtoptasksisverysmall.
Thealgorithmhasrecognizedacrudeschedulefortroublesometasks.That’swhychainlengthamongtoptasksissmall.
Intuition of Graphene
MainSteps
Ø Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.
LR’16 Framework
TimeInterval
T0 T1 T2 T3
BottomtasksBottomtasksBottomtasks
Precedenceconstraintsgoingfrombottomtotoptasksareloose.
[T2, T3]
Thechainlengthamongtoptasksisverysmall.
Foreverytaskinthesetoftoptaskswehave
T0 T1 T2 T3
[rj , dj ]Precedenceconstraintsgoingfrombottomtotoptasksareloose.
Thereisenoughspacetoscheduletoptasksiftherearenoprecedenceconstraintsbetweentoptasks.
EDFwillschedulealltoptasksintheemptyspacebutmayviolatetheprecedenceconstraintsbetweentoptasks
allexceptfew
TimeInterval
T0 T1 T2 T3
Toptasks
BottomtasksBottomtasksBottomtasks
HowtopartitiontheDAG?
1. Precedenceconstraintsbetweenbottomtasksshouldbeimplied.2. Theprecedenceconstraintsbetweentopandbottomtasksareloose.3. Thechainlengthamongtoptasksissmall.
Linear Programming Formulation
TX
t=1
xjt = 1
BinarysearchtheoptimalmakespanasT
Foreverytaskj
X
j
xjt m
isscheduled.
Fortimeslott hasatmostmjobs.
Forprecedencerelation issatisfiedateachtimestept.
xjt > 0Allvariables arenon-negative
i ! j,X
t0<t
xit0 �X
t0t
xjt0
LP Cheats…
2/3 1/3 2/3
Optimalmakespanis4butLPcancompletein3timeslots.
Time
DAG
LPcanscheduleajobfractionallyinatimeslot.
Interval of a task
Time
ConsidertheLPsolution.Intervalofataskissmallestintervalthatcontainsfractionalscheduleofthetask.
1/10 1/10 3/10 5/10
t1 t2
What LP gives? Anintervalforeachtask.
Time
What LP gives? Anintervalforeachtask.
Time
WeusetheseintervalstopartitiontheDAGintotopandbottomtasks.
Building Binary Tree
0 TLPSchedulesalltasksbetween [0, T ]
0 TT/2 T
2+ 1
0 T
4+ 1
T
4
T
2
Building Binary Tree LPSchedulesalltasksin[0, T ]
log T
[0, T ]
[0, T/2][T/2 + 1, T ]
Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.
Building Binary Tree LPSchedulesalltasksin[0, T ]
log T
[0, T ]
[0, T/2][T/2 + 1, T ]
Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.
Building Binary Tree LPSchedulesalltasksin[0, T ]
log T
[0, T ]
[0, T/2][T/2 + 1, T ]
Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.
Building Binary Tree LPSchedulesalltasksin[0, T ]
log T
[0, T ]
[0, T/2][T/2 + 1, T ]
Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.
Defining Top and Bottom Tasks [0, T ]
[0, T/2][T/2 + 1, T ]
(log log T )2
Defining Top and Bottom Tasks [0, T ]
[0, T/2][T/2 + 1, T ]
(log log T )2
ThrowThemAway!!
log log T
Defining Top and Bottom Tasks [0, T ]
[0, T/2][T/2 + 1, T ]
(log log T )2
TopTasks
BottomTasksSets
ThrowThemAway!!
log log T
Defining Top and Bottom Tasks [0, T ]
[0, T/2][T/2 + 1, T ]
(log log T )2
TopTasks
BottomTasksSets
ThrowThemAway!!
1. Precedenceconstraintsbetweenbottomtasksshouldbeimplied.2. Theprecedenceconstraintsbetweentopandbottomtasksareloose.3. Thechainlengthamongtoptasksissmall.
Defining Top and Bottom Tasks [0, T ]
[0, T/2][T/2 + 1, T ]
TopTasks
BottomTasksSets
ThrowThemAway!!
log log T
TimeInterval
0 T1 T2 T3
Precedenceconstraintsgoingfrombottomtotoptasksareloose. [T2, T3]
T4
Everytoptaskcanlooseoneintervaltotheleftandoneintervaltotherightintermsofspaceinwhichitshouldbescheduled.But,bottomintervalsaretinycomparedtotop,sothisisnotabigloss.
Toptasks
Bottomtasks
Bottomtasks
TimeInterval
0 T1 T2 T3
Precedenceconstraintsgoingfrombottomtotoptasksareloose. [T2, T3]
T4
Everytoptaskcanlooseoneintervaltotheleftandoneintervaltotherightintermsofspaceinwhichitshouldbescheduled.But,bottomintervalsaretinycomparedtotop,sothisisnotabigloss.
Toptasks
Bottomtasks
Bottomtasks
1. Precedenceconstraintsbetweenbottomtasksshouldbeimplied.2. Theprecedenceconstraintsbetweentopandbottomtasksareloose.3. Thechainlengthamongtoptasksissmall.
Lift and Project Method (LP Hierarchies) Dimensions
NumberofvariablesinLPthatyouwantintegral
OriginalLP
Allthevariablesareintegral.
Asystematicwayofplacingtroublesometasks!
Lift and Project Method (LP Hierarchies) Dimensions
NumberofvariablesinLPthatyouwantintegral
OriginalLP
Allthevariablesareintegral.
RunningtimeIncreasesbyafactorofn.
O(nS)
Asystematicwayofplacingtroublesometasks!
Lift and Project Method (LP Hierarchies)
Time
1/10 1/10 3/10 5/10
t1 t2
“Conditioning”
Touchavariable,anditbecomesintegral!
Lift and Project Method (LP Hierarchies)
Time
1/10 1/10 3/10 5/10
t1 t2
“Conditioning”
Touchavariable,anditbecomesintegral!
Lift and Project Method (LP Hierarchies)
Time
10/10
“Conditioning”
Touchavariable,anditbecomesintegral!
Lift and Project Method (LP Hierarchies)
Time
10/10
t1 t2“Conditioning”
Touchavariable,anditbecomesintegral!
Lift and Project Method (LP Hierarchies)
Time
“Conditioning”
Touchavariable,anditbecomesintegral!
TheLPsolutionchangesinsuchawaythat,foreveryothertaskon,theintervalinwhichitisscheduledinthenewsolutiononlyshrinks.
Ihaveabetterunderstandingofwherethistaskgotscheduled.
Reducing Chain Length of Top Tasks [0, T ]
[0, T/2][T/2 + 1, T ]
TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.
Reducing Chain Length of Top Tasks
0 T✏T
TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.
Reducing Chain Length of Top Tasks
0 T✏Txjt > 0
TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.
Reducing Chain Length of Top Tasks
0 T✏Txjt > 0
TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.
Reducing Chain Length of Top Tasks
0 T✏T
Reducing Chain Length of Top Tasks [0, T ]
[0, T/2][T/2 + 1, T ]
TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.
Reducing Chain Length of Top Tasks
0 T✏T
Howmanyconditioningarerequired? m/✏Nowrecallthatnumberofintervalsintoptasksis 2(log logT )2 (log T )log log T
TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.
Reducing Chain Length of Top Tasks
0 T✏T
Howmanyconditioningarerequired? m/✏Nowrecallthatnumberofintervalsintoptasksis 2(log logT )2 (log T )log log T
O(m/✏ · (log T )log log T )
Runningtime.
Thereisaquasi-polynomialtimeapproximationforminimizingmakespanwhenjobshavearbitrarylengths,whennumberofmachinesisaconstant.Thealgorithmschedulesjobsonasinglemachineandmaypreemptjobswithinamachine.
Theorem.Garg,Kulkarni,Li’18.
(1 + ✏)
Thereisapolynomialtimeoptimalapproximationforminimizingweightedcompletiontimeofjobs,whennumberofmachinesandjobsizesareuniform.
(2 + ✏)
Moresophisticateduseofconditioningandnewalgorithmsforschedulingtoptasks.
Ø Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.
IntuitionofGraphene
Ø UsingLiftandProjecttofigureoutplacinglongtasks.Isthereasimple,sayDPapproachtoit?Ø CanweuseLPsupportforplacingtasks?
Ø CanrecursionhelpinGraphenesetting?
LiftandProjectAlgorithms
Big Picture
IdenticalMachinesSchedulingandTrainingNeuralNetworks
PipeDream:FastandEfficientPipelineParallelDNNTrainingAaronHarlap,DeepakNarayanan,AmarPhanishayee,VivekSeshadri,NikhilDevanur,GregGanger,PhilGibbons
Training Deep Learning Models
Ø Largefractionofthedatacenterworkloadsformanycompanies.
Ø Improvingtrainingtimeisconsideredveryimportant.
Ø DAGsaregoodabstractionsofDNNtrainingcomputations.
Ø ConnectionstoDAGschedulingandcommunicationdelayproblems.
Two Paradigms
DataParallelism
ModelParallelism
Model Parallelism
Ø Schedulethelayersamongasetofmachines.TypicallyIdentical.
Ø Oratmost2types:CPU+FPGA,CPU+GPUsetc.
Model Parallelism
Ø Schedulethelayersamongasetofmachines.TypicallyIdentical.
Ø Oratmost2types:CPU+FPGA,CPU+GPUsetc.
Ø Thereiscommunicationbetweenlayers.Communicationcostiscrucial.
Model Parallelism
Theseproblemsarequitesimilartoschedulingwithcommunicationdelays,whenthereareprecedenceconstraints.(PY’90,VLL’90,MH’95,HLV’94)Verypoorlyunderstood.
Goodschedulinghassameeffectascaching!
ZhichengYin,JinSun,MingLi,JaliyaEkanayake,HaiboLin,MarcFriedman,JoséA.Blakeley,ClemensA.Szyperski,NikhilR.Devanur.BubbleExecution:Resource-awareReliableAnalyticsatCloudScale.PVLDB11(7).
PipeDream:FastandEfficientPipelineParallelDNNTrainingAaronHarlap,DeepakNarayanan,AmarPhanishayee,VivekSeshadri,NikhilDevanur,GregGanger,PhilGibbons
Summary: Data Center Scheduling
ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)
JobsComplexdependencies:DAGs,Co-flows,etc.
Algorithms
Fast,simple,oftenonline.
Summary: Data Center Scheduling
ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)
JobsComplexdependencies:DAGs,Co-flows,etc.
Algorithms
Fast,simple,oftenonline.
Ø Oftenhardinworstcase.What’stherightmodel?Ø UnderstandDAGsthatariseinpractice.SayDNNs.Ø Whatarethehigh-levelalgorithmicintuitions?
Summary: Data Center Scheduling
ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)
JobsComplexdependencies:DAGs,Co-flows,etc.
Algorithms
Fast,simple,oftenonline.
Ø Oftenhardinworstcase.What’stherightmodel?Ø UnderstandDAGsthatariseinpractice.SayDNNs.Ø Whatarethehigh-levelalgorithmicintuitions?