hadoop infrastructure @uber past , present and …...u b e r | data mesos vs yarn yarn mesos single...
Post on 25-Apr-2020
5 Views
Preview:
TRANSCRIPT
U B E R | Data
HadoopInfrastructure@UberPast,PresentandFutureMayankBansal
U B E R | Data
“Transporta=onasreliableasrunningwater,everywhere,foreveryone”
Uber’sMission
75+Countries 500+Ci=es
Andgrowing…
U B E R | Data
HowUberworks
U B E R | Data
HowUberworks
U B E R | Data
HowUberworks
U B E R | Data
DataDrivenDecisions
U B E R | Data
DataInfraOnceUpona8me..(2014)
Kafka Logs
Key-Val DB
RDBMS DBs
S3
Applica=ons
…
ETL
BusinessOps
A/BExperiments
Adhoc Analytics
CityOps
Vertica DataWarehouse
Data
Science
EMR
U B E R | Data
DataInfrastructureToday
Kafka8 Logs
Schemaless DB
SOA DBs
Service Accounts
…
ETL MachineLearning
Experimenta=on
Data Science
Adhoc Analytics Ops/DataScience
HDFS
CityOpsDataScience
Spark|PrestoHive
FewTakeaways…
● StrictSchemaManagement○ BecauseourlargestdataaudienceareSQL
Savvy!(1000sofUberOps!)○ SQL=StrictSchema
● BigDataProcessingToolsUnlocked-Hive,PrestoandSpark○ MigrateSQLsavvyusersfromVer=catoHive
&Presto(1000sofOps&100sofdatascien=sts&analysts)
○ Sparkformoreadvancedusers-100sofdatascien=sts
U B E R | Data
HadoopEvolu8on@ebay
2014
1XNodes1XPB
2015 10X Nodes 4X PB Data 3000+ node 30,000+ cores 50+ PB
2016 90X Nodes
40X PB Data
HadoopEvolu8on@Uber
U B E R | Data
HadoopClusterU=liza=on
• Overprovisioningforthepeakloads.
• Overcapacityforan=cipa=onoffuturegrowth
U B E R | Data
HadoopEvolu8on@ebay
20140Nodes
2015 X Nodes
2016
300XNodes
MesosEvolu8on@Uber
U B E R | Data
MesosClusterU=liza=on
• Overprovisioningforthepeakloads
• Overcapacityforan=cipa=onoffuturegrowth
U B E R | Data
EndGoal
Online
Presto
U B E R | Data
Whatweneed?
GLOBALVIEWOFRESOURCES
U B E R | Data
AvailableResourceManagers
U B E R | Data
MesosvsYARN
YARN MESOSSingleLevelScheduler TwoLevelSchedulerUseCgroupsforisola=on UseCgroupsforIsola=onCPU,Memoryasaresource CPU,MemoryandDiskasa
resource
WorkswellwithHadoopworkloads Workswellwithlongerrunningservices
YARNsupport=mebasedreserva=ons
Mesosdoesnothavesupportofreserva=ons
Dominantresourcescheduling Schedulingisdonebyframeworksanddependsoncasetocasebasis
ScalesBegerSimilarIsola=on
Diskisbeger
ThisisImportant
ImpforbatchSLA’sBegerforbatch
U B E R | Data
Let’s8edthemtogether
YARNisgoodforHadoopMesosisgoodforLongerRunningServices
InaNutshell
U B E R | Data
U B E R | Data
• MyriadisMesosFrameworkforApacheYARN
• MesosmanagesDataCenterresources• YARNmanagesHadoopworkloads
• Myriad• GetsresourcesfromMesos• LaunchesNodeManagers
U B E R | Data
• YARNwillhandleresourceshanded
overtoit.• Mesoswillworkonrestoftheresources
Myriad’sLimita8onsSta=cResourcePar==oning
U B E R | Data
• YARNwillneverbeabletodooversubscrip=on.• NodeManagerwillgoaway• Fragmenta=onofresources
• Mesosoversubscrip=oncankillYARNtoo
Myriad’sLimita8onsResourceOverSubscrip=on
U B E R | Data
• NoGlobalQuotaEnforcement
• NoGlobalPriori=es
Myriad’sLimita8ons
U B E R | Data
• Elas=cResourceManagement
• BinPacking• Stability
• LongList…
Myriad’sLimita8ons
U B E R | Data
UnifiedScheduler
U B E R | Data
HighLevelCharacteris8cs
• GlobalQuotaManagement
• CentralSchedulingpolicies
• Oversubscrip=onforbothOnlineandBatch
• Isola=onandbinpacking
• SLAguaranteesatGlobalLevel
U B E R | Data
UnifiedScheduler
U B E R | Data
FewTakeaways…
• Weneedoneschedulinglayeracrossallworkloads
• Par==oningresourcesarenotgood • Atleastcansave30%resources
• StabilityandsimplicitywinsinProduc=on• Mul=LevelofresourceManagementandschedulingwillnotbescalable
U B E R | Data
U B E R | Data
Ques=ons?
mabansal@uber.commayank@apache.org
U B E R | Data
ThankYou!!!
top related