hado "ops" or had "oops"
TRANSCRIPT
Proprietary amp Confidential Copyright copy 2014
Hadorsquoopsrsquo
or
Hadrsquooopsrsquo 1
Wersquore Hiringrocketfuelcomcareers
Kishore Kumar YellamrajuAbhijit Pol
Proprietary amp Confidential Copyright copy 2014
The Web Is Monetized By Advertising
Proprietary amp Confidential Copyright copy 2014
Delivery Methods
raquoDisplayraquoVideoraquoMobileraquoSocial
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
125$211$126$278
$1256$1809$242125
$211$126$278
$0586$2009
125$211$126$278$156
$000
[ + ][ + ]
SitePageGeoWeatherTime of DayBrand AffinityUser
Always buying the best impressions amp serving the best ad
Real Time Bidding and Serving
Proprietary amp Confidential Copyright copy 2014
GoalLeadsamp sales
GoalCoupondownloads
GoalBrandawareness
SitePageGeoWeatherTime of DayBrand AffinityDemo
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior
Response
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response X
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response
+100+40-20+20+15+10+40+35
+97
+40-70-20+10+15-25-40-18
+07
+10-10-20+20+10-35-25+10
+14
Real Time Bidding and Serving
X
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
The Web Is Monetized By Advertising
Proprietary amp Confidential Copyright copy 2014
Delivery Methods
raquoDisplayraquoVideoraquoMobileraquoSocial
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
125$211$126$278
$1256$1809$242125
$211$126$278
$0586$2009
125$211$126$278$156
$000
[ + ][ + ]
SitePageGeoWeatherTime of DayBrand AffinityUser
Always buying the best impressions amp serving the best ad
Real Time Bidding and Serving
Proprietary amp Confidential Copyright copy 2014
GoalLeadsamp sales
GoalCoupondownloads
GoalBrandawareness
SitePageGeoWeatherTime of DayBrand AffinityDemo
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior
Response
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response X
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response
+100+40-20+20+15+10+40+35
+97
+40-70-20+10+15-25-40-18
+07
+10-10-20+20+10-35-25+10
+14
Real Time Bidding and Serving
X
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Delivery Methods
raquoDisplayraquoVideoraquoMobileraquoSocial
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
125$211$126$278
$1256$1809$242125
$211$126$278
$0586$2009
125$211$126$278$156
$000
[ + ][ + ]
SitePageGeoWeatherTime of DayBrand AffinityUser
Always buying the best impressions amp serving the best ad
Real Time Bidding and Serving
Proprietary amp Confidential Copyright copy 2014
GoalLeadsamp sales
GoalCoupondownloads
GoalBrandawareness
SitePageGeoWeatherTime of DayBrand AffinityDemo
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior
Response
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response X
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response
+100+40-20+20+15+10+40+35
+97
+40-70-20+10+15-25-40-18
+07
+10-10-20+20+10-35-25+10
+14
Real Time Bidding and Serving
X
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
125$211$126$278
$1256$1809$242125
$211$126$278
$0586$2009
125$211$126$278$156
$000
[ + ][ + ]
SitePageGeoWeatherTime of DayBrand AffinityUser
Always buying the best impressions amp serving the best ad
Real Time Bidding and Serving
Proprietary amp Confidential Copyright copy 2014
GoalLeadsamp sales
GoalCoupondownloads
GoalBrandawareness
SitePageGeoWeatherTime of DayBrand AffinityDemo
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior
Response
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response X
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response
+100+40-20+20+15+10+40+35
+97
+40-70-20+10+15-25-40-18
+07
+10-10-20+20+10-35-25+10
+14
Real Time Bidding and Serving
X
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
125$211$126$278
$1256$1809$242125
$211$126$278
$0586$2009
125$211$126$278$156
$000
[ + ][ + ]
SitePageGeoWeatherTime of DayBrand AffinityUser
Always buying the best impressions amp serving the best ad
Real Time Bidding and Serving
Proprietary amp Confidential Copyright copy 2014
GoalLeadsamp sales
GoalCoupondownloads
GoalBrandawareness
SitePageGeoWeatherTime of DayBrand AffinityDemo
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior
Response
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response X
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response
+100+40-20+20+15+10+40+35
+97
+40-70-20+10+15-25-40-18
+07
+10-10-20+20+10-35-25+10
+14
Real Time Bidding and Serving
X
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
GoalLeadsamp sales
GoalCoupondownloads
GoalBrandawareness
SitePageGeoWeatherTime of DayBrand AffinityDemo
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior
Response
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response X
Impression Scorecard
DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior
Response
+100+40-20+20+15+10+40+35
+97
+40-70-20+10+15-25-40-18
+07
+10-10-20+20+10-35-25+10
+14
Real Time Bidding and Serving
X
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
6 Ad Served
User
Segment
s
3 Bid
Reques
t
Overview
Publishers
2 Ad Request
1 Page Request
4 Bid amp
Ad
User Engagement
s
Data Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time BidderAutomated Decisions
Models
Refresh
learning
Data
Store
Ads amp
Budget
Model
ScoresEvents
5 RocketfuelWinning Ad
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbirds wing
Look up in Blackbird
Time (ms)
Latency
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Architecture and Scale
raquoDatacenters
raquoScale
raquoGrowth
raquoArchitecture
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Data Center Expansion
raquoabc
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Data Center Design
bull Racks custom built at Rocket Fuel
bull Leased spacebandwidth in colocation facilities
Hadoop Server20 2U servers (85kW)
Bidders40 2-U Twin 2 servers (17kW)
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Rocket Fuel Scale
raquo34474 CPU processor cores
ndash2655 servers
ndash1874 Teraflops of computing
raquo188 Terabytes of memory
ndash13X the memory of IBM computer Watson that played Jeopardy
raquo42PB Petabytes of storage
ndash106X the data volume of the entire Library of Congress
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Hadoop at Rocket Fuel
raquo 1400 servers
raquo 15K Disks
raquo 15K Cores
raquo 90 TB
raquo 30K MR slots
raquo 12K daily MR jobs
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Data Architecture 30
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Hadoop Setup
QJM ZK Quorum
raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC
raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC
raquo same as DNrsquosraquo Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TTDN
TT
DN
TT
DN
TTDN
TT
DN
TT
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Puppet and Infradb
raquo Automate as much as you can
raquo Adding a slave node to Hadoop cluster lt 120 seconds
raquo Bringing up a new Hadoop cluster lt 500 seconds
raquo MR slots are automatically determined based on hardware config
Isnrsquot it cool
Just define once
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
dfsnamenodehandlercount
dfsimagetransfertimeout
mapredreduceparallelcopies
mapredjobtrackerhandlercount
iosortmbiosortfactor
maxClientCnxns
ZK
HDFS
MR
IMP MAPREDUCE-2026
-XX+UseConcMarkSweepGC
-XXCMSFullGCsBeforeCompaction=1
-XXCMSInitiatingOccupancyFraction=60
ha-timeoutms
JVM
Performance Tuning
mapreducereduceshuffleparallelcopies
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
MAPREDUCE-5351
MAPREDUCE-5508
keepfailedtaskfiles=true
We Have an Issue
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum
mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval
mapredjobtrackerretiredjobscachesize
JT OOM
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Monitoring
Wall of Ops
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Monitoring
hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm
Donrsquot fly blind you will crash
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
MR Workload Monitoring
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Network Monitoring
Donrsquot blame network instead monitor it Network Mesh can be mess
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Alerting
Monitoring is not enough need better Alerting
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Alerts
httphostnameportjmx
qry=Hadoopservice=NameNodename=NameNodeInfo
gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT
qry=hadoopservice=JobTrackername=JobTrackerInfo
NameDirStatuses DeadNodes NumberOfMissingBlocks
qry=Hadoopservice=NameNodename=FSNamesystemState
FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks
Blacklisted TTrsquos jobs slots_used ThreadCount
qry=javalangtype=Memory
Used jvm free jvm etc
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
MR Workload Alerting
raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted
TTrsquos with more failure counts etchellip
raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page
httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Operations
raquo Maintenance
raquo Performance Tuning
raquo Monitoring
raquo BCP
raquo YARN
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
BCP
raquo BCP Business Continuity Plan
raquo Near real time reporting over 15+ TB of daily data
raquo Freshness of models trained over petabytes of data
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Data BCP Cluster
INW Data
Cluster
US Serving Clusters
EU Serving Clusters
HK Serving Clusters
Modeling
Reporting
User Queries
Amazon Backup
LSV Data
Cluster
USEUHK Serving Clusters
Research
Ad-hoc Queries
Processed Data
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
YARN
raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management
raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring
raquo Application Master- Per-application- Manages application scheduling and
task execution
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
YARN at Rocket FueI
raquo Yarn is in production
raquo 700+ nodes
raquo 31TB RAM 8500 disks 8500 cores
raquo Primary use case Map-Reduce
raquo No more static slots
raquo Tez Spark Storm are in race
YAY
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
Obligatory ldquowe are hiringrdquo slide
httprocketfuelcomcareers
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom
Proprietary amp Confidential Copyright copy 2014
THANKS
kishorerocketfuelcom
apolrocketfuelcom