hado "ops" or had "oops"

41
Proprietary & Confidential. Copyright © 2014. Hado’ops’ or Had’oops’ 1 We’re Hiring rocketfuel.com/careers Kishore Kumar Yellamraju Abhijit Pol

Upload: hadoop-summit

Post on 14-Jul-2015

444 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Hadorsquoopsrsquo

or

Hadrsquooopsrsquo 1

Wersquore Hiringrocketfuelcomcareers

Kishore Kumar YellamrajuAbhijit Pol

Proprietary amp Confidential Copyright copy 2014

The Web Is Monetized By Advertising

Proprietary amp Confidential Copyright copy 2014

Delivery Methods

raquoDisplayraquoVideoraquoMobileraquoSocial

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

125$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior

Response

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response X

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

X

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 2: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

The Web Is Monetized By Advertising

Proprietary amp Confidential Copyright copy 2014

Delivery Methods

raquoDisplayraquoVideoraquoMobileraquoSocial

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

125$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior

Response

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response X

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

X

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 3: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Delivery Methods

raquoDisplayraquoVideoraquoMobileraquoSocial

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

125$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior

Response

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response X

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

X

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 4: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

125$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior

Response

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response X

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

X

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 5: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

125$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior

Response

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response X

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

X

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 6: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehavior

Response

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response X

Impression Scorecard

DemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehavior

Response

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

X

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 7: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User

Segment

s

3 Bid

Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp

Ad

User Engagement

s

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh

learning

Data

Store

Ads amp

Budget

Model

ScoresEvents

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 8: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

5 B

6 B

45 B

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 9: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

400

100

20

2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 10: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacenters

raquoScale

raquoGrowth

raquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 11: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 12: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuel

bull Leased spacebandwidth in colocation facilities

Hadoop Server20 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 13: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor cores

ndash2655 servers

ndash1874 Teraflops of computing

raquo188 Terabytes of memory

ndash13X the memory of IBM computer Watson that played Jeopardy

raquo42PB Petabytes of storage

ndash106X the data volume of the entire Library of Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 14: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 15: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

5 PB

41 PB

8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 16: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 17: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DN

TTDN

TT

DN

TT

DN

TTDN

TT

DN

TT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 18: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 19: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Puppet

+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 20: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you can

raquo Adding a slave node to Hadoop cluster lt 120 seconds

raquo Bringing up a new Hadoop cluster lt 500 seconds

raquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 21: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 22: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

dfsnamenodehandlercount

dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns

ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 23: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

MAPREDUCE-5351

MAPREDUCE-5508

keepfailedtaskfiles=true

We Have an Issue

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 24: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

instances of JobInProgressrdquo class = no of users submitted jobs Xmapredjobtrackercompleteuserjobsmaximum

mapredjobtrackercompleteuserjobsmaximum mapredjobtrackerretirejobinterval

mapredjobtrackerretiredjobscachesize

JT OOM

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 25: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 26: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 27: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 28: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 29: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 30: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 31: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alertsgtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 32: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted

TTrsquos with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page

httpltJT-hostnamegt50030scheduleradvancedhttpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 33: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 34: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 35: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenance

raquo Performance Tuning

raquo Monitoring

raquo BCP

raquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 36: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Plan

raquo Near real time reporting over 15+ TB of daily data

raquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 37: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data

Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Queries

Amazon Backup

LSV Data

Cluster

USEUHK Serving Clusters

Research

Ad-hoc Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 38: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

YARN

raquo Resource Manager- Global resource scheduler- Hierarchical queues- Application management

raquo Node Manager- Per-machine agent- Manages life cycle of container- Container resource monitoring

raquo Application Master- Per-application- Manages application scheduling and

task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 39: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn is in production

raquo 700+ nodes

raquo 31TB RAM 8500 disks 8500 cores

raquo Primary use case Map-Reduce

raquo No more static slots

raquo Tez Spark Storm are in race

YAY

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 40: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

Obligatory ldquowe are hiringrdquo slide

httprocketfuelcomcareers

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom

Page 41: Hado "OPS" or Had "oops"

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

apolrocketfuelcom