pepperdata's real-time hadoop cluster optimization

25
© 2015 Enterprise Integration News, Inc. Introduction Agenda Bio Making Hadoop just work better for varied workloads Details top challenges in adopting Hadoop How Pepperdata automatically improves performance, visibility, controls A pioneer in production-ready Hadoop 15+ years web search and big data; Focus on huge scale, huge impact products Started the Silicon Valley branch of Microsoft’s Bing engineering & product team Visibility & Optimization for Hadoop Sean Suchter Co-Founder and CEO 1

Upload: becky-mendenhall

Post on 16-Aug-2015

14 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Pepperdata's Real-time Hadoop Cluster Optimization

© 2015 Enterprise Integration News, Inc.

Introduction

Agenda Bio

Making Hadoop just work better for varied workloads

Details top challenges in adopting Hadoop

How Pepperdata automatically improves performance, visibility, controls

A pioneer in production-ready Hadoop

15+ years web search and big data; Focus on huge scale, huge impact products

Started the Silicon Valley branch of Microsoft’s Bing engineering & product team

Visibility & Optimization for Hadoop

Sean SuchterCo-Founder and CEO

1

Page 2: Pepperdata's Real-time Hadoop Cluster Optimization

©2015 Pepperdata

VISIBILITY & OPTIMIZATION FOR HADOOP

Page 3: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

AGENDA

CHALLENGES USING HADOOP

HOW PEPPERDATA ADDRESSES THESE CHALLENGES

Q&A and NEXT STEPS

3

Page 4: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

HADOOP CHALLENGES YOU FACE DAILY

4

Tact

ical

Stra

tegi

cV

ery

Str

ateg

ic Call from your CEO asking“WTF is happening?!?

Can’t make SEC filingEOQ and can’t send the

invoice

Critical feature on your website is broken!

Online ad impression data unavailable

External customer reports unavailable

Users complainingCustomer churn metrics

unavailableRevenue report doesn’t

completeHave to buy more

servers

End user SLAs compromised

Finding root-cause of problems is manual

Low priority jobs taking over the cluster

Ad hoc jobs interfere with production jobs

HBase & MapReduce contentionRogue jobs hammer cluster

performanceCluster seems near maximum

capacity

Developers can’t submit new jobs

Page 5: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

HADOOP WASTES VALUABLE CAPACITY

Physi

cal hard

ware

reso

urc

e

Time

Theoretical maximum usage (reservation)

Actual physical capacity used

1. Production clusters are sized for peak SLA with lots of headroom, so capacity is wasted

1. Ad-hoc jobs consume capacity from high-priority jobs, so companies run them on separate cluster

1. Hadoop’s allocations are predefinedand static, resulting in wasted capacity

Page 6: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

MORE AND MORE WASTED CAPACITY

6

Over time, more and more clusters are built to isolate the different workloads

Production Cluster Ad Hoc Cluster Priority Job Cluster HBase Cluster Bulk Load Cluster

But they are full of “holes”!

Page 7: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

PEPPERDATA MAKES HADOOP WORK BETTER

7

FINE-GRAINED VISIBILITYMonitor CPU, RAM, I/O, network per task, job, user, group

Identify bottlenecks in real-time or at any moment historically

TOTAL PREDICTABILITYSLA enforcement for true multi-tenancy: dynamically adjusts resource usage

Set policies to protect high-priority jobs

30-50% GREATER THROUGHPUT ON ALREADY HIGHLY TUNED

CLUSTERSReclaims wasted capacity: use all true hardware capacity

Run more jobs with our Dynamic Capacity Creation

Page 8: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

PEPPERDATA REAL-TIME ARCHITECTURE

8

VISIBILITY

CONTROL

CAPACITY

Delivers real-time, granular visibility into resource consumption by user, job, and task

Allows user-defined prioritization of Hadoop jobs and automatically allocates resources to ensure jobs run safely

Reclaims wasted capacity and allows mixed workloads to be shared on a single cluster

Developer AnalystFinancial

ReportProduct

Pepperdata Dashboard

Hadoop Configuration

YOUR EXISTING HADOOP

MapReduce, HBase, etc.

Job Tracker / Resource Manager (Scheduler & YARN)

ETL

Policies

Page 9: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

FINE-GRAINED VISIBILITY INTO THE CLUSTER

9

Page 10: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

FINE-GRAINED VISIBILITY INTO YOUR CLUSTER

10

Page 11: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

EASILY PINPOINT BOTTLENECKS IN THE CLUSTER

11

Page 12: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

PEPPERDATA MAKES HADOOP WORK BETTER

12

FINE-GRAINED VISIBILITYMonitor CPU, RAM, I/O, network per task, job, user, group

Identify bottlenecks in real-time or at any moment historically

TOTAL PREDICTABILITYSLA enforcement for true multi-tenancy: dynamically adjusts resource usage

Set policies to protect high-priority jobs

30-50% GREATER THROUGHPUT ON ALREADY HIGHLY TUNED

CLUSTERSReclaims wasted capacity: use all true hardware capacity

Run more jobs with our Dynamic Capacity Creation

Page 13: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

NEXT STEPS

Like what you saw? Want to learn more?

Visit pepperdata.com for more product information.

Visit pepperdata.com/demo to request a

free demo from one of our technical experts!

13

Page 14: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

THANK YOU

14

Page 15: Pepperdata's Real-time Hadoop Cluster Optimization

© 2015 Enterprise Integration News, Inc.

Questions & Answers

Q&A

Question & Answer

Page 16: Pepperdata's Real-time Hadoop Cluster Optimization

What is the form factor of Pepperdata, and how long does it take to install?

How do we make sure Pepperdata ‘agents’ are where they need to be -- and working?

Sean SuchterCo-Founder and CEO

Page 17: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

PEPPERDATA REAL-TIME ARCHITECTURE

17

VISIBILITY

CONTROL

CAPACITY

Delivers real-time, granular visibility into resource consumption by user, job, and task

Allows user-defined prioritization of Hadoop jobs and automatically allocates resources to ensure jobs run safely

Reclaims wasted capacity and allows mixed workloads to be shared on a single cluster

Developer AnalystFinancial

ReportProduct

Pepperdata Dashboard

Hadoop Configuration

YOUR EXISTING HADOOP

MapReduce, HBase, etc.

Job Tracker / Resource Manager (Scheduler & YARN)

ETL

Policies

Page 18: Pepperdata's Real-time Hadoop Cluster Optimization

We have mixed workloads that often force us to overprovision Hadoop resources.

Does Pepperdata help us deal with this by allowing Hadoop to adjust dynamically?

Sean SuchterCo-Founder and CEO

Page 19: Pepperdata's Real-time Hadoop Cluster Optimization

Given Pepperdata’s intelligent and dynamic environment,

how does that impact the way we do Hadoop prep or set-up?

Sean SuchterCo-Founder and CEO

Page 20: Pepperdata's Real-time Hadoop Cluster Optimization

How much Hadoop cluster resource does Pepperdata use?

Sean SuchterCo-Founder and CEO

Page 21: Pepperdata's Real-time Hadoop Cluster Optimization

How do customers use the Pepperdata dashboard?

Where is it hosted?

Sean SuchterCo-Founder and CEO

Page 22: Pepperdata's Real-time Hadoop Cluster Optimization

• • • • • • • • ©2014 Pepperdata

PEPPERDATA REAL-TIME ARCHITECTURE

22

VISIBILITY

CONTROL

CAPACITY

Delivers real-time, granular visibility into resource consumption by user, job, and task

Allows user-defined prioritization of Hadoop jobs and automatically allocates resources to ensure jobs run safely

Reclaims wasted capacity and allows mixed workloads to be shared on a single cluster

Developer AnalystFinancial

ReportProduct

Pepperdata Dashboard

Hadoop Configuration

YOUR EXISTING HADOOP

MapReduce, HBase, etc.

Job Tracker / Resource Manager (Scheduler & YARN)

ETL

Policies

Page 23: Pepperdata's Real-time Hadoop Cluster Optimization

How is the Pepperdata approach different from YARN?

Sean SuchterCo-Founder and CEO

Page 24: Pepperdata's Real-time Hadoop Cluster Optimization

Please detail some customer successes from using Pepperdata with Hadoop?

Sean SuchterCo-Founder and CEO

Page 25: Pepperdata's Real-time Hadoop Cluster Optimization

© 2015 Enterprise Integration News, Inc.

For More Information

For More Information

Pepperdata – Rely on Hadoop http://pepperdata.com/

Visibility Capacity Control Technology

Learn More About PepperdataProducthttp://pepperdata.com/products/

Real-Time Architecture http://pepperdata.com/products/#pd-technology

Benefitshttp://pepperdata.com/benefits/

Blog http://pepperdata.com/blog/

Other Pepperdata Resources (Whitepapers & Case Studies)http://pepperdata.com/resources/

Request a Demo http://pepperdata.com/demo/