keynote #2 - operability in hadoop ecosystem @ abdw17, pune

16
© 2015 DataTorrent Confidential Do Not Distribute Big Data & Operability

Upload: datatorrent

Post on 21-Jan-2018

45 views

Category:

Technology


0 download

TRANSCRIPT

© 2015 DataTorrent Confidential – Do Not Distribute

Big Data & Operability

© 2015 DataTorrent Confidential – Do Not Distribute

Agenda

• Big Data so far

• Operability Definition

• Components of Operability

• Laws of Operability

• Guiding Principles

© 2015 DataTorrent Confidential – Do Not Distribute

…Today

… 1990s

…2007-2009

2015-2016

Big Data Journey So Far

MapReduceBatchScale-Out

Data At RestDataBasesScale-Up

Data In MotionReal-Time StreamingScale-Out

© 2015 DataTorrent Confidential – Do Not Distribute

Productization & Operations of Big Data

• Big Data is neither Productized nor Operationalized

• Total Cost of Ownership (TCO) =• Cost to Develop + Cost to Launch + Cost of ongoing Operations

• Time to Value• Time to Develop + Time to Test/Launch + Continue to extract value

© 2015 DataTorrent Confidential – Do Not Distribute

Operability Definition

• Can the enterprise operate the product/application to meet SLA

under planned total cost of ownership?

© 2015 DataTorrent Confidential – Do Not Distribute

Operability Components

• SLA• Latency

• Resources

• Uptime

• Fault Tolerance and High Availability

• SecOps: Security and Certifications; Laws

• Resource Cost: Scalability and Performance

• DevOps: Ease of Integration and native operational support

• Operational Expertise

• Maintenance: Ease of Upgrading and Backward Compatibility

© 2015 DataTorrent Confidential – Do Not Distribute

Laws of Operability for a Pipeline

• What are the laws of operability?

• Why are these laws even needed?

• Measure

• Predict/Forecast

• Architectural Decisions

• Evaluate Impact of Native Hadoop Applications on Operations

© 2015 DataTorrent Confidential – Do Not Distribute

Uptime

Job A

Job B

Pipeline

Hadoop

Uptime = 95%

Cluster Y

Uptime = 95%

Uptime for Non Native Hadoop Pipeline

Job C

Cluster X

Uptime = 95%

Job A Job B Job C

1- Non Native Hadoop Pipeline

Job A

Job BJob C

Hadoop

Uptime = 95%

2 - Native Hadoop Pipeline

• Cluster X = 365 * .95 = 347 days

• Hadoop = 347 * .95 = 329 days

• Downtime of 52 days

Uptime for Native Hadoop Pipeline

• Cluster Y = 329 * .95 = 312 days

• Hadoop = 365 * .95 = 346 days

• Downtime of 18 days

© 2015 DataTorrent Confidential – Do Not Distribute

Cost Structure

Job A

Job B

Pipeline

Hadoop Cluster Y

Job C

Cluster X

Job A Job B Job C

1: Non Native Hadoop Pipeline

Job A

Job BJob C

Hadoop

2: Native Hadoop Pipeline

Cost for non Native Hadoop pipeline

Resources needed for cluster X

+ Resources needed for Cluster Y

Cost of Native Hadoop pipeline

Resources needed for Hadoop (already invested)

Resources = Machines (Hardware + Software) + Human (Expertise)

+ Resources needed for Hadoop

© 2015 DataTorrent Confidential – Do Not Distribute

Single Point of Failure

Job A

Job B

Pipeline

Job C

Job A Job B Job C

1: Non Native Hadoop Pipeline

Job A

Job BJob C

2: Native Hadoop Pipeline

???

???

Job A

Job CCluster X Hadoop Cluster Y

Hadoop

No Single Point of Failure Pipeline 1 = No (Cluster X) and Yes (Hadoop) and Yes (Cluster Y) = No

No Single Point of Failure Pipeline 2 = Yes (Hadoop) = Yes

© 2015 DataTorrent Confidential – Do Not Distribute

Ease of Integration and DevOps

Job A

Job B

Pipeline

Job C

Job A Job B Job C

1: Non Native Hadoop Pipeline

Job A

Job BJob C

2: Native Hadoop Pipeline

???

???

Job A

Job CCluster X Hadoop Cluster Y

Hadoop

Fully Integrated with DevOps Tools for Pipeline 1 = No (Cluster X) and Yes (Hadoop) and Yes (Cluster Y) = No

Fully Integrated with DevOps Tools for Pipeline 2 = Yes (Hadoop) = Yes

© 2015 DataTorrent Confidential – Do Not Distribute

Laws of Operability

Job 1

Pipeline

Cluster 1

Infrastructure for (Big Data) Pipeline Processing

Uptime = U1 * U2 * … * Un

Job 2

Job …

Job n

Job 1

Cluster 2

Job 2

Cluster …

Job …

Cluster n

Job n…

Cost = C1 + C2 + … + Cn

No Single Point of Failure = S1 and S2 and … and Sn

Easy to Integrate = I1 and I2 and … and In

© 2015 DataTorrent Confidential – Do Not Distribute

Cost Structure for Big Data Products (Fortune 50)Functional Design• Read File

• Hash Join

• Apply Rules

• Write Results

• Upload Results

Operational Design• Parallel read

• Parallel write

• Skew analysis of entire data flow

• Which design meets SLA

• Analyze single point of failure

• Bottleneck analysis: Data, Compute, CPU, Memory, Disk, I/O

• Node outage, fault tolerance. Data center outage?

• Multi-Tenancy : Multiple apps at the same time

• Uptime analysis

• Infrastructure requirements and design (Hadoop grid, and node design)

• Error handling

• Alerts, and escalation policy

• Integration with current monitoring: Webservices

• Launch runbooks

• Testing and certification design+infrastructure

• Upgrade path, runbooks

• Audit: Intermediate results

• Versioning and backward compatibility

• Expertise, and outsourcing

• Support structure, escalation

• Pre and Post launch support. Ongoing cost

• DevOps Training

• Security and Access

• …

Functional Cost < 20% 80% < Operational Cost

© 2015 DataTorrent Confidential – Do Not Distribute

Guiding Principles of Operability

Job 1

Pipeline

Cluster 1

Infrastructure for (Big Data) Pipeline Processing

Cost = At most 20% Functional, at least 80% Operational

Job 2

Job …

Job n

Job 1

Cluster 2

Job 2

Cluster …

Job …

Cluster n

Job n…

Operability has to first class citizen of a platform. It cannot be slapped on

Operability is inversely proportional to the number of hops (clusters) in a pipeline

Operability is vastly higher if taken care of by the platform as opposed to user code

Operability is design decision on day 1

© 2015 DataTorrent Confidential – Do Not Distribute

Operability – The Graveyard of Big Data Projects

© 2015 DataTorrent Confidential – Do Not Distribute

Impact of Operability

Customers do not just pay for software

They bet their career on it

An Operable Software = Successful Launch

= Low TCO and Short TTV