the dataops transformation · devops has resulted in a transformative improvement in software...

71
The DataOps Transformation www.datakitchen.io

Upload: others

Post on 23-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

The DataOps Transformation

www.datakitchen.io

Page 2: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Topics

Why DataOps Is Essential

Seven Steps to DataOps

Next Steps With DataOps

Page 3: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Strategic Trend: DataOps

• High Data Analytic Project Failure Rate

• Increased rate of market adoption of DataOps principles by leaders of data and analytic teams

• Gartner Hype Cycle in late 2018

Page 4: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Data Analytics has a dirty secret:

FAILURE

Data Analytics is Hot:

• ‘New Oil’ amount of data increasing fast

• Buzz: Big Data, Data Science, Data Lakes, Machine Learning, AI, etc.

With Rampant Failure:

• 87% of data science projects never get to production.

• Data analytics investment up, yet “data driven” organizations down 37% to 31%

• 60% of all data analytic projects fail

• 79% of data projects have too many errors

Page 5: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Challenge: Your Team’s Time Not Well Spent

Percentage

Time Team

Spends Per

Week

Current

Errors & Operational Tasks

New Features & Data For Customers

Improvements & Debt

Challenges:

• Many people involved

• Complex toolchains

• Innovative insights cannot be

delivered at the speed of

business

• Collaboration and

coordination across teams,

locations, and clouds/data

centers

Page 6: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Challenge: Data Analytics is like the US Auto Industry in the 1970s

Current

High ErrorsProduction

Errors

Data Analytics

Team

Deployment

LatencyWeeks, Months

Dev Prod

Challenges:

• Slow to add new features,

rapidly address consumer

requests, changing data sets

• Errors in dashboards, reports,

and data pipelines - lack of

trust

• Lacking pipeline execution and

visibility on premise, multi-

cloud, etc.

• Team morale

Page 7: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

DevOps has resulted in a transformative improvement in Software Development

• High-performing IT

organizations deploy 200

times more frequently

• They have 24 times faster

recovery times and three

times lower change

failure rates

• And they spend 22

percent less time on

unplanned work and

rework

Source: State of DevOps Report

Page 8: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Lean has resulted in a transformative improvement in manufacturing

• Lean manufacturing improves

efficiency, reduces waste, and

increases productivity.

• The benefits are manifold:

• Increased product quality

• Reduces rework

• Employee satisfaction

• Higher profits

Page 9: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Focus on Operations, Not The Next Feature

“We realized that the true problem, the true difficulty, and where

the greatest potential is – is building the machine that makes the machine. In other words, it’s building the factory. I’m really thinking of the factory like a product.” -- Elon Musk

“If any engineer has to choose between working on a feature or working

on dev productivity, always choose developer productivity.’ --Satya Nadella

“The average salary for a DevOps pro is roughly 20% more than a traditional software developer” -- Will Markow … a labor market analysis firm

“At Google, we have over 2,000 engineers who contribute to Engineering Productivity.”

Page 10: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Currently, Teams Have High Errors

DataKitchen/Eckerson Survey (May 2019)

Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad

Page 11: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Currently, Teams Struggle to Deploy

DataKitchen/Eckerson Survey (May 2019)

Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad

Page 12: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Figure 1: Only a small fraction of real-world ML systems is

composed of the ML code, as shown by the small black

box in the middle. The required surrounding infrastructure

is vast and complex.

Google

Advances in Neural Information Processing Systems 28 (NIPS 2015)

Page 13: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Business Need

Prep Data

Feature Extraction

Build Model

Evaluate Model

Deploy Model

Monitor Model

Iterate, Test and Improve

Model building

Page 14: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Why are world class, game changing data analytics so difficult?

Challenges:

• Innovative insights cannot be delivered at the speed of business

• Data Quality can be compromised

• New innovative technologies are difficult to test, deploy and leverage

Dichotomies:

• Develop & protect production

• Experimentation & reproducibility

• Central control & self service

• Group sharing & individual control

• Reuse & branching

Key Observation: Innovation Requires

Iteration, And Iteration Is Hard

Page 15: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

From To

Change Fear Change Velocity

Manual Operations Automated Operations

Hope For Quality Integrated Quality

Hero Mentality Repeatable Processes

Tool Centric Code Centric

Vendor Lock-In Diverse Tools

How To Succeed?

A Mindset Change to DataOps…

…to power your highly agile data culture.

Page 16: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

DataOps – Transformative to Data Analytics

DataOps – The technical practices, cultural norms, and architecture that enable:

• Rapid experimentation and innovation for the fastest delivery of new insights to our customers

• Low error rates

• Collaboration across complex sets of people, technology, and environments

• Clear measurement and monitoring of results

Source: Gartner

“Organizations that adopt a DevOps- and DataOps-based

approach are more successful in implementing end-to-end,

reliable, robust, scalable and repeatable solutions.”

Sumit Pal, Gartner, November 2018

People,

Process,

Organization

Technical

Environment

Page 17: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Topics

Why DataOps Is Essential

Seven Steps to DataOps

Next Steps With DataOps

Page 18: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

What to do? Seven Steps to DataOps (+3)

1. Orchestrate Two Journeys

2. Add Tests And Monitoring

3. Use a Version Control System

4. Branch and Merge

5. Use Multiple Environments

6. Reuse & Containerize

7. Parameterize Your Processing

+ Three (Architecture, Metrics and Inter/Intra Team Collaboration)

Page 19: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Orchestrate data to customer valueAnalytic process are like manufacturing: materials (data) and production outputs (refined data, charts, graphs, model)

Access: Python Code

Transform:SQL Code, ETL

Model:

R Code

Visualize: Tableau

Workbook

Report: Tableau Online

Page 20: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Speed deployment to productionAnalytic processes are like software development: deliverables continually move from development to production

Data

Engineers

Data

Scientists

Data

Analysts

Diverse Team

Diverse Tools

Diverse Customers

Business

Customer

Products &

Systems

Page 21: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Innovation and Value Pipeline TogetherFocus on both orchestration and deployment while automating & monitoring quality

Don’t want break production

when I deploy my changes

Don’t want to learn about data quality issues from my customers

Page 22: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Add Automated Monitoring And Tests

Move Fast and Count Things

Monitoring: To ensure that

during in the Value Pipeline, the

data quality remains high.

Tests: Before promoting work,

running new and old tests gives

high confidence that the change did

not break anything in the

Innovation Pipeline

Page 23: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Automate Monitoring & Tests In Production

Test Every Step And Every Tool in Your Value Pipeline

Are your outputs

consistent?

And Save Test Results!

Are data inputs

free from

issues?

Is your business logic

still correct?

Access: Python Code

Transform:SQL Code, ETL

Model:

R Code

Visualize: Tableau

Workbook

Report: Tableau Online

Page 24: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Support Multiple Types Of Tests

Testing Data Is Not Just Pass/Fail in Your Value Pipeline

Support Test Types

• Error – stop the line

• Warning – investigate later

• Info – list of changes

Keep Test History

• Statistical Process Control

Page 25: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Example Tests (Basic)❷

Page 26: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Example TestsSimple

Page 27: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Example TestMore Complex

Make sure all

table counts are

the same in the

production and

development

environment

Page 28: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Example Test (Location Balance)❷

Access: Python Code

Transform:SQL Code, ETL

Model:

R Code

Visualize: Tableau

Workbook

Report: Tableau Online

sou

rce

1 million rows

da

tab

ase

1 million rows

300K facts

700K dimensions

rep

ort

300K facts

700K dimensions

Page 29: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Example Test (Historical Balance)❷

SKU Product Product Group VolumeSKU1 P1

G1100

SKU2 P1 50SKU3 P2 75SKU4 P3

G2125

SKU5 P4 200SKU6 P5 25

575

Production

Data, Pipeline

& Environment

Pre-Production

Data, Pipeline

& Environment

SKU Product Product Group VolumeSKU1 P1

G1

101SKU2 P1 55SKU3 P2 76SKU4 P3 126SKU5 P4

G2200

SKU6 P5 29587

Access: Python Code

Transfor

m: SQL Code, ETL

Model:

R Code

Visualize:

Tableau Workbook

Report

: Tableau

Online

Access: Python Code

Transfor

m: SQL Code, ETL

Model:

R Code

Visualize:

Tableau Workbook

Report

: Tableau

Online

HistbalG1 225G2 350

HistbalG1 358G2 229

Page 30: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Production Testing and Monitoring

Lower Your Error Rates and Embarrassment!

[HOW] Test and Monitor:

• Automatically, in production

• On top of the entire tool chain

• Send alerts / notification

• Keep track of history

• Make it easy to create tests

[WHAT] Test Types:

• Traditional Data Quality

• Statistical Process Control

• Location Balance Test

• Historic Balance Test

• Business Based Tests

[WHY] Benefits:

• Less Errors

• More Innovation

• More Customer Data Trust

• Less Stress

• Less Embarrassment

Page 31: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

For the Innovation Pipeline

Tests Are For Also Code: Keep Data Fixed

Deploy Feature

Run all tests here before

promoting

Page 32: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Automated ‘Tests’ Serve

a Dual Purpose:

1. Data Tests and

Monitoring in

Production

2. Regression, Functional

and Performance Tests

in Development

Data Fixed Data Variable

Code Fixed Value Pipeline

Code VariableInnovation

Pipeline

Quality Your

Customer

Receives

= f (data, code)

https://medium.com/data-ops/disband-your-impact-review-

board-automate-analytics-testing-42093d09fe11

Duality of Tests❷

Page 33: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Use a Version Control System

At The End Of The Day, Analytic Work Is All Just Code

Access: Python Code

Transform: SQL Code, ETL Code

Model:

R Code

Visualize: Tableau

Workbook XML

Report: Tableau Online

Source Code Control

Page 34: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Branch & Merge

Source Code Control

Branching & Merging enables people to safely work on their own tasks

Page 35: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Example Branch And Merge Pattern

Sprint 1 Sprint 2

f1 f2

f3

main / master / trunk

f5

Page 36: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Access: Python Code

Transform: SQL Code, ETL Code

Model:

R Code

Visualize: Tableau

Workbook XML

Report: Tableau Online

Use Multiple Environments

Analytic Environment

Your Analytic Work Requires Coordinating Tools And Hardware

Page 37: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Use Multiple Environments

Provide an Analytic Environment for each branch

• Analysts need a controlled environment for their experiments

• Engineers need a place to develop outside of production

• Update Production only after all tests are run!

Page 38: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Sandboxes Are Complex

Analytic Environment

Data Engineers,

Scientists or Analytics Team’s Analytic Tools

R(model)

Alteryx(business ETL)

Redshift

(data)

SQL(ETL)

Hardware & Network

Configurations

Right Hardware and

Software VersionsTableau(workbook)

Python

Test Data Sets

Code Branch

Test Result

History

Analytic Environment/

Development Sandbox

Creation is Complex:

Hard to create the

right set of data,

tools, people,

history and

configuration for a

fast build test debug

cycle

Page 39: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Reuse & Containerize

Containerize1. Manage the environment for each

component (e.g. Docker, AMI)

2. Practice Environment Version Control

Reuse

1. Do not create one ‘monolith’ of code

2. Reuse the code and results

Page 40: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Parameterize Your Processing

Think Of Your Value Pipeline Like A Big Function

• Named sets of parameters will increase your velocity

• With parameters, you can vary”

• Inputs

• Outputs• Steps in the workflow

• You can make a time machine

• Secure storage for credentials

Page 41: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

The Seven Steps In Action

1. Select story

2. Create branch

3. Create environment

4. Implement feature

5. Write new tests

6. Run new and existing tests

7. Check in to branch

8. Merge to parent

9. Delete environment

When sprint ends

• Deliver all completed features to customer

• Merge sprint branch to master

• Roll un-merged features into the next sprint

Page 42: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

The 7 Steps and Data Science

Journeys Tests Version Control Branch and Merge Environments Reuse / Containerize Parameterize

Business Need Agile

Prep Data x x x x x x x

Feature Extraction x x x x x x x

Build Model x x x x x x x

Evaluate Model x

Deploy Model x x x x x x x

Monitor Model x

Page 43: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Three Bonus Steps!

• DataOps Data Architecture

• DataOps Collaboration

• Inter and Intra Team

• DataOps Measurement

Page 44: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Why a DataOps Centric Architecture?

• Canonical Data Architectures only think about production, not the process to make changes to production.

• A DataOps Data Architecture makes the steps to change what is in production a “central idea.”

• Think first about changes over time to your code, your servers, your tools, and monitoring for errors are first class citizens in the design.

• Why?

• It is a little like designing a mobile phone with a fixed battery.

• You can end up with processes characterized by unplanned work, manual deployment, errors, and bureaucracy.

• A ‘Right to Repair’ Data Architecture

Page 45: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Canonical Data Architecture

Does not reflect collaboration and operations

Pro

du

cti

on

En

vir

on

me

nt

Source DataData

Customers

Raw

Lake

Data

Engine-

ering

Refined

Data

Data

ScienceData

Viz.

Data

Govern-

ance

Page 46: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

DataOps Data Architecture

Spans tool chain & environments

Clo

ud

/On

-Pre

m

Pro

du

cti

on

En

vir

on

me

nt

Te

st

De

v

Source DataData

Customers

Raw

Lake

Data

Engine-

ering

Refined

Data

Data

ScienceData

Viz.

Data

Govern-

ance

Orchestrate, Monitor, Test

Orchestrate, Monitor, Test

Orchestrate, Monitor, Test

DataKitchen DataOps

Storage

&

Version

Control

History &

Metadata

Auth &

Permissions

Environ-

ment

Secrets

DataOps

Metrics &

Reports

Auto

mate

d

De

plo

ym

ent

Environm

ent C

reatio

n

and M

anagem

ent.

DataOps Team

Page 47: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Page 48: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

A Cornucopia of Collaboration Complexity

D D

P

D

D

D D

D

D

D

P

D

P

P

D Development - Data Analytic Team P Production - Data Analytic Team

Centralized Dev Centralized Dev &

Prod

Decentralized Dev Decentralized Dev &

Prod

How do we create together

without conflicts?

How do we deploy safely and

rapidly?

How to balance centralized

control vs self service freedom?

How to reuse/incorporate what

another team deployed?

Data Engineer & Data

Scientist

Data Team and Production

Team

Home Office Data Team and

Line of Business Analysts

Multiple Data & Production

Teams in Many Orgs

DS

DE

BI

DG

Page 49: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

A Cornucopia of Collaboration Complexity

D D

P

D

D

D D

D

D

D

P

D

P

P

D Development - Data Analytic Team P Production - Data Analytic Team

Centralized Dev Centralized Dev &

Prod

Decentralized Dev Decentralized Dev &

Prod

How do we create together

without conflicts?

How do we deploy safely and

rapidly?

How to balance centralized

control vs self service freedom?

How to reuse/incorporate what

another team deployed?

Data Engineer & Data

Scientist

Data Team and Production

Team

Home Office Data Team and

Line of Business Analysts

Multiple Data & Production

Teams in Many Orgs

DS

DE

BI

DG

Page 50: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

A Cornucopia of Collaboration Complexity

D D

P

D

D

D D

D

D

D

P

D

P

P

D Development - Data Analytic Team P Production - Data Analytic Team

Centralized Dev Centralized Dev &

Prod

Decentralized Dev Decentralized Dev &

Prod

How do we create together

without conflicts?

How do we deploy safely and

rapidly?

How to balance centralized

control vs self service freedom?

How to reuse/incorporate what

another team deployed?

Data Engineer & Data

Scientist

Data Team and Production

Team

Home Office Data Team and

Line of Business Analysts

Multiple Data & Production

Teams in Many Orgs

DS

DE

BI

DG

Page 51: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

A Cornucopia of Collaboration Complexity

D D

P

D

D

D D

D

D

D

P

D

P

P

D Development - Data Analytic Team P Production - Data Analytic Team

Centralized Dev Centralized Dev &

Prod

Decentralized Dev Decentralized Dev &

Prod

How do we create together

without conflicts?

How do we deploy safely and

rapidly?

How to balance centralized

control vs self service freedom?

How to reuse/incorporate what

another team deployed?

Data Engineer & Data

Scientist

Data Team and Production

Team

Home Office Data Team and

Line of Business Analysts

Multiple Data & Production

Teams in Many Orgs

DS

DE

BI

DG

Page 52: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

A Cornucopia of Collaboration Complexity

D D

P

D

D

D D

D

D

D

P

D

P

P

D Development - Data Analytic Team P Production - Data Analytic Team

Centralized Dev Centralized Dev &

Prod

Decentralized Dev Decentralized Dev &

Prod

How do we create together

without conflicts?

How do we deploy safely and

rapidly?

How to balance centralized

control vs self service freedom?

How to reuse/incorporate what

another team deployed?

Data Engineer & Data

Scientist

Data Team and Production

Team

Home Office Data Team and

Line of Business Analysts

Multiple Data & Production

Teams in Many Orgs

DS

DE

BI

DG

Page 53: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Chris – DataOps

Engineer

Eric – Production

Engineer

Betty – Data

Engineer

This Is A Multi Step, Multi Person, Multi

Environment Process To Make this Request a

Reality

Challenges:

• How to leverage best practices and re-use?

• How to collaborate and coordinate work?

• How to ease movement between team

members with many tools and environments?

• How to maintain security?

• How to automate work and reduce manual

errors?

DataOps and Intra-Team Coordination

Pat – Data

Scientist

Page 54: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Chris – DataOps

Engineer

Production Environment:

• Separate

Hardware/Software

Environment

• Secure

• No Access By

Developers

• Managed by Eric

• Separate Credentials

PRODUCTION

DEVELOPMENT

Development

Environment:

• Separate

Hardware/Software

Environment

• Secure

• Access By Data

Engineers, Data

Scientists, Analysts and

DataOps Engineers

• Setup by Chris (DataOps

Engineer)

• Separate Credentials

Eric – Production

Engineer

Betty – Data

Engineer

Intra-Team Coordination

Pat – Data

Scientist

Page 55: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Inter-Team Coordination

Multiple organizations, perhaps multiple bosses

Page 56: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Inter-Team Coordination

Multiple locations, orgs, tasks

Page 57: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Inter-Team Coordination: Two Locations, Multiple Tools

Home Office Team Local Office ‘Self-

Service’ Team

VP Marketing

Data

Engineer

Data

Scientist

Centralized, Weekly Cadence of Changes

Data

Analyst

Distributed, Daily/HourlyCadence of Changes

Boston

New Jersey

Page 58: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Challenges With Coordination

Data

Engineer

Data

Scientist

Data

Analyst

Make a change in schema? Break Reports?

Add New Data SetsNot Available For All?

Change Report Calculations Inconsistencies?

New Data & Schema Update/New Report Not Working?

Home Office Team Local Office ‘Self-

Service’ Team

VP Marketing

Page 59: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Shared Result, Separate Responsibilities

Home Office Team

Data

Engineer

Data

Scientist

Local Office ‘Self-

Service’ Team

Calculate: SQL

Segment:

Python

Transform

SSIS

Load

SQLServerDeploy: Tableau

Publish:

T Server

Add Data

Alteryx

Data

AnalystVP Marketing

Page 60: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Overall Orchestration

Home Office Team

Data

Engineer

Data

Scientist

Local Office ‘Self-

Service’ Team

Calculate: SQL

Segment:

Python

Transform

SSIS

Load

SQLServerDeploy: Tableau

Publish:

T Server

Add Data

Alteryx

Data

AnalystVP Marketing

Page 61: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

DataOps Process Analytics

Analytic teams are not very analytic about measuring and improving their internal work

• Prove your teams’ value, measure:

• Team and individual productivity

• Production error rates

• Data provider error rates

• SLAs

• Production deployment rates

• Release environments

• Tests Coverage

• Customizable with data export to fit your company’s needs

Page 62: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

DataOps: data about data

Statistical process control graphs monitoring “bad IDs” and raw row counts

Page 63: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Error Rates

Decline in

Production

On Time Delivery

within SLA, &

decreasing build

time

Per Project

Analytics

Productivity:

Recipe Work

Increasing

Team

Collaboration

Increased Deploys Between

Environments

Increased

Number of

Automated Tests

Increasing

Page 64: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Topics

Why DataOps Is Essential

Seven Steps to DataOps

Next Steps With DataOps

Page 65: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Where to Start With DataOps?

Look to manufacturing/DevOps ‘Theory of Constraints’

• Where are ‘bottlenecks’ (or constraints in your data science or analytic process?

• What impedes from creating new insight for you customers?

• Iterate & improve

Page 66: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Where to Start

Part 1: [Factory] “I don’t want to learn about data quality issues from my customers”

Part 2: [Flow] “I don’t want break production when I deploy my changes”

Part 3: [Intra-team coordination] “I don’t want my team to struggle working together”

Part 4: [Inter-team coordination] “I don’t like the Hatfields vs Mccoys war between different analytic teams”

Part 5: [Management] “How to measure team progress with and show results to leadership?”

Errors : Bottleneck / Constraint

Deployment : Bottleneck / Constraint

Team Coordination : Bottleneck / Constraint

Page 67: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Where to start

Part 1: [Factory] “I don’t want to learn about data quality issues from my customers”

Part 2: [Flow] “I don’t want break production when I deploy my changes”

Part 3: [Intra-team coordination] “I don’t want my team to struggle working together”

Part 4: [Inter-team coordination] “I don’t like the Hatfields vs Mccoys war between different analytic teams”

Part 5: [Management] “How to measure team progress with and show results to leadership?”

Errors, Deployment, and Team

Coordination Are All Bottlenecks That

GOAL: Flow of Innovation

Page 68: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Write down your answer

What can I take from

this session and apply

tomorrow?

Page 69: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

DataOps Benefit: Time Well Spent

After DataKitchen

Percentage

Time Team

Spends Per

Week

Before DataKitchen

New Features & Data For Customers

Errors & Operational Tasks

New Features & Data For Customers

Improvements & Debt

Errors & Operational Tasks

Process Improvements & Tech Debt Reduction

Page 70: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

DataOps Benefit: Faster, Better & Happier

After DataKitchenBefore DataKitchen

High ErrorsProduction

Errors Low Errors

Data Analytics

Team

Deployment

LatencyWeeks, Months

Dev Prod

Hours & Mins

Dev Prod

Page 71: The DataOps Transformation · DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently •

Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.

Learn More about DataKitchen & DataOps

• For these slides, contact me:

• cbergh at datakitchen dot io

• DataOps Manifesto:

• http://dataopsmanifesto.org

• Free DataOps Cookbook:

• https://www.datakitchen.io/dataops-cookbook-main.html

• Excerpt from New Unicorn Project Book on DataOps

• https://www.datakitchen.io/unicorn-project.html