data scientists are analysts are also software engineers

16
DATA SCIENTISTS AND ANALYSTS ARE ALSO SOFTWARE ENGINEERS W. Whipple Neely Director of Data Science, EA

Upload: domino-data-lab

Post on 20-Mar-2017

19 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data Scientists Are Analysts Are Also Software Engineers

DATA SCIENTISTS AND ANALYSTS ARE ALSO SOFTWARE ENGINEERSW. Whipple NeelyDirector of Data Science, EA

Page 2: Data Scientists Are Analysts Are Also Software Engineers

THIS TALK IS ABOUT …..Moving data science and analytics teams to a software development model.• The motivation is so that we can created repeatable,

verifiable processes.• It also means that we can bring powerful but “personal”

analysis environments (such as R) into producing enterprise level systems, to create work that typical dashboarding systems cannot achieve.

• In many ways this is a story about one set of teams, it may not apply to all groups, but it has helped ours.

Page 3: Data Scientists Are Analysts Are Also Software Engineers

THE TYPICAL VENN DIAGRAM: WHO IS A DATA SCIENTIST

Statistics

Some Version of Domain Expertise

Computer Science

“hacker skills”

Data Scienc

e

“What kind of person does all this? What abilities make a data scientist successful? Think of him or her as a hybrid of data hacker, analyst, communicator, and trusted adviser.”Davenport and Patil, Data Scientist: The Sexiest Job of the 21st Century , Harvard Business Review, 2012

“Hacker skills” is the wrong term

Page 4: Data Scientists Are Analysts Are Also Software Engineers

Click to add call out

GOOGLE IMAGE SEARCH: “WHO DATA SCIENTIST VENN DIAGRAM”

Page 5: Data Scientists Are Analysts Are Also Software Engineers

WHAT WE DO INSTEAD OF WHO WE ARE

Engineering

Collaboration Science

Data Scienc

e

data engineering, coding discipline,

software engineering, style guides

reproducibility, source code control,

regression tests

math, stats, computer science, machine

learning, probability models, economics, “substantive domain

expertise”, vast quantities of common

sense

Rules of engagement, empathy,

communication and listening skills,

flexibility, reliability, extreme social skills

Page 6: Data Scientists Are Analysts Are Also Software Engineers

THE PROBLEMS

We have a team of data scientists who are experts at probability modeling, machine learning, and a few of them are pretty good at programming in R, Matlab or Python on a laptop. However … 1. Most have no experience of team programming2. Many come without experience of creating software that others

can use, or that is robust enough of to run 3. Creating an enterprise-level repeatable process can’t be left to

the kind of programming that most of us do on our laptops 4. There is no easy intermediate step between working on a laptop

and creating something that works on the enterprise platform.

Page 7: Data Scientists Are Analysts Are Also Software Engineers

WHERE WE STARTEDWrite R or

Python Script

Run Script Manually

Update Report

Write R or Python Script

Run Script Manually

Update A Static Model Implementat

ion

OR

Page 8: Data Scientists Are Analysts Are Also Software Engineers

THE PROBLEMS WITH WHERE WE STARTED

• Code/methods/models got lost.• Lots of manual work.• No automated checks for correctness or

robustness of models or predictions.

Page 9: Data Scientists Are Analysts Are Also Software Engineers

WE TALKED TO THE TEAMS ABOUT WHAT WAS WRONG

“Our analysts are pretty good at writing scripts and generating reports, but our team needs help with the bookends: scheduling tasks and serving

the reports automatically” – Colleen Chrisco, Director of Analytics, PopCap Games

Page 10: Data Scientists Are Analysts Are Also Software Engineers

IN TERMS OF OUR DIAGRAM

Engineering

Collaboration Science

Data Scienc

e

data engineering, coding discipline,

software engineering, style guides

reproducibility, source code control,

regression tests

math, stats, computer science, machine

learning, probability models, economics, “substantive domain

expertise”, vast quantities of common

sense

Rules of engagement, empathy,

communication and listening skills,

flexibility, reliability, extreme social skills

Page 11: Data Scientists Are Analysts Are Also Software Engineers

Click to add call out

THIS WAS A LITTLE SCARY FOR SOME OF OUR TEAMS ….

We’re not programmers.

I don’t even know where to

start

I’ve never scheduled a job

before.

Page 12: Data Scientists Are Analysts Are Also Software Engineers

Click to add call out

SO, TO ANSWER THESE CONCERNS WE DID THE FOLLOWING…

Perforce R Server

Script Inputs:csv, DBs, URL, logs,

RDS

Script Outputs:csv, DBs, email, doc, pdf, html,

shiny, RDS

1. Check in CodeP4V, R-Checkin

2. Submit JobSchedule file, API, Web

3. Run Script Reporting, Models, ETLs, Forecasting

R Script

By “we did the following’ I really mean that we hired a brilliant computer scientist named Ben Weber who became part of the team. Ben learned the workflows of the team members and created this system for us.

Page 13: Data Scientists Are Analysts Are Also Software Engineers

WHERE IT LANDED US• We’d automated.• We’d gotten the “bookends” covered.• Many analytics teams, including the data science team are

using the system.

As a result … • Teams started using the technology to improve their work• Teams became more efficient: “I no longer have to be a

walking dashboard.” • Astonishingly these teams now have their routine code in

source control.

Page 14: Data Scientists Are Analysts Are Also Software Engineers

BUT IT DIDN’T SOLVE EVERYTHING

• We had produced more tools, simplified tasks, but hadn’t really created a culture of being a software producing organization.

• We had extended the laptop model … a little by introducing VMs that could run the code.

And giving teams more tools had introduced some issues … • A proliferation of models/predictions being run without

curating the processes. • People leave, and their work continues to be run automatically

…. This is not always a bad thing, but it is often not a good thing either.

Page 15: Data Scientists Are Analysts Are Also Software Engineers

WHAT WE KNEW WE HAD TO DO NEXT

We needed to make a cultural change from what is essentially “hacking” to engineering. • So, we did start hiring people with more software

engineering skills.• Introduced a style guide for our R code. • We started code and project reviews.• Hired a very non-technical writer to start helping the team

produce documentation on our internal Confluence site.• Start providing training in team programming,

engineering, new languages (Spark, Python).• Assign some of the positions on the team to be the

software/coding gurus.

Page 16: Data Scientists Are Analysts Are Also Software Engineers

WHAT’S NEXT• Dev/Test/Prod environments.• Upgrading our toolset to work with Rstudio

Server and Git.• Pair programming: a team member with

software skills as their primary background team programming with a data scientist who has focused on statistical modeling and machine learning.