scaling analysis responsibly

Post on 14-Apr-2017

7.567 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scaling Analysis Responsibly

Hilary Parker@hspter

#rcatladies

Not So Standard Deviations

@keegsdur

“We just don’t have enough analysts!”

“Let’s scale by building the perfect BI tool!”

That sounds great!

We should automate some of the things that are slowing you down

PRODUCTTEAM

DATA

http://xkcd.com/

That seems perfectly reasonable!

Let’s just enlist some folks from engineering to help you with it

DATAPRODUCTTEAM

DATA ENG

Sure thing!

...and finally can it add this last graph?

several months pass…

ENG

Sure! File a ticket!

Can we add these 132 extra metrics to the testing?

PRODUCTTEAM

You can’t do that, your family-wise error rate will tend to 1!!

ENG PRODUCTTEAM

DATA

ENG

That’s a reasonable expectation for an internal product. I’m on it!

I’d really like this tool to be more stable.

PRODUCTTEAM

Our test violates a subtle statistical assumption for this new application, and we need to gut this stable product!

ENG PRODUCTTEAM

DATA

Almost impossible to avoid 2-against-1 dysfunction as product teams become “self-service” with engineering support

Invariably becomes a race to the bottom as internal competition for the simplest tool emerges

Stability prioritized over flexibility

(In tech)

Building = Owning

Analysis Developer!

“Analysis Developer”

Someone on the analyst team who develops reproducible, flexible analyses in R and helps all analysts scale their work

I’ll work with the analysis developer on my team!

We should automate some of the things that are slowing you down

PRODUCTTEAM

DATA

Avoids common types of dysfunction

Allows for flexible, accurate analysis

Analysts acquire marketable skills!

Instead of creating dashboards or using static BI tools...

http://dilbert.com/strip/2007-05-16

Series of R packages highly specified for business case, “mix and match” elements to rapidly create common reports.

library(“internal_package”)

Instead of “assembly line” data processing…

Close 2-way partnership with data engineers to optimize the creation of datasets for certain common analyses.

The assembly line handoff from scientist to engineer creates [an uncreative] environment. The trick is to create an environment that allows for autonomy, ownership, and focus for everyone involved. - Jeff Magnusson

http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/

Instead of PM anxiously watching dashboards…

https://www.youtube.com/watch?v=CCbWyYr82BM

Analysts can create shorter-lived, reproducible reports

Expectation manage the shorter lifespan of the report, but include that report will require less work from teams once created

Productionize in the short-term with CRON jobs

Can add in more stats this way! Y/Y turns into semiparametric models, etc.

http://dilbert.com/strip/2004-04-05

Instead of promotion based on deliverables…

Consider skill acquisition for analyst promotion

For analysis developers, promoted based on whether or not they were able to help other analysts become more efficient

Support for skill acquisition!

Education support for learning better analysis development methods for all analysts

Internally created resources

Instead of PMs self-teaching analysis based on what’s presented in dashboarding tools..

https://xkcd.com/605/

PMs can use tools for education analysts if they want to “ramp up” on analytical skills like R

This way you can bake in statistical education as well.

“Isn’t this just package development?”

“Isn’t this just package development?”

No!

Ad-hoc spreadsheet work

Ad-hoc spreadsheet work

+ scripting

Ad-hoc spreadsheet work

R workflows

+ scripting

Ad-hoc spreadsheet work

R workflows

+ scripting

+ reproducibility, some functions, “analysis testing”

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

+ scripting

+ reproducibility, some functions, “analysis testing”

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

+ industry-wide audience- company-specific code and functions

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

External package development

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

+ industry-wide audience- company-specific code and functions

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

External package development

+ reproducibility, some functions, “analysis testing”

+ scripting

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

+ industry-wide audience- company-specific code and functions

Analysis Developer

Open-Source Developer

Analysis Developer

Stop trying to scale with static BI tools -- this will (almost) always lead to dysfunction

Instead, scale by increasing analyst efficiency using R and education!

Hire Analysis Developers to help with all this!

Thanks!

Hilary Parker

@hspter

top related