agile data science - agile tour lithuania...

15
Agile Data Science Waclaw Kusnierczyk | [email protected] Slide 1 of 15

Upload: others

Post on 25-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

Agile Data ScienceWaclaw Kusnierczyk | [email protected]

Slide 1 of 15

Page 2: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

Outline1. What is Agile Data Science?

2. What is Agile?

3. What is Data Science?

4. What is Agile Data Science?

5. What Agile Data Science Is Not?

Slide 2 of 15

Page 3: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Agile Data Science?Everybody knows what agile and data science are.

'Data science' is no longer a buzzword (or is it?), so we need another one.What's better for insight into agile data science than agile data science?

Slide 3 of 15

Page 4: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Agile Data Science?What does Google search tell us about agile data science?

Question | Have you ever heard about agile data science?

Slide 4 of 15

Page 5: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Agile?Agile is...

alternative to traditional sequential development;meant to help in responding to unpredictability;incremental, iterative work with empirical feedback;focused on potentially shippable product increments in short iterations;continually revisiting every aspect of development throughout thelifecycle;continuously re-evaluating the direction with the possibility of change.

Slide 5 of 15

Page 6: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Agile?Agile is...

a well-established software development methodology;somewhat unpopular before Christmas.

Slide 6 of 15

Page 7: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Data Science?Data science is...

the "Sexiest Job of the 21st Century";particularly exciting after vacation.

Slide 7 of 15

Page 8: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Data Science?Data science involves:

extraction, transformation, loading of data (ETL), cleaning, preprocessing;exploratory data analysis (EDA);statistical inference and modeling, machine learning;model testing, validation, tuning, optimisation;visualisation, delivery of insight;design and prototyping of data-based products.

Slide 8 of 15

Page 9: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Data Science?A data scientist is...

better statistician than most programmers;better programmer than most statisticians.

Slide 9 of 15

Page 10: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Agile Data Science?Agile data science...

focuses on delivery of insight and predictions ef�ciently;creates research plans to build MVPs;prefers simplistic but ef�cient models to elaborate but slow ones;uses off-the-shelf tools to the extent possible;evaluates the results against business objectives.

Slide 10 of 15

Page 11: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Agile Data Science?Agile data science requires modular, fast architecture for storing andprocessing event and other data.

Slide 11 of 15

Page 12: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What is Agile Data Science?Agile data science requires mixed competence teams.

Very roughly,

a data engineer focuses on data;a data scientist focuses on models;a data analyst focuses on insights.

Question | Have you heard about a full-stack data scientist?

Slide 12 of 15

Page 13: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What Agile Data Science Is Not?Some actual responses from data scientists:

On my laptop, the results were better.

The code ran successfully in PyCharm.

Change alpha to 0.2 in line 324 of the script.

I'm not sure what alpha was.

I'm not sure what alpha is.

The code seemed to run �ne before the last update I committed.

You need to install RServe on your production cluster.

Slide 13 of 15

Page 14: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

What Agile Data Science Is Not?Some actual responses from data scientists:

The library I import is used by many,

why would I test the code.

It worked but that library changed.

It worked with a small batch of data.

It makes no sense to try linear

regression.

The new model may perform better

than the old one.

I can tell you how to feed the model.

Let me recompile those LaTeX report

sources for you.

Slide 14 of 15

Page 15: Agile Data Science - Agile tour Lithuania 20162016.agileturas.lt/.../10/5.3-2-Waclaw-Kusnierszyk-Agile-Data-Science… · What Agile Data Science Is Not? Some actual responses from

Thank you!

Slide 15 of 15