you don't have to be a data scientist to do data science

26
You don’t have to be a Data Scientist to do Data Science @carmenmardiros (not a data scientist)

Upload: carmen-mardiros

Post on 21-Apr-2017

732 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: You Don't Have to Be a Data Scientist to Do Data Science

You don’t have to be a Data Scientist to do Data Science

@carmenmardiros (not a data scientist)

Page 2: You Don't Have to Be a Data Scientist to Do Data Science

“Sexiest job of the 21st century”

Why do I, a mere analyst, care?

Page 3: You Don't Have to Be a Data Scientist to Do Data Science

The appeal of Data Science (for me as an analyst)

Increase confidence

My own and others’ in my analyses as the complexity of data and business ecosystem increases.

Become more productive

Speed up the analysis cycle from exploration to hypothesis to experimentation.

Add value in new ways

As the business and technology landscape changes. Operationalise analysis outcomes as data products.

Page 4: You Don't Have to Be a Data Scientist to Do Data Science

“It’s just not for me...”

“I don’t have a degree in statistics or programming.”

Page 5: You Don't Have to Be a Data Scientist to Do Data Science

No confidence to attend the sessions.

Worried I would not understand the content.

Worried I’d be spotted as a fraud.

(3m into my data science foray)

Understood much of the content and terminology.

Mentally thought questions others asked.

I knew more than I thought I did.

Predictive Analytics Summit 2013 Predictive Analytics Summit 2016

Page 6: You Don't Have to Be a Data Scientist to Do Data Science

Doing data science requires a PhD/going back to school.

Can’t do data science until you can write an algorithm.

Bottom-up is the only way.

Doing data science requires enthusiasm and confidence in ourselves.

Can and should do data science once we’ve conceptually understood how and why the algorithm works.

Top-down works. Provide value, learn as you go.

Myth Truth

Page 7: You Don't Have to Be a Data Scientist to Do Data Science

Adapt. Grow. Stay relevant.

Page 8: You Don't Have to Be a Data Scientist to Do Data Science

Digital Analytics is changing fast

Increasingly scientific

approaches

Essential as we move towards prescriptive analytics at speed.

Become familiar with data

science toolkit

We will be key to bridging the gap between PhDs, machines and management. May even use it ourselves for our day-to-day work.

Future-proof ourselves

MS Office for Machine Learning coming soon at a cloud near you.

Page 9: You Don't Have to Be a Data Scientist to Do Data Science

3 Transformative Data Science techniques

Page 10: You Don't Have to Be a Data Scientist to Do Data Science

#1 Resampling

Page 11: You Don't Have to Be a Data Scientist to Do Data Science

The Bootstrap

Number of observations: 100

Sample is representative (to the best of our knowledge).

Observed mean: 17.54 months

Page 12: You Don't Have to Be a Data Scientist to Do Data Science

The Bootstrap

Draw 100 random samples with replacement.

Calculate for each one the mean:[17.61, 16.21, 17.13, 14.08, 19.58 … ] # 100

Plot all means, the 2.5 and 97.5 percentiles and original observed mean.

Bootstrap is extremely versatile:● Fewer assumptions than parametric

methods.● Can be used on any statistic.

Page 13: You Don't Have to Be a Data Scientist to Do Data Science

Simulations & Sensitivity AnalysisSimple simulation: Given existing distribution of order values and a given range of possible conversion rates , how much £££ would we make if we doubled the traffic to our website?

Sensitivity analysis(or how to open up black boxes): Given a predictive model, randomly generate new data points for each input based on observed distributions, create predictions using the model and interpret distribution of outcome scenarios.

Page 14: You Don't Have to Be a Data Scientist to Do Data Science

Cross Validation

Iterations

1 Train fold Train fold Train fold Train fold Test fold

2 Train fold Train fold Train fold Test fold Train fold

3 Train fold Train fold Test fold Train fold Train fold

4 Train fold Test fold Train fold Train fold Train fold

5 Test fold Train fold Train fold Train fold Train fold

Assesses how well a predictive model generalises to unseen data.

Page 15: You Don't Have to Be a Data Scientist to Do Data Science

Resampling

Protects you from unsound

inference

Acknowledges and mitigates effects of variance and noise in the data.

You already do this when you use confidence intervals. Quantify uncertainty more often.

Paints possible future scenarios

Leverages randomness and probability to give you glimpses into possible future outcomes.

Embrace randomness. It's your ally into prescriptive analytics.

Page 16: You Don't Have to Be a Data Scientist to Do Data Science

#2 Faceted visualisation

Page 17: You Don't Have to Be a Data Scientist to Do Data Science

Segmented view, side-by-side

Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R

Page 18: You Don't Have to Be a Data Scientist to Do Data Science

Segmented view, side-by-side

Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R

Page 19: You Don't Have to Be a Data Scientist to Do Data Science

Segmented view, side-by-side

Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R

Page 20: You Don't Have to Be a Data Scientist to Do Data Science

#3 Feature Engineering

What?!

Page 21: You Don't Have to Be a Data Scientist to Do Data Science

#3 Feature Engineering#3 Calculated Metrics or

Content Groupings?Back on familiar territory.

Page 22: You Don't Have to Be a Data Scientist to Do Data Science

Feature Engineering Examples

Unique content views per user

by content type

# politics content views, # business content views# short/long-form content views

Distribution of content seen

per user

% politics content views in total content viewed adjusted for uncertainty of small samples

Result: fat user-level table of attributes and behaviour for analysis and modelling.

Page 23: You Don't Have to Be a Data Scientist to Do Data Science

Feature Engineering ExamplesInfer trading

calendar activities

from data(for time series

analysis)

# new marketing campaigns (first date with sessions)

# new brands launched (first date with pageviews)

# voucher codes at peak redeem-rate (date with highest redeems)

# AB tests started (date with first events tracked)

# VIPs active on each date, etc

Result: fat date-level table of leading KPIs and activities (model the ecosystem).

Page 24: You Don't Have to Be a Data Scientist to Do Data Science

Feature Engineering

New ways of capturing

underlying phenomena

Seasoned data scientists: Feature engineering often yields higher rewards than pushing the latest algorithms.

You likely already do this, likely in Excel. It’s painful and limiting. Your analytical creativity needs better tools.

SQL: The single most valuable tool in our toolkit. We become self-sufficient analysts.

Page 25: You Don't Have to Be a Data Scientist to Do Data Science

Resources

Page 26: You Don't Have to Be a Data Scientist to Do Data Science

Inspired?

Learn Python https://try.jupyter.org/ -- start learning python for data science right now (no setup!).

https://learncodethehardway.org/python/

Learn Machine Learning

http://machinelearningmastery.com/Understand how algorithms using spreadsheets.Top-down approach. No programming required.

Learn SQL https://learncodethehardway.org/sql/