data science: not just for big data

7
Revolution Confidential Data Science Not just for big data! David Smith Revolution Analytics @revodavid October 16, 2013

Upload: revolution-analytics

Post on 26-Jan-2015

107 views

Category:

Technology


1 download

DESCRIPTION

From the webinar presentation "Data Science: Not Just for Big Data", hosted by Kalido and presented by: David Smith, Data Scientist at Revolution Analytics, and Gregory Piatetsky, Editor, KDnuggets These are the slides for David Smith's portion of the presentation. Watch the full webinar at: http://www.kalido.com/data-science.htm

TRANSCRIPT

Page 1: Data Science: Not Just For Big Data

Revolution Confidential

Data ScienceNot just for big

data!David SmithRevolution Analytics@revodavid

October 16, 2013

Page 2: Data Science: Not Just For Big Data

Revolution Confidential

2

Big Data: the new oil?

Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0

Page 3: Data Science: Not Just For Big Data

Revolution Confidential

3

Big Data is just raw material

Data Distillation Extract quantities of interest Find complete cases Derive missing information

Big Data Pitfalls: Data cleanliness & accuracy Observational bias

Do the data I have represent the population I’m interested in?

Page 4: Data Science: Not Just For Big Data

Revolution Confidential

4

Surveys & Experiments

Even with Big Data, the data you need isn’t always in the building!

… so ask (survey)! Survey design Stratified sampling

… or experiment! A/B Testing Experimental Design

Page 5: Data Science: Not Just For Big Data

Revolution Confidential

5

Data Exploration & Visualization

Limited by pixels Big data = a big black

blob Extract signal from

noise Aggregations Heat maps Smoothing Small multiples

Page 6: Data Science: Not Just For Big Data

Revolution Confidential

6

Statistical Modeling & Forecasting

You don’t always need big data Sampling can help with observational bias

Model selection Feature extraction Confounding? Interactions?

Model validation Overfitting

Prediction Extrapolation Confidence

http://xkcd.com/605/

Page 7: Data Science: Not Just For Big Data

Revolution Confidential

7

Summary

Big Data is great, but think of it as the “raw materials” for data science After refining, “big” isn’t always so “Big”

Use statistical insight to avoid pitfalls: Inferences: Observational bias / Sampling bias Predictions: Confounding / Overfitting Think about variances and means (risk!)

Some data scientists may miss these issues Look for statistical expertise

Further reading: ComputerWorld: 12 predictive analytics screw-ups