no-bullshit data science

No-Bullshit Data Science

Szilárd Pafka, PhDChief Scientist, Epoch

Domino Data Science PopupSan Francisco, Feb 2017

Disclaimer:

I am not representing my employer (Epoch) in this talk

I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk

Example #1

Aggregation 100M rows 1M groups Join 100M rows x 1M rows

time [s]

(largest data analyzed)

data size [M]

trainingtime [s]

Gradient Boosting Machines

linear tops off(data size)

(accuracy)

linear tops off

more data & better algo

(data size)

(accuracy)

linear tops off

more data & better algorandom forest on 1% of data beats linear on all data

(data size)

(accuracy)

linear tops off

more data & better algorandom forest on 1% of data beats linear on all data

(data size)

(accuracy)

Example #2

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

- R packages- Python scikit-learn- Vowpal Wabbit- H2O- xgboost- Spark MLlib- a few others

n = 10K, 100K, 1M, 10M, 100M

Training timeRAM usageAUCCPU % by coreread data, pre-process, score test data

Best linear: 71.1

learn_rate = 0.1, max_depth = 6, n_trees = 300learn_rate = 0.01, max_depth = 16, n_trees = 1000

Summary

no-bullshit data science

Technology

why follower count is bullshit

no bullshit! du samedi

ebay bullshit must read

social roi: bullshit & reality

rhetoric and bullshit - undergraduate...

istart - is big data "bullshit"

un bullshit-able

no bullshit virtual book tour & giveaway

content marketing - beyond the bullshit

is growth hacking bullshit?

this is another bullshit document

no bullshit social media - radian6 webinar

gamification: brilliant or bullshit?

15 minutes of social media bullshit

no-bullshit data science - r in...

pepsi bullshit marketing brief

the lean brand: brand ≠ bullshit

infobésité et bullshit

bullshit zine #1

smoking cessation bullshit