no-bullshit data science - r in financepast.rinfinance.com/agenda/2017/talk/szilardpafka.pdf · -...

103

No-Bullshit Data Science Szilárd Pafka, PhD Chief Scientist, Epoch R/Finance Conference Chicago, May 2017

Upload: others

Post on 27-Apr-2020

8 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

No-Bullshit Data Science

Szilárd Pafka, PhDChief Scientist, Epoch

R/Finance ConferenceChicago, May 2017

Page 2: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 3: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Disclaimer:

I am not representing my employer (Epoch) in this talk

I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk

Page 4: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 5: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 6: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 7: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 8: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Example #1

Page 9: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 10: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 11: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 12: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 13: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 14: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 15: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 16: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 17: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 18: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 19: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 20: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 21: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 22: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 23: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 24: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 25: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 26: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 27: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 28: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

https://deads.gitbooks.io/paratext-bench/content/teaser.html June 2016

https://deads.gitbooks.io/paratext-bench/content/teaser.html

https://deads.gitbooks.io/paratext-bench/content/teaser.html

Page 29: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Aggregation 100M rows 1M groups Join 100M rows x 1M rows

time [s]

time [s]

Page 30: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

(largest data analyzed)

Page 31: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

(largest data analyzed)

Page 32: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

(largest data analyzed)

Page 33: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 34: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

data size [M]

trainingtime [s]

10x

Gradient Boosting Machines

Page 35: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 36: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 37: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off(data size)

(accuracy)

Page 38: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off

more data & better algo

(data size)

(accuracy)

Page 39: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off

more data & better algorandom forest on 1% of data beats linear on all data

(data size)

(accuracy)

Page 40: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off

more data & better algorandom forest on 1% of data beats linear on all data

(data size)

(accuracy)

Page 41: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 42: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 43: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 44: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 45: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 46: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 47: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 48: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 49: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 50: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Summary / Tips for analyzing “big” data:

- Get lots of RAM (physical/ cloud)

- Use R/Python and high performance packages (e.g. data.table, xgboost)

- Do data reduction in database (analytical db/ big data system)

- (Only) distribute embarrassingly parallel tasks (e.g. hyperparameter search for machine learning)

- Let engineers (store and) ETL the data (“scalable”)

- Use statistics/ domain knowledge/ thinking

- Use “big data tools” only if the above tips not enough

Page 51: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Example #2

Page 52: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 53: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

I usually use other people’s code [...] I can find open source code for what I want to do, and my time is much better spent doing research and feature engineering -- Owen Zhanghttp://blog.kaggle.com/2015/06/22/profiling-top-kagglers-owen-zhang-currently-1-in-the-world/

Page 54: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

binary classification, 10M recordsnumeric & categorical features, non-sparse

Page 55: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

Page 56: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

Page 57: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 58: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 59: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

- R packages- Python scikit-learn- Vowpal Wabbit- H2O- xgboost- Spark MLlib- a few others

Page 60: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

- R packages 30%- Python scikit-learn 40%- Vowpal Wabbit 8%- H2O 10%- xgboost 8%- Spark MLlib 6%- a few others

Page 61: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

- R packages 30%- Python scikit-learn 40%- Vowpal Wabbit 8%- H2O 10%- xgboost 8%- Spark MLlib 6%- a few others

Page 62: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 63: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

EC2

Page 64: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

n = 10K, 100K, 1M, 10M, 100M

Training timeRAM usageAUCCPU % by coreread data, pre-process, score test data

Page 65: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

n = 10K, 100K, 1M, 10M, 100M

Training timeRAM usageAUCCPU % by coreread data, pre-process, score test data

Page 66: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 67: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 68: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 69: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 70: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 71: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 72: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 73: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

10x

Page 74: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 75: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 76: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 77: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

http://datascience.la/benchmarking-random-forest-implementations/#comment-53599

Page 78: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 79: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 80: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 81: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 82: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 83: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 84: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Best linear: 71.1

Page 85: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 86: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 87: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

learn_rate = 0.1, max_depth = 6, n_trees = 300learn_rate = 0.01, max_depth = 16, n_trees = 1000

Page 88: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 89: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 90: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

...

Page 91: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 92: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 93: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 94: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 95: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 96: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 97: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Summary

Page 98: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 99: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 100: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 101: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 102: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Page 103: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Package ‘xgboost’ - deepsense.ai · Package ‘xgboost’ May 16, 2018 Type Package Title Extreme Gradient Boosting Version 0.71.1 Date 2018-05-11 Description Extreme Gradient

XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

The Tidyverse - Genome Analysis Wiki · Data tables The "data.table" package exist to make data frame like structures that are faster and more efficient to work with The "data.table"

R xgboost-Tutorial

Multi GPU training Part 3: Engineering Challenges of Multi ...on-demand.gputechconf.com/gtc-cn/2018/pdf/CH8501.pdf · XGBOOST XGBoost is an implementation of gradient boosted decision

January 2016 Meetup: Speeding up (big) data manipulation with data.table package

ItLnc-BXE: a Bagging-XGBoost-ensemble method with multiple

Malware classification using XGboost-Gradient Boosted Decision … · 2020. 10. 15. · 536 Malware classification using XGboost-Gradient Boosted Decision Tree Rajesh Kumar *, Geetha

xgboost - Read the Docs · xgboost, Release 0.81 XGBoost is an optimized distributed gradient boosting library designed to be highly efﬁcient, ﬂexible and portable. It implements

Package 'data.table

XGBoost in handling missing values for life …...Vol.:(0123456789) SN Applied Sciences (2020) 2:1336 | ResearchArticle XGBoost in handling missing values for life insurance

Package ‘xgboost’ (PDF) - R - The Comprehensive R ... · PDF filePackage ‘xgboost ’ January 23, 2018 ... Date 2018-01-22 Author Tianqi Chen , ... their own objectives easily

DF1 - R - Natekin - Improving Daily Analysis with data.table

XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

Feature Importance Analysis with XGBoost in Tax audit

Introduction to XGboost

SS-XGBoost: a machine learning framework for predicting ...gwang.people.ust.hk/Publications/Wang.et.a.(2020... · 1 1 SS-XGBoost: a machine learning framework for predicting 2 Newmark

A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

The Accuracy of XGBoost for Insurance Claim Predictionhome.ijasca.com/data/documents/11_IJASCA_The-accuracy-of... · 2018-07-27 · The Accuracy of XGBoost for Insurance Claim Prediction

XGBoost: A Scalable Tree Boosting System - GitHub Pages · 2018-04-17 · Properties of XGBoost Single most important factor in its success: scalability Due to several important systems

Forecasting Canadian GDP growth using XGBoost · as Gradient Boosting Machines (GBMs), XGBoost, Generalized Linear Models (GLMs), Distributed Random Forest (DRFs, which consist of

A Host-Based Anomaly Detection Framework Using XGBoost …

XGBoost (System Overview)

LinXGBoost: Extension of XGBoost to Generalized Local Linear … · LinXGBoost: Extension of XGBoost to Generalized Local Linear Models Laurent de Vito [email protected] Abstract

GNSS-R Soil Moisture Retrieval Based on a XGboost Machine ...202.127.29.4/geodesy/publications/JiaJin_2019RS.pdf · (XGBoost) was proposed in 2015 [42]. In recent years, XGBoost has

XGBoost: the algorithm that wins every competition

Overview of tree algorithms from decision tree to xgboost

Présentation de data.table

News from data.table 1.6, 1.7 and 1.8

data.table and H2O at LondonR with Matt Dowle

A Novel PCA-Firefly based XGBoost classification model for ... · The XGBoost algorithm is implemented on the reduced dataset for classiﬁcation. A comprehensive evaluation of the

xgboost - Universitetet i oslofolk.uio.no/geirs/STK9200/John_Xgboost.pdf · Xgboost: sparse data Missing data, frequent zeroes, and artifacts of feature engineering can all cause

Tree Boosting With XGBoost · Tree Boosting With XGBoost Why Does XGBoost Win "Every" Machine Learning Competition? Didrik Nielsen Master of Science in Physics and Mathematics

Tree Boosting With XGBoost - NTNU...hvorfor “tree boosting”, og særlig XGBoost, ser ut til˚avære en s˚apass eﬀektiv og allsidig tilnærming til prediktiv modellering. Hovedargumentet

Package ‘xgboost’cran.uni-muenster.de/web/packages/xgboost/xgboost.pdfPackage ‘xgboost’ March 25, 2020 Type Package Title Extreme Gradient Boosting Version 1.0.0.2 Date 2020-03-25