xgboost : a scalable tree boosting system...xgboost : a scalable tree boosting system (t. chen, c....

19
XGBOOST: A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR ETH ZÜRICH

Upload: others

Post on 22-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016)

NATALLIE BAIKEVICH

HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

ETH ZÜRICH

Page 2: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

MOTIVATION

ü  Effective statistical models

ü  Scalable system ü  Successful

real-world applications

XGBoost eXtreme Gradient Boosting

Page 3: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

BIAS-VARIANCE TRADEOFF

Random Forest Variance ↓

Boosting Bias ↓

Voting

+ +

Page 4: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

A BIT OF HISTORY AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001

Page 5: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package

A BIT OF HISTORY

Page 6: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package

1st Kaggle success: Higgs Boson Challenge 17/29 winning solutions in 2015

A BIT OF HISTORY

Page 7: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

WHY DOES XGBOOST WIN "EVERY" MACHINE LEARNING COMPETITION? - (MASTER THESIS, D. NIELSEN, 2016)

Source: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions

Page 8: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

TREE ENSEMBLE

Page 9: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

REGULARIZED LEARNING OBJECTIVE

L = l(yi, yi )i∑ + Ω( fk )

k∑

Ω( f ) = γT + 12λ w 2

Source: http://xgboost.readthedocs.io/en/latest/model.html

yi = fk (xi )k=1

K

loss regularization

# of leaves

Page 10: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

SCORE CALCULATION

1st order gradient 2nd order gradient

Statistics for each leaf

Score

Page 11: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

ALGORITHM FEATURES

ü  Regularized objective ü  Shrinkage and column subsampling ü  Split finding: exact & approximate,

global & local ü  Weighted quantile sketch ü  Sparsity-awareness

Page 12: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

SYSTEM DESIGN: BLOCK STRUCTURE

O(Kd x0logn) O(Kd x

0+ x

0logB)

Blocks can be ü Distributed across machines ü  Stored on disk in out-of-core setting

Sorted structure –> linear scan

# trees

Max depth

# non-missing entries

Page 13: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

SYSTEM DESIGN: CACHE-AWARE ACCESS

Improved split finding

ü Allocate internal buffer ü Prefetch gradient statistics

Non-continuous memory access

Datasets: Larger vs Smaller

Page 14: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

SYSTEM DESIGN: BLOCK STRUCTURE

Compression by columns (CSC): Decompression vs Disk Reading

Block sharding: Use multiple disks

Too large blocks, cache misses

Too small, inefficient parallelization

Prefetch in independent thread

Page 15: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

EVALUATION

AWS c3.8xlarge machine: 32 virtual cores, 2x320GB SSD, 60 GB RAM

32 m3.2xlarge machines, each: 8 virtual cores, 2x80GB SSD, 30GB RAM

Page 16: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

DATASETS Dataset n m Task Allstate 10M 4227 Insurance claim classification Higgs Boson 10M 28 Event classification Yahoo LTRC 473K 700 Learning to rank Criteo 1.7B 67 Click through rate prediction

Page 17: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

WHAT’S NEXT?

Model Extensions DART (+ Dropouts) LinXGBoost

Parallel Processing GPU FPGA

Tuning Hyperparameter optimization

More Applications

XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression

Page 18: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

+ Nicely structured paper, easily comprehensible + Real framework, widely used for many ML problems + Combination of improvements both on model and implementation sides to achieve scalability + Reference point for further research in tree boosting, - The concepts are not that novel themselves - Does not explain why some of the models are not compared in all experiments - Is the compression efficient for dense datasets? - What if there’s a lot of columns rather than rows (e.g. medical data)?

QUICK OVERVIEW

Page 19: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM...XGBOOST : A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

THANK YOU!