deep neural decision forestsmfiterau/papers/2015/iccv2015...decision forest prediction trees he...

1
Peter Kontschieder 1 Samuel Rota Bulo 1,3 Antonio Criminisi 1 Madalina Fiterau 2 R a w dat a Probabilistic decision function Parametrized feature map Class distribution in each leaf Routing function: probability of reaching a leaf Decision tree prediction Decision forest prediction Trees in the forest M o t i v a ti o n Deep Neural Decision Forests , M o d e l A deep network generates the node feature maps Deep network parameters Pr efi xed fe a tur e m a p + Feature representation Decision forest Empirical risk minimization Log-loss Training set Decision nodes update Leaf nodes update Stochastic Gradient Descent Mini-batch Back-propagation through deep network Forward pass over tree Backward pass over tree compute node gradient computed with deep network forward pass T R A I N I N G Convex optimization problem normalization factor Step-size-free update rule Converges to global optimum ... ... ... ... ... JOINTLY trained alt e rn at i ng opt i m i z a t i o n Interleave leaf nodes updates and 1 epoch of decision nodes updates alternatively per mini-batch leaf nodes updates Independent updates per tree Random selection of tree to update per mini-batch T rain d e cisio n for est Standard decision forests cannot build new feature representations but rely on predefined ones We allow decision forests to build ad-hoc feature representations through a deep network, which is jointly trained with the forest Proposed solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 x 10 9 Split responses (1000 epochs) Split node output Histogram counts 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 x 10 8 Split responses (100 epochs) Split node output Histogram counts 0 500 550 600 650 700 750 800 850 900 950 1000 # training epochs 12 11.5 11 10.5 10 9.5 9 8.5 8 7.5 7 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Top5-error [%] ImageNet Top5-error 100 200 300 400 500 600 700 800 900 1000 # training epochs 9 8 7 6 5 4 3 Average Leaf Entropy [bits] Average Leaf Entropy during Training MACHINE LEARNING DATASETS Leaf entropy 0 200 400 600 800 1000 # training epochs 100 80 60 40 20 0 Top5-error [%] ImageNet Top5-error dNDF 0 on validation dNDF 1 on validation dNDF 2 on validation dNDF.NET on validation dNDF.NET on training deep Neural Decision Forest (dNDF) E xp e ri men T S IMAGENET MNIST LeNet 5 ... ... ... x10 5 Convergence to deterministic routing shallow Neural Decision Forest (sNDF) ... ... ... x10 15 ... ... ... ... ... ... dNDF 0.7% LeNet-5 0.9% dNDF 0 dNDF 1 dNDF 2 dNDF.NET = dNDF 0 + dNDF 1 + dNDF 2 GoogLeNet dNDF.NET 1 1 1 7 7 7 1 1 7 1 10 144 1 10 144 1 10 1 10.07% 9.15% 7.89% 8.09% 7.62% 6.67% 7.84% 7.08% 6.38% # models # crops Top5-err. 1 3 2 GoogLeNet

Upload: others

Post on 14-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Neural Decision Forestsmfiterau/papers/2015/ICCV2015...Decision forest prediction Trees he forest M o t i v a t i o n Deep Neural Decision Forests, M o d e l A deep network generates

Peter Kontschieder1

Samuel Rota Bulo1,3

Antonio Criminisi1

Madalina Fiterau2

Raw data

Probabilistic decision function

Parametrized feature map

Class distributionin each leaf

Routing function: probability of reaching a leaf

Decision tree prediction

Decision forest prediction

Trees in the forest

Motivation

Deep Neural Decision Forests,

Model

A deep network generatesthe node feature

maps Deep network

parameters

Prefixed feature map

+Feature

representation

Decisionforest

Empirical risk minimization

Log-lossTraining set

Decision nodes update

Leaf nodes update

Stochastic

Gradient

Descent

Mini-batch

Back-propagation

through deep network

Forward pass over tree

Backward pass over tree

compute

node gradient

computed with deep

network forward pass

TRAINING

Convex optimizationproblem

normalization factor

Step-size-free

update rule

Converges to global optimum

... ... ... ... ...

JOINTLYtrained

alternating optimiz

ati

on

Interleave

leaf nodes updates

and 1 epoch of

decision nodes updates

alternatively

per mini-batch

leaf nodes updatesIndependent

updates per tree

Random selection of tree to updateper mini-batch

Train decision forestStandard decision forests cannot build new feature representations

but rely on predefined ones

We allow decision forests to build ad-hoc

feature representations through

a deep network, which is jointly

trained with the forest

Proposed solution

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.5

1

1.5

2

2.5x 109 Split responses (1000 epochs)

Split node output

His

tog

ram

co

unts

00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

2

3

4

5

6

7x 108 Split responses (100 epochs)

Split node output

His

tog

ram

co

unts

0

500 550 600 650 700 750 800 850 900 950 1000

# training epochs

1211.5

1110.5

109.5

98.5

87.5

76.5

65.5

54.5

43.5

32.5

21.5

10.5

0

To

p5-e

rro

r [

%]

ImageNet Top5-error

100 200 300 400 500 600 700 800 900 1000

# training epochs

9

8

7

6

5

4

3

Averag

e L

eaf

Entr

op

y [

bit

s]

Average Leaf Entropy during Training

MACHINE LEARNINGDATASETS

Leaf entropy

0 200 400 600 800 1000# training epochs

100

80

60

40

20

0

To

p5-e

rro

r [

%]

ImageNet Top5-error

dNDF0 on validationdNDF1 on validationdNDF2 on validationdNDF.NET on validationdNDF.NET on training

deep Neural Decision Forest(dNDF)

ExperimenTS IMAGENETMNIST

LeNet 5

... ......x10

5

Convergence to deterministic routing

shallow

Neural Decision Forest

(sNDF)

... ......x10

15

... ......

... ......

dNDF 0.7%LeNet-5 0.9%

dNDF0

dNDF1

dNDF2

dNDF.NET = dNDF0 + dNDF1 + dNDF2

GoogLeNet

dNDF.NET

111777117

110

1441

10144

1101

10.07%9.15%7.89%8.09%7.62%6.67%7.84%7.08%6.38%

# models

# crops

Top5-err

.

1

32

GoogLeNet