artificial neural networks: deep or broad? an empirical studynzaidi/presentations/mltalk.pdfai 2016:...

29
Artificial Neural Networks: Deep or Broad? An Empirical Study Nian Liu and Nayyar A. Zaidi AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 1

Upload: others

Post on 07-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Artificial Neural Networks: Deep orBroad? An Empirical Study

Nian Liu and Nayyar A. Zaidi

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 1

Page 2: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Introduction

I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems

I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep

learning

I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models

I Is feature engineering and low-bias models two newphenomenon?

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 2

Page 3: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Introduction

I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems

I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep

learning

I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models

I Is feature engineering and low-bias models two newphenomenon?

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 3

Page 4: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Introduction

I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems

I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep

learning

I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models

I Is feature engineering and low-bias models two newphenomenon?

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 4

Page 5: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Introduction

I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems

I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep

learning

I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models

I Is feature engineering and low-bias models two newphenomenon?

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 5

Page 6: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

The Need for Low-Bias

I Much of machine learninghas been conducted in thecontext of small datasets

I Variance dominates most ofthe error

I Low-bias models will lead toover-fitting

I Lots of emphasis onRegularization

I Big datasets requireslow-bias models

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 6

Page 7: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Higher-order Logistic Regression

I Generalized Linear Models

I Artificial Neural NetworksI Deep Learning

I Random ForestsI Other ensemble-based and tree models

I Support Vector MachinesI Kernel Engineering ≡ Feature Engineering

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 7

Page 8: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Higher-order Logistic Regression

I Generalized Linear Models

I Artificial Neural NetworksI Deep Learning

I Random Forests

I Support Vector Machines

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 8

Page 9: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parametersI Why not Deep?

I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 9

Page 10: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parametersI Why not Deep?

I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 10

Page 11: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parametersI Why not Deep?

I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 11

Page 12: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural Networks

I Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parametersI Why not Deep?

I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 12

Page 13: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parametersI Why not Deep?

I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 13

Page 14: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parameters

I Why not Deep?I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 14

Page 15: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parametersI Why not Deep?

I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 15

Page 16: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.

and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in

Machine Learning (2017)

I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian

Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)

I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :

Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)

I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal

function-approximators

I Why Deep? – Constant-depth circuits are less powerful than deep

circuits and Less no. of parametersI Why not Deep?

I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 16

Page 17: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Low-Bias Models

I Bayesian Networks

PBNk (y |x) =P(y)

∏ni=1 P(xi |pa(xi ), y)∑C

c=1 P(c)∏n

i=1 P(xi |pa(xi ), c).

I Higher-order Logistic Regression

PLRn (y |x) =exp(βy +

∑α∈(An ) βy ,α,xα

)∑c∈ΩY

exp(βc +

∑α∗∈(An ) βc,α∗,xα∗

) .I Artificial Neural Networks

PANNb,d (y |x) =f1[∑nH

j=1 βk,0 + wk,j f0(βj,0 + βT

j x)]

Z.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 17

Page 18: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Observations and Motivations

Observations

I We know that:I higher k will lead to low-bias BNk

I higher n will lead to low-bias LRn

I We do not know:I higher b or d will lead to low-bias ANNb,d

I should b be preferred over d or vice-versaI what is the effect on the convergence?

Motivations

I A comparative analysis of low-bias models warrantsfurther investigation

I Efficient, low-bias and dynamic models are the key tosolving big data enigma

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 18

Page 19: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Experimental Design: Broad vs. Deep ANN

I 73 datasets from UCI repository

I 2-fold cross-validation

I 0-1 Loss, RMSE, Bias, Variance and Convergence performance

I Bias and Variance definition of Kohavi and Wolpart

I Win-Draw-Loss results are reported

I Separate analysis on Big Datasets

I 12 datasets with more than 10000 instances

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 19

Page 20: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Experimental Design: Broad vs. Deep ANN

I Deep Models denoted as: NN2, NN22, NN222, NN2222,NN2222, representing 1, 2, 3, 4, and 5 hidden layers each withtwo nodes each

I Broad Models denoted as: NN2, NN4, NN6, NN8, NN10,representing 1 hidden layer with 2, 4, 6, 8 and 10 nodes

I For sake of comparison, we also include NN0, this zero-hiddenlayer ANN and is equivalent to linear Logistic Regression

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 20

Page 21: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Broad ANN – Bias, Variance Comparison

vs. NN0 vs. NN2 vs. NN4 vs. NN6 vs. NN8

W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p

All Datasets - Bias

NN2 35/3/34 1

NN4 45/4/23 0.010 49/7/16 <0.001

NN6 47/4/21 0.002 47/5/20 0.001 37/7/28 0.321

NN8 48/3/21 0.002 44/5/23 0.014 37/7/28 0.321 36/11/25 0.200

NN10 52/3/17 <0.001 47/5/20 0.001 41/9/22 0.023 43/10/19 0.003 40/15/17 0.003

All Datasets - Variance

NN2 20/2/50 <0.001

NN4 21/2/49 0.001 38/6/28 0.268

NN6 27/3/42 0.091 43/7/22 0.013 40/8/24 0.060

NN8 32/2/38 0.550 42/7/23 0.025 44/8/20 0.004 36/9/27 0.314

NN10 30/3/39 0.336 42/7/23 0.025 43/9/20 0.005 34/13/25 0.298 33/10/29 0.704

Table: A comparison of Bias and Variance of broad models in terms of W-D-L on Alldatasets. p is two-tail binomial sign test. Results are significant if p ≤ 0.05.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 21

Page 22: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Broad ANN – Error Comparison

vs. NN0 vs. NN2 vs. NN4 vs. NN6 vs. NN8

W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p

All Datasets – 0-1 Loss

NN2 27/2/43 0.072

NN4 31/6/35 0.712 50/9/13 <0.001

NN6 33/3/36 0.801 49/3/20 <0.001 45/7/20 0.003

NN8 37/1/34 0.813 50/5/17 <0.001 44/8/20 0.004 31/14/27 0.694

NN10 40/2/30 0.282 51/4/17 <0.001 49/5/18 <0.001 38/9/25 0.130 40/8/24 0.060

Big Datasets – 0-1 Loss

NN2 6/0/6 1.226

NN4 7/0/5 0.774 12/0/0 0.011

NN6 7/0/5 0.774 12/0/0 0.001 11/0/1 0.006

NN8 8/0/4 0.388 12/0/0 <0.001 9/0/3 0.146 8/0/4 0.388

NN10 8/0/4 0.388 12/0/0 <0.001 10/0/2 0.039 9/0/3 0.146 9/0/3 0.146

Table: A comparison of 0-1 Loss and RMSE of broad models in terms of W-D-L onAll and Big datasets. p is two-tail binomial sign test. Results are significant ifp ≤ 0.05.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 22

Page 23: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Broad ANN – Geometric Averages

All Big0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

20-1 Loss

NN0NN2NN4NN6NN8NN10

All Big0

0.5

1

1.5RMSE

NN0NN2NN4NN6NN8NN10

All Big0

0.5

1

1.5Bias

NN0NN2NN4NN6NN8NN10

All Big0

0.5

1

1.5

2

2.5

3Variance

NN0NN2NN4NN6NN8NN10

Figure: Comparison (geometric average) of 0-1 Loss, RMSE, Bias and Variance forbroad models on All and Big datasets. Results are normalized w.r.t NN0.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 23

Page 24: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Deep ANN – Bias, Variance Comparison

vs. NN0 vs. NN2 vs. NN22 vs. NN222 vs. NN2222

W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p

All Datasets – Bias

NN2 35/3/34 1

NN22 30/3/39 0.336 28/4/40 0.182

NN222 26/1/45 0.032 21/3/48 0.002 24/4/44 0.021

NN2222 5/0/67 <0.001 3/1/68 <0.001 3/2/67 <0.001 4/9/59 <0.001

NN22222 0/1/71 <0.001 0/1/71 <0.001 1/2/69 <0.001 1/9/62 <0.001 0/61/11 <0.001

All Datasets – Variance

NN2 20/2/50 <0.001

NN22 20/1/51 <0.001 27/6/39 0.175

NN222 24/1/47 0.009 34/3/35 1 32/4/36 0.905

NN2222 34/1/37 0.813 34/1/37 0.813 36/2/34 0.905 32/9/31 1

NN22222 40/2/30 0.282 38/1/33 0.6353 39/2/31 0.403 35/9/28 0.450 8/61/3 0.227

Table: Bias W-D-L on All and Big datasets. p is two-tail binomial sign test. Resultsare significant if p ≤ 0.05.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 24

Page 25: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Deep ANN – Error Comparison

vs. NN0 vs. NN2 vs. NN22 vs. NN222 vs. NN2222

W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p

All Datasets – 0-1 Loss

NN2 27/2/43 0.072

NN22 28/1/43 0.096 24/5/43 0.027

NN222 24/1/47 0.009 25/5/42 0.050 28/3/41 0.148

NN2222 7/0/65 <0.001 4/2/66 <0.001 4/2/66 <0.001 3/9/60 <0.001

NN22222 7/1/64 <0.001 5/1/66 <0.001 4/2/66 <0.001 3/9/60 <0.001 1/61/10 0.012

Big Datasets – 0-1 Loss

NN2 6/0/6 1.226

NN22 5/0/7 0.774 4/0/8 0.388

NN222 4/0/8 0.388 2/0/10 0.039 4/0/8 0.388

NN2222 2/0/10 0.039 0/0/12 <0.001 1/0/11 0.006 1/1/10 0.012

NN22222 1/1/10 0.012 0/0/12 <0.001 0/0/12 <0.001 0/1/11 <0.001 0/6/6 0.031

Table: 0-1 Loss W-D-L on All and Big datasets. p is two-tail binomial sign test.Results are significant if p ≤ 0.05.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 25

Page 26: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Deep ANN – Geometric Averages

All Big0

0.5

1

1.5

2

2.5

3

3.5

40-1 Loss

NN0NN2NN22NN222NN2222NN22222

All Big0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2RMSE

NN0NN2NN22NN222NN2222NN22222

All Big0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Bias

NN0NN2NN22NN222NN2222NN22222

All Big0

0.5

1

1.5

2

2.5

3Variance

NN0NN2NN22NN222NN2222NN22222

Figure: Comparison (geometric average) of 0-1 Loss, RMSE, Bias and Variance fordeep models on Little and Big datasets. Results are normalized w.r.t NN0.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 26

Page 27: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Convergence Analysis (Broad)

100

101

102

103

No. of Iterations

0.09

0.095

0.1

0.105

0.11

0.115

0.12

0.125

0.13

0.135

Mean

S

qu

are E

rro

r

Connect-4

NN2NN4NN6NN8NN10

100

101

102

103

No. of Iterations

0.048

0.05

0.052

0.054

0.056

0.058

0.06

0.062

0.064

0.066

Mean

S

qu

are E

rro

r

Localization

NN2NN4NN6NN8NN10

100

101

102

103

No. of Iterations

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Mean

S

qu

are E

rro

r

Nursery

NN2NN4NN6NN8NN10

100

101

102

103

No. of Iterations

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Mean

S

qu

are E

rro

r

Letter-recog

NN2NN4NN6NN8NN10

100

101

102

103

No. of Iterations

0.095

0.1

0.105

0.11

0.115

0.12

0.125

0.13

0.135

0.14

Mean

S

qu

are E

rro

r

Magic

NN2NN4NN6NN8NN10

100

101

102

103

No. of Iterations

0.09

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

Mean

S

qu

are E

rro

r

Sign

NN2NN4NN6NN8NN10

Figure: Variation in Mean Square Error of NN2, NN4, NN6, NN8 and NN10 withincreasing number of (optimization) iterations on sample datasets.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 27

Page 28: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Convergence Analysis (Deep)

100

101

102

103

No. of Iterations

0.11

0.12

0.13

0.14

0.15

0.16

0.17

Mean

S

qu

are E

rro

r

Connect-4

NN2NN22NN222NN2222NN22222

100

101

102

103

No. of Iterations

0.06

0.062

0.064

0.066

0.068

0.07

0.072

0.074

Mean

S

qu

are E

rro

r

Localization

NN2NN22NN222NN2222NN22222

100

101

102

103

No. of Iterations

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Mean

S

qu

are E

rro

r

Nursery

NN2NN22NN222NN2222NN22222

100

101

102

103

No. of Iterations

0.034

0.0345

0.035

0.0355

0.036

0.0365

0.037

0.0375

Mean

S

qu

are E

rro

r

Letter-recog

NN2NN22NN222NN2222NN22222

100

101

102

103

No. of Iterations

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

Mean

S

qu

are E

rro

r

Magic

NN2NN22NN222NN2222NN22222

100

101

102

103

No. of Iterations

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.21

0.22

0.23

Mean

S

qu

are E

rro

r

Sign

NN2NN22NN222NN2222NN22222

Figure: Variation in Mean Square Error of NN2, NN22, NN222, NN2222 andNN22222 with increasing number of (optimization) iterations on sample datasets.

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 28

Page 29: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar

Conclusion

I Results warrants further investigation

I Deep versus Broad

I Deep versus Shallow

I Q & A

I For Further Discussions

I @nayyar zaidi

I [email protected]

I nayyar zaidi

I http://users.monash.edu.au/~nzaidi

AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 29