daum plenary panel final version (13 july 2017) 2017 plenary panel - daum.pdf · (17) geoff hinton,...

Fred Daum

13 July 2017

data fusion: history,

open problems & future progress

Copyright © 2017 Raytheon Company. All rights reserved.

Customer Success Is Our Mission is a trademark of Raytheon Company.

1Unrestricted Content

FUSION 2017

Unrestricted Content2

Daniel Svensson, Martin Ulmke &Lars Danielsson, “Multi-Target

Tracking with Partially Unresolved Measurements,” 2011.

Fred Daum & Bob Fitzgerald.

“importance of resolution in multiple-target tracking,”

Proceedings of SPIE 1994.

Wolfgang Koch & Günter van Keuk, “multiple hypothesis track

maintenance with possibly unresolved Measurements,” IEEE

Transactions on Aerospace and Electronic Systems, 1997.

Henk Blom & Edwin Bloem,

“Bayesian tracking of two possibly unresolved targets,” IEEE

Transactions AES 2007.

Darko Musicki, Taek Lyul Song, HaeHo Lee, “multiscan multitarget

tracking with finite resolution Sensors,” 2012.

without resolution model in algorithm

with resolution model in algorithmperfect resolution assumed in simulation

FUSION 2017

GNPL vs. JVC with BiasGNPL vs. JVC with BiasGNPL vs. JVC with BiasGNPL vs. JVC with Bias

7 remote tracks and 29 local tracks (2 of the remote tracks have no local track) with residual radar bias

2 4 6 8 10 12

2

4

6

8

10

2 4 6 8 10 12

Perfect

fusion

All fusion is

incorrect

GNPLJVC

Unrestricted Content

FUSION 2017

sosososo----called “bias” called “bias” called “bias” called “bias” can ruin can ruin can ruin can ruin multimultimultimulti----sensor sensor sensor sensor fusionfusionfusionfusion300 total targets: 30 missiles, 10 targets per missile

Position error σ = 100m, Separation of targets in missile complex = 500m, 1500m

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000

Magnitude of bias (m)

% C

orr

ec

t a

ss

ign

me

nts

JVC

GNPL


Mark Levedahl, “explicit

pattern matching

assignment algorithm,”

proceedings of SPIE

conference on signal

processing, Orlando, 2002.FUSION 2017

1960 20201980 2000

1210

610

910

310

1

computer speed

& memory

per unit cost

time

still no useful theory to

explain performance or

guide design for deep

learning, MCMC,

particle filters, EKF,

etc.!

theoretical

bound:

σ² ≤ c/N


FUSION 2017

1960 20201980 2000

1210

610

910

310

1

computer speed

& memory

per unit cost

time

many opportunities to

solve open problems using

faster computers & new

theory: improved

algorithms for unresolved

data & residual sensor

“bias” & nonlinear filters &

tight useful theoretical

bounds on accuracy

FUTURE


FUSION 2017

application of Gromov’s theorem:

101

102

103

104

105

106

Number of Particles

10-1

100

101

102

103

Qopt = 1

Dimensions 5

Dimensions 10

Dimensions 20

Dimensions 40

Dimensions 100

Daum, Huang & Noushin,

“new theory & numerical

experiments for Gromov’s method,”

researchgate (free on-line)

May 2017


1

2

2

2

21

2

2

1

2

2

2

2

logloglog

loglog

/2

1log

)(loglog

−−

−

∂

∂

∂

∂

∂

∂−=

∂

∂

∂

∂−=

∂

∂

∂

∂+

∂

∂

∂

∂−

∂

∂−

∂

∂−=

∂

∂

x

p

x

h

x

pQ

x

h

x

pf

px

pQdiv

xx

f

x

p

x

fdiv

x

pf

x

h

T

T

FUSION 2017

DEEP BACKUP


FUSION 2017


FUSION 2017

1960 20201980 2000

1210

610

910

310

1

computer speed

& memory

per unit cost

time

Almost all progress has been

driven by faster low cost

computers rather than new

theory or brilliant deep ideas.


FUSION 2017

1960 20201980 2000

1210

610

910

310

1

computer speed

& memory

per unit cost

time

standard MHT explicitly

assumes that one

measurement corresponds

with at most one target,

contrary to the real World for

any sensor!!!

Wolfgang Koch & Günter van Keuk,

“multiple hypothesis track

maintenance with possibly

unresolved measurements,” IEEE

Trans AES, 1997


“Bayesian tracking of two possibly

unresolved targets,” IEEE Trans

AES, 2007


FUSION 2017

no useful theory for deep learning, MCMC or particle filtersno useful theory for deep learning, MCMC or particle filtersno useful theory for deep learning, MCMC or particle filtersno useful theory for deep learning, MCMC or particle filters


“The convergence properties of the particle filter are well

understood on a theoretical level (see Crisan & Doucet

IEEE Transactions Signal Processing, 2002)….In practice

the performance degrades quickly with the state

dimension due to the curse of dimensionality.” 2010

“Despite their great success, there is still no comprehensive understanding of the optimization process or the internal organization of deep neural networks, and they are often criticized for being used as mysterious black boxes.”

Ravid Schwartz-Ziv (2017)

“Essentially none of these applications [of MCMC] is accompanied by any kind of practically useful running time analysis.” Persi Diaconis (2009)

FUSION 2017

1960 20201980 2000

1210

610

910

310

1

computer speed

& memory

per unit cost

time

multi-sensor data

fusion is often worse

than no fusion in the

real world because of

residual so-called

“bias” errors


Mark Levedahl, “explicit

pattern matching

assignment algorithm,”

proceedings of SPIE

conference on signal

processing, Orlando, 2002.

FUSION 2017

item deep learning particle flow

purpose learning & decisions learning & estimation & decisions

interesting wrinkle (which

annoys many people)

lack of uniqueness of solution for highly

non-convex loss functions

lack of uniqueness for solution of highly

underdertermined transport PDE

architecture many layers many steps in log-homotopy

fundamental issues curse of dimensionality &

ill-conditioning & singularity of Hessian

curse of dimensionality &

ill-conditioning & singularity of Hessian

tools stochastic gradient or natural gradient stochastic natural gradient

representation of geometry Hessian of loss function (log p) Hessian of log p

useful theory to explain

performance

none none

performance evaluation numerical experiments numerical experiments

theory of design ersatz Bayesian echt Bayesian

computers of choice today GPUs GPUs

regularization random dropout & sparsity of coupling

between layers and within layers

Tychonov regularization or shrinkage or

preferred coordinate system

key adaptive method adaptive learning rate adaptive step size in λ

dynamics of learning backpropagation (i.e., chain rule) Fokker-Planck equation (i.e., chain rule)Unrestricted Content14

FUSION 2017

6/26/2017 15Unrestricted Content

failure of deep learning in high dimensions*

*Shai Shalev-Shwartz, et al., “failures of gradient-based deep learning,” April 2017.FUSION 2017

(1) Beskos, Crisan, Jasra & Whitely, “error bounds and normalizing constants for sequential Monte Carlo in high dimensions,” Dec 2011.

(2) Beskos, Crisan & Jasra, “on the stability of sequential Monte Carlo methods in high dimensions,” April 2012.

(3) Crisan & Doucet, “a survey of convergence results on particle filtering methods for practitioners,” IEEE Transactions Signal Processing, March 2002.

(4) Arnak Dalalyan, “theoretical guarantees for approximate sampling from smooth and log-concave densities,” arXiv:1412.7392v4, September 2015.

(5) Krzysztof Łatuszynski, Błaej Miasojedow and Wojciech Niemiro, “nonasymptotic bounds on the estimation error of MCMC algorithms,” 2011.

(6) Erich Novak and Daniel Rudolph, “computation of expectations by Markov chain Monte Carlo methods,” September 2014.

(7) Paul Bui Quang, Christian Musso and Francois Le Gland, “An Insight into the Issue of Dimensionality in Particle Filtering,” Proceedings of 13th international conference on information fusion, Edinburgh Scotland, July 2010.

(8) Thomas Bengtsson, Peter Bickel & Bo Li, “curse of dimensionality revisited: collapse of the particle filter in very large scale systems,” IMS 2008.

(9) Erich Novak, “some results on the complexity of numerical integration,” pages 161 to 183 in “Monte Carlo and quasi-Monte Carlo methods,” edited by Ron Cools and Dirk Nuyens, Springer-Verlag, 2016.

(10) Snyder, Bengtsson & Morzfeld, “performance bounds for particle filters using the optimal proposal,” Monthly Weather Review, November 2015.

(11) Simone Surace, Anna Kutschireiter & Jean-Pascal Pfister, “how to avoid the curse of dimensionality: scalability of particle filters with and without importance weights,” March 2017.Unrestricted Content

16

FUSION 2017

(12) Mathieu Gerber and Nicolas Chopin, “Sequential quasi Monte Carlo,” Journal of Royal Statistical Society, series B, pages 509 to 579, with rejoinders and rebuttal, 2015.

(13) Michael Elad, “Deep, deep trouble: deep learning’s impact on image processing, mathematics and humanity,” SIAM NEWS, May 2017.

(14) Persi Diaconis, “the MCMC revolution,” AMS Bulletin, 2009.

(15) Moritz Hardt, Benjamin Recht & Yoram Singer, “training faster, generalize better: stability of stochastic gradient descent,” 2016.

(16) Yann LeCun, et al., “efficient backprop,” 1998.

(17) Geoff Hinton, et al., “improving neural networks by preventing co-adaptation of feature detectors,” 2012.

(18) Geoff Hinton, et al., “on the importance of initialization and momentum in deep learning,” 2013.

(19) Geoff Hinton, et al., “overview of mini-batch gradient descent,” 2013.

(20) Jürgen Schmidhuber, “deep learning in neural networks: an overview,” Oct. 2014.

(21) Yoshua Bengio, et al., “deep learning,” MIT Press, 2016.

(22) Rony Ronen, et al., “why and when deep learning works,” 2017

(23) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “deep learning,” Nature, 2015.


FUSION 2017

[24] Behnam Neyshabur, et al., “geometry of optimization and implicit regularization in deep Learning,” May 2017.

[25] Shai Shalev-Shwartz, et al., “failures of gradient-based deep learning,” April 2017.

[26] Nadav Cohen, et al., “analysis and design of convolutional networks via hierarchical tensor decompositions,” June 2017.

[27] Ravid Schwartz-Ziv, et al., “opening the black box of deep neural networks via information,” April 2017.

[28] Tom Zahavy, et al., “graying the black box: understanding deep Q-networks,” April 2017.

[29] Anna Choromanska, et al., “the loss surfaces of multilayer networks,” 2015.

[30] Chiyuan Zhang, et al., “theory of deep learning III: generalization properties of stochastic gradient descent (SGD),” MIT Center for Brains, Minds & Machines, April 2017.

[31] Kenji Kawaguchi, “deep learning without poor local minima,” NIPS 2016.

[32] Yoshua Bengio and Yann LeCun, “scaling learning algorithms towards AI,” in “large-scale kernel machines”, edited by Bottou, Chapelle, DeCoste, and Weston, MIT Press 2007.

[33] Hao Wang & Dit-Yan Yeung, “towards Bayesian deep learning: a survey, April 2016.

[34] Yann LeCun, et al., “singularity of the Hessian in deep learning,” 2017.


FUSION 2017

Oh’s Formula for Monte Carlo errors

assumptions:

(1) Gaussian density (zero mean & unit covariance matrix)

(2) d-dimensional random variable

(3) proposal density is also Gaussian with mean ε and covariance matrix kI, but it is not exact for k ≠ 1 or ε ≠ 0

(4) N = number of Monte Carlo trials

Nkk

kd

/21

exp21

1 22

+

+

+≈

εσ


FUSION 2017

curse of dimensionality for classic particle filter*curse of dimensionality for classic particle filter*curse of dimensionality for classic particle filter*curse of dimensionality for classic particle filter*

optimal

accuracy:

r = 1.0

20*Daum, IEEE AES Systems Magazine, August 2005.Unrestricted Content

FUSION 2017

21

Nima Moshtagh, Jonathan Chan, Moses Chan, “Homotopy Particle Filter for Ground-

Based Tracking of Satellites at GEO,” AMOS Conference, Hawaii September 2016.

boring old EKF standard particle

filter

particle flow filter


FUSION 2017

22

Nima Moshtagh, Jonathan Chan, Moses Chan, “Homotopy Particle Filter for Ground-

Based Tracking of Satellites at GEO,” AMOS Conference, Hawaii September 2016. Unrestricted Content

FUSION 2017

application of Gromov’s theoremapplication of Gromov’s theoremapplication of Gromov’s theoremapplication of Gromov’s theorem:

23

[ ] [ ]

1

2

2

2

21

2

2

1

2

2

2

2

2

21

2

2

2

2

1111111

1

2

2

2

2

logloglog

logloglogloglog

loglog

)(),(

/2

1log)(loglog

−−

−−

−−−−−−−

−

∂

∂

∂

∂

∂

∂−=

∂

∂+

∂

∂

∂

∂

∂

∂+

∂

∂−=

++=

∂

∂

∂

∂−=

+=

∂

∂

∂

∂+

∂

∂

∂

∂−

∂

∂−

∂

∂−=

∂

∂

x

p

x

h

x

pQ

x

h

x

g

x

h

x

h

x

gQ

HRHPHRHHRHPQ

x

h

x

pf

dwQdxfdx

px

pQdiv

xx

f

x

p

x

fdiv

x

pf

x

h

TTT

T

T

λλ

λλ

λλλ

last year at

Baden-Baden

Daum, Huang & Noushin, “new theory &

numerical experiments for Gromov’s

method,” researchgate (free on-line),

May 2017Unrestricted Content

FUSION 2017

application of Gromov’s theorem:

Daum, Huang & Noushin,

“new theory & numerical

experiments for Gromov’s method,”

researchgate (free on-line)

May 2017


FUSION 2017

“Demonstration of quantum advantage in machine learning”“Demonstration of quantum advantage in machine learning”“Demonstration of quantum advantage in machine learning”“Demonstration of quantum advantage in machine learning”Diego RistèDiego RistèDiego RistèDiego Ristè1111, , , , Marcus P. da SilvaMarcus P. da SilvaMarcus P. da SilvaMarcus P. da Silva,,,, ColmColmColmColm A. RyanA. RyanA. RyanA. Ryan1111, , , , Andrew W. CrossAndrew W. CrossAndrew W. CrossAndrew W. Cross2222, , , , Antonio D. CórcolesAntonio D. CórcolesAntonio D. CórcolesAntonio D. Córcoles2222, , , , John A. SmolinJohn A. SmolinJohn A. SmolinJohn A. Smolin2222, , , , Jay M. GambettaJay M. GambettaJay M. GambettaJay M. Gambetta2222, , , , Jerry M. ChowJerry M. ChowJerry M. ChowJerry M. Chow2222 & & & & Blake R. JohnsonBlake R. JohnsonBlake R. JohnsonBlake R. Johnson, Journal of Quantum Information, April 2017., Journal of Quantum Information, April 2017., Journal of Quantum Information, April 2017., Journal of Quantum Information, April 2017.


FUSION 2017

comparison of multicomparison of multicomparison of multicomparison of multi----sensor data fusion algorithmssensor data fusion algorithmssensor data fusion algorithmssensor data fusion algorithms

Algorithm Bias estimation? Performance

1. GNPL + adaptive gating yes (jointly with

association)

2. Iterative bias estimation & JVC yes

3. Histogram of bias over association hypotheses

yes

4. Covariance inflation & JVC no

5. Association of objects (JVC) no


FUSION 2017


Daniel Svensson, Martin Ulmke &Lars Danielsson, “Multi-Target

Tracking with Partially Unresolved Measurements,” 2011.

Fred Daum & Bob Fitzgerald.

“importance of resolution in multiple-target tracking,”

Proceedings of SPIE 1994.

Wolfgang Koch & Günter van Keuk, “multiple hypothesis track

maintenance with possibly unresolved Measurements,” IEEE

Transactions on Aerospace and Electronic Systems, 1997.


“Bayesian tracking of two possibly unresolved targets,” IEEE

Transactions AES 2007.

Darko Musicki, Taek Lyul Song, HaeHo Lee, “multiscan multitarget

tracking with finite resolution Sensors,” 2012.FUSION 2017

daum plenary panel final version (13 july 2017) 2017 plenary panel - daum.pdf · (17) geoff hinton,...

Documents