daum plenary panel final version (13 july 2017) 2017 plenary panel - daum.pdf · (17) geoff hinton,...
TRANSCRIPT
Fred Daum
13 July 2017
data fusion: history,
open problems & future progress
Copyright © 2017 Raytheon Company. All rights reserved.
Customer Success Is Our Mission is a trademark of Raytheon Company.
1Unrestricted Content
FUSION 2017
Unrestricted Content2
Daniel Svensson, Martin Ulmke &Lars Danielsson, “Multi-Target
Tracking with Partially Unresolved Measurements,” 2011.
Fred Daum & Bob Fitzgerald.
“importance of resolution in multiple-target tracking,”
Proceedings of SPIE 1994.
Wolfgang Koch & Günter van Keuk, “multiple hypothesis track
maintenance with possibly unresolved Measurements,” IEEE
Transactions on Aerospace and Electronic Systems, 1997.
Henk Blom & Edwin Bloem,
“Bayesian tracking of two possibly unresolved targets,” IEEE
Transactions AES 2007.
Darko Musicki, Taek Lyul Song, HaeHo Lee, “multiscan multitarget
tracking with finite resolution Sensors,” 2012.
without resolution model in algorithm
with resolution model in algorithmperfect resolution assumed in simulation
FUSION 2017
GNPL vs. JVC with BiasGNPL vs. JVC with BiasGNPL vs. JVC with BiasGNPL vs. JVC with Bias
7 remote tracks and 29 local tracks (2 of the remote tracks have no local track) with residual radar bias
2 4 6 8 10 12
2
4
6
8
10
2 4 6 8 10 12
Perfect
fusion
All fusion is
incorrect
GNPLJVC
Unrestricted Content
FUSION 2017
sosososo----called “bias” called “bias” called “bias” called “bias” can ruin can ruin can ruin can ruin multimultimultimulti----sensor sensor sensor sensor fusionfusionfusionfusion300 total targets: 30 missiles, 10 targets per missile
Position error σ = 100m, Separation of targets in missile complex = 500m, 1500m
0
10
20
30
40
50
60
70
80
90
100
1 10 100 1000 10000
Magnitude of bias (m)
% C
orr
ec
t a
ss
ign
me
nts
JVC
GNPL
Unrestricted Content
Mark Levedahl, “explicit
pattern matching
assignment algorithm,”
proceedings of SPIE
conference on signal
processing, Orlando, 2002.FUSION 2017
1960 20201980 2000
1210
610
910
310
1
computer speed
& memory
per unit cost
time
still no useful theory to
explain performance or
guide design for deep
learning, MCMC,
particle filters, EKF,
etc.!
theoretical
bound:
σ² ≤ c/N
Unrestricted Content5
FUSION 2017
1960 20201980 2000
1210
610
910
310
1
computer speed
& memory
per unit cost
time
many opportunities to
solve open problems using
faster computers & new
theory: improved
algorithms for unresolved
data & residual sensor
“bias” & nonlinear filters &
tight useful theoretical
bounds on accuracy
FUTURE
Unrestricted Content6
FUSION 2017
application of Gromov’s theorem:
101
102
103
104
105
106
Number of Particles
10-1
100
101
102
103
Qopt = 1
Dimensions 5
Dimensions 10
Dimensions 20
Dimensions 40
Dimensions 100
Daum, Huang & Noushin,
“new theory & numerical
experiments for Gromov’s method,”
researchgate (free on-line)
May 2017
Unrestricted Content7
1
2
2
2
21
2
2
1
2
2
2
2
logloglog
loglog
/2
1log
)(loglog
−−
−
∂
∂
∂
∂
∂
∂−=
∂
∂
∂
∂−=
∂
∂
∂
∂+
∂
∂
∂
∂−
∂
∂−
∂
∂−=
∂
∂
x
p
x
h
x
pQ
x
h
x
pf
px
pQdiv
xx
f
x
p
x
fdiv
x
pf
x
h
T
T
FUSION 2017
DEEP BACKUP
Unrestricted Content8
FUSION 2017
Unrestricted Content9
FUSION 2017
1960 20201980 2000
1210
610
910
310
1
computer speed
& memory
per unit cost
time
Almost all progress has been
driven by faster low cost
computers rather than new
theory or brilliant deep ideas.
Unrestricted Content10
FUSION 2017
1960 20201980 2000
1210
610
910
310
1
computer speed
& memory
per unit cost
time
standard MHT explicitly
assumes that one
measurement corresponds
with at most one target,
contrary to the real World for
any sensor!!!
Wolfgang Koch & Günter van Keuk,
“multiple hypothesis track
maintenance with possibly
unresolved measurements,” IEEE
Trans AES, 1997
Henk Blom & Edwin Bloem,
“Bayesian tracking of two possibly
unresolved targets,” IEEE Trans
AES, 2007
Unrestricted Content11
FUSION 2017
no useful theory for deep learning, MCMC or particle filtersno useful theory for deep learning, MCMC or particle filtersno useful theory for deep learning, MCMC or particle filtersno useful theory for deep learning, MCMC or particle filters
Unrestricted Content12
“The convergence properties of the particle filter are well
understood on a theoretical level (see Crisan & Doucet
IEEE Transactions Signal Processing, 2002)….In practice
the performance degrades quickly with the state
dimension due to the curse of dimensionality.” 2010
“Despite their great success, there is still no comprehensive understanding of the optimization process or the internal organization of deep neural networks, and they are often criticized for being used as mysterious black boxes.”
Ravid Schwartz-Ziv (2017)
“Essentially none of these applications [of MCMC] is accompanied by any kind of practically useful running time analysis.” Persi Diaconis (2009)
FUSION 2017
1960 20201980 2000
1210
610
910
310
1
computer speed
& memory
per unit cost
time
multi-sensor data
fusion is often worse
than no fusion in the
real world because of
residual so-called
“bias” errors
Unrestricted Content13
Mark Levedahl, “explicit
pattern matching
assignment algorithm,”
proceedings of SPIE
conference on signal
processing, Orlando, 2002.
FUSION 2017
item deep learning particle flow
purpose learning & decisions learning & estimation & decisions
interesting wrinkle (which
annoys many people)
lack of uniqueness of solution for highly
non-convex loss functions
lack of uniqueness for solution of highly
underdertermined transport PDE
architecture many layers many steps in log-homotopy
fundamental issues curse of dimensionality &
ill-conditioning & singularity of Hessian
curse of dimensionality &
ill-conditioning & singularity of Hessian
tools stochastic gradient or natural gradient stochastic natural gradient
representation of geometry Hessian of loss function (log p) Hessian of log p
useful theory to explain
performance
none none
performance evaluation numerical experiments numerical experiments
theory of design ersatz Bayesian echt Bayesian
computers of choice today GPUs GPUs
regularization random dropout & sparsity of coupling
between layers and within layers
Tychonov regularization or shrinkage or
preferred coordinate system
key adaptive method adaptive learning rate adaptive step size in λ
dynamics of learning backpropagation (i.e., chain rule) Fokker-Planck equation (i.e., chain rule)Unrestricted Content14
FUSION 2017
6/26/2017 15Unrestricted Content
failure of deep learning in high dimensions*
*Shai Shalev-Shwartz, et al., “failures of gradient-based deep learning,” April 2017.FUSION 2017
(1) Beskos, Crisan, Jasra & Whitely, “error bounds and normalizing constants for sequential Monte Carlo in high dimensions,” Dec 2011.
(2) Beskos, Crisan & Jasra, “on the stability of sequential Monte Carlo methods in high dimensions,” April 2012.
(3) Crisan & Doucet, “a survey of convergence results on particle filtering methods for practitioners,” IEEE Transactions Signal Processing, March 2002.
(4) Arnak Dalalyan, “theoretical guarantees for approximate sampling from smooth and log-concave densities,” arXiv:1412.7392v4, September 2015.
(5) Krzysztof Łatuszynski, Błaej Miasojedow and Wojciech Niemiro, “nonasymptotic bounds on the estimation error of MCMC algorithms,” 2011.
(6) Erich Novak and Daniel Rudolph, “computation of expectations by Markov chain Monte Carlo methods,” September 2014.
(7) Paul Bui Quang, Christian Musso and Francois Le Gland, “An Insight into the Issue of Dimensionality in Particle Filtering,” Proceedings of 13th international conference on information fusion, Edinburgh Scotland, July 2010.
(8) Thomas Bengtsson, Peter Bickel & Bo Li, “curse of dimensionality revisited: collapse of the particle filter in very large scale systems,” IMS 2008.
(9) Erich Novak, “some results on the complexity of numerical integration,” pages 161 to 183 in “Monte Carlo and quasi-Monte Carlo methods,” edited by Ron Cools and Dirk Nuyens, Springer-Verlag, 2016.
(10) Snyder, Bengtsson & Morzfeld, “performance bounds for particle filters using the optimal proposal,” Monthly Weather Review, November 2015.
(11) Simone Surace, Anna Kutschireiter & Jean-Pascal Pfister, “how to avoid the curse of dimensionality: scalability of particle filters with and without importance weights,” March 2017.Unrestricted Content
16
FUSION 2017
(12) Mathieu Gerber and Nicolas Chopin, “Sequential quasi Monte Carlo,” Journal of Royal Statistical Society, series B, pages 509 to 579, with rejoinders and rebuttal, 2015.
(13) Michael Elad, “Deep, deep trouble: deep learning’s impact on image processing, mathematics and humanity,” SIAM NEWS, May 2017.
(14) Persi Diaconis, “the MCMC revolution,” AMS Bulletin, 2009.
(15) Moritz Hardt, Benjamin Recht & Yoram Singer, “training faster, generalize better: stability of stochastic gradient descent,” 2016.
(16) Yann LeCun, et al., “efficient backprop,” 1998.
(17) Geoff Hinton, et al., “improving neural networks by preventing co-adaptation of feature detectors,” 2012.
(18) Geoff Hinton, et al., “on the importance of initialization and momentum in deep learning,” 2013.
(19) Geoff Hinton, et al., “overview of mini-batch gradient descent,” 2013.
(20) Jürgen Schmidhuber, “deep learning in neural networks: an overview,” Oct. 2014.
(21) Yoshua Bengio, et al., “deep learning,” MIT Press, 2016.
(22) Rony Ronen, et al., “why and when deep learning works,” 2017
(23) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “deep learning,” Nature, 2015.
Unrestricted Content17
FUSION 2017
[24] Behnam Neyshabur, et al., “geometry of optimization and implicit regularization in deep Learning,” May 2017.
[25] Shai Shalev-Shwartz, et al., “failures of gradient-based deep learning,” April 2017.
[26] Nadav Cohen, et al., “analysis and design of convolutional networks via hierarchical tensor decompositions,” June 2017.
[27] Ravid Schwartz-Ziv, et al., “opening the black box of deep neural networks via information,” April 2017.
[28] Tom Zahavy, et al., “graying the black box: understanding deep Q-networks,” April 2017.
[29] Anna Choromanska, et al., “the loss surfaces of multilayer networks,” 2015.
[30] Chiyuan Zhang, et al., “theory of deep learning III: generalization properties of stochastic gradient descent (SGD),” MIT Center for Brains, Minds & Machines, April 2017.
[31] Kenji Kawaguchi, “deep learning without poor local minima,” NIPS 2016.
[32] Yoshua Bengio and Yann LeCun, “scaling learning algorithms towards AI,” in “large-scale kernel machines”, edited by Bottou, Chapelle, DeCoste, and Weston, MIT Press 2007.
[33] Hao Wang & Dit-Yan Yeung, “towards Bayesian deep learning: a survey, April 2016.
[34] Yann LeCun, et al., “singularity of the Hessian in deep learning,” 2017.
Unrestricted Content18
FUSION 2017
Oh’s Formula for Monte Carlo errors
assumptions:
(1) Gaussian density (zero mean & unit covariance matrix)
(2) d-dimensional random variable
(3) proposal density is also Gaussian with mean ε and covariance matrix kI, but it is not exact for k ≠ 1 or ε ≠ 0
(4) N = number of Monte Carlo trials
Nkk
kd
/21
exp21
1 22
+
+
+≈
εσ
Unrestricted Content
FUSION 2017
curse of dimensionality for classic particle filter*curse of dimensionality for classic particle filter*curse of dimensionality for classic particle filter*curse of dimensionality for classic particle filter*
optimal
accuracy:
r = 1.0
20*Daum, IEEE AES Systems Magazine, August 2005.Unrestricted Content
FUSION 2017
21
Nima Moshtagh, Jonathan Chan, Moses Chan, “Homotopy Particle Filter for Ground-
Based Tracking of Satellites at GEO,” AMOS Conference, Hawaii September 2016.
boring old EKF standard particle
filter
particle flow filter
Unrestricted Content
FUSION 2017
22
Nima Moshtagh, Jonathan Chan, Moses Chan, “Homotopy Particle Filter for Ground-
Based Tracking of Satellites at GEO,” AMOS Conference, Hawaii September 2016. Unrestricted Content
FUSION 2017
application of Gromov’s theoremapplication of Gromov’s theoremapplication of Gromov’s theoremapplication of Gromov’s theorem:
23
[ ] [ ]
1
2
2
2
21
2
2
1
2
2
2
2
2
21
2
2
2
2
1111111
1
2
2
2
2
logloglog
logloglogloglog
loglog
)(),(
/2
1log)(loglog
−−
−−
−−−−−−−
−
∂
∂
∂
∂
∂
∂−=
∂
∂+
∂
∂
∂
∂
∂
∂+
∂
∂−=
++=
∂
∂
∂
∂−=
+=
∂
∂
∂
∂+
∂
∂
∂
∂−
∂
∂−
∂
∂−=
∂
∂
x
p
x
h
x
pQ
x
h
x
g
x
h
x
h
x
gQ
HRHPHRHHRHPQ
x
h
x
pf
dwQdxfdx
px
pQdiv
xx
f
x
p
x
fdiv
x
pf
x
h
TTT
T
T
λλ
λλ
λλλ
last year at
Baden-Baden
Daum, Huang & Noushin, “new theory &
numerical experiments for Gromov’s
method,” researchgate (free on-line),
May 2017Unrestricted Content
FUSION 2017
application of Gromov’s theorem:
Daum, Huang & Noushin,
“new theory & numerical
experiments for Gromov’s method,”
researchgate (free on-line)
May 2017
Unrestricted Content24
FUSION 2017
“Demonstration of quantum advantage in machine learning”“Demonstration of quantum advantage in machine learning”“Demonstration of quantum advantage in machine learning”“Demonstration of quantum advantage in machine learning”Diego RistèDiego RistèDiego RistèDiego Ristè1111, , , , Marcus P. da SilvaMarcus P. da SilvaMarcus P. da SilvaMarcus P. da Silva,,,, ColmColmColmColm A. RyanA. RyanA. RyanA. Ryan1111, , , , Andrew W. CrossAndrew W. CrossAndrew W. CrossAndrew W. Cross2222, , , , Antonio D. CórcolesAntonio D. CórcolesAntonio D. CórcolesAntonio D. Córcoles2222, , , , John A. SmolinJohn A. SmolinJohn A. SmolinJohn A. Smolin2222, , , , Jay M. GambettaJay M. GambettaJay M. GambettaJay M. Gambetta2222, , , , Jerry M. ChowJerry M. ChowJerry M. ChowJerry M. Chow2222 & & & & Blake R. JohnsonBlake R. JohnsonBlake R. JohnsonBlake R. Johnson, Journal of Quantum Information, April 2017., Journal of Quantum Information, April 2017., Journal of Quantum Information, April 2017., Journal of Quantum Information, April 2017.
Unrestricted Content25
FUSION 2017
comparison of multicomparison of multicomparison of multicomparison of multi----sensor data fusion algorithmssensor data fusion algorithmssensor data fusion algorithmssensor data fusion algorithms
Algorithm Bias estimation? Performance
1. GNPL + adaptive gating yes (jointly with
association)
2. Iterative bias estimation & JVC yes
3. Histogram of bias over association hypotheses
yes
4. Covariance inflation & JVC no
5. Association of objects (JVC) no
Unrestricted Content
FUSION 2017
Unrestricted Content27
Daniel Svensson, Martin Ulmke &Lars Danielsson, “Multi-Target
Tracking with Partially Unresolved Measurements,” 2011.
Fred Daum & Bob Fitzgerald.
“importance of resolution in multiple-target tracking,”
Proceedings of SPIE 1994.
Wolfgang Koch & Günter van Keuk, “multiple hypothesis track
maintenance with possibly unresolved Measurements,” IEEE
Transactions on Aerospace and Electronic Systems, 1997.
Henk Blom & Edwin Bloem,
“Bayesian tracking of two possibly unresolved targets,” IEEE
Transactions AES 2007.
Darko Musicki, Taek Lyul Song, HaeHo Lee, “multiscan multitarget
tracking with finite resolution Sensors,” 2012.FUSION 2017