harmonic analysis and deep learning
TRANSCRIPT
Harmonic Analysis &
Deep Learning
Sungbin Lim
In this talk…
Mathematical theory about filter, activation, pooling through multi-layers based on DCNN
Encompass general ingredients
Lipschitz continuity & Deformation sensitivity
WARNING : Very tough mathematics…without non-Euclidean geometry (e.g. Geometric DL)
What is Harmonic Analysis?
f(x)=X
n2Nan n(x), an := hf, niH
How to represent a function efficiently in the sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis
What is Harmonic Analysis?
f(x)=X
n2Nan n(x), an := hf, niH
How to represent a function efficiently in the sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis
Hilbert space & Inner product
Banach space :
Hilbert space :
© Kyung-Min Rho
Hilbert space & Inner product
© Kyung-Min Rho
Banach space : Normed space + Completeness
Hilbert space :
Banach space : Normed space + Completeness
Hilbert space :Banach space + Inner product
Hilbert space & Inner product
© Kyung-Min Rho
Banach space : Normed space + Completeness
Hilbert space :Banach space + Inner product
Rd, L2,Wn2 , · · ·
Hilbert space & Inner product
Cn, Lp,Wnp · · ·
© Kyung-Min Rho
Banach space : Normed space + Completeness
Hilbert space :Banach space + Inner product
Rd, L2,Wn2 , · · ·
hu,vi =dX
k=1
ukvk
hf, giL2 =
Zf(x)g(x)dx
hf, giWn2= hf, giL2 +
nX
k=1
h@kxf, @
kxgiL2
Hilbert space & Inner product
Cn, Lp,Wnp · · ·
© Kyung-Min Rho
Why Harmonic Analysis?
Pn(x) = anxn + an�1x
n�1 + · · ·+ a1x+ a0
Why Harmonic Analysis?
Pn(x) = anxn + an�1x
n�1 + · · ·+ a1x+ a0
(an, an�1, . . . , a1 , a0)
Encoding
Why Harmonic Analysis?
Pn(x) = anxn + an�1x
n�1 + · · ·+ a1x+ a0
(an, an�1, . . . , a1 , a0)
Encoding
Pn(x) = anxn + an�1x
n�1 + · · ·+ a1x+ a0
Decoding
Why Harmonic Analysis?
Pn(x) = anxn + an�1x
n�1 + · · ·+ a1x+ a0
(an, an�1, . . . , a1 , a0)
Encoding
Pn(x) = anxn + an�1x
n�1 + · · ·+ a1x+ a0
Decoding
Why we prefer polynomial?
Stone-Weierstrass theorem
Polynomial is Universal approximation!
8f 2 C(X ), 8" > 0,
9Pn s.t. maxx2X
|f(x)� Pn(x)| < "
© Wikipedia
8f 2 C(X ),
9Pn s.t. limn!1
kf � Pnk1 = 0
Stone-Weierstrass theorem
Polynomial is Universal approximation!
© Wikipedia
Stone-Weierstrass theorem
Even we can approximate derivatives!
9Pn s.t. limn!1
kf � PnkCn ! 0
Polynomial is Universal approximation!
8f 2 Ck(X ),
© Wikipedia
Stone-Weierstrass theorem
Even we can approximate derivatives!
Universal approximation = {DL, polynomials, Tree,…}
Polynomial is Universal approximation!
9Pn s.t. limn!1
kf � PnkCn ! 0
8f 2 Ck(X ),
© Wikipedia
Stone-Weierstrass theorem
Even we can approximate derivatives!
Universal approximation = {DL, polynomials, Tree,…}
But why we do not use polynomial?
Polynomial is Universal approximation!
9Pn s.t. limn!1
kf � PnkCn ! 0
8f 2 Ck(X ),
© Wikipedia
Local interpolation works well for low dimension© S. Mallat
Local interpolation works well for low dimension
Need "�d points to cover [0, 1]d at a distance "
© S. Mallat
Local interpolation works well for low dimension
Need "�d points to cover [0, 1]d at a distance "
High dimension ⇢ Curse of dimension!
© H. Bölcskei
Universal approximator= Good feature extractor?
Universal approximator= Good feature extractor\
…in HIGH dimension!
Nonlinear Feature Extraction© S. Mallat, © H. Bölcskei
Dimension Reduction ⇢ Invariants© S. Mallat
Dimension Reduction ⇢ Invariants
How?© S. Mallat
Main Topic in Harmonic Analysis
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
L[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)
Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)
Linear operator ⇢ Convolution + Multiplier
Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame
AkfkH kL[f ]kH BkfkH
Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)
AkfkH kL[f ]kH BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame
kL[f1]� L[f2]kH = kL[f1 � f2]kH � Akf1 � f2kHi.e. f1 6= f2 ) L[f1] 6= L[f2]
Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)
AkfkH kL[f ]kH BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame
kL � · · · � L| {z }n-fold
[f ]kH BkL � · · · � L| {z }(n-1)-fold
[f ]kH · · · BnkfkH
Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)
AkfkH kL[f ]kH BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame
kL � · · · � L| {z }n-fold
[f ]kH BkL � · · · � L| {z }(n-1)-fold
[f ]kH · · · BnkfkH
Banach fixed-point t
heorem
Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transformLipschitz continuity
ex) ReLU, tanh, sigmoid …
|f(x)� f(y)| Ckx� yk () krf(x)k C
How to control Lipschitz ?
k⇢(L[f ])kH N(B,C)kfkHTheorem
No change in Invariance!
k⇢(L[f ])kH N(B,C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU,H = W 12 . Then
Theorem
How to control Lipschitz ?
k⇢(L[f ])kH N(B,C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU,H = W 12 . Then
Theorem
k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2
kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0
r(L[f ])kL2
kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f ])kH N(B,C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU,H = W 12 . Then
Theorem
k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2
kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0
r(L[f ])kL2
kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f ])kH N(B,C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU,H = W 12 . Then
Theorem
k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2
kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0
r(L[f ])kL2
kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f ])kH N(B,C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU,H = W 12 . Then
Theorem
k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2
kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0
r(L[f ])kL2
kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f ])kH N(B,C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU,H = W 12 . Then
Theorem
k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2
kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0
r(L[f ])kL2
kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f ])kH N(B,C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU,H = W 12 . Then
Theorem
k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2
kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0
r(L[f ])kL2
kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1
2
How to control Lipschitz ?
What about Discriminability?
Scale Invariant Feature
Translation Invariant
Stable at Deformation
© S. Mallat
Scale Invariant Feature
Translation Invariant
Stable at Deformation
Scattering Network (Mallat, 2012)
�(f) =[
n
(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)
��� · · · ⇤ g�(p)
�����| {z }
n-fold convolution
⇤�n
)
�(j),··· ,�(p)
© H. Bölcskei
Generalized Scattering Network (Wiatowski, 2015)
�(f) =[
n
(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)
��� · · · ⇤ g�(p)
�����| {z }
n-fold convolution
⇤�n
)
�(j),··· ,�(p)
Gabor frame
Tensor wavelet Directional wavelet
Ridgelet frame Curvelet frame
© H. Bölcskei
Generalized Scattering Network (Wiatowski, 2015)
�(f) =[
n
(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)
��� · · · ⇤ g�(p)
�����| {z }
n-fold convolution
⇤�n
)
�(j),··· ,�(p)
© S. Mallat
Generalized Scattering Network (Wiatowski, 2015)
�(f) =[
n
(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)
��� · · · ⇤ g�(p)
�����| {z }
n-fold convolution
⇤�n
)
�(j),··· ,�(p)
Linearize symmetries
© S. Mallat
Generalized Scattering Network (Wiatowski, 2015)
�(f) =[
n
(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)
��� · · · ⇤ g�(p)
�����| {z }
n-fold convolution
⇤�n
)
�(j),··· ,�(p)
Linearize symmetries
“Space folding”, Cho (2014)
© S. Mallat
�(f) =[
n
(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)
��� · · · ⇤ g�(p)
�����| {z }
n-fold convolution
⇤�n
)
�(j),··· ,�(p)
f 7! Sd/2n Pn(f)(Sn·)
|k�n(Ttf)� �n(f)|k = O
ktkQnj=1 Sj
!Theorem
Generalized Scattering Network (Wiatowski, 2015)
f 7! Sd/2n Pn(f)(Sn·)
|k�n(Ttf)� �n(f)|k = O
ktkQnj=1 Sj
!Theorem
Features become more translation invariant with increasing network depth
Generalized Scattering Network (Wiatowski, 2015)
Generalized Scattering Network (Wiatowski, 2015)
© Philip Scott Johnson
�(f) =[
n
(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)
��� · · · ⇤ g�(p)
�����| {z }
n-fold convolution
⇤�n
)
�(j),··· ,�(p)
Theorem
F⌧,! = e2⇡i!(x)f(x� ⌧(x))
|k�(F⌧,![f ])� �(f)k| C(k⌧k1 + k!k1)kfkL2
Generalized Scattering Network (Wiatowski, 2015)
© Philip Scott Johnson
Theorem
F⌧,! = e2⇡i!(x)f(x� ⌧(x))
|k�(F⌧,![f ])� �(f)k| C(k⌧k1 + k!k1)kfkL2
Multi-layer convolution linearize Featuresi.e. stable to deformations
Generalized Scattering Network (Wiatowski, 2015)
© Philip Scott Johnson
Ergodic Reconstructions
© Philip Scott Johnson
© S. Mallat
David Hilbert
Wir müssen wissen.
Wir werden wissen.
Q.A