harmonic analysis and deep learning

Harmonic Analysis &

Deep Learning

Sungbin Lim

In this talk…

Mathematical theory about filter, activation, pooling through multi-layers based on DCNN

Encompass general ingredients

Lipschitz continuity & Deformation sensitivity

WARNING : Very tough mathematics…without non-Euclidean geometry (e.g. Geometric DL)

What is Harmonic Analysis?

f(x)=X

n2Nan n(x), an := hf, niH

How to represent a function efficiently in the sense of Hilbert space?

Number theory

Signal processing

Quantum mechanics

Neuroscience, Statistics, Finance, etc…

Includes PDE theory, Stochastic Analysis

Hilbert space & Inner product

Banach space :

Hilbert space :

© Kyung-Min Rho


© Kyung-Min Rho

Banach space : Normed space + Completeness

Hilbert space :


Hilbert space :Banach space + Inner product


© Kyung-Min Rho



Rd, L2,Wn2 , · · ·


Cn, Lp,Wnp · · ·

© Kyung-Min Rho



Rd, L2,Wn2 , · · ·

hu,vi =dX

k=1

ukvk

hf, giL2 =

Zf(x)g(x)dx

hf, giWn2= hf, giL2 +

nX

k=1

h@kxf, @

kxgiL2


Cn, Lp,Wnp · · ·

© Kyung-Min Rho

Why Harmonic Analysis?

Pn(x) = anxn + an�1x

n�1 + · · ·+ a1x+ a0



n�1 + · · ·+ a1x+ a0

(an, an�1, . . . , a1 , a0)

Encoding



n�1 + · · ·+ a1x+ a0

(an, an�1, . . . , a1 , a0)

Encoding


n�1 + · · ·+ a1x+ a0

Decoding



n�1 + · · ·+ a1x+ a0

(an, an�1, . . . , a1 , a0)

Encoding


n�1 + · · ·+ a1x+ a0

Decoding

Why we prefer polynomial?

Stone-Weierstrass theorem

Polynomial is Universal approximation!

8f 2 C(X ), 8" > 0,

9Pn s.t. maxx2X

|f(x)� Pn(x)| < "

© Wikipedia

8f 2 C(X ),

9Pn s.t. limn!1

kf � Pnk1 = 0



© Wikipedia


Even we can approximate derivatives!

9Pn s.t. limn!1

kf � PnkCn ! 0


8f 2 Ck(X ),

© Wikipedia



Universal approximation = {DL, polynomials, Tree,…}


9Pn s.t. limn!1

kf � PnkCn ! 0

8f 2 Ck(X ),

© Wikipedia



Universal approximation = {DL, polynomials, Tree,…}

But why we do not use polynomial?


9Pn s.t. limn!1

kf � PnkCn ! 0

8f 2 Ck(X ),

© Wikipedia

Local interpolation works well for low dimension© S. Mallat

Local interpolation works well for low dimension

Need "�d points to cover [0, 1]d at a distance "

© S. Mallat

Local interpolation works well for low dimension

Need "�d points to cover [0, 1]d at a distance "

High dimension ⇢ Curse of dimension!

© H. Bölcskei

Universal approximator= Good feature extractor?

Universal approximator= Good feature extractor\

…in HIGH dimension!

Nonlinear Feature Extraction© S. Mallat, © H. Bölcskei

Dimension Reduction ⇢ Invariants© S. Mallat

Dimension Reduction ⇢ Invariants

How?© S. Mallat

Main Topic in Harmonic Analysis

Linear operator ⇢ Convolution + Multiplier

Invariance vs Discriminability

L[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)

Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)


Invariance vs Discriminability



Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame

AkfkH kL[f ]kH BkfkH





kL[f1]� L[f2]kH = kL[f1 � f2]kH � Akf1 � f2kHi.e. f1 6= f2 ) L[f1] 6= L[f2]





kL � · · · � L| {z }n-fold

[f ]kH BkL � · · · � L| {z }(n-1)-fold

[f ]kH · · · BnkfkH





kL � · · · � L| {z }n-fold

[f ]kH BkL � · · · � L| {z }(n-1)-fold

[f ]kH · · · BnkfkH

Banach fixed-point t

heorem

Main Tasks in Deep CNN

Representation learning

Feature Extraction

Nonlinear transform

Main Tasks in Deep CNN

Representation learning

Feature Extraction

Nonlinear transformLipschitz continuity

ex) ReLU, tanh, sigmoid …

|f(x)� f(y)| Ckx� yk () krf(x)k C

How to control Lipschitz ?

k⇢(L[f ])kH N(B,C)kfkHTheorem

No change in Invariance!

k⇢(L[f ])kH N(B,C)kfkH

Proof)


Let ⇢ = ReLU,H = W 12 . Then

Theorem



Proof)



Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2



Proof)



Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2


What about Discriminability?

Scale Invariant Feature

Translation Invariant

Stable at Deformation

Scattering Network (Mallat, 2012)

�(f) =[

n

(�� · · ·��|f ⇤ g�(j) | ⇤ g�(k)

�� · · · ⇤ g�(p)

��| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

© H. Bölcskei

Generalized Scattering Network (Wiatowski, 2015)

�(f) =[

n

(�� · · ·��|f ⇤ g�(j) | ⇤ g�(k)

�� · · · ⇤ g�(p)

��| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Gabor frame

Tensor wavelet Directional wavelet

Ridgelet frame Curvelet frame

© H. Bölcskei


�(f) =[

n

(�� · · ·��|f ⇤ g�(j) | ⇤ g�(k)

�� · · · ⇤ g�(p)

��| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

© S. Mallat


�(f) =[

n

(�� · · ·��|f ⇤ g�(j) | ⇤ g�(k)

�� · · · ⇤ g�(p)

��| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Linearize symmetries

© S. Mallat


�(f) =[

n

(�� · · ·��|f ⇤ g�(j) | ⇤ g�(k)

�� · · · ⇤ g�(p)

��| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Linearize symmetries

“Space folding”, Cho (2014)

© S. Mallat

�(f) =[

n

(�� · · ·��|f ⇤ g�(j) | ⇤ g�(k)

�� · · · ⇤ g�(p)

��| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

f 7! Sd/2n Pn(f)(Sn·)

|k�n(Ttf)� �n(f)|k = O

ktkQnj=1 Sj

!Theorem


f 7! Sd/2n Pn(f)(Sn·)

|k�n(Ttf)� �n(f)|k = O

ktkQnj=1 Sj

!Theorem

Features become more translation invariant with increasing network depth



© Philip Scott Johnson

�(f) =[

n

(�� · · ·��|f ⇤ g�(j) | ⇤ g�(k)

�� · · · ⇤ g�(p)

��| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Theorem

F⌧,! = e2⇡i!(x)f(x� ⌧(x))

|k�(F⌧,![f ])� �(f)k| C(k⌧k1 + k!k1)kfkL2



Theorem

F⌧,! = e2⇡i!(x)f(x� ⌧(x))

|k�(F⌧,![f ])� �(f)k| C(k⌧k1 + k!k1)kfkL2

Multi-layer convolution linearize Featuresi.e. stable to deformations

David Hilbert

Wir müssen wissen.

Wir werden wissen.

harmonic analysis and deep learning

Science