harmonic analysis of deep convolutional neural networks...\carlos kleiber conducting the vienna...
TRANSCRIPT
![Page 1: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/1.jpg)
Harmonic Analysis ofDeep Convolutional Neural Networks
Helmut Bolcskei
Department of Information Technology and Electrical Engineering
October 2017
joint work with Thomas Wiatowski and Philipp Grohs
![Page 2: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/2.jpg)
ImageNet
![Page 3: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/3.jpg)
ImageNet
ski
rock
coffee
plant
CNNs win the ImageNet 2015 challenge [He et al., 2015 ]
![Page 4: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/4.jpg)
ImageNet
ski
rock
coffee
plant
CNNs win the ImageNet 2015 challenge [He et al., 2015 ]
![Page 5: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/5.jpg)
Describing the content of an image
CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]
“Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989.”
![Page 6: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/6.jpg)
Describing the content of an image
CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]
“Carlos
Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989
.”
![Page 7: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/7.jpg)
Describing the content of an image
CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]
“Carlos Kleiber
conducting the Vienna Philharmonic’s New Year’s Concert 1989
.”
![Page 8: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/8.jpg)
Describing the content of an image
CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]
“Carlos Kleiber conducting the
Vienna Philharmonic’s New Year’s Concert 1989
.”
![Page 9: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/9.jpg)
Describing the content of an image
CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]
“Carlos Kleiber conducting the Vienna Philharmonic’s
New Year’s Concert 1989
.”
![Page 10: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/10.jpg)
Describing the content of an image
CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]
“Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert
1989
.”
![Page 11: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/11.jpg)
Describing the content of an image
CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]
“Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989.”
![Page 12: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/12.jpg)
Feature extraction and classification
input: f =
non-linear feature extraction
feature vector Φ(f)
linear classifier
{〈w,Φ(f)〉 > 0, ⇒ Shannon
〈w,Φ(f)〉 < 0, ⇒ von Neumannoutput:
![Page 13: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/13.jpg)
Why non-linear feature extractors?
Task: Separate two categories of data through a linear classifier
1
: 〈w, f〉 > 0
: 〈w, f〉 < 0
Φ(f) =
[‖f‖1
]
: 〈w,Φ(f)〉 > 0
: 〈w,Φ(f)〉 < 0
possible with w =
[1−1
]
![Page 14: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/14.jpg)
Why non-linear feature extractors?
Task: Separate two categories of data through a linear classifier
1
: 〈w, f〉 > 0
: 〈w, f〉 < 0
not possible!
Φ(f) =
[‖f‖1
]
: 〈w,Φ(f)〉 > 0
: 〈w,Φ(f)〉 < 0
possible with w =
[1−1
]
![Page 15: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/15.jpg)
Why non-linear feature extractors?
Task: Separate two categories of data through a linear classifier
1
: 〈w, f〉 > 0
: 〈w, f〉 < 0
not possible!
Φ(f) =
[‖f‖1
]
: 〈w,Φ(f)〉 > 0
: 〈w,Φ(f)〉 < 0
possible with w =
[1−1
]
![Page 16: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/16.jpg)
Why non-linear feature extractors?
Task: Separate two categories of data through a linear classifier
Φ(f) =
[‖f‖1
]
⇒ Φ is invariant to angular component of the data
⇒ Linear separability in feature space!
![Page 17: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/17.jpg)
Why non-linear feature extractors?
Task: Separate two categories of data through a linear classifier
Φ(f) =
[‖f‖1
]
⇒ Φ is invariant to angular component of the data
⇒ Linear separability in feature space!
![Page 18: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/18.jpg)
Translation invariance
Handwritten digits from the MNIST database [LeCun & Cortes, 1998 ]
Feature vector should be invariant to spatial location⇒ translation invariance
![Page 19: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/19.jpg)
Deformation insensitivity
Feature vector should be independent of cameras (of differentresolutions), and insensitive to small acquisition jitters
![Page 20: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/20.jpg)
Scattering networks ([Mallat, 2012 ], [Wiatowski and HB, 2015 ])
feature mapf
|f ∗ gλ(k)1
|
||f ∗ gλ(k)1
| ∗ gλ(l)2
|
|f ∗ gλ(p)1
|
||f ∗ gλ(p)1
| ∗ gλ(r)2
|
![Page 21: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/21.jpg)
Scattering networks ([Mallat, 2012 ], [Wiatowski and HB, 2015 ])
feature mapf
|f ∗ gλ(k)1
|
· ∗ χ2
||f ∗ gλ(k)1
| ∗ gλ(l)2
|
· ∗ χ3
|f ∗ gλ(p)1
|
· ∗ χ2
||f ∗ gλ(p)1
| ∗ gλ(r)2
|
· ∗ χ3
· ∗ χ1
![Page 22: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/22.jpg)
Scattering networks ([Mallat, 2012 ], [Wiatowski and HB, 2015 ])
feature map
feature vector Φ(f)
f
|f ∗ gλ(k)1
|
· ∗ χ2
||f ∗ gλ(k)1
| ∗ gλ(l)2
|
· ∗ χ3
|f ∗ gλ(p)1
|
· ∗ χ2
||f ∗ gλ(p)1
| ∗ gλ(r)2
|
· ∗ χ3
· ∗ χ1
General scattering networks guarantee [Wiatowski & HB, 2015 ]
- (vertical) translation invariance
- small deformation sensitivity
essentially irrespective of filters, non-linearities, and poolings!
![Page 23: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/23.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn
An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn
‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)
e.g.: Learned filters
![Page 24: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/24.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn
An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn
‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)
e.g.: Structured filters
e.g.: Learned filters
![Page 25: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/25.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn
An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn
‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)
e.g.: Unstructured filters
e.g.: Learned filters
![Page 26: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/26.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn
An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn
‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)
e.g.: Learned filters
![Page 27: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/27.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Non-linearities: Point-wise and Lipschitz-continuous
‖Mn(f)−Mn(h)‖2 ≤ Ln‖f − h‖2, ∀ f, h ∈ L2(Rd)
⇒ Satisfied by virtually all non-linearities usedin the deep learning literature!
ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 14 ; ...
![Page 28: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/28.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Non-linearities: Point-wise and Lipschitz-continuous
‖Mn(f)−Mn(h)‖2 ≤ Ln‖f − h‖2, ∀ f, h ∈ L2(Rd)
⇒ Satisfied by virtually all non-linearities usedin the deep learning literature!
ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 14 ; ...
![Page 29: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/29.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Pooling: In continuous-time according to
f 7→ Sd/2n Pn(f)(Sn·),
where Sn ≥ 1 is the pooling factor and Pn : L2(Rd)→ L2(Rd) isRn-Lipschitz-continuous
⇒ Emulates most poolings used in the deep learning literature!
e.g.: Pooling by sub-sampling Pn(f) = f with Rn = 1
![Page 30: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/30.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Pooling: In continuous-time according to
f 7→ Sd/2n Pn(f)(Sn·),
where Sn ≥ 1 is the pooling factor and Pn : L2(Rd)→ L2(Rd) isRn-Lipschitz-continuous
⇒ Emulates most poolings used in the deep learning literature!
e.g.: Pooling by sub-sampling Pn(f) = f with Rn = 1
![Page 31: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/31.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n non-lin. pool.
gλ(k)n non-lin. pool.
Pooling: In continuous-time according to
f 7→ Sd/2n Pn(f)(Sn·),
where Sn ≥ 1 is the pooling factor and Pn : L2(Rd)→ L2(Rd) isRn-Lipschitz-continuous
⇒ Emulates most poolings used in the deep learning literature!
e.g.: Pooling by averaging Pn(f) = f ∗ φn with Rn = ‖φn‖1
![Page 32: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/32.jpg)
Vertical translation invariance
Theorem (Wiatowski and HB, 2015)
Assume that the filters, non-linearities, and poolings satisfy
Bn ≤ min{1, L−2n R−2
n }, ∀n ∈ N.
Let the pooling factors be Sn ≥ 1, n ∈ N. Then,
|||Φn(Ttf)− Φn(f)||| = O(
‖t‖S1 . . . Sn
),
for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.
![Page 33: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/33.jpg)
Vertical translation invariance
Theorem (Wiatowski and HB, 2015)
Assume that the filters, non-linearities, and poolings satisfy
Bn ≤ min{1, L−2n R−2
n }, ∀n ∈ N.
Let the pooling factors be Sn ≥ 1, n ∈ N. Then,
|||Φn(Ttf)− Φn(f)||| = O(
‖t‖S1 . . . Sn
),
for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.
⇒ Features become more invariant with increasing network depth!
![Page 34: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/34.jpg)
Vertical translation invariance
Theorem (Wiatowski and HB, 2015)
Assume that the filters, non-linearities, and poolings satisfy
Bn ≤ min{1, L−2n R−2
n }, ∀n ∈ N.
Let the pooling factors be Sn ≥ 1, n ∈ N. Then,
|||Φn(Ttf)− Φn(f)||| = O(
‖t‖S1 . . . Sn
),
for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.
Full translation invariance: If limn→∞
S1 · S2 · . . . · Sn =∞, then
limn→∞
|||Φn(Ttf)− Φn(f)||| = 0
![Page 35: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/35.jpg)
Vertical translation invariance
Theorem (Wiatowski and HB, 2015)
Assume that the filters, non-linearities, and poolings satisfy
Bn ≤ min{1, L−2n R−2
n }, ∀n ∈ N.
Let the pooling factors be Sn ≥ 1, n ∈ N. Then,
|||Φn(Ttf)− Φn(f)||| = O(
‖t‖S1 . . . Sn
),
for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.
The condition
Bn ≤ min{1, L−2n R−2
n }, ∀n ∈ N,
is easily satisfied by normalizing the filters {gλn}λn∈Λn .
![Page 36: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/36.jpg)
Vertical translation invariance
Theorem (Wiatowski and HB, 2015)
Assume that the filters, non-linearities, and poolings satisfy
Bn ≤ min{1, L−2n R−2
n }, ∀n ∈ N.
Let the pooling factors be Sn ≥ 1, n ∈ N. Then,
|||Φn(Ttf)− Φn(f)||| = O(
‖t‖S1 . . . Sn
),
for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.
⇒ applies to general filters, non-linearities, and poolings
![Page 37: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/37.jpg)
Philosophy behind invariance results
Mallat’s “horizontal” translation invariance [Mallat, 2012 ]:
limJ→∞
|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
- features become invariant in every network layer, but needsJ →∞
- applies to wavelet transform and modulus non-linearity withoutpooling
“Vertical” translation invariance:
limn→∞
|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
- features become more invariant with increasing network depth
- applies to general filters, general non-linearities, and generalpoolings
![Page 38: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/38.jpg)
Philosophy behind invariance results
Mallat’s “horizontal” translation invariance [Mallat, 2012 ]:
limJ→∞
|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
- features become invariant in every network layer, but needsJ →∞
- applies to wavelet transform and modulus non-linearity withoutpooling
“Vertical” translation invariance:
limn→∞
|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
- features become more invariant with increasing network depth
- applies to general filters, general non-linearities, and generalpoolings
![Page 39: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/39.jpg)
Philosophy behind invariance results
Mallat’s “horizontal” translation invariance [Mallat, 2012 ]:
limJ→∞
|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
- features become invariant in every network layer, but needsJ →∞
- applies to wavelet transform and modulus non-linearity withoutpooling
“Vertical” translation invariance:
limn→∞
|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
- features become more invariant with increasing network depth
- applies to general filters, general non-linearities, and generalpoolings
![Page 40: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/40.jpg)
Non-linear deformations
Non-linear deformation (Fτf)(x) = f(x− τ(x)), where τ : Rd → Rd
For “small” τ :
![Page 41: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/41.jpg)
Non-linear deformations
Non-linear deformation (Fτf)(x) = f(x− τ(x)), where τ : Rd → Rd
For “large” τ :
![Page 42: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/42.jpg)
Deformation sensitivity for signal classes
Consider (Fτf)(x) = f(x− τ(x)) = f(x− e−x2)
x
f1(x), (Fτf1)(x)
x
f2(x), (Fτf2)(x)
For given τ the amount of deformation inducedcan depend drastically on f ∈ L2(Rd)
![Page 43: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/43.jpg)
Philosophy behind deformation stability/sensitivity bounds
Mallat’s deformation stability bound [Mallat, 2012 ]:
|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞
)‖f‖W ,
for all f ∈ HW ⊆ L2(Rd)
- The signal class HW and the corresponding norm ‖ · ‖W dependon the mother wavelet (and hence the network)
Our deformation sensitivity bound:
|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)
- The signal class C (band-limited functions, cartoon functions, orLipschitz functions) is independent of the network
![Page 44: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/44.jpg)
Philosophy behind deformation stability/sensitivity bounds
Mallat’s deformation stability bound [Mallat, 2012 ]:
|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞
)‖f‖W ,
for all f ∈ HW ⊆ L2(Rd)
- Signal class description complexity implicit via norm ‖ · ‖W
Our deformation sensitivity bound:
|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)
- Signal class description complexity explicit via CC- L-band-limited functions: CC = O(L)- cartoon functions of size K: CC = O(K3/2)- M -Lipschitz functions CC = O(M)
![Page 45: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/45.jpg)
Philosophy behind deformation stability/sensitivity bounds
Mallat’s deformation stability bound [Mallat, 2012 ]:
|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞
)‖f‖W ,
for all f ∈ HW ⊆ L2(Rd)
Our deformation sensitivity bound:
|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)
- Decay rate α > 0 of the deformation error is signal-class-specific (band-limited functions: α = 1, cartoon functions:α = 1
2 , Lipschitz functions: α = 1)
![Page 46: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/46.jpg)
Philosophy behind deformation stability/sensitivity bounds
Mallat’s deformation stability bound [Mallat, 2012 ]:
|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞
)‖f‖W ,
for all f ∈ HW ⊆ L2(Rd)
- The bound depends explicitly on higher order derivatives of τ
Our deformation sensitivity bound:
|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)
- The bound implicitly depends on derivative of τ via thecondition ‖Dτ‖∞ ≤ 1
2d
![Page 47: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/47.jpg)
Philosophy behind deformation stability/sensitivity bounds
Mallat’s deformation stability bound [Mallat, 2012 ]:
|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞
)‖f‖W ,
for all f ∈ HW ⊆ L2(Rd)
- The bound is coupled to horizontal translation invariance
limJ→∞
|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
Our deformation sensitivity bound:
|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)
- The bound is decoupled from vertical translation invariance
limn→∞
|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd
![Page 48: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/48.jpg)
CNNs in a nutshell
CNNs used in practice employ potentiallyhundreds of layers and 10,000s of nodes!
e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ]
- Network depth: 152 layers
- average # of nodes per layer: 472
- # of FLOPS for a single forward pass: 11.3 billion
![Page 49: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/49.jpg)
CNNs in a nutshell
CNNs used in practice employ potentiallyhundreds of layers and 10,000s of nodes!
e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ]
- Network depth: 152 layers
- average # of nodes per layer: 472
- # of FLOPS for a single forward pass: 11.3 billion
![Page 50: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/50.jpg)
CNNs in a nutshell
CNNs used in practice employ potentiallyhundreds of layers and 10,000s of nodes!
e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ]
- Network depth: 152 layers
- average # of nodes per layer: 472
- # of FLOPS for a single forward pass: 11.3 billion
Such depths (and breadths) pose formidable computationalchallenges in training and operating the network!
![Page 51: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/51.jpg)
Topology reduction
Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers
Guarantee trivial null-space for feature extractor Φ
Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector
For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy
![Page 52: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/52.jpg)
Topology reduction
Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers
Guarantee trivial null-space for feature extractor Φ
Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector
For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy
![Page 53: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/53.jpg)
Topology reduction
Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers
Guarantee trivial null-space for feature extractor Φ
Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector
For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy
![Page 54: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/54.jpg)
Topology reduction
Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers
Guarantee trivial null-space for feature extractor Φ
Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector
For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy
![Page 55: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/55.jpg)
Building blocks
Basic operations in the n-th network layer
f...
gλ(r)n | · | ↓S
gλ(k)n | · | ↓S
Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn
Non-linearity: Modulus | · |Pooling: Sub-sampling with pooling factor S ≥ 1
![Page 56: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/56.jpg)
Demodulation effect of modulus non-linearity
Components of feature vector given by |f ∗ gλn | ∗ χn+1
1
ω
· · · · · ·
gλn(ω) χn+1(ω)
f(ω)
1
ω
f(ω) · gλn(ω)
![Page 57: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/57.jpg)
Demodulation effect of modulus non-linearity
Components of feature vector given by |f ∗ gλn | ∗ χn+1
1
ω
· · · · · ·
gλn(ω) χn+1(ω)
f(ω)
1
ω
f(ω) · gλn(ω)
![Page 58: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/58.jpg)
Demodulation effect of modulus non-linearity
Components of feature vector given by |f ∗ gλn | ∗ χn+1
1
ω
· · · · · ·
gλn(ω) χn+1(ω)
f(ω)
1
ω
f(ω) · gλn(ω)
Modulus squared:
|f ∗ gλn(x)|2 Rf ·gλn
(ω)
![Page 59: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/59.jpg)
Demodulation effect of modulus non-linearity
Components of feature vector given by |f ∗ gλn | ∗ χn+1
1
ω
· · · · · ·
gλn(ω) χn+1(ω)
f(ω)
1
ω
f(ω) · gλn(ω)
1
ω
|f ∗ gλn |∧
(ω)
Φ(f)
via χn+1
![Page 60: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/60.jpg)
Do all non-linearities demodulate?
High-pass filtered signal:
−2R 2R
2R
F(f ∗ gλ)
ω
![Page 61: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/61.jpg)
Do all non-linearities demodulate?
High-pass filtered signal:
−2R 2R
2R
F(f ∗ gλ)
ω
Modulus: Yes!
−2R 2Rω
|F(|f ∗ gλ|)|
−2R 2Rω
|F(|f ∗ gλ|)|
... but (small) tails!
![Page 62: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/62.jpg)
Do all non-linearities demodulate?
High-pass filtered signal:
−2R 2R
2R
F(f ∗ gλ)
ω
Modulus squared: Yes, and sharply so!
−2R 2R
|F(|f ∗ gλ|2)|
ω
... but not Lipschitz-continuous!
![Page 63: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/63.jpg)
Do all non-linearities demodulate?
High-pass filtered signal:
−2R 2R
2R
F(f ∗ gλ)
ω
Rectified linear unit: No!
−2R 2R
|F(ReLU(f ∗ gλ))|
ω
![Page 64: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/64.jpg)
First goal: Quantify feature map energy decay
W1(f)
W2(f)
f
|f ∗ gλ(k)1
|
· ∗ χ2
||f ∗ gλ(k)1
| ∗ gλ(l)2
|
· ∗ χ3
|f ∗ gλ(p)1
|
· ∗ χ2
||f ∗ gλ(p)1
| ∗ gλ(r)2
|
· ∗ χ3
· ∗ χ1
![Page 65: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/65.jpg)
Assumptions (on the filters)
i) Analyticity: For every filter gλn there exists a (not necessarilycanonical) orthant Hλn ⊆ Rd such that
supp(gλn) ⊆ Hλn .
ii) High-pass: There exists δ > 0 such that∑λn∈Λn
|gλn(ω)|2 = 0, a.e. ω ∈ Bδ(0).
⇒ Comprises various contructions of WH filters, wavelets,ridgelets, (α)-curvelets, shearlets
e.g.: analytic band-limited curvelets: ω1
ω2
![Page 66: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/66.jpg)
Assumptions (on the filters)
i) Analyticity: For every filter gλn there exists a (not necessarilycanonical) orthant Hλn ⊆ Rd such that
supp(gλn) ⊆ Hλn .
ii) High-pass: There exists δ > 0 such that∑λn∈Λn
|gλn(ω)|2 = 0, a.e. ω ∈ Bδ(0).
⇒ Comprises various contructions of WH filters, wavelets,ridgelets, (α)-curvelets, shearlets
e.g.: analytic band-limited curvelets: ω1
ω2
![Page 67: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/67.jpg)
Input signal classes
Sobolev functions of order s ≥ 0:
Hs(Rd) ={f ∈ L2(Rd)
∣∣∣ ∫Rd
(1 + |ω|2)s|f(ω)|2dω <∞}
Hs(Rd) contains a wide range of practically relevant signal classes
- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2
L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0
- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)
Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]
![Page 68: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/68.jpg)
Input signal classes
Sobolev functions of order s ≥ 0:
Hs(Rd) ={f ∈ L2(Rd)
∣∣∣ ∫Rd
(1 + |ω|2)s|f(ω)|2dω <∞}
Hs(Rd) contains a wide range of practically relevant signal classes
- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2
L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0
- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)
Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]
![Page 69: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/69.jpg)
Input signal classes
Sobolev functions of order s ≥ 0:
Hs(Rd) ={f ∈ L2(Rd)
∣∣∣ ∫Rd
(1 + |ω|2)s|f(ω)|2dω <∞}
Hs(Rd) contains a wide range of practically relevant signal classes
- square-integrable functions L2(Rd) = H0(Rd)
- L-band-limited functions L2L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0
- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)
Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]
![Page 70: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/70.jpg)
Input signal classes
Sobolev functions of order s ≥ 0:
Hs(Rd) ={f ∈ L2(Rd)
∣∣∣ ∫Rd
(1 + |ω|2)s|f(ω)|2dω <∞}
Hs(Rd) contains a wide range of practically relevant signal classes
- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2
L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0
- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)
Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]
![Page 71: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/71.jpg)
Input signal classes
Sobolev functions of order s ≥ 0:
Hs(Rd) ={f ∈ L2(Rd)
∣∣∣ ∫Rd
(1 + |ω|2)s|f(ω)|2dω <∞}
Hs(Rd) contains a wide range of practically relevant signal classes
- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2
L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0
- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)
Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]
![Page 72: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/72.jpg)
Exponential energy decay
Theorem
Let the filters be wavelets with mother wavelet
supp(ψ ) ⊆ [r−1, r], r > 1,
or Weyl-Heisenberg (WH) filters with prototype function
supp(g) ⊆ [−R,R], R > 0.
Then, for every f ∈ Hs(Rd), there exists β > 0 such that
Wn(f) = O(a−n(2s+β)
2s+β+1
),
where a = r2+1r2−1
in the wavelet case, and a = 12 + 1
R in the WH case.
⇒ decay factor a is explicit and can be tuned via r,R
![Page 73: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/73.jpg)
Exponential energy decay
Theorem
Let the filters be wavelets with mother wavelet
supp(ψ ) ⊆ [r−1, r], r > 1,
or Weyl-Heisenberg (WH) filters with prototype function
supp(g) ⊆ [−R,R], R > 0.
Then, for every f ∈ Hs(Rd), there exists β > 0 such that
Wn(f) = O(a−n(2s+β)
2s+β+1
),
where a = r2+1r2−1
in the wavelet case, and a = 12 + 1
R in the WH case.
⇒ decay factor a is explicit and can be tuned via r,R
![Page 74: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/74.jpg)
Exponential energy decay
Exponential energy decay:
Wn(f) = O(a−n(2s+β)
2s+β+1
)
- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to
|f(ω)| ≤ µ(1 + |ω|2)−( s2
+ 14
+β4
), ∀ |ω| ≥ L,
for some µ > 0, and L acts as an “effective bandwidth”
- smoother input signals (i.e., s↑) lead to faster energy decay
- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a
S
What about general filters? ⇒ polynomial energy decay!
![Page 75: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/75.jpg)
Exponential energy decay
Exponential energy decay:
Wn(f) = O(a−n(2s+β)
2s+β+1
)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to
|f(ω)| ≤ µ(1 + |ω|2)−( s2
+ 14
+β4
), ∀ |ω| ≥ L,
for some µ > 0, and L acts as an “effective bandwidth”
- smoother input signals (i.e., s↑) lead to faster energy decay
- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a
S
What about general filters? ⇒ polynomial energy decay!
![Page 76: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/76.jpg)
Exponential energy decay
Exponential energy decay:
Wn(f) = O(a−n(2s+β)
2s+β+1
)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to
|f(ω)| ≤ µ(1 + |ω|2)−( s2
+ 14
+β4
), ∀ |ω| ≥ L,
for some µ > 0, and L acts as an “effective bandwidth”
- smoother input signals (i.e., s↑) lead to faster energy decay
- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a
S
What about general filters? ⇒ polynomial energy decay!
![Page 77: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/77.jpg)
Exponential energy decay
Exponential energy decay:
Wn(f) = O(a−n(2s+β)
2s+β+1
)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to
|f(ω)| ≤ µ(1 + |ω|2)−( s2
+ 14
+β4
), ∀ |ω| ≥ L,
for some µ > 0, and L acts as an “effective bandwidth”
- smoother input signals (i.e., s↑) lead to faster energy decay
- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a
S
What about general filters? ⇒ polynomial energy decay!
![Page 78: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/78.jpg)
Exponential energy decay
Exponential energy decay:
Wn(f) = O(a−n(2s+β)
2s+β+1
)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to
|f(ω)| ≤ µ(1 + |ω|2)−( s2
+ 14
+β4
), ∀ |ω| ≥ L,
for some µ > 0, and L acts as an “effective bandwidth”
- smoother input signals (i.e., s↑) lead to faster energy decay
- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a
S
What about general filters? ⇒ polynomial energy decay!
![Page 79: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/79.jpg)
... our second goal ... trivial null-space for Φ
Why trivial null-space?
Feature space
w
: 〈w,Φ(f)〉 > 0
: 〈w,Φ(f)〉 < 0
Non-trivial null-space: ∃ f∗ 6= 0 such that Φ(f∗) = 0
⇒ 〈w,Φ(f∗)〉 = 0 for all w !
⇒ these f∗ become unclassifiable!
![Page 80: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/80.jpg)
... our second goal ... trivial null-space for Φ
Why trivial null-space?
Feature space
w
Φ(f∗)
: 〈w,Φ(f)〉 > 0
: 〈w,Φ(f)〉 < 0
Non-trivial null-space: ∃ f∗ 6= 0 such that Φ(f∗) = 0
⇒ 〈w,Φ(f∗)〉 = 0 for all w !
⇒ these f∗ become unclassifiable!
![Page 81: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/81.jpg)
... our second goal ...
Trivial null-space for feature extractor:{f ∈ L2(Rd) | Φ(f) = 0
}={
0}
Feature extractor Φ(·) =⋃∞n=0 Φn(·) shall satisfy
A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀f ∈ L2(Rd),
for some A,B > 0.
![Page 82: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/82.jpg)
“Energy conservation”
Theorem
For the frame upper {Bn}n∈N and frame lower bounds {An}n∈N,define B :=
∏∞n=1 max{1, Bn} and A :=
∏∞n=1 min{1, An}. If
0 < A ≤ B <∞,then
A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀ f ∈ L2(Rd).
- For Parseval frames (i.e., An = Bn = 1, n ∈ N), this yields
|||Φ(f)|||2 = ‖f‖22
- Connection to energy decay:
‖f‖22 =n−1∑k=0
|||Φk(f)|||2 +Wn(f)︸ ︷︷ ︸→ 0
![Page 83: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/83.jpg)
“Energy conservation”
Theorem
For the frame upper {Bn}n∈N and frame lower bounds {An}n∈N,define B :=
∏∞n=1 max{1, Bn} and A :=
∏∞n=1 min{1, An}. If
0 < A ≤ B <∞,then
A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀ f ∈ L2(Rd).
- For Parseval frames (i.e., An = Bn = 1, n ∈ N), this yields
|||Φ(f)|||2 = ‖f‖22
- Connection to energy decay:
‖f‖22 =n−1∑k=0
|||Φk(f)|||2 +Wn(f)︸ ︷︷ ︸→ 0
![Page 84: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/84.jpg)
“Energy conservation”
Theorem
For the frame upper {Bn}n∈N and frame lower bounds {An}n∈N,define B :=
∏∞n=1 max{1, Bn} and A :=
∏∞n=1 min{1, An}. If
0 < A ≤ B <∞,then
A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀ f ∈ L2(Rd).
- For Parseval frames (i.e., An = Bn = 1, n ∈ N), this yields
|||Φ(f)|||2 = ‖f‖22
- Connection to energy decay:
‖f‖22 =
n−1∑k=0
|||Φk(f)|||2 +Wn(f)︸ ︷︷ ︸→ 0
![Page 85: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/85.jpg)
... and our third goal ...
For a given CNN, specify the number of layersneeded to capture “most” of the input signal energy
How many layers n are needed to have at least ((1− ε) · 100)% ofthe input signal energy be contained in the feature vector, i.e.,
(1− ε)‖f‖22 ≤n∑k=0
|||Φk(f)|||2 ≤ ‖f‖22, ∀f ∈ L2(Rd).
![Page 86: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/86.jpg)
... and our third goal ...
For a given CNN, specify the number of layersneeded to capture “most” of the input signal energy
How many layers n are needed to have at least ((1− ε) · 100)% ofthe input signal energy be contained in the feature vector, i.e.,
(1− ε)‖f‖22 ≤n∑k=0
|||Φk(f)|||2 ≤ ‖f‖22, ∀f ∈ L2(Rd).
![Page 87: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/87.jpg)
Number of layers needed
Theorem
Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If
n ≥⌈
loga
(L
(1−√
1− ε )
)⌉,
then(1− ε)‖f‖22 ≤
n∑k=0
|||Φk(f)|||2 ≤ ‖f‖22.
![Page 88: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/88.jpg)
Number of layers needed
Theorem
Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If
n ≥⌈
loga
(L
(1−√
1− ε )
)⌉,
then(1− ε)‖f‖22 ≤
n∑k=0
|||Φk(f)|||2 ≤ ‖f‖22.
⇒ also guarantees trivial null-space for⋃nk=0 Φk(f)
![Page 89: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/89.jpg)
Number of layers needed
Theorem
Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If
n ≥⌈
loga
(L
(1−√
1− ε )
)⌉,
then(1− ε)‖f‖22 ≤
n∑k=0
|||Φk(f)|||2 ≤ ‖f‖22.
- lower bound depends on
- description complexity of input signals (i.e., bandwidth L)
- decay factor (wavelets a = r2+1r2−1 , WH filters a = 1
2 + 1R )
- similar estimates for Sobolev input signals and for generalfilters (polynomial decay!)
![Page 90: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/90.jpg)
Number of layers needed
Theorem
Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If
n ≥⌈
loga
(L
(1−√
1− ε )
)⌉,
then(1− ε)‖f‖22 ≤
n∑k=0
|||Φk(f)|||2 ≤ ‖f‖22.
- lower bound depends on
- description complexity of input signals (i.e., bandwidth L)
- decay factor (wavelets a = r2+1r2−1 , WH filters a = 1
2 + 1R )
- similar estimates for Sobolev input signals and for generalfilters (polynomial decay!)
![Page 91: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/91.jpg)
Number of layers needed
Numerical example for bandwidth L = 1:
(1− ε)0.25 0.5 0.75 0.9 0.95 0.99
wavelets (r = 2) 2 3 4 6 8 11WH filters (R = 1) 2 4 5 8 10 14general filters 2 3 7 19 39 199
Recall: Winner of the ImageNet 2015 challenge [He et al., 2015 ]
- Network depth: 152 layers
- average # of nodes per layer: 472
- # of FLOPS for a single forward pass: 11.3 billion
![Page 92: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/92.jpg)
Number of layers needed
Numerical example for bandwidth L = 1:
(1− ε)0.25 0.5 0.75 0.9 0.95 0.99
wavelets (r = 2) 2 3 4 6 8 11WH filters (R = 1) 2 4 5 8 10 14general filters 2 3 7 19 39 199
Recall: Winner of the ImageNet 2015 challenge [He et al., 2015 ]
- Network depth: 152 layers
- average # of nodes per layer: 472
- # of FLOPS for a single forward pass: 11.3 billion
![Page 93: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/93.jpg)
Number of layers needed
Numerical example for bandwidth L = 1:
(1− ε)0.25 0.5 0.75 0.9 0.95 0.99
wavelets (r = 2) 2 3 4 6 8 11WH filters (R = 1) 2 4 5 8 10 14general filters 2 3 7 19 39 199
Recall: Winner of the ImageNet 2015 challenge [He et al., 2015 ]
- Network depth: 152 layers
- average # of nodes per layer: 472
- # of FLOPS for a single forward pass: 11.3 billion
![Page 94: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/94.jpg)
... our fourth and last goal ...
For a fixed (possibly small) depth N , design scatteringnetworks that capture “most” of the input signal energy
![Page 95: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/95.jpg)
... our fourth and last goal ...
For a fixed (possibly small) depth N , design scatteringnetworks that capture “most” of the input signal energy
Recall: Let the filters be wavelets with mother wavelet
supp(ψ ) ⊆ [r−1, r], r > 1,
or Weyl-Heisenberg filters with prototype function
supp(g) ⊆ [−R,R], R > 0.
![Page 96: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/96.jpg)
... our fourth and last goal ...
For a fixed (possibly small) depth N , design scatteringnetworks that capture “most” of the input signal energy
For fixed depth N, want to choose r in the wavelet and R in the WHcase so that
(1− ε)‖f‖22 ≤N∑k=0
|||Φk(f)|||2 ≤ ‖f‖22, ∀f ∈ L2(Rd).
![Page 97: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/97.jpg)
Depth-constrained networks
Theorem
Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and fix ε ∈ (0, 1) and N ∈ N. If, in thewavelet case,
1 < r ≤√κ+ 1
κ− 1,
or, in the WH case,
0 < R ≤√
1
κ− 12
,
where κ :=(
L(1−√
1−ε )
) 1N
, then
(1− ε)‖f‖22 ≤N∑k=0
|||Φk(f)|||2 ≤ ‖f‖22.
![Page 98: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/98.jpg)
Depth-width tradeoff
Spectral supports of wavelet filters:
ωL1
r1 r r2 r3
1
g1 g2 g3ψ
Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet
⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal
⇒ larger number of filters in the first layer
⇒ depth-width tradeoff
![Page 99: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/99.jpg)
Depth-width tradeoff
Spectral supports of wavelet filters:
ωL1
r1 r r2 r3
1
g1 g2 g3ψ
Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet
⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal
⇒ larger number of filters in the first layer
⇒ depth-width tradeoff
![Page 100: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/100.jpg)
Depth-width tradeoff
Spectral supports of wavelet filters:
ωL1
r1 r r2 r3
1
g1 g2 g3ψ
Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet
⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal
⇒ larger number of filters in the first layer
⇒ depth-width tradeoff
![Page 101: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/101.jpg)
Depth-width tradeoff
Spectral supports of wavelet filters:
ωL1
r1 r r2 r3
1
g1 g2 g3ψ
Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet
⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal
⇒ larger number of filters in the first layer
⇒ depth-width tradeoff
![Page 102: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/102.jpg)
Depth-width tradeoff
Spectral supports of wavelet filters:
ωL1
r1 r r2 r3
1
g1 g2 g3ψ
Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet
⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal
⇒ larger number of filters in the first layer
⇒ depth-width tradeoff
![Page 103: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/103.jpg)
Yours truly
![Page 104: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/104.jpg)
Experiment: Handwritten digit classification
- Dataset: MNIST database of handwritten digits [LeCun &Cortes, 1998 ]; 60,000 training and 10,000 test images
- Φ-network: D = 3 layers; same filters, non-linearities, andpooling operators in all layers
- Classifier: SVM with radial basis function kernel [Vapnik, 1995 ]
- Dimensionality reduction: Supervised orthogonal least squaresscheme [Chen et al., 1991 ]
![Page 105: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/105.jpg)
Experiment: Handwritten digit classification
Classification error in percent:
Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig
n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26
- modulus and ReLU perform better than tanh and LogSig
- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost
- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]
- similar feature extraction network with directional, non-separablewavelets and no pooling
- significantly higher computational complexity
![Page 106: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/106.jpg)
Experiment: Handwritten digit classification
Classification error in percent:
Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig
n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26
- modulus and ReLU perform better than tanh and LogSig
- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost
- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]
- similar feature extraction network with directional, non-separablewavelets and no pooling
- significantly higher computational complexity
![Page 107: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/107.jpg)
Experiment: Handwritten digit classification
Classification error in percent:
Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig
n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26
- modulus and ReLU perform better than tanh and LogSig
- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost
- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]
- similar feature extraction network with directional, non-separablewavelets and no pooling
- significantly higher computational complexity
![Page 108: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/108.jpg)
Experiment: Handwritten digit classification
Classification error in percent:
Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig
n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26
- modulus and ReLU perform better than tanh and LogSig
- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost
- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]
- similar feature extraction network with directional, non-separablewavelets and no pooling
- significantly higher computational complexity
![Page 109: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/109.jpg)
Energy decay: Related work
[Waldspurger, 2017 ]: Exponential energy decay
Wn(f) = O(a−n),
for some unspecified a > 1.
- 1-D wavelet filters
- every network layer equipped with the same set of wavelets
- vanishing moments condition on the mother wavelet
- applies to 1-D real-valued band-limited input signals f ∈ L2(R)
![Page 110: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/110.jpg)
Energy decay: Related work
[Waldspurger, 2017 ]: Exponential energy decay
Wn(f) = O(a−n),
for some unspecified a > 1.
- 1-D wavelet filters
- every network layer equipped with the same set of wavelets
- vanishing moments condition on the mother wavelet
- applies to 1-D real-valued band-limited input signals f ∈ L2(R)
![Page 111: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/111.jpg)
Energy decay: Related work
[Waldspurger, 2017 ]: Exponential energy decay
Wn(f) = O(a−n),
for some unspecified a > 1.
- 1-D wavelet filters
- every network layer equipped with the same set of wavelets
- vanishing moments condition on the mother wavelet
- applies to 1-D real-valued band-limited input signals f ∈ L2(R)
![Page 112: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/112.jpg)
Energy decay: Related work
[Czaja and Li, 2016 ]: Exponential energy decay
Wn(f) = O(a−n),
for some unspecified a > 1.
- d-dimensional uniform covering filters (similar to Weyl-Heisenberg filters), but does not cover multi-scale filters (e.g.wavelets, ridgedelets, curvelets etc.)
- every network layer equipped with the same set of filters
- analyticity and vanishing moments conditions on the filters
- applies to d-dimensional complex-valued input signalsf ∈ L2(Rd)
![Page 113: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/113.jpg)
Energy decay: Related work
[Czaja and Li, 2016 ]: Exponential energy decay
Wn(f) = O(a−n),
for some unspecified a > 1.
- d-dimensional uniform covering filters (similar to Weyl-Heisenberg filters), but does not cover multi-scale filters (e.g.wavelets, ridgedelets, curvelets etc.)
- every network layer equipped with the same set of filters
- analyticity and vanishing moments conditions on the filters
- applies to d-dimensional complex-valued input signalsf ∈ L2(Rd)
![Page 114: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs](https://reader035.vdocuments.us/reader035/viewer/2022071609/6148c3352918e2056c22e629/html5/thumbnails/114.jpg)
Energy decay: Related work
[Czaja and Li, 2016 ]: Exponential energy decay
Wn(f) = O(a−n),
for some unspecified a > 1.
- d-dimensional uniform covering filters (similar to Weyl-Heisenberg filters), but does not cover multi-scale filters (e.g.wavelets, ridgedelets, curvelets etc.)
- every network layer equipped with the same set of filters
- analyticity and vanishing moments conditions on the filters
- applies to d-dimensional complex-valued input signalsf ∈ L2(Rd)