bayesian deep neural networks...measure theory properties of ˙- eld b 1. u 2b(entire set is...
TRANSCRIPT
![Page 1: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/1.jpg)
Bayesian Deep Neural Networks
Elementary mathematics
Sungjoon Choi
February 9, 2018
Seoul National University
![Page 2: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/2.jpg)
Elementary of mathematics
Figure 1: Elementary of mathematics (copyright to wikipedia).
1
![Page 3: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/3.jpg)
Elementary of mathematics
2
![Page 4: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/4.jpg)
Table of contents
1. Introduction
2. Set theory
3. Measure theory
4. Probability
5. Random variable
6. Random process
7. Functional analysis
3
![Page 5: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/5.jpg)
Introduction
![Page 6: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/6.jpg)
Introduction
• Whats Wrong with Probability Notation? 1
• Whats Wrong?
1. overloading p(·) for every probability function.
2. using bound variables named after random variables.
• Probability Notation is Bad
p(x |y) = p(y |x)p(x)/p(y)
• Random variables don’t help.
PX |Y (x |y) = PY |X (y |x)pX (x)/pY (y)
• Great expectations
E[x ] =∑x
xp(x)
E[X ] =∑x
xPX (x)
1https://lingpipe-blog.com/2009/10/13/whats-wrong-with-probability-notation/
4
![Page 7: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/7.jpg)
Introduction
• Today, I will introduce
1. probability theory of Kolmogorov
• set theory
• measure theory.
2. basic functional analysis
• Caution
• Try to get familiar with the terminologies.
• Some facts could be counterintuitive.
• No proof will be provided here.
5
![Page 8: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/8.jpg)
Introduction
Figure 2: Andrey Kolmogorov
6
![Page 9: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/9.jpg)
Introduction
• Import questions to have in mind throughout this lecture:
1. What is probability?
2. What is a random variable?
3. What is a random process?
4. What is a kernel function?
7
![Page 10: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/10.jpg)
Don’t panic.
7
![Page 11: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/11.jpg)
Most of the contents are from
Prof. Taejeong Kim’s slides.
7
![Page 12: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/12.jpg)
Set theory
![Page 13: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/13.jpg)
Set theory
• set, element, subset, universal set, set operations
• disjoint sets: A ∩ B = ∅• partition of A
example: A = 1, 2, 3, 4, partition of A: 1, 2, 3, 4• Cartesian product: A× B = (a, b) : a ∈ A, b ∈ B
• example: A = 1, 2,B = 3, 4, 5• A× B = (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)
• power set 2A: the set of all the subsets of A.
• example: A = 1, 2, 3• 2A = ∅, 1, 2, 3, 1, 2, 2, 3, 1, 3, 1, 2, 3
8
![Page 14: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/14.jpg)
Set theory
• cardinality |A|: finite, infinite, countable, uncountable, denumerable
(countably infinite)
• |A| = m, |B| = n⇒ |A× B| = mn
• |A| = n⇒ |2A| = 2n
• If there exists a one-to-one correspondence between two sets, they
have the same cardinality.
• countable: There is a one-to-one between the set and a set of
natural numbers. (example: set of all integers, set of all rational
numbers)
9
![Page 15: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/15.jpg)
Set theory
• Are the set of all integers and the set of all rational numbers
countable?
• Yes. by the following mappings.
• In fact, they are the same.
10
![Page 16: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/16.jpg)
Set theory
• denumerable: countably infinite
All denumerable sets are of the same cardinality, which is denoted
by ℵ0, aleph null or aleph naught.
• uncountable: not countable2
The smallest known uncountable set is (0, 1) or R, the set of all
real numbers, whose cardinality is denoted by c, continuum.
c = 2ℵ0
2Found by Georg Cantor in 1874.
11
![Page 17: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/17.jpg)
Set theory
• Show that the cardinality of C = [0, 1] is uncountable (Cantor’s
diagonal argument).
• Proof sketch)
1. Suppose that C is countable.
2. Then, there exists a sequence S = x1, x2, . . . such that all elements
in C are covered.
3. We can represent each xi using a binary system.
x1 = 0.d11d12d13...
x2 = 0.d21d22d23...
x3 = 0.d31d32d33...
where dij ∈ 0, 1.4. Define xnew = 0.d1d2d3 . . . such that di = 1− dii .
5. Clearly, xnew does not appear in S , which is a contraction. So C
must be uncountable.
12
![Page 18: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/18.jpg)
Set theory
• Then what is the number of real numbers between 0 and 1?
• Proof sketch)
1. We can represent a real number between 0 and 1 using a binary
system.
r1 = 0.d11d12d13...
r2 = 0.d21d22d23...
r3 = 0.d31d32d33...
where dij ∈ 0, 1.2. To fully distinguish a real number ri , we need ℵ0 bits where ℵ0 is the
number of all integers.
3. Consequently, c = 2ℵ0 (uncountable).
13
![Page 19: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/19.jpg)
Set theory
• function or mapping f : U → V
• domain U, codomain V
• image f (A) = f (x) ∈ V : x ∈ A, A ⊆ U
• range f (U)
• inverse image or preimage
f −1(B) = x ∈ U : f (x) ∈ B, B ⊆ V
14
![Page 20: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/20.jpg)
Set theory
15
![Page 21: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/21.jpg)
Set theory
• one-to-one or injective: f (a) = f (b)⇒ a = b
• onto or surjective: f (U) = V
• invertible: one-to-one and onto
16
![Page 22: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/22.jpg)
Set theory
17
![Page 23: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/23.jpg)
Measure theory
![Page 24: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/24.jpg)
Measure theory
Given a universal set U, a measure assigns a nonnegative real number to
each subset of U.
18
![Page 25: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/25.jpg)
Measure theory
• set function: a function assigning a number of a set (example:
cardinality, length, area).
• σ-field B: a collection of subsets of U such that (axioms)
1. ∅ ∈ B (empty set is included.)
2. B ∈ B ⇒ Bc ∈ B (closed under set complement.)
3. Bi ∈ B ⇒ ∪∞i=1Bi ∈ B (closed under countable union.)
19
![Page 26: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/26.jpg)
Measure theory
• properties of σ-field B1. U ∈ B (entire set is included.)
2. Bi ∈ B ⇒ ∩∞i=1Bi ∈ B (closed under countable intersection)
3. 2U is a σ-field.
4. B is either finite or uncountable, never denumerable.5. B and C are σ-fields ⇒ B ∩ C is a σ-field but B ∪ C is not.
• B = ∅, a, b, c, a, b, c• C = ∅, a, b, c, a, b, c• B ∩ C = ∅, a, b, c
(this is a σ-field)
• B ∪ C = ∅, a, c, a, b, b, c, a, b, c(this is not a σ-field as a, c = a ∩ c is not included.)
• σ(C) is called the σ-field generated by C.
20
![Page 27: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/27.jpg)
A σ-field is designed to define a measure.
20
![Page 28: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/28.jpg)
If the element is not inside a σ-field, it cannot
be measured.
20
![Page 29: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/29.jpg)
Measure theory
• A set U and a σ-field of subsets of U form a measurable space
(U,B).
• A measure µ defined on a measurable space (U,B) is a set function
µ : B → [0,∞] such that
1. µ(∅) = 0
2. For disjoint Bi and Bj ⇒ µ(∪∞i=1Bi ) =∑∞
i=1 µ(Bi )
(countable additivity)
• Probability is a measure such that µ(U) = 1, i.e., normalized
measure.
• A measurable space (U,B) and a measure µ defined on it together
form a measure space (U,B, µ).
21
![Page 30: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/30.jpg)
Probability
![Page 31: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/31.jpg)
Probability
22
![Page 32: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/32.jpg)
What is probability?
22
![Page 33: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/33.jpg)
Probability
• Toss a fair dice and observe the outcomes.
• P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
• P(A) = P(2, 4, 6) = P(2) + P(4) + P(6) = 1/2
23
![Page 34: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/34.jpg)
Probability
• The random experiment should be well defined.
• The outcomes are all the possible results of the random experiment
each of which canot be further divided.
• The sample point w : a point representing an outcome.
• The sample space Ω: the set of all the sample points.
24
![Page 35: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/35.jpg)
Probability
• Definition (probability)
• P defined on a measurable space (Ω,A) is a set functionP : A → [0, 1] such that (probability axioms).
1. P(∅) = 0
2. P(A) ≥ 0 ∀A ⊆ Ω
3. For disjoint sets Ai and Aj ⇒ P(∪ki=1Ai ) =∑k
i=1 P(Ai ) (countable
additivity)
4. P(Ω) = 1
25
![Page 36: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/36.jpg)
Probability
How do we assign probability to each event in A in such a way as to
satisfy the axioms?
26
![Page 37: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/37.jpg)
Probability
• probability allocation function
• For discrete Ω:
p : Ω→ [0, 1] such that∑
w∈Ω p(w) = 1 and P(A) =∑
w∈A p(w).
• For continuous Ω:
f : Ω→ [0,∞) such that∫w∈Ω
f (w)dw = 1 and
P(A) =∫w∈A f (w)dw .
• Recall that probability P is a set function P : A → [0, 1] where A is
a σ-field.
27
![Page 38: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/38.jpg)
Probability
Examples of probability allocation functions:
28
![Page 39: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/39.jpg)
Conditional probability
• conditional probability of A given B:
P(A|B) ,P(A ∩ B)
P(B).
• Again, recall that probability P is a set function, i.e., P : A → [0, 1].
29
![Page 40: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/40.jpg)
Conditional probability
• From the definition of conditional probability, we can derive:
P(A|B) ,P(A ∩ B)
P(B).
• chain rule:
• P(A ∩ B) = P(A|B)P(B)
• P(A ∩ B ∩ C) = P(A|B ∩ C)P(B ∩ C) = P(A|B ∩ C)P(B|C)P(C)
• total probability law:
P(A) = P(A ∩ B) + P(A ∩ BC )
= P(A|B)P(B) + P(A|BC )P(BC )
30
![Page 41: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/41.jpg)
Bayes’ rule
• Bayes’ rule
P(B|A) =P(B ∩ A)
P(A)=
P(A ∩ B)
P(A)=
P(A|B)P(B)
P(A)
• When B is the event that is considered and A is an observation,
• P(B|A) is called posterior probability.
• P(B) is called prior probability.
31
![Page 42: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/42.jpg)
Independence
• independent events A and B: P(A ∩ B) = P(A)P(B)
• independent 6= disjoint, mutually exclusive
32
![Page 43: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/43.jpg)
Independence
Example:
33
![Page 44: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/44.jpg)
Random variable
![Page 45: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/45.jpg)
Random variable
• random variable:
A random variable is a real-valued function defined on Ω that is
measurable w.r.t. the probability space (Ω,A,P) and the Borel
measurable space (R,B), i.e.,
X : Ω→ R such that ∀B ∈ B,X−1(B) ∈ A.
• What is random here?
• What is the result of carrying out the random experiment?
34
![Page 46: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/46.jpg)
Random variable
• Random variables are real numbers of our interest that are
associated with the outcomes of a random experiment.
• X (w) for a specific w ∈ Ω is called a realization.
• The set of all realizations of X is called the alphabet of X .
• We are interested in P(X ∈ B) for B ∈ B:
P(X ∈ B) , P(X−1(B)) = P(w : X (w) ∈ B)
35
![Page 47: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/47.jpg)
Random variable
• discrete random variable: There is a discrete set
xi : i = 1, 2, · · · such that∑
P(X = xi ) = 1.
• probability mass function: pX (x) , P(X = x) that satisfies
1. 0 ≤ pX (x) ≤ 1
2.∑
x pX (x) = 1
3. P(X ∈ B) =∑
x∈B pX (x)
36
![Page 48: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/48.jpg)
Random variable
• example: three fair-coin tosses
• X = number of heads
• probability mass function (pmf)
pX (x) =
1/8, x = 0
3/8, x = 1
3/8, x = 2
1/8, x = 3
0, else
• P(X ≥ 1) = 38
+ 38
+ 18
= 78
37
![Page 49: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/49.jpg)
Random variable
• Bernoulli pX (k) =
1− p, k = 0
p, k = 1
0, else
• uniform pX (k) =
1/(m − l + 1), k = l , l + 1, l + 2, · · · ,m0, else
• geometric pX (k) =
(1− p)pk , k = 0, 1, 2, · · ·0, else
38
![Page 50: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/50.jpg)
Random variable
• continuous random variable
There is an integrable function fX (x) such that
P(X ∈ B) =∫BfX (x)dx .
• probability density functionfX (x) , lim∆x→0
P(x<X≤x+∆x)∆x that satisfies
1. fX (x) > 1 is possible.
2.∫∞−∞ fX (x)dx = 1
3. P(X ∈ B) =∫x∈B fX (x)dx
39
![Page 51: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/51.jpg)
Random variable
• uniform fX (k) =
1/(b − a), a ≤ x ≤ b
0, else
• exponential fX (k) =
λeλx , x ≥ 0
0, else
• Laplace fX (k) = λ2 eλ|x|
• Gaussian fX (k) = 1√2πσ
exp(− 12 ( x−µ
σ )2)
• Cauchy fX (k) = λπ(λ2+x2)
40
![Page 52: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/52.jpg)
Expectation
EX ,
∑x xpX (x), discreteX∫∞∞ xfX (x)dx , continuousX
41
![Page 53: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/53.jpg)
Conditional expectation
• Conditional expectation E (X |Y )
• Expectation E(X ) of random variable X is EX =∫xfX (x)dx and is a
deterministic variable.
• E(X |Y ) is a function of Y and hence a random variable.
• For each y , E(X |Y ) is X average over the event where Y = y .
42
![Page 54: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/54.jpg)
Conditional expectation
• Conditional expectation E (X |Y )
43
![Page 55: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/55.jpg)
Conditional expectation
• Definition (conditional expectation)
• Given a random variable Y with E|Y | <∞ defined on a probabilityspace (Ω,A,P) and some sub-σ-field G ⊂ A we will define theconditional expectation as the almost surely unique randomvariable E(Y |G) which satisfies the following two conditions
1. (Y |G) is G-measurable.
2. E(YZ) = E(E(Y |GZ) for all Z which are bounded and G-measurable.
44
![Page 56: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/56.jpg)
Conditional expectation
• Conditional expectation E (X |Y ) with different σ-fields.
45
![Page 57: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/57.jpg)
Moment
• n-th moment EX n
• mean mX = EX
• variance σ2X = var(X ) = E (X −mX )2
• skewness E(X−mX )3
σ3X
• kurtosis E(X−mX )4
σ4X
46
![Page 58: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/58.jpg)
Joint moment
• correlation EXY
• covariance cov(X ,Y ) = E (X −mX )(Y −mY )
• correlation coefficient ρXY = cov(X ,Y )σXσY
• uncorrelated EXY = EXEY
• independent ⇒ uncorrelated
• uncorrelated 6⇒ independent
• orthogonal EXY = 0
47
![Page 59: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/59.jpg)
Random process
![Page 60: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/60.jpg)
Random process
• We would like to extend random vectors to infinite dimensions. That
is, we would like to mathematically describe an infinite number of
random variables simultaneously, e.g., infinite trials of tossing a die.
48
![Page 61: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/61.jpg)
Random process
• random process Xt(w), t ∈ I :
1. random sequence, random function, or random signal:
Xt : Ω→ the set of all sequences or functions
2. indexed family of infinite number of random variables:
Xt : I → set of all random variables defined on Ω
3. Xt : Ω× I → R4. If t is fixed, then a random process becomes a random variable.
49
![Page 62: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/62.jpg)
Random process
• A random process Xt is completely characterized if the following is
known.
• P((Xt1 , · · · ,Xtk ) ∈ B) for any B, k, and t1, · · · , tk
• Note that given a random process, only ’finite-dimensional’
probabilities or probability functions can be specified.
50
![Page 63: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/63.jpg)
Random process
• For a fixed t ∈ T , Xt(w) is a random variable.
• For a fixed w ∈ Ω, Xt(w) is a deterministic function of t, which is
called a sample path.
• types of random processes
1. discrete-time
2. continuous-time
3. discrete-valued
4. continuous-valued
• Example: Brownian motion
51
![Page 64: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/64.jpg)
Random process
• Moment
• mean function
mX (t) , EXt =
∑
x xpXt (x), discrete-valued∫xfXt (x)dx , continuous-valued
• auto-correlation function, acf
RX (t, s) , EXtXs
• auto-covariance function, acvf
CX (t, s) , E(Xt −mX (t))(Xs −mX (s))
• cross-covariance function, acvf
RXY (t, s) , E(Xt −mX (t))(Ys −mY (s))
52
![Page 65: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/65.jpg)
Random process
• Stationarity
• (strict-sense) stationary, sss
P((Xt1+τ , · · · ,Xtk+τ ) ∈ B) = P((Xt1 , · · · ,Xtk ) ∈ B)
• If Xt is strict-sense stationary,
• mX (t + τ) = mX (t)
• RX (t + τ, s + τ) = RX (t, s)
• CX (t + τ, s + τ) = CX (t, s)
• If Xt is wide-sense stationary,
• mX (t + τ) = mX (t)
• RX (t + τ, s + τ) = RX (t, s)
53
![Page 66: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/66.jpg)
Random process
• If Xt is wide-sense stationary,
• mX (t) = mX
• RX (t, s) = RX (t − s) = RX (τ)
• CX (t, s) = CX (t − s) = CX (τ)
• In (general) Gaussian processes, wss is assumed.
• RX (t, s) corresponds to a kernel function, i.e., k(t, s).
54
![Page 67: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/67.jpg)
Functional analysis
![Page 68: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/68.jpg)
Functional analysis
Figure 3: Mathematical spaces (copyright to Kyungmin Noh).
55
![Page 69: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/69.jpg)
Functional analysis
• Vector space: space with algebraic structures (addition, scalar
multiplication, ...)
• Metric space: space with a metric (distance)
• Normed space: space with a norm (size)
• Inner-product space: space with an inner-product (similarity)
• Hilbert space: complete space
56
![Page 70: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/70.jpg)
Functional analysis
• We will show a bunch to terminologies and theorems.
1. Inner product
2. Hilbert space
3. Kernel
4. Positive definite
5. Eigenfunction and eigenvalue
6. Mercer’s theorem
7. Bochner’s theorem
8. Reproducing kernel Hilbert space (RKHS)
9. Moore-Aronszajn theorem
10. Representer theorem
57
![Page 71: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/71.jpg)
Functional analysis
• Definition (inner product)
• Let H be a vector space over R. A function 〈·, ·〉H : H×H → R isan inner product on H if
1. Linear: 〈α1f1 + α2f2, g〉H2. Symmetric 〈f , g〉H = 〈g , h〉H3. 〈f , f 〉H ≥ 0 and 〈f , f 〉H = 0 if and only if f = 0.
• Note that norm can be naturally defined from the inner product:
‖f ‖H ,√〈f , f 〉H
58
![Page 72: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/72.jpg)
Don’t panic.
58
![Page 73: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/73.jpg)
Functional analysis
• Definition (Hilbert space)
• Inner product space containing Cauchy sequence limits.
⇒ Complete space
⇒ Always possible to fill all the holes.
⇒ R is complete, Q is not complete.
59
![Page 74: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/74.jpg)
Functional analysis
• Definition (Kernel)
• Let X be a non-empty set. A function k : X × X → R is a kernel if
there exists a Hilbert space H and a map φ : X → H such that
∀x , x ′ ∈ X ,
k(x , x ′) , 〈φ(x), φ(x ′)〉H.
• Note that there is almost no condition on X .
60
![Page 75: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/75.jpg)
Functional analysis
• Sum of kernels or product of kernels are also a kernel.
• Kernels can be defined in terms of sequences in φ ∈ l2, i.e.,∑∞i=1 φ
2i (x) ≤ ∞.
• Theorem
• Given a sequence of functions φi (x)i≥1 in l2, where φi : X → R is
the i-th coordinate of φ(x). Then
k(x , x ′) ,∞∑i=1
φi (x)φi (x′).
• This is often used as an intuitive interpretation of a kernel function.
61
![Page 76: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/76.jpg)
Functional analysis
• Let Tk be an operator defined as
(Tk f )(x) =
∫Xk(x , x ′)f (x ′)dµ(x ′)
where µ(·) denotes a measure (dµ(x ′)→ dx ′).
• Tk can be viewed as a mapping between spaces of functions:
Tk : L2(X , µ)→ L2(X , µ).
• Once a kernel k(·, ·) is defined, the mapping Tk is defined
accordingly.
62
![Page 77: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/77.jpg)
Functional analysis
• Definition (positive definite)
• A kernel is said to be positive definite if∫k(x , x ′)f (x)f (x ′)dµ(x)dµ(x ′) ≥ 0
for all f ∈ L2(x , µ).
63
![Page 78: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/78.jpg)
Functional analysis
• Definition (Eigenfunction and eigenvalue)
• Given a kernel function k(·, ·) and∫k(x , x ′)φ(x)dµ(x) = λφ(x ′).
Then, φ(x) and λ are eigenfunction and eigenvalue of a kernel k(·, ·).
64
![Page 79: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/79.jpg)
Functional analysis
• Theorem (Mercer)
• Let (X , µ) be a finite measurable space and k ∈ L∞(X 2, µ2) be a
kernel such that Tk : L2(X , µ)→ L2(X , µ) is positive definite.• Let φi ∈ L2(X , µ) be the normalized eigenfunctions of Tk associated
with the eigenvalues λi > 0. Then:
1. The eigenvalues λi∞i=1| are absolutely summable.
2.
k(x , x ′) =∞∑i=1
λiφi (x)φi (x′)
holds µ2 almost everywhere, where the series converges absolutely
and uniformly µ2 almost everywhere.
• Absolutely summable is more important than it seems.
• SB: Mercer’s theorem can be interpreted as an infinite dimensional
SVD.
65
![Page 80: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/80.jpg)
Functional analysis
• Theorem (Kernels are positive definite)
• Let H be a Hilbert space, X be a non-empty set, and φ : X → H.
Then 〈φ(x), φ(x ′)〉H is positive definite.
• Reverse also holds:
Positive definite k(x , x ′) is an inner-product in H between φ(x) and
φ(x ′).
66
![Page 81: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/81.jpg)
Functional analysis
• Theorem (Bochner)
• Let f be a bounded continuous function on Rd . Then f is positive
semidefinite iff. it is the (inverse) Fourier transform of a nonnegative
and finite Borel measure mu, i.e.,
f (x) =
∫Rd
e iwT xµ(dw).
• What does this mean?
67
![Page 82: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/82.jpg)
Functional analysis
• Corollary (Bochner)
• If we have an isotropic kernel function function, i.e.,
k(x , x ′) = kI (t = |x − x ′|),
showing the non-negativeness of a Fourier series of kI (t) is
equivalent to showing the positive definiteness of k(x , x ′).
• Example:
k(x , x ′) = cos(π
2|x − x ′|
).
68
![Page 83: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/83.jpg)
Functional analysis
• Definition (reproducing kernel Hilbert space)
• Let H be a Hilbert space of R-valued functions on X . A functionk : X × X → R is a reproducing kernel on H, and H is areproducing kernel Hilbert space if
1. ∀x ∈ Xk(·, x) ∈ H
2. ∀x ∈ X , ∀f ∈ H〈f (·), k(·, x)〉H = f (x) (reproducing property)
3. ∀x , x ′ ∈ Xk(x , x ′) = 〈k(·, x), k(·, x ′)〉H
• What does this indicates?
69
![Page 84: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/84.jpg)
Functional analysis
• Suppose we have a RKHS H, f (·) ∈ H, and k(·, x) ∈ H.
• Then the reproducing property indicates that evaluation of f (·) at x ,
i.e., f (x) is the inner-product of k(·, x) and f (·) itself, i.e.,
f (x) = 〈f , k(·, x)〉H.
• Recall Mercer’s theorem k(x , x ′) =∑∞
i=1 λiφi (x)φi (x′). Then,
f (x) =
⟨f ,∞∑i=1
λiφi (·)φi (x)
⟩H
=∞∑i=1
λi 〈f , φi (·)〉H φi (x)
=∞∑i=1
λiφi (x)
where λi = λi 〈f , φi (·)〉H.
70
![Page 85: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/85.jpg)
Functional analysis
• Theorem (Moore-Aronszajn)
• Let X be a non-empty set. Then, for every positive-definite function
k(·, ·) on X × X , there exists a unique RKHS and vice versa.
• This indicates:
reproducing kernels ⇔ positive definite function ⇔ RKHS
71
![Page 86: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/86.jpg)
Functional analysis
• Definition (another view of RKHS)
• Consider the space of function H defined as
H = f (x) =n∑
i=1
αik(x , xi ) : n ∈ N, xi ∈ X , αi ∈ R.
• Let g(x) =∑n′
j=1 α′jk(x , x ′j ), then we define the inner-product
〈f , h〉H =n∑
i=1
n′∑j=1
αiα′jk(xi , x
′j )
• We can easily demonstrate the reproducing property:
〈k(·, x), f (·)〉H = 〈k(·, x),n∑
i=1
αik(·, xi )〉H
=n∑
i=1
αik(x , xi )
= f (x).
72
![Page 87: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/87.jpg)
Functional analysis
• Theorem (Representer)
• Let X be a nonempty set and k(·, ·) be a positive definite kernel with
corresponding RKHS Hk . Given training samples
D = (x1, y1), . . . , (xn, yn) ∈ X ×R, a strictly monotonically increasing
real-valued function g : [0,∞)→ R, and an arbitrary empirical risk
function E : (X × R2)n → R ∪ ∞, then for any f ∗ ∈ Hk satisfying
f ∗ = arg minf∈Hk
E(D) + g(‖f ‖)
f ∗ admits a representation of the form:
f ∗(·) =n∑
i=1
αik(·, xi )
where αi ∈ R.
73
![Page 88: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/88.jpg)
Functional analysis
• Example
1. Given D = (xi , yi )ni=1, solve
minf∈Hk
1
2
n∑i=1
(f (xi )− yi )2 + γ‖f ‖2
H. (1)
2. From the representer theorem, solving (1) becomes:
minα∈Rn
1
2
n∑i=1
(n∑
j=1
αjk(xi , xj)− yi
)2
+ γ‖f ‖2H. (2)
3. Represent (2) with a matrix form:
minα∈Rn
1
2‖KXXα− Y ‖2
2 + γαTKXXα. (3)
4. ∇α(3) = KXX (KXXα− Y ) + γKXXα = 0
5. Finally, α = (KXX + γI )−1Y where f (x) =∑n
i=1 αi (x , xi ).
• Note that the form of this solution is identical to the mean function
of Gaussian process regression.
74
![Page 89: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/89.jpg)
Questions?
74
![Page 90: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/90.jpg)
Text book
![Page 91: Bayesian Deep Neural Networks...Measure theory properties of ˙- eld B 1. U 2B(entire set is included.) 2. B i 2B)\1 =1 B i 2B(closed under countable intersection) 3.2U is a ˙- eld](https://reader036.vdocuments.us/reader036/viewer/2022071513/6134dc2fdfd10f4dd73c004e/html5/thumbnails/91.jpg)
References i