functional data analysis: introduction iii functional data ... · functional data indicate that the...
TRANSCRIPT
Functional Data Analysis: Introduction I
Definition (functional data)
Functional data indicate that the collected (observed ) data are observations
of functions varying over a continuum.
Functional Data Analysis , J.O. Ramsay and B. W. Silverman, Springer 2006.
148
Functional Data Analysis: Introduction II
Functional Data Analysis (FDA) aims at:
representing data in ways that help further analysis.
to display the data so as to highlight various characteristics
studying important sources of patterns and variations among the data
149
Functional Data Analysis: Introduction III
Example: Height of Girls
150
Functional Data Analysis: Introduction IV
10 subjects took part in the study.
each record has 31 discrete values of height that are not equally spaced.
there are uncertainty or noise in the collected height values (about 3mm).
the underlying process can be assumed to be a smooth function µj(t) for
each subject j taking part in the study and observations have been
collected for the stochastic process
sj(t) = µj(t) + ǫj(t)
151
Functional Data Analysis: Introduction V
For each record j, it is interesting to calculate an estimate of the function
µj(t) (Linear Smoothing methods).
When having a family of functions {µj(t)}j=1,··· ,J , it is interesting to
investigate the variations of this family of functions (functional Principal
Component Analysis).
For simplicity here, we are considering temporal stochastic processes. All
methods can easily be extended to spatio-temporal stochastic processes.
152
Linear Smoothing: Introduction I
We consider that we have observations {(t(i), s(i))}i=1,··· ,n
Definition (Linear smoothing)
A linear smoother estimates the function value s(t) by a linear combination of
the discrete observations:
s(t) =n∑
i=1
λi(t) s(i) =< ~λ(t)|~s >
The behaviour of the smoother at t is controlled by the weights λi(t).
153
Linear Smoothing: Introduction II
Example of linear smoothing methods:
1 Kriging X
2 Regression on basis of functions
3 Nadaraya-Watson estimator
Note that these methods have different hypotheses concerning the nature of
the noise ǫ.
154
Linear Smoothing: Regression on basis of functions I
A model can be proposed for the deterministic part of the process as
follow:
µ(t) =K∑
k=0
θk φk(t) =< ~φ(t)|~θ >
The standard model for the error ǫ is to be N (0, σ2) and independent on
time (or E[s(t)] = µ(t) and Var[s(t)] = σ2). Observations collected are
{(t(i), s(i))}i=1,··· ,n
155
Linear Smoothing: Regression on basis of functions II
The coefficients {θk}k=0,··· ,K are estimated using least squares:
~θ = (ΦTΦ)−1ΦT ~s
where ~s = [s(i), s(2), · · · , s(n)]T , Φ is the n× (K + 1) matrix collecting the
values {φk(t(i))}. So the estimate for E(s(t)) is:
s(t) =< ~φ(t)|~θ >= ~φ(t)T (ΦTΦ)−1ΦT
︸ ︷︷ ︸
~S(t)T
~s
Exercise: What are the basis functions {φk}k=0,··· ,K that can be used?
156
Linear Smoothing: Regression on basis of functions III
Example: Vancouver precipitation data with 13 Fourier Bases.
157
Linear Smoothing: Regression on basis of functions IV
Choice of Basis Expansions.
When performing least squares fitting of basis expansions φ0(t), φ1(t), · · · is a
basis system for s. The choice of this basis system is important in particular if
you want to explore time derivatives of s(t).
158
Linear Smoothing: Regression on basis of functions V
Monomial Basis φ0(t) = 1, φ1(t) = t, φ2(t) = t2, · · · , φK(t) = tK
Properties
◮ difficult for K > 6
◮ Derivatives of φk(t) get simpler but this is often not a desirable property
when fitting real-world data.
159
Linear Smoothing: Regression on basis of functions VI
K = 0 K = 1
K = 2 K = 3 160
Linear Smoothing: Regression on basis of functions
VII
Fourier Basis
{1, sin(ωt), cos(ωt), sin(2ωt), cos(2ωt), sin(3ωt), cos(3ωt), · · · , sin(Kωt), cos(Kωt)}
Properties
◮ natural basis for periodic data
◮ Derivatives retain complexities.
161
Linear Smoothing: Regression on basis of functions
VIII
K = 3 K = 4
K = 5 K = 6 162
Linear Smoothing: Regression on basis of functions IX
Splines. Splines are polynomial segments joined end-to-end. They are
constrained to be smooth at the joints (called knots). The order (order =
degree+1) of the polynomial segments and the location of the knots
define the system.
163
Linear Smoothing: Regression on basis of functions X
K = 1 K = 2
K = 3 K = 4
164
Linear Smoothing: Regression on basis of functions XI
Other basis:
wavelet
...
How to choose the number K of functions?
Information Criterion: AIC, BIC.
Cross-Validation
165
Linear Smoothing: Nadaraya-Watson estimator I
The model is s(t) = µ(t) + ǫ(t) with E(ǫ(t)) = 0.
The estimate s(t) is computed as an expectation of s at time t:
E[s|t] = E[s(t)] = E[µ(t) + ǫ(t)] = µ(t)
By definition of expectation:
E[s|t] =
∫
s ps|t(s|t) ds =
∫s pst(s, t) ds
pt(t)
Note how Bayes formula is used.
166
Linear Smoothing: Nadaraya-Watson estimator II
Since we dont know the true densities pst(s, t) and pt(t), the idea of the
Nadaraya-Watson estimator is to replace them by density estimates such
that the integral is easy to compute and the solution provides a smooth
estimate of s(t).
Consequently the Nadaraya-Watson estimator can be written:
E[s(t)] ≃
∫s pst(s, t) ds
pt(t)=
∑ni=1
1hk(
t−t(i)
h
)
s(i)
∑ni=1
1hk(
t−t(i)
h
) =
n∑
i=1
Si(t) s(i) = s(t)
167
Linear Smoothing: Nadaraya-Watson estimator III
Choosing the kernel k as the Dirac function is equivalent as using the
empirical density estimates for pst(s, t) and pt(t).
The resulting estimate for s(t) would not be smooth with the Dirac kernel
and other kernels can be used (e.g. Gaussian, uniform, quadratic,
triangle, Epanechnikov, etc.).
168
Linear Smoothing: Nadaraya-Watson estimator IV
Definition (Kernel Density Estimate)
Consider a random variable x for which observations {x(i)}i=1,··· ,n have been
collected. Choosing a even function k such that
∫
k(x) dx = 1 and k(x) ≥ 0 ∀x
the a Kernel Density Estimate (KDE) for the density function px is:
px(x) =1
nh
n∑
i=1
k
(x− x(i)
h
)
h is called the bandwidth and is a parameter controlling the level of smoothing.
169
Linear Smoothing: Conclusion and Extensions I
Conclusion:
Nadaraya Watson estimator: the bandwidth needs to be
chosen/estimated.
Regression on basis of functions: {θk} needs to be estimated, the basis
{φk(t)} needs to be chosen, and the number K of functions needs to be
selected.
170