![Page 1: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/1.jpg)
Empirical ModelingEmpirical Modeling
Dongsup Kim
Department of Biosystems, KAISTFall, 2004
![Page 2: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/2.jpg)
Empirical modelingEmpirical modeling Moore’s law: Gordon Moore made his famous observation in 1965, just
four years after the first planar integrated circuit was discovered. The press called it "Moore's Law" and the name has stuck. In his original paper, Moore observed an exponential growth in the number of transistors per integrated circuit and predicted that this trend would continue.
From http://www.intel.com/research/silicon/mooreslaw.htm
![Page 3: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/3.jpg)
Covariance and correlationCovariance and correlation Consider n pairs of measurements on each of variables
x and y.
A measure of linear association between the measurements of variable x and y is “sample covariance”
– If sxy > 0: positively correlated
– If sxy < 0: negatively correlated
– If sxy = 0: uncorrelated
Sample linear correlation coefficient (“Pearson’s product moment correlation coefficient”)
n
n
y
x
y
x
y
x ,..., ,
2
2
1
1
n
iiixy yyxx
ns
1
))((1
n
ii
n
ii
n
iii
yx
xyxy
yyxx
yyxx
ss
sr
1
2
1
2
1
)()(
))((
11 xyr
![Page 4: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/4.jpg)
CorrelationCorrelation
201
8316
9021
5911
436
363
7213
649
578
303
Y (Salary in $1000s)
X (Years Experience)
972.0
979.22
315.6
4.55
1.9
,
YX
Y
X
r
Y
X
Strong relationship
![Page 5: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/5.jpg)
Covariance & Correlation Covariance & Correlation matrix matrix
Given n measurements on p variables, the sample covariance is
and the covariance matrix,
The sample correlation coefficient for the ith and jth variables,
and the correlation matrix,
pppp
p
p
sss
sss
sss
21
22221
11211
S
1
1
1
21
221
112
pp
p
p
rr
rr
rr
R
pjpixxxxn
sn
kjjkiikij ,...,2,1 ,,...,2,1 ,))((
1
1
n
kjjk
n
kiik
n
kjjkiik
jjii
ijij
xxxx
xxxx
ss
sr
1
2
1
2
1
)()(
))((
![Page 6: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/6.jpg)
In two dimensionIn two dimension
![Page 7: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/7.jpg)
Fitting a line to dataFitting a line to data When the correlation coefficient is large, it indicates a
dependence of one variable on the other. The simplest relationship is the straight line: y = 0+ 1x
Criteria for a best fit line: least squares The resulting equation is called “regression equation”,
and its graph is called the “regression line”. The sum of squares of the error SS:
Least square equations:
n
iii
n
iii xyyySS
1
210
1
2 )]ˆˆ([)ˆ(
xys
s
xnxn
yxyxn
SSSS
x
xy
n
ii
n
ii
n
ii
n
ii
n
iii
10
1
2
1
2
1111
10
ˆˆ ,)(
ˆ
0ˆ
,0ˆ
![Page 8: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/8.jpg)
A measure of fitA measure of fit Suppose we have data points (xi, yi) and modeled (or
predicted) points (xi, ŷi) from the model ŷ = f(x).
Data {yi} have two types of variations; (i) variation explained by the model and (ii) variation not explained by the model.
Residual sum of squares: variation not explained by the model
Regression sum of squares: variation explained by the model
The coefficient of determination R2
n
iiis yySS
1
2Re )ˆ(
n
iig yySS
1
2Re )ˆ(
x1 x2
y1
y2
y
Total variation in y = Variation explained by the model + Unexplained variation (error)
sg
g
SSSS
SSR
ReRe
Re2
![Page 9: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/9.jpg)
Principal Component Analysis Principal Component Analysis (PCA)(PCA)
PCA selects a new set of axes for the data by moving and rotating the coordinate system in such a way that the dependency between the two variables is removed in a new transformed coordinate system.
First principal axis points to the direction of the maximum variation in the data.
Second principal axis is orthogonal to the first one and is in the direction of the maximum variation in the remaining allowable directions, and so on.
It can be used to:– Reduce number of dimensions in
data.– Find patterns in high-dimensional
data.– Visualize data of high
dimensionality.
![Page 10: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/10.jpg)
PCA, IIPCA, II Assume X is an n p matrix and is “centered (zero
mean)” Let a be the p 1 column vector of projection weights
(unknown at this point) that result in the largest variance when the data X are projected along a.
We can express the projected values onto a of all data vectors in X as Xa.
Now define the variance along a as
We wish to maximize the variance under the constraint that aTa=1 optimization with constraints method of Lagrange multipliers
XaXa(Xa)(Xa) TTT as
aSa
aSaa
aaSaa TT
022
1
u
)λ(u
![Page 11: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/11.jpg)
Example, 2D Example, 2D Covariance matrix,
Decomposition,
PCA,
2049.19596.0
9596.09701.0S
66.0
74.0,12.0
74.0
66.0,0592.2
12
11
v
v
From CMU 15-385 Computer Vision by Tai Sing Lee
yxyxz 66.074.0z ,74.066.0 21
![Page 12: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/12.jpg)
PCA, IIIPCA, III If S = {sik} is the pp sample covariance matrix with
eigenvalue, eigenvectors pairs (1, v1), (2, v2),…, (p, vp), the ith principal component is given by
where 1 2 … p 0 and x is the p-dimensional vector formed by the random variables x1, x2,…, xp.
Also
pixvyp
jjij
Tii ,...,2,1 ,
1
xv
p
k
p
ii
p
iii
ji
ii
k
s
jiyy
piy
...component principal
thby variance
totalof proportion
varianceTotal
0],cov[
,...,2,1 ,]var[
21
11
![Page 13: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/13.jpg)
ApplicationsApplications Dimensional reduction Image compression pattern recognition Gene expression data analysis Molecular dynamics simulation …
![Page 14: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/14.jpg)
Dimensional reductionDimensional reduction
We can throw v3 away, and keep w=[v1 v2] and can still represent the information almost equally well.
v1 and v2 also provide good dimensions in which different objects/textures form nice clusters in this 2D space.
From CMU 15-385 Computer Vision by Tai Sing Lee
![Page 15: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/15.jpg)
Image compression, IImage compression, I A set of N images, I1, I2,…, IN, each of which has n pixels.
– Dataset of N dimensions and n observations– Corresponding pixels form vectors of intensities
Expand each of them as a series,
where the optimal set of basis vectors are chosen to minimize the reconstruction error,
Principle components of the set form the optimal basis.– PCA produces N eigenvectors and eigenvalues.– Compress: choose limited number (k<N) of components– Information loss when recreating original data
N
jjiji vcI
1
NkvcIerrorN
i
k
jjiji
where)(1
2
1
![Page 16: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/16.jpg)
Image compression, IIImage compression, II Given a large set of 8x8 image
patches, convert an image patch into a vector by stacking the columns together into one column vector.
Compute the covariance matrix Transforming into a set of new bases
by PCA. Since the eigenvalues in S drops
rapidly, we can represent the image more efficiently in this new coordinate with the eigenvectors (principle components) v1,...vk where k << 64 as bases (k10).
Then I = a1v1 + a2v2 + ...+ akvk
The idea is that now you only store 10 code words, each is a 8x8 image basis, then you can transmit the image with only 10 numbers instead of 64 numbers. From CMU 15-385 Computer Vision by Tai Sing Lee
![Page 17: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/17.jpg)
ApplicationsApplications Representation
– N x N pixel image X=(x1 ... xN2)
– xi is intensity value
PCA for Pattern identification– Perform PCA on matrix of M images– If new image Which original image is most similar?– Traditionally: difference original image and new image– PCA: difference PCA data and new image– Advantage: PCA data reflects similarities and differences in image
data– Omitted dimensions still good performance
PCA for image compression– M images, each containing N2 pixels– Dataset of M dimensions and N2 observations– Corresponding pixels form vectors of intensities– PCA produces M eigenvectors and eigenvalues– Compress: choose limited number of components– Information loss when recreating original data
![Page 18: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/18.jpg)
Interpolation & ExtrapolationInterpolation & Extrapolation Numerical Recipes, Chapter 3 Consider n pairs of data of variables x and y,
and we don’t know an analytic expression for y=f(x). The task is to estimate f(x) for arbitrary x by drawing a smooth
curve through xi’s.
– Interpolation: if x is in between the largest and smallest of xi’s.
– Extrapolation: if x is outside of the range (more dangerous, example: stock market)
Methods– Polynomials, rational functions– Trigonometric interpolation: Fourier methods– Spline fit.
Order: the number of points (minus one) used in an interpolation
– Increasing order does not necessarilyincrease the accuracy.
n
n
y
x
y
x
y
x ,..., ,
2
2
1
1
![Page 19: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/19.jpg)
Polynomial interpolation, IPolynomial interpolation, I Straight line interpolation
– Given two points (x1, y1) and (x2, y2), use a straight line to join two points to find all the missing values in between
Lagrange interpolation– First order
– Second order polynomials:
12
1121 )()(
xx
xxyyyxPy
2
12
11
21
22 )( y
xx
xxy
xx
xxxPy
3
2313
212
3212
311
3121
323
))((
))((
))((
))((
))((
))(()( y
xxxx
xxxxy
xxxx
xxxxy
xxxx
xxxxxPy
![Page 20: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/20.jpg)
Polynomial interpolation, IPolynomial interpolation, I In general, the interpolating polynomial of degree N-1
through the N points y1=f(x1), y2=f(x2), …, yN=f(xN) is
N
NNNN
N
N
N
N
NN
yxxxxxx
xxxxxx
yxxxxxx
xxxxxx
yxxxxxx
xxxxxxxPy
)())((
)())((
)())((
)())((
)())((
)())(()(
121
121
2
23212
31
1
13121
32
![Page 21: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/21.jpg)
Example, IExample, I
The values are evaluated.P(x) = 9.2983*(x-1.7)(x-3.0)
- 19.4872*(x-1.1)(x-3.0) + 8.2186*(x-1.1)(x-1.7)
P(2.3) = 9.2983*(2.3-1.7)(2.3-3.0) - 19.4872*(2.3-1.1)(2.3-3.0) + 8.2186*(2.3-1.1)(2.3-1.7)
= 18.3813
x y1.1 10.61.7 15.23 20.3
2186.87.10.31.10.3
3.20
4872.190.37.11.17.1
2.15
298.90.31.17.11.1
6.10
3
2
3121
11
C
C
xxxx
yC
Lagrange Interpolation
0
5
10
15
20
25
1 1.5 2 2.5 3
x values
y va
lues
![Page 22: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/22.jpg)
Example, IIExample, II What happens if we increase
the number of data points? Coefficient for 2 is
Note: that the coefficient creates a P4(x) polynomial and comparison between the two curves. The original value P2(x) is given.
The problem with adding additional points will create “bulges” in the graph.
x y Ci
1.1 10.6 28.17651.7 15.2 129.91453 20.3 6.42081.4 13.4 -116.3192.2 18.7 -53.125
52423212
22
xxxxxxxx
yC
Lagrange Interpolation
0
5
10
15
20
25
1 1.5 2 2.5 3
X Values
Y V
alue
s
![Page 23: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/23.jpg)
Rational Function InterpolationRational Function Interpolation
222
23
112
13
icxbxax
cxbxaxxP
22
22
311
2
jcxbxax
cxbxxP
x y-1 0.0385-0.5 0.13790 10.5 0.13791 0.0385
![Page 24: Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e195503460f94b05806/html5/thumbnails/24.jpg)
Cubic Spline InterpolationCubic Spline Interpolation Cubic Spline interpolation use only the data points used
to maintaining the desired smoothness of the function and is piecewise continuous.
Given a function f defined on [a, b] and a set of nodes a=x0<x1<…<xn=b, a cubic spline interpolation S for f is
– S(x) is a cubic polynomial, denoted Sj(x), on the subinterval [xj, xj+1] for each j=0, 1, …, n-1;
– Sj(xj) = f(xj) for j = 0, 1, …, n;
– Sj+1(xj+1) = Sj(xj+1) for j = 0, 1, …, n-2;
– S’j+1(xj+1) = S’j(xj+1) for j = 0, 1, …, n-2;
– S’’j+1(xj+1) = S’’j(xj+1) for j = 0, 1, …, n-2;
– Boundary conditions: S’’(a)= S’’(b)= 0
3jj2
jjjjj xxdxxcxxbaxS