university of copenhagendepartment of mathematical...
Post on 29-Sep-2020
0 Views
Preview:
TRANSCRIPT
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Expectation – continuous variables
Definition
If X is a real valued random variable with density f then
EX =
∫ ∞−∞
xf (x)dx
denotes its expectation provided that∫ ∞−∞|x |f (x)dx <∞,
in which case we say that X has finite expectation.
Slide 1/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Expectation – discrete variables
Definition
If X is a discrete random variable taking values in E ⊆ R withpoint probabilities (p(x))x∈E then
EX =∑x∈E
xp(x)
denotes its expectation provided that∑x∈E
|x |p(x) <∞.
in which case we say that X has finite expectation.
Slide 2/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Computational rule
Theorem
If h : Rn → R is a real valued function, if the distribution of X hasdensity f : Rn → [0,∞) and if h(X ) has finite expectation then
Eh(X ) =
∫h(x)f (x)dx
=
∫ ∞−∞· · ·∫ ∞−∞︸ ︷︷ ︸
n
h(x1, . . . , xn)f (x1, . . . , xn)dx1 · · · dxn.
Slide 3/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Expectation of the sum
Take h : R2 → Rh(x1, x2) = x1 + x2
then
E(X1 + X2) =
∫ ∫(x1 + x2)f (x1, x2)dx1dx2
=
∫ ∫x1f (x1, x2)dx2dx1 +
∫ ∫x2f (x1, x2)dx1dx2
= EX1 + EX2.
since the marginal distributions of X1 and X2 have densities
f1(x1) =
∫f (x1, x2)dx2 f2(x2) =
∫f (x1, x2)dx1.
Slide 4/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
The general result
Theorem
If X and Y are two real valued random variables with finiteexpectation then X + Y has finite expectation and
E(X + Y ) = EX + EY .
Furthermore, if c ∈ R is a real valued constant then cX has finiteexpectation and
E(cX ) = cEX .
Moreover, if X and Y are independent real valued randomvariables with finite expectation then
E(XY ) = EX EY .
Slide 5/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Example – binomial distribution
If X is a Bernoulli random variable
EX = 1× P(X = 1) + 0× P(X = 0) = p.
If X1, . . . ,Xn are iid Bernoulli variables with success probability pthen
X = X1 + . . .+ Xn ∼ Bin(n, p).
AndEX = EX1 + . . .+ EXn = p + . . .+ p = np.
Slide 6/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Example – word counts
X1, . . . ,Xn are iid random variables with values in the alphabetE = {a, c, g, t}, w = w1w2 . . .wm denotes a word from thealphabet.
N =n−m+1∑
i=1
1(XiXi+1 . . .Xi+m−1 = w)
is the number of occurrences of the word w .
The expectation is
EN =n−m+1∑
i=1
E1(XiXi+1 . . .Xi+m−1 = w)
Slide 7/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Example - word counts
The variable 1(XiXi+1 . . .Xi+m−1 = w) is a Bernoulli variable,hence
EN =n−m+1∑
i=1
E1(XiXi+1 . . .Xi+m−1 = w)
=n−m+1∑
i=1
P(XiXi+1 . . .Xi+m−1 = w)
=n−m+1∑
i=1
P(Xi = w1)P(Xi+1 = w2) . . .P(Xi+m−1 = wm)
= (n −m + 1)p(w1)p(w2) . . . p(wm)
= (n −m + 1)p(a)nw (a)p(c)nw (c)p(g)nw (g)p(t)nw (t).
Slide 8/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Variance
Definition
If X is a real valued random variable with finite second momentthe variance of X is
VX = E(X − EX )2
and the standard deviation is defined as√
VX .
Since (X − EX )2 = X 2 − 2XEX + (EX )2 we find that
VX = EX 2 − 2EXEX + (EX )2 = EX 2 − (EX )2
The variance (or rather the standard deviation) is a naturalmeasure of how spread out the distribution of X is.
Slide 9/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Bernoulli random variables
If X is a Bernoulli random variable with success probability p weknow that EX = p. We can compute that variance as follows
VX = E(X − p)2 = (1− p)2P(X = 1) + p2P(X = 0)
= (1− p)2p + p2(1− p) = (1− p)p(1− p + p) = (1− p)p.
In the alternative, we can compute
EX 2 = P(X = 1) = p.
HenceVX = EX 2 − (EX )2 = p − p2 = (1− p)p.
Slide 10/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Empirical distribution
If x1, . . . , xn are observations from a sample space E we know that
εn(A) =1
n
n∑i=1
1(xn ∈ A)
denotes the relative frequency of occurrences of event A ⊆ E . Asa function of A this is the empirical probability measure given bythe data set x1, . . . , xn.
If we think of the data as realizations of iid random variables withdistribution P on E we think of εn as the – non-parametric –estimate of P.
Slide 11/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Representation of the empirical distributionLet U denote a random variable that is uniformly distributed onthe finite set {1, . . . , n} and define the transformation
hx1,...,xn : {1, . . . , n} → E
byhx1,...,xn(i) = xi .
Then
P(h(U) ∈ A) =∑
i :h(i)=xi∈A
P(U = i)︸ ︷︷ ︸= 1
n
=1
n
n∑i=1
1(xi ∈ A) = εn(A)
This gives a useful representation of the empirical probabilitymeasure as a transformation of a uniformly distributed randomvariable on the indices – not least for the simulation from theempirical probability measure, which we will use later for theso-called bootstrapping procedure.
Slide 12/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Plug-in for the mean and varianceFor observations x1, . . . , xn from iid real valued variables, εn is anestimate of P – making no restrictions on the possible choices ofP – hence with reference to the plug-in principle the mean andvariance under εn are natural estimates of the unknown mean andvariance under P.
We compute the empirical mean
µ̂n = EεnX = Ehx1,...,xn(U) =n∑
i=1
xiP(U = i) =1
n
n∑i=1
xi ,
where U has the uniform distribution on {1, . . . , n} and theempirical variance
σ̃2n = Eεn(X − µ̂n)2 =
n∑i=1
(xi − µ̂n)2P(U = i) =1
n
n∑i=1
(xi − µ̂n)2.
Slide 13/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
NormalizationIf X is a real valued random variable with mean 0 and variance 1we find for µ ∈ R and σ > 0 that
E(σX + µ) = σEX + µ = µ
and
V(σX + µ) = E(σX + µ− µ)2 = σ2EX 2 = σ2VX = σ2.
In the other direction, if X has mean µ and variance σ2 then
X − µσ
has mean 0 and variance 1. This is in particular the case under theempirical measure, hence normalizing a data set with its empiricalmean and empirical standard deviation gives a data set withempirical mean 0 and empirical variance 1.
Slide 14/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Covariance
Definition
If XY has finite expectation the covariance of the randomvariables X and Y is
V(X ,Y ) = E ((X − EX )(Y − EY ))
The correlation is
corr(X ,Y ) =V(X ,Y )√
VXVY.
The covariance is a centered measure of co-variation – dependingon the marginal scales. The correlation is a unitless measure ofco-variation that does not depend on the marginal scales.
Slide 15/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Covariance rules
The covariance is symmetric:
V(X ,Y ) = V(Y ,X ).
If X = Y then
V(X ,X ) = E(X − EX )2 = VX .
An alternative formula is
V(X ,Y ) = E(XY )− EXEY
Thus if X and Y are independent then
V(X ,Y ) = 0.
Slide 16/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Covariance rules
Theorem
If X and Y are two random variables with finite variance then thesum X + Y has finite variance and
V(X + Y ) = VX + VY + 2V(X ,Y ).
If X is a random variable with finite variance and c ∈ R is aconstant then cX has finite variance and
V(cX ) = c2VX .
Slide 17/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Empirical versions
Just as for the mean and variance we have, with x1, . . . , xn a dataset where xl = (x1l , . . . , xkl) ∈ Rk , the empirical covariance
σ̃2ij ,n = Vεn(Xi ,Xj) = Eεn(Xi − EεnXi )(Xj − EεnXj)
=1
n
n∑l=1
(xil − µ̂i ,n)(xjl − µ̂j ,n)
And the empirical correlation
c̃orrij ,n =σ̃2
ij ,n
σ̃i ,nσ̃j ,n=
∑nl=1(xil − µ̂i ,n)(xjl − µ̂j ,n)√∑n
l=1(xil − µ̂i ,n)2∑n
l=1(xjl − µ̂j ,n)2.
Slide 18/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Properties of the empirical measureWith X1, . . . ,Xn independent and identically distributed withdistribution P on E and εn is the corresponding empiricalprobability measure,
εn(A) =1
n
n∑k=1
1(Xk ∈ A), A ⊆ E ,
we regard εn(A) as a real valued random variable – for fixed A ⊆ E .Then it holds that
Eεn(A) = P(A),
Vεn(A) =1
nP(A)(1− P(A)).
and (not in notes)
V(εn(A), εn(B)) =P(A ∩ B)− P(A)P(B)
n.
Slide 19/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Properties of the empirical mean and variance
Theorem
Considering the empirical mean µ̂n and the empirical variance σ̃2n
as estimators of the mean and variance respectively we have
Eµ̂n = EX and Vµ̂n =1
nVX
together with
Eσ̃2n =
n − 1
nVX . (1)
Due to (1) the estimator σ̃2n systematically undershoots the true
variance, and we prefer the estimator
σ̂2n =
n
n − 1σ̃2
n =1
n − 1
n∑i=1
(xi − µ̂n)2,
which has Eσ̂2n = σ2.
Slide 20/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s
Properties of the empirical mean and variance
It is possible, but no fun, to compute the variance of σ̃2n or σ̂2
n,however, we can immediately see that
V(σ̂2n) =
(n
n − 1
)2
V(σ̃2n).
Hence the variance of σ̂2n is larger than the variance of σ̃2
n.
Slide 21/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010
top related