university of copenhagendepartment of mathematical...

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Expectation – continuous variables

Definition

If X is a real valued random variable with density f then

∫ ∞−∞

xf (x)dx

denotes its expectation provided that∫ ∞−∞|x |f (x)dx <∞,

in which case we say that X has finite expectation.

/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Expectation – discrete variables

Definition

If X is a discrete random variable taking values in E ⊆ R withpoint probabilities (p(x))x∈E then

EX =∑x∈E

denotes its expectation provided that∑x∈E

|x |p(x) <∞.

in which case we say that X has finite expectation.

Computational rule

Theorem

If h : Rn → R is a real valued function, if the distribution of X hasdensity f : Rn → [0,∞) and if h(X ) has finite expectation then

Eh(X ) =

∫h(x)f (x)dx

∫ ∞−∞· · ·∫ ∞−∞︸︷︷︸

h(x1, . . . , xn)f (x1, . . . , xn)dx1 · · · dxn.

Expectation of the sum

Take h : R2 → Rh(x1, x2) = x1 + x2

E(X1 + X2) =

∫ ∫(x1 + x2)f (x1, x2)dx1dx2

∫ ∫x1f (x1, x2)dx2dx1 +

∫ ∫x2f (x1, x2)dx1dx2

= EX1 + EX2.

since the marginal distributions of X1 and X2 have densities

f1(x1) =

∫f (x1, x2)dx2 f2(x2) =

∫f (x1, x2)dx1.

The general result

Theorem

If X and Y are two real valued random variables with finiteexpectation then X + Y has finite expectation and

E(X + Y ) = EX + EY .

Furthermore, if c ∈ R is a real valued constant then cX has finiteexpectation and

E(cX ) = cEX .

Moreover, if X and Y are independent real valued randomvariables with finite expectation then

E(XY ) = EX EY .

Example – binomial distribution

If X is a Bernoulli random variable

EX = 1× P(X = 1) + 0× P(X = 0) = p.

If X1, . . . ,Xn are iid Bernoulli variables with success probability pthen

X = X1 + . . .+ Xn ∼ Bin(n, p).

AndEX = EX1 + . . .+ EXn = p + . . .+ p = np.

Example – word counts

X1, . . . ,Xn are iid random variables with values in the alphabetE = {a, c, g, t}, w = w1w2 . . .wm denotes a word from thealphabet.

N =n−m+1∑

1(XiXi+1 . . .Xi+m−1 = w)

is the number of occurrences of the word w .

The expectation is

EN =n−m+1∑

E1(XiXi+1 . . .Xi+m−1 = w)

Example - word counts

The variable 1(XiXi+1 . . .Xi+m−1 = w) is a Bernoulli variable,hence

EN =n−m+1∑

E1(XiXi+1 . . .Xi+m−1 = w)

=n−m+1∑

P(XiXi+1 . . .Xi+m−1 = w)

=n−m+1∑

P(Xi = w1)P(Xi+1 = w2) . . .P(Xi+m−1 = wm)

= (n −m + 1)p(w1)p(w2) . . . p(wm)

= (n −m + 1)p(a)nw (a)p(c)nw (c)p(g)nw (g)p(t)nw (t).

Variance

Definition

If X is a real valued random variable with finite second momentthe variance of X is

VX = E(X − EX )2

and the standard deviation is defined as√

Since (X − EX )2 = X 2 − 2XEX + (EX )2 we find that

VX = EX 2 − 2EXEX + (EX )2 = EX 2 − (EX )2

The variance (or rather the standard deviation) is a naturalmeasure of how spread out the distribution of X is.

Bernoulli random variables

If X is a Bernoulli random variable with success probability p weknow that EX = p. We can compute that variance as follows

VX = E(X − p)2 = (1− p)2P(X = 1) + p2P(X = 0)

= (1− p)2p + p2(1− p) = (1− p)p(1− p + p) = (1− p)p.

In the alternative, we can compute

EX 2 = P(X = 1) = p.

HenceVX = EX 2 − (EX )2 = p − p2 = (1− p)p.

Empirical distribution

If x1, . . . , xn are observations from a sample space E we know that

εn(A) =1

n∑i=1

1(xn ∈ A)

denotes the relative frequency of occurrences of event A ⊆ E . Asa function of A this is the empirical probability measure given bythe data set x1, . . . , xn.

If we think of the data as realizations of iid random variables withdistribution P on E we think of εn as the – non-parametric –estimate of P.

Representation of the empirical distributionLet U denote a random variable that is uniformly distributed onthe finite set {1, . . . , n} and define the transformation

hx1,...,xn : {1, . . . , n} → E

byhx1,...,xn(i) = xi .

P(h(U) ∈ A) =∑

i :h(i)=xi∈A

P(U = i)︸︷︷︸= 1

n∑i=1

1(xi ∈ A) = εn(A)

This gives a useful representation of the empirical probabilitymeasure as a transformation of a uniformly distributed randomvariable on the indices – not least for the simulation from theempirical probability measure, which we will use later for theso-called bootstrapping procedure.

Plug-in for the mean and varianceFor observations x1, . . . , xn from iid real valued variables, εn is anestimate of P – making no restrictions on the possible choices ofP – hence with reference to the plug-in principle the mean andvariance under εn are natural estimates of the unknown mean andvariance under P.

We compute the empirical mean

µ̂n = EεnX = Ehx1,...,xn(U) =n∑

xiP(U = i) =1

n∑i=1

where U has the uniform distribution on {1, . . . , n} and theempirical variance

σ̃2n = Eεn(X − µ̂n)2 =

n∑i=1

(xi − µ̂n)2P(U = i) =1

n∑i=1

(xi − µ̂n)2.

NormalizationIf X is a real valued random variable with mean 0 and variance 1we find for µ ∈ R and σ > 0 that

E(σX + µ) = σEX + µ = µ

V(σX + µ) = E(σX + µ− µ)2 = σ2EX 2 = σ2VX = σ2.

In the other direction, if X has mean µ and variance σ2 then

X − µσ

has mean 0 and variance 1. This is in particular the case under theempirical measure, hence normalizing a data set with its empiricalmean and empirical standard deviation gives a data set withempirical mean 0 and empirical variance 1.

Covariance

Definition

If XY has finite expectation the covariance of the randomvariables X and Y is

V(X ,Y ) = E ((X − EX )(Y − EY ))

The correlation is

corr(X ,Y ) =V(X ,Y )√

The covariance is a centered measure of co-variation – dependingon the marginal scales. The correlation is a unitless measure ofco-variation that does not depend on the marginal scales.

Covariance rules

The covariance is symmetric:

V(X ,Y ) = V(Y ,X ).

If X = Y then

V(X ,X ) = E(X − EX )2 = VX .

An alternative formula is

V(X ,Y ) = E(XY )− EXEY

Thus if X and Y are independent then

V(X ,Y ) = 0.

Covariance rules

Theorem

If X and Y are two random variables with finite variance then thesum X + Y has finite variance and

V(X + Y ) = VX + VY + 2V(X ,Y ).

If X is a random variable with finite variance and c ∈ R is aconstant then cX has finite variance and

V(cX ) = c2VX .

Empirical versions

Just as for the mean and variance we have, with x1, . . . , xn a dataset where xl = (x1l , . . . , xkl) ∈ Rk , the empirical covariance

σ̃2ij ,n = Vεn(Xi ,Xj) = Eεn(Xi − EεnXi )(Xj − EεnXj)

n∑l=1

(xil − µ̂i ,n)(xjl − µ̂j ,n)

And the empirical correlation

c̃orrij ,n =σ̃2

σ̃i ,nσ̃j ,n=

∑nl=1(xil − µ̂i ,n)(xjl − µ̂j ,n)√∑n

l=1(xil − µ̂i ,n)2∑n

l=1(xjl − µ̂j ,n)2.

Properties of the empirical measureWith X1, . . . ,Xn independent and identically distributed withdistribution P on E and εn is the corresponding empiricalprobability measure,

εn(A) =1

n∑k=1

1(Xk ∈ A), A ⊆ E ,

we regard εn(A) as a real valued random variable – for fixed A ⊆ E .Then it holds that

Eεn(A) = P(A),

Vεn(A) =1

nP(A)(1− P(A)).

and (not in notes)

V(εn(A), εn(B)) =P(A ∩ B)− P(A)P(B)

Properties of the empirical mean and variance

Theorem

Considering the empirical mean µ̂n and the empirical variance σ̃2n

as estimators of the mean and variance respectively we have

Eµ̂n = EX and Vµ̂n =1

together with

Eσ̃2n =

n − 1

nVX . (1)

Due to (1) the estimator σ̃2n systematically undershoots the true

variance, and we prefer the estimator

σ̂2n =

n − 1σ̃2

n − 1

n∑i=1

(xi − µ̂n)2,

which has Eσ̂2n = σ2.

Properties of the empirical mean and variance

It is possible, but no fun, to compute the variance of σ̃2n or σ̂2

n,however, we can immediately see that

V(σ̂2n) =

n − 1

V(σ̃2n).

Hence the variance of σ̂2n is larger than the variance of σ̃2

university of copenhagendepartment of mathematical...

Documents

monte carlo theory and practice - semantic...

expectation-maximization algorithm for clustering ... ·...

cfcs - expectation and variance; chebyshev's …expectation...

lectures 7 - mathematical expectation text: wmm, chapter 4...

students’ expectation

high expectation

great expectation

a survey on some inequalities for expectation and...

praise for expectation hangover - christine...

(2019 batch onwards)...1 mcs-101 program core 1 mathematical...

expectation & maximization

mathematical expectation - marmara...

programme - web.math.ku.dk

expectation & variance 1 expectation

expectation #005

expectation maximization

is there an expectation gap of mathematical skills

legitimate expectation

alg1geomalg2 final 08aug22 - mercer island school...

cornersandotherhereditarysubalgebrasofgraph c...