university of copenhagendepartment of mathematical...

21
university of copenhagen department of mathematical sciences Expectation – continuous variables Definition If X is a real valued random variable with density f then EX = Z -∞ xf (x )dx denotes its expectation provided that Z -∞ |x |f (x )dx < , in which case we say that X has finite expectation. Slide 1/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Expectation – continuous variables

Definition

If X is a real valued random variable with density f then

EX =

∫ ∞−∞

xf (x)dx

denotes its expectation provided that∫ ∞−∞|x |f (x)dx <∞,

in which case we say that X has finite expectation.

Slide 1/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 2: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Expectation – discrete variables

Definition

If X is a discrete random variable taking values in E ⊆ R withpoint probabilities (p(x))x∈E then

EX =∑x∈E

xp(x)

denotes its expectation provided that∑x∈E

|x |p(x) <∞.

in which case we say that X has finite expectation.

Slide 2/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 3: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Computational rule

Theorem

If h : Rn → R is a real valued function, if the distribution of X hasdensity f : Rn → [0,∞) and if h(X ) has finite expectation then

Eh(X ) =

∫h(x)f (x)dx

=

∫ ∞−∞· · ·∫ ∞−∞︸ ︷︷ ︸

n

h(x1, . . . , xn)f (x1, . . . , xn)dx1 · · · dxn.

Slide 3/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 4: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Expectation of the sum

Take h : R2 → Rh(x1, x2) = x1 + x2

then

E(X1 + X2) =

∫ ∫(x1 + x2)f (x1, x2)dx1dx2

=

∫ ∫x1f (x1, x2)dx2dx1 +

∫ ∫x2f (x1, x2)dx1dx2

= EX1 + EX2.

since the marginal distributions of X1 and X2 have densities

f1(x1) =

∫f (x1, x2)dx2 f2(x2) =

∫f (x1, x2)dx1.

Slide 4/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 5: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

The general result

Theorem

If X and Y are two real valued random variables with finiteexpectation then X + Y has finite expectation and

E(X + Y ) = EX + EY .

Furthermore, if c ∈ R is a real valued constant then cX has finiteexpectation and

E(cX ) = cEX .

Moreover, if X and Y are independent real valued randomvariables with finite expectation then

E(XY ) = EX EY .

Slide 5/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 6: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Example – binomial distribution

If X is a Bernoulli random variable

EX = 1× P(X = 1) + 0× P(X = 0) = p.

If X1, . . . ,Xn are iid Bernoulli variables with success probability pthen

X = X1 + . . .+ Xn ∼ Bin(n, p).

AndEX = EX1 + . . .+ EXn = p + . . .+ p = np.

Slide 6/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 7: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Example – word counts

X1, . . . ,Xn are iid random variables with values in the alphabetE = {a, c, g, t}, w = w1w2 . . .wm denotes a word from thealphabet.

N =n−m+1∑

i=1

1(XiXi+1 . . .Xi+m−1 = w)

is the number of occurrences of the word w .

The expectation is

EN =n−m+1∑

i=1

E1(XiXi+1 . . .Xi+m−1 = w)

Slide 7/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 8: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Example - word counts

The variable 1(XiXi+1 . . .Xi+m−1 = w) is a Bernoulli variable,hence

EN =n−m+1∑

i=1

E1(XiXi+1 . . .Xi+m−1 = w)

=n−m+1∑

i=1

P(XiXi+1 . . .Xi+m−1 = w)

=n−m+1∑

i=1

P(Xi = w1)P(Xi+1 = w2) . . .P(Xi+m−1 = wm)

= (n −m + 1)p(w1)p(w2) . . . p(wm)

= (n −m + 1)p(a)nw (a)p(c)nw (c)p(g)nw (g)p(t)nw (t).

Slide 8/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 9: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Variance

Definition

If X is a real valued random variable with finite second momentthe variance of X is

VX = E(X − EX )2

and the standard deviation is defined as√

VX .

Since (X − EX )2 = X 2 − 2XEX + (EX )2 we find that

VX = EX 2 − 2EXEX + (EX )2 = EX 2 − (EX )2

The variance (or rather the standard deviation) is a naturalmeasure of how spread out the distribution of X is.

Slide 9/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 10: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Bernoulli random variables

If X is a Bernoulli random variable with success probability p weknow that EX = p. We can compute that variance as follows

VX = E(X − p)2 = (1− p)2P(X = 1) + p2P(X = 0)

= (1− p)2p + p2(1− p) = (1− p)p(1− p + p) = (1− p)p.

In the alternative, we can compute

EX 2 = P(X = 1) = p.

HenceVX = EX 2 − (EX )2 = p − p2 = (1− p)p.

Slide 10/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 11: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Empirical distribution

If x1, . . . , xn are observations from a sample space E we know that

εn(A) =1

n

n∑i=1

1(xn ∈ A)

denotes the relative frequency of occurrences of event A ⊆ E . Asa function of A this is the empirical probability measure given bythe data set x1, . . . , xn.

If we think of the data as realizations of iid random variables withdistribution P on E we think of εn as the – non-parametric –estimate of P.

Slide 11/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 12: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Representation of the empirical distributionLet U denote a random variable that is uniformly distributed onthe finite set {1, . . . , n} and define the transformation

hx1,...,xn : {1, . . . , n} → E

byhx1,...,xn(i) = xi .

Then

P(h(U) ∈ A) =∑

i :h(i)=xi∈A

P(U = i)︸ ︷︷ ︸= 1

n

=1

n

n∑i=1

1(xi ∈ A) = εn(A)

This gives a useful representation of the empirical probabilitymeasure as a transformation of a uniformly distributed randomvariable on the indices – not least for the simulation from theempirical probability measure, which we will use later for theso-called bootstrapping procedure.

Slide 12/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 13: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Plug-in for the mean and varianceFor observations x1, . . . , xn from iid real valued variables, εn is anestimate of P – making no restrictions on the possible choices ofP – hence with reference to the plug-in principle the mean andvariance under εn are natural estimates of the unknown mean andvariance under P.

We compute the empirical mean

µ̂n = EεnX = Ehx1,...,xn(U) =n∑

i=1

xiP(U = i) =1

n

n∑i=1

xi ,

where U has the uniform distribution on {1, . . . , n} and theempirical variance

σ̃2n = Eεn(X − µ̂n)2 =

n∑i=1

(xi − µ̂n)2P(U = i) =1

n

n∑i=1

(xi − µ̂n)2.

Slide 13/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 14: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

NormalizationIf X is a real valued random variable with mean 0 and variance 1we find for µ ∈ R and σ > 0 that

E(σX + µ) = σEX + µ = µ

and

V(σX + µ) = E(σX + µ− µ)2 = σ2EX 2 = σ2VX = σ2.

In the other direction, if X has mean µ and variance σ2 then

X − µσ

has mean 0 and variance 1. This is in particular the case under theempirical measure, hence normalizing a data set with its empiricalmean and empirical standard deviation gives a data set withempirical mean 0 and empirical variance 1.

Slide 14/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 15: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Covariance

Definition

If XY has finite expectation the covariance of the randomvariables X and Y is

V(X ,Y ) = E ((X − EX )(Y − EY ))

The correlation is

corr(X ,Y ) =V(X ,Y )√

VXVY.

The covariance is a centered measure of co-variation – dependingon the marginal scales. The correlation is a unitless measure ofco-variation that does not depend on the marginal scales.

Slide 15/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 16: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Covariance rules

The covariance is symmetric:

V(X ,Y ) = V(Y ,X ).

If X = Y then

V(X ,X ) = E(X − EX )2 = VX .

An alternative formula is

V(X ,Y ) = E(XY )− EXEY

Thus if X and Y are independent then

V(X ,Y ) = 0.

Slide 16/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 17: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Covariance rules

Theorem

If X and Y are two random variables with finite variance then thesum X + Y has finite variance and

V(X + Y ) = VX + VY + 2V(X ,Y ).

If X is a random variable with finite variance and c ∈ R is aconstant then cX has finite variance and

V(cX ) = c2VX .

Slide 17/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 18: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Empirical versions

Just as for the mean and variance we have, with x1, . . . , xn a dataset where xl = (x1l , . . . , xkl) ∈ Rk , the empirical covariance

σ̃2ij ,n = Vεn(Xi ,Xj) = Eεn(Xi − EεnXi )(Xj − EεnXj)

=1

n

n∑l=1

(xil − µ̂i ,n)(xjl − µ̂j ,n)

And the empirical correlation

c̃orrij ,n =σ̃2

ij ,n

σ̃i ,nσ̃j ,n=

∑nl=1(xil − µ̂i ,n)(xjl − µ̂j ,n)√∑n

l=1(xil − µ̂i ,n)2∑n

l=1(xjl − µ̂j ,n)2.

Slide 18/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 19: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Properties of the empirical measureWith X1, . . . ,Xn independent and identically distributed withdistribution P on E and εn is the corresponding empiricalprobability measure,

εn(A) =1

n

n∑k=1

1(Xk ∈ A), A ⊆ E ,

we regard εn(A) as a real valued random variable – for fixed A ⊆ E .Then it holds that

Eεn(A) = P(A),

Vεn(A) =1

nP(A)(1− P(A)).

and (not in notes)

V(εn(A), εn(B)) =P(A ∩ B)− P(A)P(B)

n.

Slide 19/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 20: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Properties of the empirical mean and variance

Theorem

Considering the empirical mean µ̂n and the empirical variance σ̃2n

as estimators of the mean and variance respectively we have

Eµ̂n = EX and Vµ̂n =1

nVX

together with

Eσ̃2n =

n − 1

nVX . (1)

Due to (1) the estimator σ̃2n systematically undershoots the true

variance, and we prefer the estimator

σ̂2n =

n

n − 1σ̃2

n =1

n − 1

n∑i=1

(xi − µ̂n)2,

which has Eσ̂2n = σ2.

Slide 20/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010

Page 21: university of copenhagendepartment of mathematical ...web.math.ku.dk/~richard/courses/StatBIE/week3monScreen.pdf · university of copenhagendepartment of mathematical sciences Expectation

un i ver s i ty of copenhagen department of mathemat i ca l sc i ence s

Properties of the empirical mean and variance

It is possible, but no fun, to compute the variance of σ̃2n or σ̂2

n,however, we can immediately see that

V(σ̂2n) =

(n

n − 1

)2

V(σ̃2n).

Hence the variance of σ̂2n is larger than the variance of σ̃2

n.

Slide 21/21— Niels Richard Hansen — Statistics BI/E — January 18, 2010