applied statistics - web.uniroma1.it lezioni/slides... · introduction sampling variation random...
TRANSCRIPT
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Applied Statistics
Lecturer: Cristina Mollica
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Statistical models
Statistics concerns what can be learned from data using
statistical models
to study the variability of the data.
The key feature of a statistical model is that variability isrepresented using probability distributions, which form thebuilding-blocks from which the model is constructed.
Statistical models must accommodate:systematic variationrandom variation
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Statistical models
The purpose of these first lectures is to review the concepts ofSample statistics and Sampling VariationMomentsConvergence
and we will focus on the
Normal distribution
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Statistics and Sampling Variation
The key idea in statistical modelling is to treat the data asthe outcome of a random experiment.
n i.i.d. r. v. (Y1, . . . ,Yi , . . . ,Yn) form the random sample.
Statistical analysis generally deals with
n observations (y1, . . . , yi , . . . , yn) known as observed sample.
We say that a quantity
t = t(y1, ..., yn) is a sample statistic.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Data summaries
Statistics summarize some important aspects of the data.
Example:
y =1n
n∑i=1
yi (location) s2 =1
n − 1
n∑i=1
(yi − y)2 (scale)
median(y) =
y(n+1)/2 n odd12(y(n/2) + y(n/2+1) n even.
(center)
ECDF (t) =1n
n∑i=1
Iyi ≤ t (empirical cumulative distribution function)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Graphs
HistogramEmpirical cumulative distribution functionBoxplotScatterplot (2 variables)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
GraphsExample (Italian consumption data): income and consumption of asample 7927 italians in 2010.
Histogram of the income (on log scale)
Income
6 8 10 12
0.0
0.2
0.4
0.6
6 8 10 12 14
0.0
0.2
0.4
0.6
0.8
1.0
ECFD of the income (on log scale)
x
Fn(
x)
(5.7,7.21] (8.72,10.2] (11.8,13.3]
78
910
1112
Consumption by class of income (on log scale)
6 8 10 12
78
910
1112
Income vs consumption (on log scale)
Income
Con
sum
ptio
n
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Random sample
The fundamental idea of statistical modelling is to treat dataas observed values of random variables.
The data available y1, y2, ..., yn are the observed values of arandom sample of size n, defined to be a collection of nindependent and identically distributed random variablesY1,Y2, ...,Yn.
We suppose that each of the Yi has the same cumulativedistribution function F , representing the population Y fromwhich the sample has been taken.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Random sample
Statistical models ⇔ Random variables
Typically, we are interested in inferring specific features of thepopulation:
mean of Y ∼ F
variance of Y ∼ F
moments of Y ∼ F
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Mean of a random variable
Let Y be a random variable with cumulative distribution function Fand density function f .The expected value of Y , E [Y ] is defined as
E [Y ] =
∫ydF (y) =
∫yf (y)dy .
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Variance of a random variable
The variance V [Y ] of Y is defined as
V [Y ] = E [(Y−E [Y ])2] = E [Y 2]−E [Y ]2 =
∫y2dF (y)−
(∫ydF (y)
)2
.
The computation of the moments of a random variable is facilitatedby the use of moment generating function.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Moment generating function
The moment generating function of the random variable Y is
MY (t) = E (etY ) t ∈ R
provided that MY (t) <∞.
Let
M ′(t) = dM(t)dt , M ′′(t) = d2M(t)
dt2, M(r)(t) = d rM(t)
dtr
denote the derivatives of M.
µr = E [Y r ] = M (r)(0).
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Moment generating functionSome properties:
Y1, ...,Yn are independent if and only if their joint momentgenerating function factorizes as
E [exp(Y1t1 + ... + Yntn)] = E [exp(Y1t1)] · · ·E [exp(Yntn)]
Let Y = a + bX . The moment generating function of Y is
MY (t) = eatMX (bt)
Any moment generating function corresponds to a uniqueprobability distribution.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Examples
We will consider some examples of important random variables.In particular, we will focus on the
Poisson distribution (blackboard);Binomial distribution (at home);Uniform distribution (at home);Exponential distribution (at home).
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Normal distribution
We say that X ∼ N(µ, σ2) when
fX (x ;µ, σ) =1
σ√2π
exp
(− 12σ2 (x − µ)2
)
µ ∈ R→ mean, median and mode (location parameter);σ2 > 0→ variance (scale parameter).
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Normal distribution
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Varying location parameterde
nsity
µ = 0 µ = − 1 µ = − 3 µ = 1 µ = 3
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Normal distribution
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
Varying scale parameterde
nsity
σ2 = 1 σ2 = 4 σ2 = 9 σ2 = 0.1 σ2 = 0.25
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Normal distribution
−4 −2 0 2 4
0.0
0.5
1.0
1.5
Varying scale and location parametersde
nsity
σ2 = 1σ2 = 0.1σ2 = − 3
µ = 0µ = 2µ = − 2
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Standardization
The random variable
Z =X − µσ
∼ N(0, 1)
is a standardized Normal distribution.
fZ (z) =1√2π
exp
(−12z2)
Clearly,X = σZ + µ.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Standardization
The cumulative density function is obtained by integrating thedensity as follows
FZ (z) =1√2π
∫ u
−∞exp
(−12z2)dz = Φ(u)
If we want to compute P(X ≤ 2) for X ∼ N(µ = 3, σ2 = 4)
P(X ≤ 2) = P(Z ≤ −0.5) = Φ(−0.5) = 1− Φ(0.5)
that cannot be computed analytically! Use R!
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
M.G.F. of Normal distribution
Let Z ∼ N(0, 1). The moment generating function is
MZ (t) = E [etZ ] = et22
Proof (at home) Let Y ∼ N(µ, σ2); for the standardization,
Y = µ+ σZ
For linear combination, the m.g.f. of Y is
MY (t) = E [etY ] = eµtMZ (σt) = eµteσ2t2
2
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Chi-squared distribution
Let Z ∼ N(0, 1). The random variable
Y = Z 2 ∼ χ2(1)
FY (y) = 2Φ(√y)− 1
fY (y) = 1√2πy e
− 12 y y ∈ R+
Proof (blackboard)
E [Y ] = 1 V [Y ] = 2
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Chi-squared distribution
More generally, Y ∼ χ2(n) where the parameter n ∈ N+ is referred
to as degrees of freedom if
fY (y ; n) =1
2n/2Γ(n/2)y
n2−1e−y/2 x ∈ R+
0
Note that
Y ∼ χ2(n) ⇔ Y ∼ Gamma(α = n/2, β = 1/2).
It can be easily proven (at home) that
E [Y ] = n V [Y ] = 2n
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Chi-squared distributionThe m.g.f. of a Gamma distribution X ∼ Gamma(α, β) is
MX (t) = E [etX ] =
(β
β − t
)αand hence the m.g.f. of a Y ∼ χ2
(1) (α = 12 , β = 1
2)
MY (t) = E [etY ] =
√1
1− 2t.
THEOREM Let (Z1,Z2, . . . ,Z2)iid∼ N(0, 1). Hence
Y = Z 21 + Z 2
2 + · · ·+ Z 2n ∼ χ2
(n).
Proof (blackboard)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Chi-squared distribution
0 5 10 15 20 25 30
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Den
sity
ν = 1ν = 3ν = 5ν = 10ν = 20
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Student T distributionLet Z ∼ N(0, 1) and W ∼ χ2
(ν) be independent. The randomvariable
Y =Z√W /ν
∼ Tν
where the parameter ν > 0 indicates the degrees of freedom.
−10 −5 0 5 10
0.0
0.1
0.2
0.3
0.4
Den
sity
ν = 1ν = 3ν = 5ν = 10ν = 20N(0,1)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Fisher F distributionLet X1 ∼ χ2
(ν1)and X2 ∼ χ2
(ν2)(independent). The random variable
Y =X1/ν1
X2/ν2∼ F (ν1, ν2),
where the parameters ν1, ν2 > 0 are the degrees of freedom.
−10 −5 0 5 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F distribution (nu1=5)
Den
sity
nu2 = 1nu2 = 5nu2 = 10nu2 = 20
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Association measures
Let Y1 and Y2 be two random variables.
The covariance between Y1 and Y2 is
Cov(Y1,Y2) = E [(Y1 − E [Y1])(Y2 − E [Y2])]
= E [Y1Y2]− E [Y1]E [Y2]
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Association measures
Let Y be a p-dimensional random vector.
The variance-covariance matrix is a p × p matrix defined as
Σ = Cov(Y,Y) = E [(Y − E [Y])(Y − E [Y])′]
The generic element σrs is
σrs = Cov(Yr ,Ys) = E [(Yr − E [Yr ])(Ys − E [Ys ])]
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Association measures
Theorem: Σ is a positive semidefinite matrix.
Proof: Let a = (a1, ..., ap) be a vector of constants. It followsthat
0 ≤ V (a′Y) = Cov(a′Y, a′Y) = a′Cov(Y,Y)a = a′Σa.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Association measures
LetY be a p-dimensional random vector;a be a q-dimensional vector;B be a p × q matrix.
The q × q variance-covariance matrix of the vector a + B′Y
Cov(a + B′Y, a + B′Y) = Cov(B′Y,B′Y)
= E [(B′Y − E [B′Y])(B′Y − E [B′Y])′]
= E [B′(Y − E [Y])(Y − E [Y])′B]
= B′E [(Y − E [Y])(Y − E [Y])′]B= B′ΣB.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Association measures
The correlation between two variables Y1 and Y2 is defined as
ρ(Y1,Y2) =Cov(Y1,Y2)√Var(Y1)Var(Y2)
|Cov(Y1,Y2)| ≤√Var(Y1)Var(Y2);
−1 ≤ ρ(Y1,Y2) ≤ 1.The correlation matrix is defined as
Ω = Σ−12 ΣΣ−
12
where Σ−12 is a diagonal matrix with the standard deviations of Y.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
The multivariate normal distribution
We say that Y = (Y1, ...,Yp) ∼ Np(µ,Σ) when its densityfunction is
f (y ;µ,Σ) =1
(2π)p/2|Σ|1/2exp
−12
(y − µ)′Σ−1(y − µ)
where |Σ| = det(Σ).
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
The multivariate normal distribution
Example: p = 2 (blackboard).
f (y1, y2;µ1, µ2,Σ) =1
2πσ1σ2√
1− ρ2exp
−12Q(y1, y2)
where
Q(y1, y2) =1
1− ρ2
[(y1 − µ1
σ1
)2
− 2ρ(y1 − µ1
σ1
)(y2 − µ2
σ2
)+
(y2 − µ2
σ2
)2].
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
The multivariate normal distribution
x
−4−2
0
2
4y
−4−2
02
4
f(x,y)
0.000.050.100.150.200.25
rho= −0.8
x
−4−2
0
2
4y
−4−2
02
4
f(x,y)
0.000.05
0.10
0.15
rho= 0
x
−4−2
0
2
4y
−4−2
02
4
f(x,y)
0.000.050.10
0.15
rho= 0.4
x
−4−2
0
2
4y
−4−2
02
4
f(x,y)
0.000.050.100.150.20
rho= 0.7
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
The multivariate normal distributionrho= −0.8
0.02
0.04
0.06
0.08 0.1
0.12
0.14
0.16
0.18
−2 −1 0 1 2 3 4
−2
−1
01
23
4
rho= 0
0.02
0.04
0.06
0.08 0.1
0.12
−2 −1 0 1 2 3 4
−2
−1
01
23
4
rho= 0.4
0.02
0.04 0.06
0.08
0.1
0.12
0.14
−2 −1 0 1 2 3 4
−2
−1
01
23
4
rho= 0.7
0.02
0.04 0.06
0.08
0.1 0.12
0.14
0.18
−2 −1 0 1 2 3 4
−2
−1
01
23
4
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
The multivariate normal distribution
The moment generating function of Y ∼ Np(µ,Σ) is
MY(t) = E [et′Y] = exp(t′µ +t′Σt2
)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
The multivariate normal distribution
Theorem Let Y ∼ Np(µ,Σ) and B a k × p matrix. Then thevariable W = BY is
W ∼ Nk(Bµ,BΣB′)
Proof. Compute the m.g.f. and its property for linear combination:
MW(t) = E [exp(t′BY)] = MY(B′t)
= exp
(t′Bµ+
t′BΣB′t2
)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Marginal and conditional distribution
Let Y ∼ Np(µ,Σ) and Y′ = (Y′1,Y′2) where Y1 is q × 1 and Y2 is
(p − q)× 1.
Let µ =
(µ1µ2
)Σ =
(Σ11 Σ12Σ21 Σ22
)be the decomposition of the mean vector and thevariance-covariance matrix, where
Σ11 is q × q;Σ22 is (p − q)× (p − q).Σ12 is q × (p − q);Σ21 is (p − q)× q;Σ21 = Σ′12;
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Marginal and conditional distribution
It can be shown that1 Y1 ∼ Nq(µ1,Σ11);2 Y2 ∼ N(p−q)(µ2,Σ22);
3 Y1|Y2 = y ∼ Nq(µ1 + Σ12Σ−122 (y − µ2); Σ11 − Σ12Σ−1
22 Σ21)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Chi-squared distribution
Let Y ∼ Np(µ,Σ), Σ full rank matrix. Then
Z = (Y − µ)′Σ−1(Y − µ) ∼ χ2p.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Modes of convergence
IDEA:The bigger our sample, the more faith we can have in ourinferences, because the sample is more representative of the
distribution F from which it comes.
We will study two modes of convergence:convergence in distributionconvergence in probability
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Convergence in distribution
We say that the sequence Z1,Z2, . . . ,Zn, . . . converges indistribution to Z , Zn
d→ Z , if
FZn = Pr(Zn ≤ z)→ FZ = Pr(Z ≤ z) as n→∞
This implies that, for large n, one can use FZ to approximate FZn :
Zn.∼ Z
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Some examples
1 A sequence X1,X2, . . . ,Xn, . . . where Xn ∼ T(n) converges indistribution to Z ∼ N(0, 1)
T(n).∼ N(0, 1) as n→ +∞
2 A sequence X1,X2, . . . ,Xn, . . . where Xn ∼ χ2(n) converges in
distribution to Z ∼ N(n, 2n)
χ2(n)
.∼ N(n, 2n) as n→ +∞
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Central Limit Theorem
Another important cases is the Central Limit Theorem (CLT):let Y1,Y2, . . . ,Yn, . . . be a sequence of i.i.d. variables with finitemean µ and finite variance σ2 > 0. Then
Zn =(Y − µ)
σ/√n
d→ N(0, 1)
The CLT implies that, in large samples, the sampling distribution ofY can be approximated with the normal density with mean µ andvariance σ2
n .
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Central Limit Theoremn = 1
x
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
n = 5
x
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
n = 10
x
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
n = 50
x
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Convergence in probability
We say hat the sequence S1, S2, . . . , Sn, . . . converges inprobability to S , Sn
p→ S , if for any ε > 0
Pr(|Sn − S | > ε)→ 0 as n→∞
A special case of this is the weak law of large numbers:if Y1,Y2, ... is a sequence of i.i.d. random variables, each with finitemean µ, then
Yp→ µ.
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Some consequences
Consider the average Y of a random sample drawn from Y withmean E [Y ] = µ and variance V [Y ] = σ2.The weak law of large numbers implies that
Y is a consistent estimator of µ.
It is also an unbiased estimator of µ (E [Y ] = µ.)
Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence
Some properties
If s0 and u0 are constants, these modes of convergence are relatedas follows:
Snp→ S =⇒ Sn
d→ S ;
Snd→ s0 =⇒ Sn
p→ s0;Sn
p→ s0 =⇒ h(Sn)p→ h(s0);
Snd→ S and Un
p→ u0 =⇒ Sn + Und→ S + u0 and
SnUnd→ Su0.
The fourth of these is known as Slutsky’s lemma.