sampling distributions of estimators and hypothesis...

45
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sampling Distributions of Estimators and Hypothesis Testing Weerachart T. Kilenthong Research Institute for Policy Evaluation and Design (RIPED) University of the Thai Chamber of Commerce Tee (RIPED) Statistical Inference 1 / 45

Upload: others

Post on 07-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Sampling Distributions of Estimators and HypothesisTesting

Weerachart T. Kilenthong

Research Institute for Policy Evaluation and Design (RIPED)University of the Thai Chamber of Commerce

Tee (RIPED) Statistical Inference 1 / 45

Page 2: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Random Variable and Distribution Function

Definition (Random Variable)

A random variable X is a real-valued function that is defined on thesample space S .

We usually use a random variable to represent uncertain situations. Infact, we assign value to each situations, e.g.,

I Raining: we could assign values to a random variable X representingraining experiment, i.e., X = 1 if it is raining, and 0 otherwise.

To understand a random variable completely, we need to know itsdistribution function, i.e.,

I Raining: suppose that the probability of raining is p = 0.6. Itsdistribution function is defined by

Pr (X = 1) = 0.6,

Pr (X = 0) = 0.4.

I Income: suppose that household income is distributed as log-normalwith parameter

(µ = 10, σ2 = 100

):

Tee (RIPED) Statistical Inference 2 / 45

Page 3: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distribution Function from Simualted Data: BernoulliDistribution

x=0 x=1

Bernoulli Distribution p=0.6

Prob

abilit

y

0.0

0.2

0.4

0.6

0.8

1.0

0.4006

0.5994

Figure: Observed fraction of sample from a 10,000 draws of the Bernoullidistribution

Tee (RIPED) Statistical Inference 3 / 45

Page 4: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distribution Function from Simualted Data: CDF ofLog-Normal Distribution

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Log−normal Distribution mu=10 sigma=10

y

Com

ulat

ive

dist

ribut

ion

func

tion

Figure: Observed fraction of sample from a 10,000 draws of the log-normaldistribution

Tee (RIPED) Statistical Inference 4 / 45

Page 5: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distribution Function from the Population

The question then is how can we get the distribution of a randomvariable we interested in?

Of course, if we can observe the whole population, then we can justcalculate the following cumulative distribution function

F (x) = Pr (X ≤ x)

For example, for each level of income x , we can estimate F (x) usingthe fraction of households whose income is no more than x .

Then, we can get the density probability function(p.d.f) f (x) by

f (x) =dF (x)

dx,

which is the slope of the c.d.f. F (x).

Tee (RIPED) Statistical Inference 5 / 45

Page 6: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distribution Function from Simualted Data: pdf ofLog-Normal Distribution

Log−normal Distribution mu=10 sigma=10

y

Prob

abilit

y di

strib

utio

n fu

nctio

n

0 20 40 60 80 100

0.00

0.02

0.04

0.06

0.08

0.10

Figure: Observed fraction of sample from a 10,000 draws of the log-normaldistribution

Tee (RIPED) Statistical Inference 6 / 45

Page 7: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Population Mean and Variance

Sometimes it is more convenient to characterize a distributionfunction by moments or expectation.

The most popular moments are mean E [X ] ≡ µ and varianceVar [X ] ≡ σ2:

µ = E [X ] =

∫xf (x)dx ,

σ2 = Var [X ] =

∫(x − E [X ])2 f (x)dx .

One of the reason is that(µ, σ2

)are sufficient statistics for a Normal

distribution. That is, if we know(µ, σ2

)of a Normal distribution then

we know the whole distribution. Note: why do we care about Normaldistribution so much?

Tee (RIPED) Statistical Inference 7 / 45

Page 8: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Sample and Population

Unfortunately, it is almost impossible to observe all the population.

Therefore, we need to use estimation and inference.

An observed data we got is called a sample. We, of course, need touse the sample to infer about the underlying random variable or thetruth. We cannot wait for the population data.

Problem: obviously data is not the same as population. What can wedo?

Tee (RIPED) Statistical Inference 8 / 45

Page 9: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Estiamtion of Mean and Variance

We can use a non-parametric method to estimate the wholedistribution directly. But it requires a very large sample, which weusually do not have. This way is the best if you can do it, of course.

Most of the time, though, we will estimate mean and variance mostly.

There are several underlying theories that give us the followingestimators for mean and standard deviation:

µ = xn =

∑ni=1 xin

,

σ =

√∑ni=1 (xi − xn)

2

n.

Note: there is an alternative estimator of variance:

σ =

√∑ni=1 (xi − xn)

2

n − 1,

which is unbiased. But both will be very close when n is large.

Tee (RIPED) Statistical Inference 9 / 45

Page 10: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Estimators as Random Variables

Question: Should we consider an estimator as a constant number ora random variable?

Answer: given that we use sample, which is not the population, weneed to consider an estimator as a random variable.

As a result, an estimator itself must have a distribution function,which is the key of a statistical analysis.

A statement from a statistical analysis, therefore, needs toincorporate the distribution function of the estimator. Technically,this procedure is called a statistical inference.

Tee (RIPED) Statistical Inference 10 / 45

Page 11: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distribution of the Mean Estimator or Average

Question: What is the distribution of the average:

µ =

∑ni=1 xin

,

I Using a law of large number, we know that the limit of the averageE [µ] is equal to the true parameter µ.

Question: what is the distribution of the estimator?I If we assume that the distribution is Normal (we usually prove this

using the central limit theorem), then we can simply ask what is thevariance of the mean estimator?

Tee (RIPED) Statistical Inference 11 / 45

Page 12: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Standard Error of the Average

We usually estimate the standard deviation of an estimator bystandard error (s.e.).

Question: what is standard error of the mean estimator of a Normaldistribution?

Back to basic: what is standard error?I It is the standard deviation of the mean estimator.I But how can we calculate it?

Statistic programs (i.e. STATA, R) usually calculate the s.e. for you.

Here we will show you how to construct it manually. The purpose isnot for you to do it this way but to show you its meaning.

Tee (RIPED) Statistical Inference 12 / 45

Page 13: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulated Data from the Standard Normal distributionwith

(µ = 0, σ2 = 1

)

We begin by simulating the data from a known Normal distributionwith

(µ = 0, σ2 = 1

). These parameters in this case are the true

parameters.

Suppose that we simulate 10,000 observations.

You can see the distribution of each sample sets int he followingfigure.

I It get closer to the theoretical distribution when the sample size islarger.

Tee (RIPED) Statistical Inference 13 / 45

Page 14: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulated Data from a Normal distribution with(µ = 0, σ2 = 1

)number of sample = 30

z

Dens

ity

−1 0 1 2

0.00.4

0.8number of sample = 100

z

Dens

ity

−2 −1 0 1 2 3 4

0.00

0.15

0.30

number of sample = 500

z

Dens

ity

−3 −2 −1 0 1 2 3

0.00.2

0.4

number of sample = 1000

z

Dens

ity

−3 −2 −1 0 1 2 3

0.00.2

0.4

number of sample = 5000

z

Dens

ity

−2 0 2 4

0.00.1

0.20.3

0.4

number of sample = 10000

z

Dens

ity

−2 0 2 4

0.00.1

0.20.3

0.4

Normal Distribution:µ=0,σ2=1

Figure: Distribution of simulated sample from the standard normal distribution

Tee (RIPED) Statistical Inference 14 / 45

Page 15: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Standard Error of the Average from Simulated Data

We will randomly draw n = 50, 100, 500, 1,000 sample from the10,000 observations, each of them 1,000 times.

Then, calculate the average or the mean estimator for each draw

µn =

∑ni=1 xin

,

Get the distribution of the averages µn from those draws.

This is the distribution of the estimator. Of course, the accuracydepends on the number of draw.

Tee (RIPED) Statistical Inference 15 / 45

Page 16: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Standard Error of the Average from Simulated Data

Figure: Distribution of estimated mean or averge from randomized sample

Tee (RIPED) Statistical Inference 16 / 45

Page 17: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Standard Error from Theory

Question: Do we need to do this bootstrapping all the time?

Answer: we usually do not do it this way. We use probability theoryto guide us how to calculate the standard error (s.e.) of an estimator.

For example, using a central limit theorem, we can show that√n(Xn − µ

)has the Normal distribution with mean zero and

variance σ2. Hence, Var[Xn

]= σ2

n .

As a result, we can approximate the standard error for the average orthe mean estimator by

σ√n.

That is, we replace σ by its estimator σ:

σ =

√∑ni=1 (xi − xn)

2

n − 1,

As a result, the average Xn has the t distribution with n− 1 degree offreedom.

Tee (RIPED) Statistical Inference 17 / 45

Page 18: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Standard Error from Simulated Data and Theory

Simulationn µ σ E [x ] Var [x ] SDx SE formular

x

50 0 1 0.018542 0.018731 0.140978 0.140585100 0 1 0.005671 0.010035 0.099687 0.098938500 0 1 0.012806 0.001813 0.044581 0.0445731000 0 1 0.011740 0.000861 0.031524 0.031520

Tee (RIPED) Statistical Inference 18 / 45

Page 19: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Standard Error is Not the Standard Deviation of theSample

One may be confused whether the standard error is the standarddeviation of the data?

The answer is clearly NO.

To see this further, let see the standard error for standard deviationestimator.

Tee (RIPED) Statistical Inference 19 / 45

Page 20: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Standard Error of the Standard Deviation σ fromSimulated Data

Figure: Distribution of estimated standard deviation from randomized sample

Tee (RIPED) Statistical Inference 20 / 45

Page 21: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Proportion/Fraction as Probability

Another simple but useful estimator is a fraction pi , a ratio of numberof observations in a group ni and the total number of observations n:

pi =nin.

This fraction is an efficient estimator of a probability of having groupi in the population.

Example: Bernoulli distribution with parameter p; the outcome iseither 1 or 0. The estimator for p is

p = xn =

∑ni=1 xin

=n1n,

where n1 is the number of observations with xi = 1.

The key point: p ∼ N(p, p(1−p)

n

)Tee (RIPED) Statistical Inference 21 / 45

Page 22: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Proportion/Fraction from Simulation

In theory, the standard error for p is√

p(1−p)n .

0.60 0.65 0.70 0.75 0.80 0.85 0.90

010

2030

40

Dens

ity

0.60 0.65 0.70 0.75 0.80 0.85 0.90

010

2030

40

Dens

ity

0.60 0.65 0.70 0.75 0.80 0.85 0.90

010

2030

40

Dens

ity

0.60 0.65 0.70 0.75 0.80 0.85 0.90

010

2030

40

Dens

ity

n=50 n=100 n=500 n=1000

p=0.75

Figure: Distribution of estimated p: the true parameter is p = 0.75.Tee (RIPED) Statistical Inference 22 / 45

Page 23: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Bayes Estimation for a normal distribution

Assume that, X1,X2, ...,Xn be a random sample for N(µ, σ2) and σ2

is known.

Assume that, the prior is µ ∼ N(µ0, σ20).

The key point: the posterior of µ is also normal with mean andvariance:

µ1 =σ2µ0 + nσ2

0Xn

σ2 + nσ20

and σ21 =

σ2σ20

σ2 + nσ20

.

Where Xn is sample mean and n is number of sample.

Tee (RIPED) Statistical Inference 23 / 45

Page 24: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation of posterior distribution

Figure: Distribution of estimated µ1: the true parameter is µ = 0.

Tee (RIPED) Statistical Inference 24 / 45

Page 25: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum Likelihood Estimation for a normal distribution

Assume that, X1,X2, ...,Xn be a random sample for N(µ, σ2).

For this example, likelihood function is

L =n∏

i=1

ϕ

(xi − µ

σ

∣∣∣∣µ, σ2

)(1)

The MLE of θ = µ, σ2 is θ = µ, σ2 = x ,∑n

i=1(xi−xn)n and the

asymptotic normality result states when θ0 = µ0, σ20 is true

parameter that

√n(µ− µ0)

d−→ N(0,σ20

n)

√n(σ2 − σ2

0)d−→ N(0,

2(σ20)

2

n).

Tee (RIPED) Statistical Inference 25 / 45

Page 26: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation of estimator using MLE

Figure: Distribution of estimated µ and σ2: the true parameters are µ = 0 andσ2 = 1.

Tee (RIPED) Statistical Inference 26 / 45

Page 27: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Method of Moment Estimation for a normal distribution

Assume that, X1,X2, ...,Xn be a random sample for N(µ, σ2).

The moments of this example are∑ni=1 xin

= E (X ) = µ (2)∑ni=1 x

2i

n= E (X 2) = µ2 + σ2 (3)

The MM estimors of θ = µ, σ is θ = µ, σ = x ,∑n

i=1(xi−xn)n .

In general, Bootstrap (Monte Carlo simulation) methods provideapproximations to the sampling distributions of MM estimators.

Tee (RIPED) Statistical Inference 27 / 45

Page 28: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation of estimator using MM estimators

Figure: Distribution of estimated µ and σ2: the true parameters are µ = 0 andσ2 = 1.

Tee (RIPED) Statistical Inference 28 / 45

Page 29: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Hypothesis Testing

In general, we apply a statistical analysis to test whether a hypothesiscan be rejected or not.

This is a main reason why do we need to know the distribution or thestandard error of an estimator.

We will not go into details about how many types of testing we cando here. We will simply focus on the basic concept that might beuseful for simple analysis.

Tee (RIPED) Statistical Inference 29 / 45

Page 30: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Null and Alternative Hypothesis

We usually want to test whether a parameter θ is in a set Ω0 or not:

H0 : θ ∈ Ω0,

which is called a Null hypothesis. Consequently, at the same time, theAlternative hypothesis is automatically defined as

H1 : θ ∈ Ω1,

where Ω0 and Ω1 are disjoint partition with Ω0 ∩ Ω1 = ∅ andΩ0 ∪ Ω1 = Ω.

If we know the true value of θ, it is then very easy to tell which one istrue. But unfortunately we usually do not know the true value. Wecan at best estimate it.

Tee (RIPED) Statistical Inference 30 / 45

Page 31: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Test Statistics

To simplify a test further, we usually define a test statisticsT = r (X ), which is a function of observables X .Examples:

I t-statistics: it is a test statistics for testing whether a parameter isdifferent from zero:

t =θ

s.e.(θ) ,

where s.e.(θ)si the standard error of the estimator θ. If the

parameter of estimate is the mean, whose estimator is the average, wethen can test whether the mean is equal to a constant µ or not, usingthe following t-stat

t =Xn − µ

σ√n

,

where the distribution of this t-stat is the t distribution with degree of

freedom n − 1. Note that here θ = Xn−µσ and s.e.

(θ)= σ

σ√n

Tee (RIPED) Statistical Inference 31 / 45

Page 32: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

t and Normal Distribution

−4 −2 0 2 4

0.00.1

0.20.3

0.4

Student's t DistributionsDe

nsity

normaldf=3df=5df=20df=30

Figure: Normal and t distributions.

Tee (RIPED) Statistical Inference 32 / 45

Page 33: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Error Type I and Error Type II: Graphical Representation

Figure: Error type I and error type II

Tee (RIPED) Statistical Inference 33 / 45

Page 34: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Error Type I and Error Type II: Formal Definition

Figure: Error type I and error type II

Tee (RIPED) Statistical Inference 34 / 45

Page 35: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Significance Level of A Test

We usually like to tell us how big is the Type I error: α.

Figure: Significance Level

Tee (RIPED) Statistical Inference 35 / 45

Page 36: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

p-Value

Definition

The p-value is the smallest level α s.t. we would reject H0 at level α withthe observed data

Alternatively, we can use p-value to tell us smallest probability thatwe would reject H0.

Figure: p-value

Tee (RIPED) Statistical Inference 36 / 45

Page 37: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Confidence IntervalThe daily net-return of ABSM from 11/09/2017 to 11/09/2018

−0.02

−0.01

0.00

0.01

0.02

Date

ABSM

11/09

/17

11/10

/17

11/11

/17

11/12

/17

11/01

/18

11/02

/18

11/03

/18

11/04

/18

11/05

/18

11/06

/18

11/07

/18

11/08

/18

11/09

/18

95% CI Fitted values

Figure: Confidence Interval

Tee (RIPED) Statistical Inference 37 / 45

Page 38: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Means of Two Distributions: t-Distribution

When we want to compare means of two distributions with the samevariance σ2:

H0 : µ1 = µ2

We can use the following statistic to test the hypothesis

U =

Xn−Ym

σ√

1n+ 1

m√S2X+S2

Yσ2(n+m−2)

The key point is that it has the t distribution.

Tee (RIPED) Statistical Inference 38 / 45

Page 39: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Means of Two Distributions: Example

−0.02 −0.01 0.00 0.01 0.02

010

2030

4050

6070

kden

sity

−0.02 −0.01 0.00 0.01 0.02

010

2030

4050

6070

kden

sity

ABSM

1AMSET50

Figure: Comparing Means of Two Distributions: daily net return of ABSM and1AMSET50 from 11/09/2017 to 11/09/2018.

Tee (RIPED) Statistical Inference 39 / 45

Page 40: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Means of Two Distributions: Example

We set a hypothesis testing to compare mean of two funds.

H0 : µABSM = µ1AMSET50

Ha : µABSM = µ1AMSET50

Tee (RIPED) Statistical Inference 40 / 45

Page 41: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Variances of Two Distributions: F-Distribution

We can also compare variances of two distributions.

H0 : σ1 = σ2

We can use the following statistic to test the hypothesis

U =

S2X

n−1

S2Y

m−1

The key point is that it has the F distribution.

Tee (RIPED) Statistical Inference 41 / 45

Page 42: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Variances of Two Distributions: Example

We set a hypothesis testing to compare mean of two funds.

H0 : σABSM = σ1AMSET50

Ha : σABSM = σ1AMSET50

Figure: Comparing Variances of Two Distributions: daily net return of ABSM and1AMSET50 from 11/09/2017 to 11/09/2018.

Tee (RIPED) Statistical Inference 42 / 45

Page 43: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Two Distributions: χ2-Distribution

We sometimes would like to compare two distributions.

With categorical or discrete data, we can form the null hypothesis as

H0 : pji = p0i for i = 1, . . . , k and ∀j

where each sample can be categorized into k groups.

The test statistic is

Q =k∑

i=1

(Ni − Np0i

)2Np0i

,

where Ni is the number of observation in group i , and∑k

i=1 Ni = N.

The key point: Q ∼ χ2k−1.

Tee (RIPED) Statistical Inference 43 / 45

Page 44: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Two Distributions: Example

We set a hypothesis testing to compare proportion of two categoricalfunds including: type and policy.

H0 : pji = pTotali

Ha : pji = pTotali

when i = Equity, Fixed, Mixed, Other and j = global, local,Total

global local Total

Equity 0.34 0.59 0.43Fixed 0.38 0.23 0.33Mixed 0.18 0.17 0.18Other 0.10 0.01 0.06

Total 1.00 1.00 1.00

Table: The proportion of number of funds

Tee (RIPED) Statistical Inference 44 / 45

Page 45: Sampling Distributions of Estimators and Hypothesis Testingriped.utcc.ac.th/tee/wp-content/uploads/sites/3/2018/11/12_sampling... · Distribution Function from Simualted Data: Bernoulli

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Comparing Two Distributions: Example

Figure: Comparing proportion of Two Distributions: Type and policy.

Tee (RIPED) Statistical Inference 45 / 45