16943_research methodology lec6 to 11

7/29/2019 16943_Research Methodology Lec6 to 11

1/79

Central limit theorem

If a random variable Y is the sum of n independent

random variables that satisfy certain general

conditions, then for sufficiently large n , Y is

approximately normally distributed.

If X1, X2,X3------Xn is a sequence of n independent

variables with E(Xi) = i and V(Xi) = and Y =

X1+X2+X3+.Xn, then under general conditions

Zn =1

i2

n

i

i

n

i

iY

1

1

2

)(


2/79

Central limit theorem

If X1, X2,X3------Xn is a sequence of n independent

variables with E(Xi) = and V(Xi) = and Y =

X1+X2+X3+.Xn, then

Zn =

2

2

n

nY

)(


3/79

RESEARCH

METHODOLOGY

Sampling Distribution, Chi ,T and FDistributions


4/79

4


5/79

Sampling Distribution

Statistic:- Any function of the observations in a

random sample that does not depends on unknown

parameters.( Used to drawing conclusions)

The sampling distribution of a statistic is the

probability function that describes the probabilistic

behaviour of the statistic in repeated sampling from

the same universe or on the same process variable

assignment model.

5


6/79


If the sample mean of X1, X2,X3------Xn is a linear

combination of n independent variables. then E( ) =

and V( )=

Zn=

Standard error of a statistic is the standard deviation

of its sampling distribution.

If the standard error involves unknown parameters

whose value can be estimated, substitution of these

estimates into the standard error results in the6

X

X n2

X

n

X

/

)(


7/79


Standard error of a statistic is the standard deviation

of its sampling distribution.

If the standard error involves unknown parameters

whose value can be estimated, substitution of these

estimates into the standard error results in the

estimated standard error

Standard error of is

If is unknown the sample standard deviation is

7

Xn

n

s


8/79

Examples

8

Data on the tension bond strength of a modified Portland cement mortar are16.85,16.40,17.21,16.35,16.52,17.04,16.96,17.15,16.59, 16.57calculate the standard

error if the SD is 0.25 kgf/cm2 , if SD not known.


9/79

Examples

9


10/79


11/79


12/79


13/79

t - DISTRIBUTION


14/79

t-Distribution Probability Density Function

A random variable T is said to have the t-

distribution with parameter , called degrees of

freedom, if its probability density function is given

by:- < t <

where is a positive integer

2/12

12/

2/)1()(

tth


15/79

t-Distribution Table of Probabilities

Remark: The distribution of T is usually called the Student-t

or the t-distribution. It is customary to let tp represent the tvalue above which we find an area equal to p.

Values of T, tp, for which P(T > tp,) = p

0 tp t

p


16/79

t-distribution - Probability Density Function for

various values of

-3 -2 -1 0 1 2 3

5

2


17/79

Table of t-Distribution


18/79

t-Distribution - Example

If T~t10,

find:

(a) P(0.542 < T < 2.359)

(b) P(T < -1.812)

(c) t for which P(T>t) = 0.05 .


19/79

Example Solution

(a) P(0.542 < T < 2.359)

= 0.3-0.02 =0.28

(b) P(T < -1.812)=F(-1.812) =P(T > 1.812)=0.05

(c) t for which P(T>t) = 1-F(t ) =0.05 .

t = 1.812

0

t0.542 2.359

0

t-1.812 1.812

0

0.05

t t


20/79

CHI-SQUARED DISTRIBUTION


21/79

Chi-Squared Distribution Probability

Density Function

A random variable X is said to have the Chi-Squared

distribution with parameter , called degrees of

freedom, if the probability density function of X is for x > 0

, elsewhere

where is a positive integer.

2

12

2/ 2/2

1x

ex

0

)( xf


22/79

Chi-Squared Distribution - Remarks

The Chi-Squared distribution plays a vital role in

statistical inference. It has considerable application

in both methodology and theory. It is an important

component of statistical hypothesis testing and

estimation.

The Chi-Squared distribution is a special case of the

Gamma distribution, i.e., when = /2 and = 2.


23/79

Chi-Squared Distribution Mean and Standard

Deviation

Mean or Expected Value

Standard Deviation

2


24/79

Chi-Squared Distribution Table of Probabilities

It is customary to let 2prepresent the value above which wefind an area of p. This is illustrated by the shaded region

below.

For tabulated values of the Chi-Squared distribution see the

Chi-Squared table, which gives values of2pfor various valuesof p and . The areas, p, are the column headings; the degrees

of freedom, , are given in the left column, and the table

entries are the2values.

x

f(x)

p

2,p

)(1 2,pF

20


25/79

Chi-Squared Table

Chi Sq ared Table Contin ed


26/79

Chi-Squared Table Continued


27/79

Chi-Squared Distribution Example

2

15X


28/79

Example Solution

(a) P(7.261 < X < 24.996)

= 0.95-0.05

=0.9

(b)P(X


29/79

F-DISTRIBUTION


30/79

F-Distribution Probability Density Function

A random variable X is said to have the F-distribution with

parameters 1and 2, called degrees of freedom, if the

probability density function is given by:

, 0 < x <

0 , elsewhere

Note : The probability density function of the F-distribution

depends not only on the two parameters 1and 2 but also

on the order in which we state them.

2

)(

2

1

12

21

22121

21

11

)1(2/2/

)/(2/)(

x

x)(xh


31/79

F-Distribution - Application

Remark: The F-distribution is used in two-sample

situations to draw inferences about the population

variances. It is applied to many other types of

problems in which the sample variances areinvolved.

In fact, the F-distribution is called the variance ratio

distribution.


32/79

F-Distribution Probability Density Function

Shapes

probability density functions for various values of 1and 2

6 and 24 d.f.

6 and 10 d.f.

x

0

f(x)

F Di ib i ( 0 01) T bl


33/79

F-Distribution (p=0.01) Table

F Di t ib ti ( 0 05) T bl


34/79



35/79

F-Distribution Table of Probabilities

The fp is the f value above which we find an area equal to p,illustrated by the shaded area below.

For tabulated values of the F-distribution see the F table,which gives values of xpfor various values of 1and 2. The

degrees of freedom, 1and 2 are the column and row

headings; and the table entries are the x values.

x

f(x)

p

px0

F Di ib i P i


36/79

F-Distribution - Properties

Letx(1, 2) denotexwith 1and 2 degrees of

freedom, then

12

211,

1,

xx


37/79

F-Distribution Example

If Y ~ F6,11,

find:

(a) P(Y < 3.09)

(b) y for which P(Y > y ) = 0.01

E l S l ti


38/79

Example Solution

(a) P(Y < 3.09) = F(3.09)

= 1- P(Y > 3.09) = 1 - 0.05

=0.95

(b) P(Y > y ) = 0.01

y =5.07

11,6 21

y

f(y)

p

0 3.09

y

f(y)

0.01

y


39/79

Learning Objectives

1. Estimate a population parameter (means) based

on a large sample selected from the population

2. Use the sampling distribution of a statistic to

form a confidence interval for the population

parameter

3. Show how to select the proper sample size for

estimating a population parameter


40/79

Statistical Interval for a Single Sample

Outlines:

Confidence interval on the mean of a normal

distribution, variance known.

Confidence interval on the mean of a normal

distribution, variance unknown.

Confidence interval on the variance and standard

deviation of a normal distribution.


41/79

Statistical Methods

Statistical

Methods

EstimationHypothesis

Testing

Inferential

Statistics

Descriptive

Statistics


42/79

Statistical Methods

Statistical

Methods

EstimationHypothesis

Testing

Inferential

Statistics

Descriptive

Statistics


43/79

Point Estimator

A point estimator of a population parameter is a rule

or formula that tells us how to use the sample data to

calculate a single number that can be used as anestimate of the target parameter.


44/79

Point Estimation

1. Provides a single value

Based on observations from one sample

2. Gives no information about how close the value is

to the unknown population parameter

3. Example: Sample meanx= 3 is the point

estimate of the unknown population mean


45/79

Interval Estimator

An interval estimator (or confidence interval) is a

formula that tells us how to use the sample data to

calculate an intervalthat estimates the targetparameter.


46/79

Interval Estimation

1. Provides a range of values

Based on observations from one sample

2. Gives information about closeness to unknown

population parameter

Stated in terms of probability

Knowing exact closeness requires knowing unknown

population parameter

3. Example: Unknown population mean lies between 50

and 70 with 95% confidence


47/79

2011 Pearson Education,Inc

Key Elements ofInterval Estimation

Sample statistic

(point estimate)Confidence interval

Confidence limit

(lower)

Confidence limit

(upper)

A confidence interval provides a range of plausible

values for the population parameter.


48/79

Confidence interval

Confidence interval: Bounds represent an interval of plausible

values for a parameter.

Suppose that we estimate the mean viscosity of a chemical

product to be , we do not know exactly that themean likely to be between 900 and 1100? or 990 and 1010?

Because we use a sample from the population to compute the

interval, we have high confident that it does contain the

unknown population parameter.

1000

x


49/79

Confidence interval

Practical example

A machine fills cups with margarine, and is supposed to be adjusted so that

the mean content of the cups is close to 250 grams of margarine. Of course

it is not possible to fill every cup with exactly 250 grams of margarine.

Hence the weight of the filling can be considered to be a random variableX.The distribution ofXis assumed here to be a normal distribution with

unknown expectation and known standard deviation = 2.5 grams.

To check if the machine is adequately calibrated, a sample ofn = 25 cups of

margarine is chosen at random.

The sample shows actual weights , with mean:

if the population mean actually around 250g. The value of

If , population mean shouldnt close to 250g.

25321 ,...,,, xxxx2.250

1 25

1

i

ix

nx

1.251,4.250x

?,6.280 x


50/79

Confidence interval (Case I)

Confidence interval on the mean of a normal distribution,

variance known.

Suppose thatX1, X2, ...,Xnis a random sample from a normal

distribution with unknown and known 2

. We known that

A Confidence interval estimate for is

n

XZ

/

)/,(~ nNX

UL

1}{ ULP

Prob. of selecting samples provide the range of that contains the true value of


51/79


In order to find lower and upper confidence limits:

1}{

1}/

{

2/2/

2/2/

nzXnzXP

zn

XzP


52/79


Ex. Ten measurements of impact energy on specimens of steel are: 64.1, 64.7,

64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. Assume that impact energy

is normally distributed with = 1 J. We want to find a 95% CI for

That is, based on the sample data, a range of highly plausible values for mean impact

energy for steel is 63.84J-65.08J.


53/79


Choice of Sample Size


54/79


Ex. Consider the previous example, we want to determine how many

specimens must be tested to ensure that the 95% CI on of steel has a

length at most 1.0J.

CI length


55/79


One-Sided Confidence Bounds

Ex. From previous Ex, find a lower one sided 95% CI for mean impact energy.

94.63

10

164.146.64

5.0n

zx


56/79


Large Sample Confidence Interval for

has any distribution, n>=30, variance unknown

We can approximate CI for by replacing by S.

iX


57/79


Ex


58/79



59/79

Confidence interval (Case II)

Confidence interval on the mean of a normal distribution,

variance unknown.

Suppose thatX1, X2, ...,Xnis a random sample from a normal distribution

with unknown and unknown 2.

n


60/79



61/79


Ex


62/79

Confidence interval (Case III)

Confidence interval on the variance and standard deviation of a

normal distribution.


63/79


Two-Sided CI

One-Sided CI


64/79


Ex


65/79

Two random samples are drawn from the two

populations of interest.

Because we compare two population means, we

use the statistic .

Confidence Intervals for the Difference

between Two Population Means 1 - 2:Independent Samples

21 xx

P l ti 1 P l ti 2


66/79

Population 1 Population 2

Parameters: 1 and 12 Parameters: 2 and 22

(values are unknown) (values are unknown)

Sample size: n1 Sample size: n2

Statistics:x1 and s12 Statistics:x2 and s2

2

Estimate 12 withx1x2


67/79

Confidence Interval for 1 2

*

*

Confidence interval

2 21 2( )

1 2

1 2where is the value from the z-table

that corresponds to the confidence level

x x zn n

z

Note: when the values of12 and 2

2 are unknown, the

sample variances s12 and s2

2 computed from the data can be

used.


68/79

Do people who eat high-fiber cereal forbreakfast consume, on average, fewer calories

for lunch than people who do not eat high-fibercereal for breakfast?

A sample of 150 people was randomly drawn.Each person was identified as a consumer or a

non-consumer of high-fiber cereal. For each person the number of calories

consumed at lunch was recorded.

Example: confidence interval for 1 2


69/79

onsmers on-cmrs568 705

498 819

589 706

681 509

540 613

646 582

636 601739 608

539 787

596 573

607 428

529 754

637 741

617 628

633 537

555 748

. .

. .

. .

. .

Solution: The parameter to be tested is

the difference between two means.

The claim to be tested is:The mean caloric intake of consumers (1)

is less than that of non-consumers (2).

Use s12 = 4,103 for 1

2 and s22 = 10,670

for22

Example: confidence interval for 1 2

1 1 2 243, 604.02; 107, 633.239n x n x


70/79

Interpretation


71/79

Interpretation

The 95% CI is (-56.59, -1.83).

We are 95% confident that the interval

(-56.59, -1.83) contains the true but unknown

difference 1 2

Since the interval is entirely negative (that is, does

not contain 0), there is evidence from the data that

1 is less than 2. We estimate that non-

consumers of high-fiber breakfast consume onaverage between 1.83 and 56.59 more calories for

lunch.


72/79


73/79


74/79


75/79

Table of t-Distribution


76/79

Chi-Squared Table Continued


77/79



78/79



79/79

16943_research methodology lec6 to 11

Documents