16943_research methodology lec6 to 11
TRANSCRIPT
-
7/29/2019 16943_Research Methodology Lec6 to 11
1/79
Central limit theorem
If a random variable Y is the sum of n independent
random variables that satisfy certain general
conditions, then for sufficiently large n , Y is
approximately normally distributed.
If X1, X2,X3------Xn is a sequence of n independent
variables with E(Xi) = i and V(Xi) = and Y =
X1+X2+X3+.Xn, then under general conditions
Zn =1
i2
n
i
i
n
i
iY
1
1
2
)(
-
7/29/2019 16943_Research Methodology Lec6 to 11
2/79
Central limit theorem
If X1, X2,X3------Xn is a sequence of n independent
variables with E(Xi) = and V(Xi) = and Y =
X1+X2+X3+.Xn, then
Zn =
2
2
n
nY
)(
-
7/29/2019 16943_Research Methodology Lec6 to 11
3/79
RESEARCH
METHODOLOGY
Sampling Distribution, Chi ,T and FDistributions
-
7/29/2019 16943_Research Methodology Lec6 to 11
4/79
4
-
7/29/2019 16943_Research Methodology Lec6 to 11
5/79
Sampling Distribution
Statistic:- Any function of the observations in a
random sample that does not depends on unknown
parameters.( Used to drawing conclusions)
The sampling distribution of a statistic is the
probability function that describes the probabilistic
behaviour of the statistic in repeated sampling from
the same universe or on the same process variable
assignment model.
5
-
7/29/2019 16943_Research Methodology Lec6 to 11
6/79
Sampling Distribution
If the sample mean of X1, X2,X3------Xn is a linear
combination of n independent variables. then E( ) =
and V( )=
Zn=
Standard error of a statistic is the standard deviation
of its sampling distribution.
If the standard error involves unknown parameters
whose value can be estimated, substitution of these
estimates into the standard error results in the6
X
X n2
X
n
X
/
)(
-
7/29/2019 16943_Research Methodology Lec6 to 11
7/79
Sampling Distribution
Standard error of a statistic is the standard deviation
of its sampling distribution.
If the standard error involves unknown parameters
whose value can be estimated, substitution of these
estimates into the standard error results in the
estimated standard error
Standard error of is
If is unknown the sample standard deviation is
7
Xn
n
s
-
7/29/2019 16943_Research Methodology Lec6 to 11
8/79
Examples
8
Data on the tension bond strength of a modified Portland cement mortar are16.85,16.40,17.21,16.35,16.52,17.04,16.96,17.15,16.59, 16.57calculate the standard
error if the SD is 0.25 kgf/cm2 , if SD not known.
-
7/29/2019 16943_Research Methodology Lec6 to 11
9/79
Examples
9
-
7/29/2019 16943_Research Methodology Lec6 to 11
10/79
-
7/29/2019 16943_Research Methodology Lec6 to 11
11/79
-
7/29/2019 16943_Research Methodology Lec6 to 11
12/79
-
7/29/2019 16943_Research Methodology Lec6 to 11
13/79
t - DISTRIBUTION
-
7/29/2019 16943_Research Methodology Lec6 to 11
14/79
t-Distribution Probability Density Function
A random variable T is said to have the t-
distribution with parameter , called degrees of
freedom, if its probability density function is given
by:- < t <
where is a positive integer
2/12
12/
2/)1()(
tth
-
7/29/2019 16943_Research Methodology Lec6 to 11
15/79
t-Distribution Table of Probabilities
Remark: The distribution of T is usually called the Student-t
or the t-distribution. It is customary to let tp represent the tvalue above which we find an area equal to p.
Values of T, tp, for which P(T > tp,) = p
0 tp t
p
-
7/29/2019 16943_Research Methodology Lec6 to 11
16/79
t-distribution - Probability Density Function for
various values of
-3 -2 -1 0 1 2 3
5
2
-
7/29/2019 16943_Research Methodology Lec6 to 11
17/79
Table of t-Distribution
-
7/29/2019 16943_Research Methodology Lec6 to 11
18/79
t-Distribution - Example
If T~t10,
find:
(a) P(0.542 < T < 2.359)
(b) P(T < -1.812)
(c) t for which P(T>t) = 0.05 .
-
7/29/2019 16943_Research Methodology Lec6 to 11
19/79
Example Solution
(a) P(0.542 < T < 2.359)
= 0.3-0.02 =0.28
(b) P(T < -1.812)=F(-1.812) =P(T > 1.812)=0.05
(c) t for which P(T>t) = 1-F(t ) =0.05 .
t = 1.812
0
t0.542 2.359
0
t-1.812 1.812
0
0.05
t t
-
7/29/2019 16943_Research Methodology Lec6 to 11
20/79
CHI-SQUARED DISTRIBUTION
-
7/29/2019 16943_Research Methodology Lec6 to 11
21/79
Chi-Squared Distribution Probability
Density Function
A random variable X is said to have the Chi-Squared
distribution with parameter , called degrees of
freedom, if the probability density function of X is for x > 0
, elsewhere
where is a positive integer.
2
12
2/ 2/2
1x
ex
0
)( xf
-
7/29/2019 16943_Research Methodology Lec6 to 11
22/79
Chi-Squared Distribution - Remarks
The Chi-Squared distribution plays a vital role in
statistical inference. It has considerable application
in both methodology and theory. It is an important
component of statistical hypothesis testing and
estimation.
The Chi-Squared distribution is a special case of the
Gamma distribution, i.e., when = /2 and = 2.
-
7/29/2019 16943_Research Methodology Lec6 to 11
23/79
Chi-Squared Distribution Mean and Standard
Deviation
Mean or Expected Value
Standard Deviation
2
-
7/29/2019 16943_Research Methodology Lec6 to 11
24/79
Chi-Squared Distribution Table of Probabilities
It is customary to let 2prepresent the value above which wefind an area of p. This is illustrated by the shaded region
below.
For tabulated values of the Chi-Squared distribution see the
Chi-Squared table, which gives values of2pfor various valuesof p and . The areas, p, are the column headings; the degrees
of freedom, , are given in the left column, and the table
entries are the2values.
x
f(x)
p
2,p
)(1 2,pF
20
-
7/29/2019 16943_Research Methodology Lec6 to 11
25/79
Chi-Squared Table
Chi Sq ared Table Contin ed
-
7/29/2019 16943_Research Methodology Lec6 to 11
26/79
Chi-Squared Table Continued
-
7/29/2019 16943_Research Methodology Lec6 to 11
27/79
Chi-Squared Distribution Example
2
15X
-
7/29/2019 16943_Research Methodology Lec6 to 11
28/79
Example Solution
(a) P(7.261 < X < 24.996)
= 0.95-0.05
=0.9
(b)P(X
-
7/29/2019 16943_Research Methodology Lec6 to 11
29/79
F-DISTRIBUTION
-
7/29/2019 16943_Research Methodology Lec6 to 11
30/79
F-Distribution Probability Density Function
A random variable X is said to have the F-distribution with
parameters 1and 2, called degrees of freedom, if the
probability density function is given by:
, 0 < x <
0 , elsewhere
Note : The probability density function of the F-distribution
depends not only on the two parameters 1and 2 but also
on the order in which we state them.
2
)(
2
1
12
21
22121
21
11
)1(2/2/
)/(2/)(
x
x)(xh
-
7/29/2019 16943_Research Methodology Lec6 to 11
31/79
F-Distribution - Application
Remark: The F-distribution is used in two-sample
situations to draw inferences about the population
variances. It is applied to many other types of
problems in which the sample variances areinvolved.
In fact, the F-distribution is called the variance ratio
distribution.
-
7/29/2019 16943_Research Methodology Lec6 to 11
32/79
F-Distribution Probability Density Function
Shapes
probability density functions for various values of 1and 2
6 and 24 d.f.
6 and 10 d.f.
x
0
f(x)
F Di ib i ( 0 01) T bl
-
7/29/2019 16943_Research Methodology Lec6 to 11
33/79
F-Distribution (p=0.01) Table
F Di t ib ti ( 0 05) T bl
-
7/29/2019 16943_Research Methodology Lec6 to 11
34/79
F-Distribution (p=0.05) Table
-
7/29/2019 16943_Research Methodology Lec6 to 11
35/79
F-Distribution Table of Probabilities
The fp is the f value above which we find an area equal to p,illustrated by the shaded area below.
For tabulated values of the F-distribution see the F table,which gives values of xpfor various values of 1and 2. The
degrees of freedom, 1and 2 are the column and row
headings; and the table entries are the x values.
x
f(x)
p
px0
F Di ib i P i
-
7/29/2019 16943_Research Methodology Lec6 to 11
36/79
F-Distribution - Properties
Letx(1, 2) denotexwith 1and 2 degrees of
freedom, then
12
211,
1,
xx
-
7/29/2019 16943_Research Methodology Lec6 to 11
37/79
F-Distribution Example
If Y ~ F6,11,
find:
(a) P(Y < 3.09)
(b) y for which P(Y > y ) = 0.01
E l S l ti
-
7/29/2019 16943_Research Methodology Lec6 to 11
38/79
Example Solution
(a) P(Y < 3.09) = F(3.09)
= 1- P(Y > 3.09) = 1 - 0.05
=0.95
(b) P(Y > y ) = 0.01
y =5.07
11,6 21
y
f(y)
p
0 3.09
y
f(y)
0.01
y
-
7/29/2019 16943_Research Methodology Lec6 to 11
39/79
Learning Objectives
1. Estimate a population parameter (means) based
on a large sample selected from the population
2. Use the sampling distribution of a statistic to
form a confidence interval for the population
parameter
3. Show how to select the proper sample size for
estimating a population parameter
-
7/29/2019 16943_Research Methodology Lec6 to 11
40/79
Statistical Interval for a Single Sample
Outlines:
Confidence interval on the mean of a normal
distribution, variance known.
Confidence interval on the mean of a normal
distribution, variance unknown.
Confidence interval on the variance and standard
deviation of a normal distribution.
-
7/29/2019 16943_Research Methodology Lec6 to 11
41/79
Statistical Methods
Statistical
Methods
EstimationHypothesis
Testing
Inferential
Statistics
Descriptive
Statistics
-
7/29/2019 16943_Research Methodology Lec6 to 11
42/79
Statistical Methods
Statistical
Methods
EstimationHypothesis
Testing
Inferential
Statistics
Descriptive
Statistics
-
7/29/2019 16943_Research Methodology Lec6 to 11
43/79
Point Estimator
A point estimator of a population parameter is a rule
or formula that tells us how to use the sample data to
calculate a single number that can be used as anestimate of the target parameter.
-
7/29/2019 16943_Research Methodology Lec6 to 11
44/79
Point Estimation
1. Provides a single value
Based on observations from one sample
2. Gives no information about how close the value is
to the unknown population parameter
3. Example: Sample meanx= 3 is the point
estimate of the unknown population mean
-
7/29/2019 16943_Research Methodology Lec6 to 11
45/79
Interval Estimator
An interval estimator (or confidence interval) is a
formula that tells us how to use the sample data to
calculate an intervalthat estimates the targetparameter.
-
7/29/2019 16943_Research Methodology Lec6 to 11
46/79
Interval Estimation
1. Provides a range of values
Based on observations from one sample
2. Gives information about closeness to unknown
population parameter
Stated in terms of probability
Knowing exact closeness requires knowing unknown
population parameter
3. Example: Unknown population mean lies between 50
and 70 with 95% confidence
-
7/29/2019 16943_Research Methodology Lec6 to 11
47/79
2011 Pearson Education,Inc
Key Elements ofInterval Estimation
Sample statistic
(point estimate)Confidence interval
Confidence limit
(lower)
Confidence limit
(upper)
A confidence interval provides a range of plausible
values for the population parameter.
-
7/29/2019 16943_Research Methodology Lec6 to 11
48/79
Confidence interval
Confidence interval: Bounds represent an interval of plausible
values for a parameter.
Suppose that we estimate the mean viscosity of a chemical
product to be , we do not know exactly that themean likely to be between 900 and 1100? or 990 and 1010?
Because we use a sample from the population to compute the
interval, we have high confident that it does contain the
unknown population parameter.
1000
x
-
7/29/2019 16943_Research Methodology Lec6 to 11
49/79
Confidence interval
Practical example
A machine fills cups with margarine, and is supposed to be adjusted so that
the mean content of the cups is close to 250 grams of margarine. Of course
it is not possible to fill every cup with exactly 250 grams of margarine.
Hence the weight of the filling can be considered to be a random variableX.The distribution ofXis assumed here to be a normal distribution with
unknown expectation and known standard deviation = 2.5 grams.
To check if the machine is adequately calibrated, a sample ofn = 25 cups of
margarine is chosen at random.
The sample shows actual weights , with mean:
if the population mean actually around 250g. The value of
If , population mean shouldnt close to 250g.
25321 ,...,,, xxxx2.250
1 25
1
i
ix
nx
1.251,4.250x
?,6.280 x
-
7/29/2019 16943_Research Methodology Lec6 to 11
50/79
Confidence interval (Case I)
Confidence interval on the mean of a normal distribution,
variance known.
Suppose thatX1, X2, ...,Xnis a random sample from a normal
distribution with unknown and known 2
. We known that
A Confidence interval estimate for is
n
XZ
/
)/,(~ nNX
UL
1}{ ULP
Prob. of selecting samples provide the range of that contains the true value of
-
7/29/2019 16943_Research Methodology Lec6 to 11
51/79
Confidence interval (Case I)
In order to find lower and upper confidence limits:
1}{
1}/
{
2/2/
2/2/
nzXnzXP
zn
XzP
-
7/29/2019 16943_Research Methodology Lec6 to 11
52/79
Confidence interval (Case I)
Ex. Ten measurements of impact energy on specimens of steel are: 64.1, 64.7,
64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. Assume that impact energy
is normally distributed with = 1 J. We want to find a 95% CI for
That is, based on the sample data, a range of highly plausible values for mean impact
energy for steel is 63.84J-65.08J.
-
7/29/2019 16943_Research Methodology Lec6 to 11
53/79
Confidence interval (Case I)
Choice of Sample Size
-
7/29/2019 16943_Research Methodology Lec6 to 11
54/79
Confidence interval (Case I)
Ex. Consider the previous example, we want to determine how many
specimens must be tested to ensure that the 95% CI on of steel has a
length at most 1.0J.
CI length
-
7/29/2019 16943_Research Methodology Lec6 to 11
55/79
Confidence interval (Case I)
One-Sided Confidence Bounds
Ex. From previous Ex, find a lower one sided 95% CI for mean impact energy.
94.63
10
164.146.64
5.0n
zx
-
7/29/2019 16943_Research Methodology Lec6 to 11
56/79
Confidence interval (Case I)
Large Sample Confidence Interval for
has any distribution, n>=30, variance unknown
We can approximate CI for by replacing by S.
iX
-
7/29/2019 16943_Research Methodology Lec6 to 11
57/79
Confidence interval (Case I)
Ex
-
7/29/2019 16943_Research Methodology Lec6 to 11
58/79
Confidence interval (Case I)
-
7/29/2019 16943_Research Methodology Lec6 to 11
59/79
Confidence interval (Case II)
Confidence interval on the mean of a normal distribution,
variance unknown.
Suppose thatX1, X2, ...,Xnis a random sample from a normal distribution
with unknown and unknown 2.
n
-
7/29/2019 16943_Research Methodology Lec6 to 11
60/79
Confidence interval (Case II)
-
7/29/2019 16943_Research Methodology Lec6 to 11
61/79
Confidence interval (Case II)
Ex
-
7/29/2019 16943_Research Methodology Lec6 to 11
62/79
Confidence interval (Case III)
Confidence interval on the variance and standard deviation of a
normal distribution.
-
7/29/2019 16943_Research Methodology Lec6 to 11
63/79
Confidence interval (Case III)
Two-Sided CI
One-Sided CI
-
7/29/2019 16943_Research Methodology Lec6 to 11
64/79
Confidence interval (Case III)
Ex
-
7/29/2019 16943_Research Methodology Lec6 to 11
65/79
Two random samples are drawn from the two
populations of interest.
Because we compare two population means, we
use the statistic .
Confidence Intervals for the Difference
between Two Population Means 1 - 2:Independent Samples
21 xx
P l ti 1 P l ti 2
-
7/29/2019 16943_Research Methodology Lec6 to 11
66/79
Population 1 Population 2
Parameters: 1 and 12 Parameters: 2 and 22
(values are unknown) (values are unknown)
Sample size: n1 Sample size: n2
Statistics:x1 and s12 Statistics:x2 and s2
2
Estimate 12 withx1x2
-
7/29/2019 16943_Research Methodology Lec6 to 11
67/79
Confidence Interval for 1 2
*
*
Confidence interval
2 21 2( )
1 2
1 2where is the value from the z-table
that corresponds to the confidence level
x x zn n
z
Note: when the values of12 and 2
2 are unknown, the
sample variances s12 and s2
2 computed from the data can be
used.
-
7/29/2019 16943_Research Methodology Lec6 to 11
68/79
Do people who eat high-fiber cereal forbreakfast consume, on average, fewer calories
for lunch than people who do not eat high-fibercereal for breakfast?
A sample of 150 people was randomly drawn.Each person was identified as a consumer or a
non-consumer of high-fiber cereal. For each person the number of calories
consumed at lunch was recorded.
Example: confidence interval for 1 2
-
7/29/2019 16943_Research Methodology Lec6 to 11
69/79
onsmers on-cmrs568 705
498 819
589 706
681 509
540 613
646 582
636 601739 608
539 787
596 573
607 428
529 754
637 741
617 628
633 537
555 748
. .
. .
. .
. .
Solution: The parameter to be tested is
the difference between two means.
The claim to be tested is:The mean caloric intake of consumers (1)
is less than that of non-consumers (2).
Use s12 = 4,103 for 1
2 and s22 = 10,670
for22
Example: confidence interval for 1 2
1 1 2 243, 604.02; 107, 633.239n x n x
-
7/29/2019 16943_Research Methodology Lec6 to 11
70/79
Interpretation
-
7/29/2019 16943_Research Methodology Lec6 to 11
71/79
Interpretation
The 95% CI is (-56.59, -1.83).
We are 95% confident that the interval
(-56.59, -1.83) contains the true but unknown
difference 1 2
Since the interval is entirely negative (that is, does
not contain 0), there is evidence from the data that
1 is less than 2. We estimate that non-
consumers of high-fiber breakfast consume onaverage between 1.83 and 56.59 more calories for
lunch.
-
7/29/2019 16943_Research Methodology Lec6 to 11
72/79
-
7/29/2019 16943_Research Methodology Lec6 to 11
73/79
-
7/29/2019 16943_Research Methodology Lec6 to 11
74/79
-
7/29/2019 16943_Research Methodology Lec6 to 11
75/79
Table of t-Distribution
-
7/29/2019 16943_Research Methodology Lec6 to 11
76/79
Chi-Squared Table Continued
-
7/29/2019 16943_Research Methodology Lec6 to 11
77/79
F-Distribution (p=0.01) Table
-
7/29/2019 16943_Research Methodology Lec6 to 11
78/79
F-Distribution (p=0.05) Table
-
7/29/2019 16943_Research Methodology Lec6 to 11
79/79