16943_research methodology lec6 to 11

Upload: akash-nigam

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    1/79

    Central limit theorem

    If a random variable Y is the sum of n independent

    random variables that satisfy certain general

    conditions, then for sufficiently large n , Y is

    approximately normally distributed.

    If X1, X2,X3------Xn is a sequence of n independent

    variables with E(Xi) = i and V(Xi) = and Y =

    X1+X2+X3+.Xn, then under general conditions

    Zn =1

    i2

    n

    i

    i

    n

    i

    iY

    1

    1

    2

    )(

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    2/79

    Central limit theorem

    If X1, X2,X3------Xn is a sequence of n independent

    variables with E(Xi) = and V(Xi) = and Y =

    X1+X2+X3+.Xn, then

    Zn =

    2

    2

    n

    nY

    )(

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    3/79

    RESEARCH

    METHODOLOGY

    Sampling Distribution, Chi ,T and FDistributions

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    4/79

    4

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    5/79

    Sampling Distribution

    Statistic:- Any function of the observations in a

    random sample that does not depends on unknown

    parameters.( Used to drawing conclusions)

    The sampling distribution of a statistic is the

    probability function that describes the probabilistic

    behaviour of the statistic in repeated sampling from

    the same universe or on the same process variable

    assignment model.

    5

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    6/79

    Sampling Distribution

    If the sample mean of X1, X2,X3------Xn is a linear

    combination of n independent variables. then E( ) =

    and V( )=

    Zn=

    Standard error of a statistic is the standard deviation

    of its sampling distribution.

    If the standard error involves unknown parameters

    whose value can be estimated, substitution of these

    estimates into the standard error results in the6

    X

    X n2

    X

    n

    X

    /

    )(

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    7/79

    Sampling Distribution

    Standard error of a statistic is the standard deviation

    of its sampling distribution.

    If the standard error involves unknown parameters

    whose value can be estimated, substitution of these

    estimates into the standard error results in the

    estimated standard error

    Standard error of is

    If is unknown the sample standard deviation is

    7

    Xn

    n

    s

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    8/79

    Examples

    8

    Data on the tension bond strength of a modified Portland cement mortar are16.85,16.40,17.21,16.35,16.52,17.04,16.96,17.15,16.59, 16.57calculate the standard

    error if the SD is 0.25 kgf/cm2 , if SD not known.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    9/79

    Examples

    9

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    10/79

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    11/79

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    12/79

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    13/79

    t - DISTRIBUTION

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    14/79

    t-Distribution Probability Density Function

    A random variable T is said to have the t-

    distribution with parameter , called degrees of

    freedom, if its probability density function is given

    by:- < t <

    where is a positive integer

    2/12

    12/

    2/)1()(

    tth

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    15/79

    t-Distribution Table of Probabilities

    Remark: The distribution of T is usually called the Student-t

    or the t-distribution. It is customary to let tp represent the tvalue above which we find an area equal to p.

    Values of T, tp, for which P(T > tp,) = p

    0 tp t

    p

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    16/79

    t-distribution - Probability Density Function for

    various values of

    -3 -2 -1 0 1 2 3

    5

    2

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    17/79

    Table of t-Distribution

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    18/79

    t-Distribution - Example

    If T~t10,

    find:

    (a) P(0.542 < T < 2.359)

    (b) P(T < -1.812)

    (c) t for which P(T>t) = 0.05 .

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    19/79

    Example Solution

    (a) P(0.542 < T < 2.359)

    = 0.3-0.02 =0.28

    (b) P(T < -1.812)=F(-1.812) =P(T > 1.812)=0.05

    (c) t for which P(T>t) = 1-F(t ) =0.05 .

    t = 1.812

    0

    t0.542 2.359

    0

    t-1.812 1.812

    0

    0.05

    t t

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    20/79

    CHI-SQUARED DISTRIBUTION

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    21/79

    Chi-Squared Distribution Probability

    Density Function

    A random variable X is said to have the Chi-Squared

    distribution with parameter , called degrees of

    freedom, if the probability density function of X is for x > 0

    , elsewhere

    where is a positive integer.

    2

    12

    2/ 2/2

    1x

    ex

    0

    )( xf

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    22/79

    Chi-Squared Distribution - Remarks

    The Chi-Squared distribution plays a vital role in

    statistical inference. It has considerable application

    in both methodology and theory. It is an important

    component of statistical hypothesis testing and

    estimation.

    The Chi-Squared distribution is a special case of the

    Gamma distribution, i.e., when = /2 and = 2.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    23/79

    Chi-Squared Distribution Mean and Standard

    Deviation

    Mean or Expected Value

    Standard Deviation

    2

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    24/79

    Chi-Squared Distribution Table of Probabilities

    It is customary to let 2prepresent the value above which wefind an area of p. This is illustrated by the shaded region

    below.

    For tabulated values of the Chi-Squared distribution see the

    Chi-Squared table, which gives values of2pfor various valuesof p and . The areas, p, are the column headings; the degrees

    of freedom, , are given in the left column, and the table

    entries are the2values.

    x

    f(x)

    p

    2,p

    )(1 2,pF

    20

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    25/79

    Chi-Squared Table

    Chi Sq ared Table Contin ed

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    26/79

    Chi-Squared Table Continued

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    27/79

    Chi-Squared Distribution Example

    2

    15X

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    28/79

    Example Solution

    (a) P(7.261 < X < 24.996)

    = 0.95-0.05

    =0.9

    (b)P(X

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    29/79

    F-DISTRIBUTION

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    30/79

    F-Distribution Probability Density Function

    A random variable X is said to have the F-distribution with

    parameters 1and 2, called degrees of freedom, if the

    probability density function is given by:

    , 0 < x <

    0 , elsewhere

    Note : The probability density function of the F-distribution

    depends not only on the two parameters 1and 2 but also

    on the order in which we state them.

    2

    )(

    2

    1

    12

    21

    22121

    21

    11

    )1(2/2/

    )/(2/)(

    x

    x)(xh

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    31/79

    F-Distribution - Application

    Remark: The F-distribution is used in two-sample

    situations to draw inferences about the population

    variances. It is applied to many other types of

    problems in which the sample variances areinvolved.

    In fact, the F-distribution is called the variance ratio

    distribution.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    32/79

    F-Distribution Probability Density Function

    Shapes

    probability density functions for various values of 1and 2

    6 and 24 d.f.

    6 and 10 d.f.

    x

    0

    f(x)

    F Di ib i ( 0 01) T bl

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    33/79

    F-Distribution (p=0.01) Table

    F Di t ib ti ( 0 05) T bl

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    34/79

    F-Distribution (p=0.05) Table

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    35/79

    F-Distribution Table of Probabilities

    The fp is the f value above which we find an area equal to p,illustrated by the shaded area below.

    For tabulated values of the F-distribution see the F table,which gives values of xpfor various values of 1and 2. The

    degrees of freedom, 1and 2 are the column and row

    headings; and the table entries are the x values.

    x

    f(x)

    p

    px0

    F Di ib i P i

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    36/79

    F-Distribution - Properties

    Letx(1, 2) denotexwith 1and 2 degrees of

    freedom, then

    12

    211,

    1,

    xx

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    37/79

    F-Distribution Example

    If Y ~ F6,11,

    find:

    (a) P(Y < 3.09)

    (b) y for which P(Y > y ) = 0.01

    E l S l ti

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    38/79

    Example Solution

    (a) P(Y < 3.09) = F(3.09)

    = 1- P(Y > 3.09) = 1 - 0.05

    =0.95

    (b) P(Y > y ) = 0.01

    y =5.07

    11,6 21

    y

    f(y)

    p

    0 3.09

    y

    f(y)

    0.01

    y

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    39/79

    Learning Objectives

    1. Estimate a population parameter (means) based

    on a large sample selected from the population

    2. Use the sampling distribution of a statistic to

    form a confidence interval for the population

    parameter

    3. Show how to select the proper sample size for

    estimating a population parameter

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    40/79

    Statistical Interval for a Single Sample

    Outlines:

    Confidence interval on the mean of a normal

    distribution, variance known.

    Confidence interval on the mean of a normal

    distribution, variance unknown.

    Confidence interval on the variance and standard

    deviation of a normal distribution.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    41/79

    Statistical Methods

    Statistical

    Methods

    EstimationHypothesis

    Testing

    Inferential

    Statistics

    Descriptive

    Statistics

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    42/79

    Statistical Methods

    Statistical

    Methods

    EstimationHypothesis

    Testing

    Inferential

    Statistics

    Descriptive

    Statistics

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    43/79

    Point Estimator

    A point estimator of a population parameter is a rule

    or formula that tells us how to use the sample data to

    calculate a single number that can be used as anestimate of the target parameter.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    44/79

    Point Estimation

    1. Provides a single value

    Based on observations from one sample

    2. Gives no information about how close the value is

    to the unknown population parameter

    3. Example: Sample meanx= 3 is the point

    estimate of the unknown population mean

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    45/79

    Interval Estimator

    An interval estimator (or confidence interval) is a

    formula that tells us how to use the sample data to

    calculate an intervalthat estimates the targetparameter.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    46/79

    Interval Estimation

    1. Provides a range of values

    Based on observations from one sample

    2. Gives information about closeness to unknown

    population parameter

    Stated in terms of probability

    Knowing exact closeness requires knowing unknown

    population parameter

    3. Example: Unknown population mean lies between 50

    and 70 with 95% confidence

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    47/79

    2011 Pearson Education,Inc

    Key Elements ofInterval Estimation

    Sample statistic

    (point estimate)Confidence interval

    Confidence limit

    (lower)

    Confidence limit

    (upper)

    A confidence interval provides a range of plausible

    values for the population parameter.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    48/79

    Confidence interval

    Confidence interval: Bounds represent an interval of plausible

    values for a parameter.

    Suppose that we estimate the mean viscosity of a chemical

    product to be , we do not know exactly that themean likely to be between 900 and 1100? or 990 and 1010?

    Because we use a sample from the population to compute the

    interval, we have high confident that it does contain the

    unknown population parameter.

    1000

    x

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    49/79

    Confidence interval

    Practical example

    A machine fills cups with margarine, and is supposed to be adjusted so that

    the mean content of the cups is close to 250 grams of margarine. Of course

    it is not possible to fill every cup with exactly 250 grams of margarine.

    Hence the weight of the filling can be considered to be a random variableX.The distribution ofXis assumed here to be a normal distribution with

    unknown expectation and known standard deviation = 2.5 grams.

    To check if the machine is adequately calibrated, a sample ofn = 25 cups of

    margarine is chosen at random.

    The sample shows actual weights , with mean:

    if the population mean actually around 250g. The value of

    If , population mean shouldnt close to 250g.

    25321 ,...,,, xxxx2.250

    1 25

    1

    i

    ix

    nx

    1.251,4.250x

    ?,6.280 x

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    50/79

    Confidence interval (Case I)

    Confidence interval on the mean of a normal distribution,

    variance known.

    Suppose thatX1, X2, ...,Xnis a random sample from a normal

    distribution with unknown and known 2

    . We known that

    A Confidence interval estimate for is

    n

    XZ

    /

    )/,(~ nNX

    UL

    1}{ ULP

    Prob. of selecting samples provide the range of that contains the true value of

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    51/79

    Confidence interval (Case I)

    In order to find lower and upper confidence limits:

    1}{

    1}/

    {

    2/2/

    2/2/

    nzXnzXP

    zn

    XzP

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    52/79

    Confidence interval (Case I)

    Ex. Ten measurements of impact energy on specimens of steel are: 64.1, 64.7,

    64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. Assume that impact energy

    is normally distributed with = 1 J. We want to find a 95% CI for

    That is, based on the sample data, a range of highly plausible values for mean impact

    energy for steel is 63.84J-65.08J.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    53/79

    Confidence interval (Case I)

    Choice of Sample Size

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    54/79

    Confidence interval (Case I)

    Ex. Consider the previous example, we want to determine how many

    specimens must be tested to ensure that the 95% CI on of steel has a

    length at most 1.0J.

    CI length

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    55/79

    Confidence interval (Case I)

    One-Sided Confidence Bounds

    Ex. From previous Ex, find a lower one sided 95% CI for mean impact energy.

    94.63

    10

    164.146.64

    5.0n

    zx

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    56/79

    Confidence interval (Case I)

    Large Sample Confidence Interval for

    has any distribution, n>=30, variance unknown

    We can approximate CI for by replacing by S.

    iX

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    57/79

    Confidence interval (Case I)

    Ex

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    58/79

    Confidence interval (Case I)

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    59/79

    Confidence interval (Case II)

    Confidence interval on the mean of a normal distribution,

    variance unknown.

    Suppose thatX1, X2, ...,Xnis a random sample from a normal distribution

    with unknown and unknown 2.

    n

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    60/79

    Confidence interval (Case II)

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    61/79

    Confidence interval (Case II)

    Ex

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    62/79

    Confidence interval (Case III)

    Confidence interval on the variance and standard deviation of a

    normal distribution.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    63/79

    Confidence interval (Case III)

    Two-Sided CI

    One-Sided CI

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    64/79

    Confidence interval (Case III)

    Ex

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    65/79

    Two random samples are drawn from the two

    populations of interest.

    Because we compare two population means, we

    use the statistic .

    Confidence Intervals for the Difference

    between Two Population Means 1 - 2:Independent Samples

    21 xx

    P l ti 1 P l ti 2

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    66/79

    Population 1 Population 2

    Parameters: 1 and 12 Parameters: 2 and 22

    (values are unknown) (values are unknown)

    Sample size: n1 Sample size: n2

    Statistics:x1 and s12 Statistics:x2 and s2

    2

    Estimate 12 withx1x2

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    67/79

    Confidence Interval for 1 2

    *

    *

    Confidence interval

    2 21 2( )

    1 2

    1 2where is the value from the z-table

    that corresponds to the confidence level

    x x zn n

    z

    Note: when the values of12 and 2

    2 are unknown, the

    sample variances s12 and s2

    2 computed from the data can be

    used.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    68/79

    Do people who eat high-fiber cereal forbreakfast consume, on average, fewer calories

    for lunch than people who do not eat high-fibercereal for breakfast?

    A sample of 150 people was randomly drawn.Each person was identified as a consumer or a

    non-consumer of high-fiber cereal. For each person the number of calories

    consumed at lunch was recorded.

    Example: confidence interval for 1 2

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    69/79

    onsmers on-cmrs568 705

    498 819

    589 706

    681 509

    540 613

    646 582

    636 601739 608

    539 787

    596 573

    607 428

    529 754

    637 741

    617 628

    633 537

    555 748

    . .

    . .

    . .

    . .

    Solution: The parameter to be tested is

    the difference between two means.

    The claim to be tested is:The mean caloric intake of consumers (1)

    is less than that of non-consumers (2).

    Use s12 = 4,103 for 1

    2 and s22 = 10,670

    for22

    Example: confidence interval for 1 2

    1 1 2 243, 604.02; 107, 633.239n x n x

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    70/79

    Interpretation

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    71/79

    Interpretation

    The 95% CI is (-56.59, -1.83).

    We are 95% confident that the interval

    (-56.59, -1.83) contains the true but unknown

    difference 1 2

    Since the interval is entirely negative (that is, does

    not contain 0), there is evidence from the data that

    1 is less than 2. We estimate that non-

    consumers of high-fiber breakfast consume onaverage between 1.83 and 56.59 more calories for

    lunch.

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    72/79

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    73/79

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    74/79

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    75/79

    Table of t-Distribution

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    76/79

    Chi-Squared Table Continued

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    77/79

    F-Distribution (p=0.01) Table

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    78/79

    F-Distribution (p=0.05) Table

  • 7/29/2019 16943_Research Methodology Lec6 to 11

    79/79