test for outliers

Upload: asdasdas-asdasdasdsadsasddssa

Post on 14-Apr-2018

246 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Test for Outliers

    1/26

    Test For Outliers

    One of the important aim in the statistical tests is to

    recognize the presene or absence of outliers

    Outliers in a series of measurements are

    extraordinarily small or large observations

    compared with the bulk of the data

    There are test procedures in order to detect outliers

    in data and we will look at the Dixons Q-test

    Q-test is one of the nost frequently used outlier testprocedure

  • 7/30/2019 Test for Outliers

    2/26

    Test For Outliers

    The Q-test uses the range of measurements and

    can be applied even when only few data areavailable

    The n measurements are arranged in ascending

    orderIf the very small value to be tested as an outlier is

    denoted byx1and the very large value byxn

    Then the test statistic is calculated as given on thenext slide

  • 7/30/2019 Test for Outliers

    3/26

    Test For Outliers

    For the smallest one

    1

    121

    xxxxQ

    n

  • 7/30/2019 Test for Outliers

    4/26

    Test For Outliers

    For the Largest one

    1

    1

    xxxxQ

    n

    nnn

  • 7/30/2019 Test for Outliers

    5/26

    Test For Outliers

    The null hypothesis, i.e, that the concidered

    measurement is not an outlier, is accepted if thequantity QQ(1-a;n)], then we reject the null hypothesis andsay that the value is an outlier

    Q values for selected significance and degrees of

    freedom are given in table and in your book in

    Table2.10

  • 7/30/2019 Test for Outliers

    6/26

    Example 2.8

    Trace analysis of polycyclic aromatic hydrocarbons

    (PAH) in a soil revealed for the trace constituentbenzo[a]pyrene the following values in mg/kg dry

    weight

    5.30, 5,00, 5.10, 5.20, 5.10, 6.20, 5.15

    Apply the Q-test to check whether the smallest and

    largest value might be an outlier

  • 7/30/2019 Test for Outliers

    7/26

    Example 2.8

    First we need to arrange the data in an ascending

    order as

    5.00, 5,10, 5.10, 5.15, 5.20, 5.30, 6.20

    The we can calculate the Q value for both smallest

    and largest values as

    083.000.520.600.510.5

    1

    121

    xxxxQ

    n

  • 7/30/2019 Test for Outliers

    8/26

    Example 2.8

    For the largest value

    75.000.520.6

    30.520.6

    1

    1

    xx

    xxQ

    n

    nn

    n

    For an a=0.01 we can use the table 2-10 and obtain the

    table value as

    Q(1-0.01=0.99;n=7)=0.64

    Since the Q1 value is much smaller (0.083) than the table

    value we can not eliminate the smallest value as outlier

    However, the Q2 value is in fact larger than the table value

    and for this reason we can eliminate the largest as outlier

    F t t E l

  • 7/30/2019 Test for Outliers

    9/26

    F test Example

    82.434.0

    )59.3(8.10

    5.14)255.0(

    8.10

    5.14

    2

    2

    2

    1

    2

    2

    21

    Confidence interval is

  • 7/30/2019 Test for Outliers

    10/26

    Grubbss Test for Outlier

    );1(*

    nTs

    xxT table a

    It can be applied for series of measurements consisting of 3 to

    150 measuremets

    The null hypothesis, according to whichx*is not an outlier

    within the measurement series ofn values is accepted at level

    a, if the test quantitity T is:

    By use of the test quantity T, the distances of the suspicious

    values from the mean are determined and related to the

    standard deviation of the measurements

  • 7/30/2019 Test for Outliers

    11/26

    Grubbss Test for Outlier

    21.2

    411.0

    20.629.5*

    71.0411.0

    00.529.5*1

    s

    xxT

    sxxT

    n

    Exmaple 2.9

    The data for the trace analysis of benzo[a]pyrene from

    previous example are also used in Grubbss test

    The mean of the data was 5.29 and the standard deviation

    was 0.411)

    The we can calculate the T values for the smallest ans the

    largest vales as

  • 7/30/2019 Test for Outliers

    12/26

    Grubbss Test for Outlier

    21.2

    411.0

    20.629.5*

    71.0411.0

    00.529.5*1

    s

    xxT

    sxxT

    n

    Exmaple 2.9

    The table value at an a=0.01 is

    T(1-0.01=0.99;n=7) =2.10

    As a result, the test results is not significant for the smallest

    value but is significant for the largest value

    So the largest one is an outlier

    N i T f M h d C i

  • 7/30/2019 Test for Outliers

    13/26

    Non-parametric Tests for Method Comparison

    The Tests that we have seen so far all requires that the data

    must be normaly distributed.In this case distribution free methods needs to be used

    These methods do not require the parameters such as

    mean and standard deviation used in the previous tests

    For that reason, they are non-parametric methods

    These methods require more replicate mesurements

    The do not use the values of the quantitative variables

    They use the rank of the data and are based on the

    counting

    N t i T t f M th d C i

  • 7/30/2019 Test for Outliers

    14/26

    Non-parametric Tests for Method Comparison

    We will look at two example of non-parametric tests

    These are:

    The Mann -Wh itney U-testfor the comparison

    of the independent samples

    Wilcoxon T-testfor for paqired measurementsWhen Normality isw doubtful, you should always

    check these tests especially in the case of small

    samples

  • 7/30/2019 Test for Outliers

    15/26

    The Mann-Whitney U-test

    This test is based on the ranking the samples by taking the

    both gruops (group A and group B ) of the data togetherIt gives the rank 1 to the lowest result and rank 2 to the

    second ect.

    If n1 and n2 are the number of data in the group with the

    smallest and largest number of results, respectively, and R1and R2 are the sum of the ranks in these two groups, then

    we can we can set up the equations as:

    2

    22212

    111211

    2

    1

    2

    1

    Rnn

    nnU

    Rnn

    nnU

  • 7/30/2019 Test for Outliers

    16/26

    The Mann-Whitney U-test

    The smaller of the two U values is used to evaluate the test

    When we have tie, the the average of the ranks are given.

    The Mann-whitney test compares the median of the two

    samples

    The smaller the diffrerence between the medians, the

    smaller the difference between U1and U2

    222212

    111

    211

    21

    2

    1

    RnnnnU

    Rnn

    nnU

    211

    210

    :

    :

    UUH

    UUH

  • 7/30/2019 Test for Outliers

    17/26

    The Mann-Whitney U-test

    Example: The following two grops of

    measuremets are to be compared

    Here the lowest results, 10.8 is

    given the rank 1.

    Since we have 10.8 twice in group A

    and B, they are both given the rank

    of 1.5 as their average

    A B

    11.2 10.913.7 11.2

    14.8 12.1

    11.2 12.415.0 15.5

    16.1 14.6

    17.3 13.510.9 10.8

    10.8

    11.7

    5.1

    2

    21

    Rank

  • 7/30/2019 Test for Outliers

    18/26

    The Mann-Whitney U-test

    If we set the

    hypothesis asThis will be a two

    sided test

    Group result rank Group result rank

    A 10.8 1.5 B 12.4 10

    B 10.8 1.5 B 13.5 11

    A 10.9 3.5 A 13.7 12

    B 10.9 3.5 B 14.6 13

    A 11.1 5 A 14.8 14

    A 11.2 6.5 A 15.0 15

    B 11.2 6.5 B 15.5 16

    A 11.7 8 A 16.1 17

    B 12.1 9 A 17.3 18

    211

    210

    :

    :

    UUH

    UUH

  • 7/30/2019 Test for Outliers

    19/26

    The Mann-Whitney U-test

    R1 is the sum of the ranks in group B as:

    R1=1.5+3.5+6.5+9+11+13+16=70.5

    R2 is the sum of the ranks in group A as:

    R2=1.5+3.5+6+6.5+8+10+12+14+15+17+18=100.5

    5.34),min(

    thatnotice

    5.345.100

    2

    1101010*8

    2

    1

    5.455.702

    18*810*8

    2

    1

    21

    2121

    222

    212

    111

    211

    UUU

    nnUU

    Rnn

    nnU

    Rnn

    nnU

  • 7/30/2019 Test for Outliers

    20/26

    The Mann-Whitney U-test

    From the table (Appendix, Table 4), for a two sided test with

    n1=8 and n2=10, a value of 17 is found.

    If an observed U value is les than or equal to the value in the

    table, the null hypothesis may be rejected at the level of the

    significance of the table.

    Since our calculated value is larger than 17, we conclude

    that no difference between the two groups.

    5.34),min(

    thatnotice

    5.345.1002

    1101010*82

    1

    5.455.702

    18*810*8

    2

    1

    21

    2121

    222

    212

    111

    211

    UUU

    nnUU

    RnnnnU

    Rnn

    nnU

    The Mann Whitney U test

  • 7/30/2019 Test for Outliers

    21/26

    The Mann-Whitney U-testWe can now check the data used in this test have any

    tendency to show normal distribution or not.

    sample raw A raw B ranked ranked (j-0.5)/10 (j-0.5)/8 ranked (j-0.5)/18

    1 11.20 10.90 10.80 10.80 0.05 0.06 10.80 0.03

    2 13.70 11.20 10.90 10.90 0.15 0.19 10.80 0.08

    3 14.80 12.10 11.20 11.20 0.25 0.31 10.90 0.14

    4 11.20 12.40 11.20 12.10 0.35 0.44 10.90 0.19

    5 15.00 15.50 11.70 12.40 0.45 0.56 11.20 0.25

    6 16.10 14.60 13.70 13.50 0.55 0.69 11.20 0.31

    7 17.30 13.50 14.80 14.60 0.65 0.81 11.20 0.36

    8 10.90 10.80 15.00 15.50 0.75 0.94 11.70 0.42

    9 10.80 16.10 0.85 12.10 0.47

    10 11.70 17.30 0.95 12.40 0.53

    11 13.50 0.58

    12 13.70 0.64

    13 14.60 0.69

    14 14.80 0.75

    15 15.00 0.81

    16 15.50 0.86

    17 16.10 0.92

    18 17.30 0.97

    The Mann Whitney U test

  • 7/30/2019 Test for Outliers

    22/26

    The Mann-Whitney U-testWe can now check the data used in this test have any

    tendency to show normal distribution or not.

    Normal probabilty plot

    0.00

    0.25

    0.50

    0.75

    1.00

    10.00 12.00 14.00 16.00 18.00

    Measurement

    Probabilit

    y

    Group A

    Group B

    A and B

  • 7/30/2019 Test for Outliers

    23/26

    Wilcoxon Matched Pairs Signed-Rank test

    In this test, difference of the (di) paired data first calculated

    These divalues are ranked first without regard to signstarting with the smallest value.

    Then the same sign is given as to corresponding difference

    If there are ties, the same rule (take average) is applied as inthe Mann-Whitney test

    If any di value is zero the you can either drop them from

    analysis or assign a rank of(p+1)/2, in which pis the number

    of zero differences

    In this case half of the zero difference takes negative and

    the other half positive rank

  • 7/30/2019 Test for Outliers

    24/26

    Wilcoxon Matched Pairs Signed-Rank test

    The null hypohesis is that the methods A and B are equivalet

    If Ho is true, it would be expected that that the sum of allranks for positive differences (T+) would be close to the sum

    for negative differences (T-).

    The test statistic is than for two sided case:

    Wilcoxon T-test is calculated as: T= min (T+, T-)

    The smaller the value of T, the larger the significance of thedifference

    BAH

    BAH

    :

    :

    1

    0

  • 7/30/2019 Test for Outliers

    25/26

    Wilcoxon Matched Pairs Signed-Rank test

    Lets now do the example

    sample R T d=R-T rank signed rank

    1 114 116 -2 1 -1

    2 49 42 7 7.5 7.5

    3 100 95 5 4 4

    4 20 10 10 9.5 9.5

    5 90 94 -4 2.5 -2.5

    6 106 100 6 5.5 5.5

    7 100 96 4 2.5 2.5

    8 95 102 -7 7.5 -7.5

    9 160 150 10 9.5 9.5

    10 110 104 6 5.5 5.5

  • 7/30/2019 Test for Outliers

    26/26

    Wilcoxon Matched Pairs Signed-Rank test

    The critical (Table) value of T as a function of n and a are

    given in Table 5 of appendix.In our example, all positive differences adds up to T+=44.0

    And all negative differences T-=11.0

    If the calulated T value is equal to or smaller than the tablevalue, the null hyothesis is rejected.

    For an a=0.05 and n=10 in our two sided test, the table

    value is T=8.

    Thus the nul hypothesis is accepted and we can conclude

    that there is no diffrence between the two method