8. testing of hypothesis for variable & attribute data

58
QUALITY TOOLS & TECHNIQUES By: - Hakeem–Ur–Rehman IQTM–PU 1 T Q T ANALYZE PHASE STATISTICAL INFERENCE: HYPOTHESIS OF TESTING FOR VARIABLE & ATTRIBUTE DATA

Upload: hakeem-ur-rehman

Post on 16-Apr-2017

2.242 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: 8. testing of hypothesis for variable & attribute  data

QUALITY TOOLS & TECHNIQUES

By: -Hakeem–Ur–Rehman

IQTM–PU 1

TQ TANALYZE PHASE

STATISTICAL INFERENCE: HYPOTHESIS OF TESTING FOR VARIABLE & ATTRIBUTE DATA

Page 2: 8. testing of hypothesis for variable & attribute  data

STATISTICAL METHODS

Statistical

Methods

Descriptive

Statistics

Inferential

Statistics

EstimationHypothesis

Testing

Page 3: 8. testing of hypothesis for variable & attribute  data

NATURE OF INFERENCEin·fer·ence (n.) “The act or process of deriving logical conclusions from

premises known or assumed to be true. The act of reasoning from factual

knowledge or evidence.” 1

1. Dictionary.com

Inferential Statistics – To draw inferences about the process or population

being studied by modeling patterns of data in a way that account for

randomness and uncertainty in the observations. 2

2. Wikipedia.com

Putting the pieces of the puzzle together….

Page 4: 8. testing of hypothesis for variable & attribute  data

HYPOTHESIS TESTING

Population

I believe the population mean

age is 50 (hypothesis).

Mean

X = 20

Reject hypothesis! Not

close.

Random sample

Page 5: 8. testing of hypothesis for variable & attribute  data

WHAT’S A HYPOTHESIS?

A Belief about a Population Parameter

Parameter Is PopulationMean, Proportion, Variance

Must Be StatedBefore Analysis

I believe the mean GPA of this class is 3.5!

Page 6: 8. testing of hypothesis for variable & attribute  data

HYPOTHESISThe hypotheses to be tested consists of twocomplementary statements:1) The null hypothesis (denoted by H0) is a statement

about the value of a population parameter; it mustcontain the condition of equality.

2) The alternative hypothesis (denoted by H1) is thestatement that must be true if the null hypothesisis false.

e.g.:H0: μ = some value vs H1: μ ≠ some valueH0: μ ≤ some value vs H1: μ > some valueH0: μ ≥ some value vs H1: μ < some value

6

Page 7: 8. testing of hypothesis for variable & attribute  data

NULL Vs. ALTERNATIVE HYPOTHESIS

A contractor is interested in finding out whether a newmaterial to be used in foundation of towers will haveany effects on the strength of the tower foundation. Willthe foundation strength increase, decrease, or remainuncharged? If the mean foundation strength of towersis 5000 lbs/sq-in, the hypothesis for this situation are:

H0 : µ = 5000 and H1 : µ ≠ 5000

This is called a TWO-TAILED HYPOTHESIS since thepossible effects of the new material could be to raise orlower the strength.

EXAMPLE:

7

Page 8: 8. testing of hypothesis for variable & attribute  data

NULL Vs. ALTERNATIVE HYPOTHESIS (cont…)

A design engineer develops an additive toincrease the life of an automobile battery. Ifthe mean lifetime of the automobiles batteryis 36 months, then his hypothesis are :

H0 : µ ≤ 36 and H1 : µ > 36

This is called a ONE-TAILED HYPOTHESIS(RIGHT-TAILED) since the interest is in anincrease only.

EXAMPLE:

8

Page 9: 8. testing of hypothesis for variable & attribute  data

NULL Vs. ALTERNATIVE HYPOTHESIS (cont…)

A contractor wishes to lower the heating billsby using a special type of insulation in sitecabins. If the average of the monthly heatingbills is $78, his hypothesis about heating costswith the use of insulation are:

H0 : µ ≥ 78 and H1 : µ < 78

This is called a ONE-TAILED HYPOTHESIS(LEFT-TAILED) since the contractor isinterested only in lowering the heating costs.

EXAMPLE:

9

Page 10: 8. testing of hypothesis for variable & attribute  data

NULL Vs. ALTERNATIVE HYPOTHESIS (cont…)

EXERCISES:1. An engineer hypothesizes that the mean number of defects

can be decreased in a manufacturing process of compactdisc by using robots instead of humans for certain tasks.The mean number of defective discs per 1000 is 18.

2. A Safety Engineer claims that by regularly conducting aspecific safety orientation program for riggers, the accidentrate during tower erection will reduce. The mean accidentrate is 9 accidents per year.

3. A contractor wants to investigate whether using safety gearsaffects the quality performance of workers erecting thetowers. The contractor is not sure whether using safetygears increases or decreases the quality performance. In thepast, the average defect rate was 7 per worker.

10

Page 11: 8. testing of hypothesis for variable & attribute  data

SAMPLING RISK α - Risk, also referred as Type I Error or Producer’s Risk: Is the risk of rejecting H0 when H0 is true.

i.e. concluding that the process has drifted when it really has not. β - Risk, also referred to Type II Error or Consumer’s Risk: Is the risk of accepting H0 when H0 is false.

i.e. failing to detect the drift that has occurred in a process.

HYPOTHESIS STATEMENT:H0 : μ = some valueH1 : μ ≠ some value

Criteria for “Accepting” & “Rejecting” a Null Hypothesis: 1. For any fixed α, an increase inthe sample size will cause adecrease in β.

2. For any fixed sample size, adecrease in α will cause anincrease in β. Conversely, anincrease in α will cause adecrease in β.

3. To decrease both α and β,increase the sample size. 11

Page 12: 8. testing of hypothesis for variable & attribute  data

What is P-Value?This is the probability that a value as extreme as X–Bar (i.e. ≥ X–Bar) is observed, given that H0 is true.We reject H0 if the obtained P-Value is less than α.

Interpreting P-Value:H0 : μ = 5H1 : μ ≠ 5α = 0.05

A low p-value for the statistical test points to rejectionof Null hypothesis because it indicates how unlikely itis that a test statistic as extreme as or more extremethan a observed from this population if NullHypothesis is true.If a p-value = 0.005, this means that if thepopulation means were equal (as hypothesized),there is only 5 in 1000 chance that a more extremetest statistic would be obtain using data from thispopulation and there is significant evidence tosupport the Alternative Hypothesis (H1).

P-value > α, Accept HoP-value < α, Reject Ho

12

Page 13: 8. testing of hypothesis for variable & attribute  data

HYPOTHESIS TESTING

What Do You Do? If……

You have:

Different types of Materials. (Stainless,Carbon Steel & Aluminum)

Different types of oils. (Shell & Mobil) Different type of Cleaning solutions.

(Hydrocarbon & Water base)You want to find which method of cleaning yield the best results for all these materials?

13

Page 14: 8. testing of hypothesis for variable & attribute  data

ANALYZE PHASE

HYPOTHESIS TESTING FOR CONTINUOUS DATA

Page 15: 8. testing of hypothesis for variable & attribute  data

PARAMETRIC STATISTICAL INFERENCE

Comparing Two

groups

Data Normally

Distributed

Equality of Variances

Equal Variances if P

≥ 0.05

Unequal Variances if

P<0.05

Indep. Samp. T Tests Indep. Samp. T Tests

(Weltch

Approximation)

Comparing one

group with a Target

One Sample

Measured once

One Sample

Measured Twice

Data Distribution

Normal Data (P≥0.05)

One sample T Test

Data Distribution

Normal (P≥0.05)

Paired Sample T-Test

Comparing More

than Two groups

Data Distribution

One Way Anova Test *Welch Test

Testing of Hypothesis

Decision Making

1.Data is Normal when p ≥ 0.05 ,Use Anderson test

2.The Variance of groups are equal when p ≥ 0.05 Use the Levenes Test

3.Accept the Null Hypothesis when P≥0.05 otherwise accept the alternative hypothesis

Levenes Test

Normal (P≥0.05)

Equality of Variances

Equal Variances if P

≥ 0.05

Unequal Variances if

P<0.05

Levenes Test

* Not Available in Minitab

Page 16: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): 1 Sample t

Measurements were made on nine widgets. You know that the distribution ofwidget measurements has historically been close to normal, but supposethat you do not know Population Standard deviation. To test if thepopulation mean is 5 and to obtain a 90% confidence interval for the mean,you use a t-procedure.

1. Open the worksheet EXH_STAT.MTW.

2. Check the Normality of the data using Normality Test “VALUES”.

3. Choose Stat > Basic Statistics > 1-Sample t.

4. In Samples in columns, enter Values.

5. Check Perform hypothesis test. In Hypothesized mean, enter 5.

6. Click Options. In Confidence level, enter 95. Click OK in each dialog box.

Target

A 1-sample t-test is used to compare anexpected population Mean to a target.

μsample

Page 17: 8. testing of hypothesis for variable & attribute  data

1 Sample t: HISTOGRAM & BOX PLOT OF VALUES

Values

5.15.04.94.84.74.64.54.4

_X

Ho

Individual Value Plot of Values(with Ho and 95% t-confidence interval for the mean)

17

Values

Fre

qu

en

cy

5.15.04.94.84.74.64.54.4

2.0

1.5

1.0

0.5

0.0 _X

Ho

Histogram of Values(with Ho and 95% t-confidence interval for the mean)

Note our target Mean (representedby red Ho) is outside our populationconfidence boundaries which tellsthat there is significant differencebetween population and targetMean.

Values

5.15.04.94.84.74.64.54.4

_X

Ho

Boxplot of Values(with Ho and 95% t-confidence interval for the mean)

INDIVIDUAL VALUEPLOT (DOT PLOT)

Page 18: 8. testing of hypothesis for variable & attribute  data

One-Sample T: Values

Test of mu = 5 vs not = 5

Variable N Mean StDev SE Mean 95% CI T P

Values 9 4.78889 0.24721 0.08240 (4.59887, 4.97891) -2.56 0.034

1 Sample t: SESSION WINDOW

HoHa

n

SMean SE

n

1i

i

1n

)X(Xs

2

Since the P-value of 0.034 is less than 0.05,reject the null hypothesis.

Based on the samples given there is adifference between the average of the sampleand the desired target. X Ho

CONCLUSIONS: The new supplier’s claim that they can meet the target of 5 forthe hardness is not correct.

Page 19: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): 2-Sample (Independent) t Test

Practical Problem:

We have conducted a study in order to determine the effectiveness of a new heatingsystem. We have installed two different types of dampers in home ( Damper = 1 andDamper = 2).

We want to compare the BTU.In data from the two types of dampers to determine ifthere is any difference between the two products.

Open the MINITABTM worksheet: “Furnace.MTW”

Statistical Problem:

Ho:μ1 = μ2

Ha:μ1 ≠ μ2

2-Sample t-test (population Standard Deviations unknown).

α = 0.05

No, not that kind of damper!

Page 20: 8. testing of hypothesis for variable & attribute  data

2-Sample (Independent) t Test: Follow the Roadmap…

NORMALITY TEST

Page 21: 8. testing of hypothesis for variable & attribute  data

2-Sample (Independent) t Test:Follow the Roadmap…

TEST OF EQUAL VARIANCE

Stat ANOVA Test forEqual Variances…

Da

mp

er

95% Bonferroni Confidence Intervals for StDevs

2

1

4.03.53.02.52.0

Da

mp

er

BTU.In

2

1

2015105

F-Test

0.996

Test Statistic 1.19

P-Value 0.558

Levene's Test

Test Statistic 0.00

P-Value

Test for Equal Variances for BTU.In

Sample 1

Sample 2

Page 22: 8. testing of hypothesis for variable & attribute  data

2-Sample (Independent) t Test:Equal Variance

Page 23: 8. testing of hypothesis for variable & attribute  data

Box Plot

State Statistical Conclusions: Fail to reject the null hypothesis.

State Practical Conclusions: There is no difference between the dampers for BTU’s in.

2-Sample (Independent) t Test:Equal Variance

Damper

BTU

.In

21

20

15

10

5

Boxplot of BTU.In by Damper

Page 24: 8. testing of hypothesis for variable & attribute  data

2-Sample (Independent) t Test: EXERCISE

A bank with a branch located in a commercial district of a city has the businessobjective of developing an improved process for serving customers during thenoon- to-1 P.M. lunch period. Management decides to first study the waitingtime in the current process. The waiting time is defined as the time thatelapses from when the customer enters the line until he or she reaches theteller window. Data are collected from a random sample of 15 customers, andthe results (in minutes) are as follows (and stored in Bank-I):

4.21 5.55 3.02 5.13 4.77 2.34 3.54 3.204.50 6.10 0.38 5.12 6.46 6.19 3.79

Suppose that another branch, located in a residential area, is also concernedwith improving the process of serving customers in the noon-to-1 P.M. lunchperiod. Data are collected from a random sample of 15 customers, and theresults are as follows (and stored in Bank-II):

9.66 5.90 8.02 5.79 8.73 3.82 8.01 8.3510.49 6.68 5.64 4.08 6.17 9.91 5.47

Is there evidence of a difference in the mean waiting time between the twobranches? (Use level of significance = 0.05)

Page 25: 8. testing of hypothesis for variable & attribute  data

PARAMETRIC STATISTICAL INFERENCE

Comparing Two

groups

Data Normally

Distributed

Equality of Variances

Equal Variances if P

≥ 0.05

Unequal Variances if

P<0.05

Indep. Samp. T Tests Indep. Samp. T Tests

(Weltch

Approximation)

Comparing one

group with a Target

One Sample

Measured once

One Sample

Measured Twice

Data Distribution

Normal Data (P≥0.05)

One sample T Test

Data Distribution

Normal (P≥0.05)

Paired Sample T-Test

Comparing More

than Two groups

Data Distribution

One Way Anova Test *Welch Test

Testing of Hypothesis

Decision Making

1.Data is Normal when p ≥ 0.05 ,Use Anderson test

2.The Variance of groups are equal when p ≥ 0.05 Use the Levenes Test

3.Accept the Null Hypothesis when P≥0.05 otherwise accept the alternative hypothesis

Levenes Test

Normal (P≥0.05)

Equality of Variances

Equal Variances if P

≥ 0.05

Unequal Variances if

P<0.05

Levenes Test

* Not Available in Minitab

Page 26: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): PAIRED T-TEST

A Paired t-test is used to compare the Means of two measurements from the samesamples generally used as a before and after test.

MINITABTM performs a paired t-test. This is appropriate for testing the difference betweentwo Means when the data are paired and the paired differences follow a NormalDistribution.

Use the Paired ‘t’ command to compute a confidence interval and perform a HypothesisTest of the difference between population Means when observations are paired. A paired t-procedure matches responses that are dependent or related in a pair-wise manner. Thismatching allows you to account for variability between the pairs usually resulting ina smaller error term, thus increasing the sensitivityof the Hypothesis Test or confidence interval.

– Ho: μδ = μo

– Ha: μδ ≠ μo

Where μδ is the population Mean of the differences and μ0 is the hypothesized Mean of thedifferences, typically zero.

Stat > Basic Statistics > Paired t

mbefore

delta

(d)

mafter

Page 27: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): PAIRED T-TEST

Practical Problem:• We are interested in changing the sole material for a

popular brand of shoes for children.• In order to account for variation in activity of children

wearing the shoes, each child will wear one shoe of each type of sole material. The sole material will be randomly assigned to either the left or right shoe.

Statistical Problem:Ho: μδ = 0Ha: μδ≠ 0

Paired t-test (comparing data that must remain paired).α = 0.05Just checking your souls,

er…soles!

EXH_STAT.MTW

Page 28: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): PAIRED T-TEST

NORMALITY TEST: “Delta”

Calc Calculator

AB Delta

Perc

ent

1.51.00.50.0-0.5

99

95

90

80

70

60

50

40

30

20

10

5

1

Mean

0.622

0.41

StDev 0.3872

N 10

AD 0.261

P-Value

Probability Plot of AB DeltaNormal

Page 29: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): PAIRED T-TEST Using 1-Sample t

Stat > Basic Statistics > 1-Sample t-test… Since there is only one column,

AB Delta, we do not test for equal

variance per the Hypothesis

Testing roadmap.

Check this data for statistical

significance in its departure from

our expected value of zero.

Page 30: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): PAIRED T-TEST Using 1-Sample t…

State Statistical Conclusions: Reject the null hypothesis

State Practical Conclusions: We are 95% confident that there is a difference inwear between the two materials.

Box Plot of AB Delta

One-Sample T: AB Delta

Test of mu = 0 vs not = 0

Variable N Mean StDev SE Mean

AB Delta 10 0.410000 0.387155 0.122429

95% CI T P

(0.133046, 0.686954) 3.35 0.009

MINITABTM Session Window

Page 31: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): PAIRED T-TEST

Another way to analyze this data is to use the paired t-testcommand.

Stat Basic Statistics Paired T-test

Click on Graphs and select

the graphs you would like

to generate.

Page 32: 8. testing of hypothesis for variable & attribute  data

TEST OF MEANS (t-tests): PAIRED T-TEST

Differences

0.0-0.3-0.6-0.9-1.2

_X

Ho

Boxplot of Differences(with Ho and 95% t-confidence interval for the mean)

Paired T-Test and CI: Mat-A, Mat-B

Paired T for Mat-A - Mat-B

N Mean StDev SE Mean

Mat-A 10 10.6300 2.4513 0.7752

Mat-B 10 11.0400 2.5185 0.7964

Difference 10 -0.410000 0.387155 0.122429

95% CI for mean difference: (-0.686954, -0.133046)

T-Test of mean difference = 0 (vs not = 0): T-Value = -3.35 P-Value = 0.009

The P-value of from this

Paired T-Test tells us the

difference in materials is

statistically significant.

Page 33: 8. testing of hypothesis for variable & attribute  data

EXERCISE: PAIRED T-TEST

Nine experts rated two brands of Colombian coffee in a taste-testing experiment. Arating on a 7- point scale (1 = extremely unpleasing, 7 = extremely pleasing) is givenfor each of four characteristics: taste, aroma, richness, and acidity. The followingdata (stored in coffee) display the ratings accumulated over all four characteristics.

BrandExpert A BC.C. 24 26S.E. 27 27E.G. 19 22B.L. 24 27C.M. 22 25C.N. 26 27G.N. 27 26R.M. 25 27P.V. 22 23

At the 0.05 level of significance, is there evidence of a difference in the meanratings between the two brands?

Page 34: 8. testing of hypothesis for variable & attribute  data

PURPOSE OF ANOVAAnalysis of Variance (ANOVA) is used to investigate and modelthe relationship between a response variable and one or moreindependent variables.

Analysis of variance extends the two sample t-test for testingthe equality of two population Means to a more general nullhypothesis of comparing the equality of more than two Means,versus them not all being equal.

– The classification variable, or factor, usually has three ormore levels (If there are only two levels, a t-test can beused).

– Allows you to examine differences among means usingmultiple comparisons.

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA

Page 35: 8. testing of hypothesis for variable & attribute  data

ANOVA: EXAMPLEWe have three potential suppliers that claim to have equal levels of quality.Supplier B provides a considerably lower purchase price than either of theother two vendors. We would like to choose the lowest cost supplier but wemust ensure that we do not effect the quality of our raw material.

Supplier A Supplier B Supplier C

3.16 4.24 4.58

4.35 3.87 4.00

3.46 3.87 4.24

3.74 4.12 3.87

3.61 3.74 3.46

We would like test the data to determine whether there is a difference between thethree suppliers.

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Page 36: 8. testing of hypothesis for variable & attribute  data

FOLLOW THE ROADMAP…TEST FOR NORMALITY

36

Supplier C

Pe

rce

nt

5.04.54.03.53.0

99

95

90

80

70

60

50

40

30

20

10

5

1

Mean

0.910

4.03

StDev 0.4177

N 5

AD 0.148

P-Value

Probability Plot of Supplier CNormal

Supplier B

Pe

rce

nt

4.504.254.003.753.50

99

95

90

80

70

60

50

40

30

20

10

5

1

Mean

0.385

3.968

StDev 0.2051

N 5

AD 0.314

P-Value

Probability Plot of Supplier BNormal

Supplier A

Pe

rce

nt

4.54.03.53.02.5

99

95

90

80

70

60

50

40

30

20

10

5

1

Mean

0.568

3.664

StDev 0.4401

N 5

AD 0.246

P-Value

Probability Plot of Supplier ANormal All three suppliers samples are

Normally Distributed.

Supplier A P-value = 0.568

Supplier B P-value = 0.385

Supplier C P-value = 0.910

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Page 37: 8. testing of hypothesis for variable & attribute  data

TEST FOR EQUAL VARIANCE

STACK DATA FIRST:

Data stack Columns…

37

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Page 38: 8. testing of hypothesis for variable & attribute  data

TEST FOR EQUAL VARIANCE…

38

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Page 39: 8. testing of hypothesis for variable & attribute  data

ANOVA Using Minitab

39

Click on “Graphs…”,

Check “Boxplots of data”

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Da

ta

Supplier CSupplier BSupplier A

4.6

4.4

4.2

4.0

3.8

3.6

3.4

3.2

3.0

Boxplot of Supplier A, Supplier B, Supplier C

Page 40: 8. testing of hypothesis for variable & attribute  data

ANOVA: Session window

40

Test for Equal Variances: Suppliers vs ID

One-way ANOVA: Suppliers versus ID

Analysis of Variance for Supplier

Source DF SS MS F P

ID 2 0.384 0.192 1.40 0.284

Error 12 1.641 0.137

Total 14 2.025

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev ----------+---------+---------+------

Supplier 5 3.6640 0.4401 (-----------*-----------)

Supplier 5 3.9680 0.2051 (-----------*-----------)

Supplier 5 4.0300 0.4177 (-----------*-----------)

----------+---------+---------+------

Pooled StDev = 0.3698 3.60 3.90 4.20

Normal data P-value > .05

No Difference

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Page 41: 8. testing of hypothesis for variable & attribute  data

ANOVA AssumptionsIn one-way ANOVA, model adequacy can be checked by either of thefollowing:

1. Check the data for Normality at each level and for homogeneity ofvariance across all levels.

2. Examine the residuals (a residual is the difference in what the modelpredicts and the true observation).

i. Normal plot of the residuals

ii. Residuals versus fits

iii. Residuals versus order

41

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Page 42: 8. testing of hypothesis for variable & attribute  data

42

Residual

Fre

qu

en

cy

0.60.40.20.0-0.2-0.4-0.6

5

4

3

2

1

0

Histogram of the Residuals(responses are Supplier A, Supplier B, Supplier C)

The Histogram ofresiduals should show abell shaped curve.

ANOVA Assumptions

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Residual

Perc

ent

1.00.50.0-0.5-1.0

99

95

90

80

70

60

50

40

30

20

10

5

1

Normal Probability Plot of the Residuals(responses are Supplier A, Supplier B, Supplier C)

Normality plot of theresiduals should follow astraight line.

Results of our example lookgood.

The Normality assumption issatisfied.

Page 43: 8. testing of hypothesis for variable & attribute  data

43

Fitted Value

Re

sid

ua

l

4.054.003.953.903.853.803.753.703.65

0.75

0.50

0.25

0.00

-0.25

-0.50

Residuals Versus the Fitted Values(responses are Supplier A, Supplier B, Supplier C)

The plot of residuals versus fits examines constant variance.

The plot should be structureless with no outliers present.

Our example does not indicate a problem.

ANOVA Assumptions

TEST FOR MORE THAN TWO MEANS (F – Test): ANOVA...

Page 44: 8. testing of hypothesis for variable & attribute  data

ANOVA EXERCISE

EXERCISE OBJECTIVE: Utilize what you have learned to conduct and analyze a one way ANOVA using MINITABTM.

You design an experiment to assess the durability of four experimentalcarpet products. You place a sample of each of the carpet products infour homes and you measure durability after 60 days. Because youwish to test the equality of means and to assess the differences inmeans, you use the one-way ANOVA procedure (data in stacked form)with multiple comparisons. Generally, you would choose one multiplecomparison method as appropriate for your data.

1. Open the worksheet EXH_AOV.MTW.

2. Choose Stat > ANOVA > One-Way.

3. In Response, enter Durability. In Factor, enter Carpet.

4. Click OK in each dialog box.

44

Page 45: 8. testing of hypothesis for variable & attribute  data

NON–PARAMETRIC STATISTICAL INFERENCE

Comparing Two

groups

Data Distribution

Non-normal

(P<0.05)

Mann Whitney / U

Test

Comparing one

group with a Target

One Sample

Measured once

One Sample

Measured Twice

Data Distribution

Non-normal

(P<0.05)

Sign Test

Data Distribution

Non-normal

(P<0.05)

Wilcoxon Signed

Rank Test

Comparing More

than Two groups

Data Distribution

Non-normal

(P<0.05)

Kruskal Wallis (H)

Test

Testing of Hypothesis

Decision Making

1. Data is Normal when p ≥ 0.05 ,Use Anderson test

2. Accept the Null Hypothesis when P≥0.05 otherwise accept the alternative

hypothesis

Mood’s Median Test

OR

Page 46: 8. testing of hypothesis for variable & attribute  data

SIGN TEST: EXAMPLE

Price index values for 29 homes in a suburban area in the Northeast weredetermined. Real estate records indicate the population median for similarhomes the previous year was 115. This test will determine if there issufficient evidence for judging if the median price index for the homes wasgreater than 115 using level of significance = 0.10.

1. Open the worksheet EXH_STAT.MTW

2. Check the normality of the variable “Priceindex”

3. Choose Stat > Nonparametrics > 1-Sample Sign.

4. In Variables, enter PriceIndex.

5. Choose Test median and enter 115 in the text box.

6. In Alternative, choose greater than. Click OK.

Page 47: 8. testing of hypothesis for variable & attribute  data

NON–PARAMETRIC STATISTICAL INFERENCE

Comparing Two

groups

Data Distribution

Non-normal

(P<0.05)

Mann Whitney / U

Test

Comparing one

group with a Target

One Sample

Measured once

One Sample

Measured Twice

Data Distribution

Non-normal

(P<0.05)

Sign Test

Data Distribution

Non-normal

(P<0.05)

Wilcoxon Signed

Rank Test

Comparing More

than Two groups

Data Distribution

Non-normal

(P<0.05)

Kruskal Wallis (H)

Test

Testing of Hypothesis

Decision Making

1. Data is Normal when p ≥ 0.05 ,Use Anderson test

2. Accept the Null Hypothesis when P≥0.05 otherwise accept the alternative

hypothesis

Mood’s Median Test

OR

Page 48: 8. testing of hypothesis for variable & attribute  data

WILCOXON SIGNED RANK TEST: EXAMPLE

This following dataset consists of cholesterol levels in patients two and four days after a heartattack. Note: these data are paired because the two measurements have been made on thesame individuals.

2 days after 4 days after Difference

270 218 52

236 234 2

210 214 -4

142 116 26

280 200 80

272 276 -4

160 146 14

245 236 9

257 225 32

178 180 -2

Choose Stat > Nonparametrics > 1-SampleWilcoxon…

Page 49: 8. testing of hypothesis for variable & attribute  data

NON–PARAMETRIC STATISTICAL INFERENCE

Comparing Two

groups

Data Distribution

Non-normal

(P<0.05)

Mann Whitney / U

Test

Comparing one

group with a Target

One Sample

Measured once

One Sample

Measured Twice

Data Distribution

Non-normal

(P<0.05)

Sign Test

Data Distribution

Non-normal

(P<0.05)

Wilcoxon Signed

Rank Test

Comparing More

than Two groups

Data Distribution

Non-normal

(P<0.05)

Kruskal Wallis (H)

Test

Testing of Hypothesis

Decision Making

1. Data is Normal when p ≥ 0.05 ,Use Anderson test

2. Accept the Null Hypothesis when P≥0.05 otherwise accept the alternative

hypothesis

Mood’s Median Test

OR

Page 50: 8. testing of hypothesis for variable & attribute  data

MANN-WHITNEY TEST: EXAMPLESamples were drawn from two populations and diastolic blood pressure was measured. You willwant to determine if there is evidence of a difference in the population locations withoutassuming a parametric model for the distributions. Therefore, you choose to test the equalityof population medians using the Mann-Whitney test with level of significance = 0.05 ratherthan using a two-sample t-test, which tests the equality of population means.

1. Open the worksheet EXH_STAT.MTW.2. Choose Stat > Nonparametrics > Mann-Whitney.3. In First Sample, enter DBP1. In Second Sample, enter DBP2. Click OK.

Interpreting the results The sample medians of the ordered data

as 69.5 and 78. The 95.1% confidence interval for the

difference in population medians (ETA1-ETA2) is [-18 to 4].

The test statistic W = 60 has a p-value of0.2685 or 0.2679 when adjusted for ties.Since the p-value is not less than thechosen a level of 0.05, you conclude thatthere is insufficient evidence to reject H0.Therefore, the data does not support thehypothesis that there is a differencebetween the population medians.

Page 51: 8. testing of hypothesis for variable & attribute  data

NON–PARAMETRIC STATISTICAL INFERENCE

Comparing Two

groups

Data Distribution

Non-normal

(P<0.05)

Mann Whitney / U

Test

Comparing one

group with a Target

One Sample

Measured once

One Sample

Measured Twice

Data Distribution

Non-normal

(P<0.05)

Sign Test

Data Distribution

Non-normal

(P<0.05)

Wilcoxon Signed

Rank Test

Comparing More

than Two groups

Data Distribution

Non-normal

(P<0.05)

Kruskal Wallis (H)

Test

Testing of Hypothesis

Decision Making

1. Data is Normal when p ≥ 0.05 ,Use Anderson test

2. Accept the Null Hypothesis when P≥0.05 otherwise accept the alternative

hypothesis

Mood’s Median Test

OR

Page 52: 8. testing of hypothesis for variable & attribute  data

KRUSKAL-WALLIS TEST: EXAMPLEMeasurements in growth were made on samples that were each given one of threetreatments. Rather than assuming a data distribution and testing the equality ofpopulation means with one-way ANOVA, you decide to select the Kruskal-Wallisprocedure to test H0: h1 = h2 = h3, versus H1: not all h's are equal, where the h'sare the population medians.

1. Open the worksheet EXH_STAT.MTW.2. Choose Stat > Nonparametrics > Kruskal-Wallis.3. In Response, enter Growth.4. In Factor, enter Treatment. Click OK.

Page 53: 8. testing of hypothesis for variable & attribute  data

MOOD'S MEDIAN TEST: EXAMPLEOne hundred seventy-nine participants were given a lecture with cartoons toillustrate the subject matter. Subsequently, they were given the OTIS test, whichmeasures general intellectual ability. Participants were rated by educational level:0 = preprofessional, 1 = professional, 2 = college student. The Mood's median testwas selected to test H0: ɳ1 = ɳ2 = ɳ3, versus H1: not all h's are equal, where theh's are the median population OTIS scores for the three education levels.

1. Open the worksheet CARTOON.MTW.

2. Choose Stat > Nonparametrics > Mood's Median Test.

3. In Response, enter Otis. In Factor, enter ED. Click OK.

Page 54: 8. testing of hypothesis for variable & attribute  data

HYPOTHESIS TESTING ROADMAP ATTRIBUTE DATA

Attribute Data

One Factor Two Factors

One Sample Proportion

Two Sample Proportion

MINITABTM:

Stat - Basic Stats - 2 Proportions

If P-value < 0.05 the proportions

are different

Chi Square Test (Contingency Table)

MINITABTM:

Stat - Tables - Chi-Square Test

If P-value < 0.05 the factors are not

independent

Chi Square Test (Contingency Table)

MINITABTM:

Stat - Tables - Chi-Square Test

If P-value < 0.05 at least one

proportion is different

Two or More

Samples

Two

SamplesOne Sample

54

Page 55: 8. testing of hypothesis for variable & attribute  data

Test for association (or dependency) between two classifications

(Chi–Square Test) “TWO FACTORS”….

Contingency Tables

55

Exercise objective: To practice solving problem presented using

the appropriate Hypothesis Test.

You are the quotations manager and your team thinks that the reason

you don’t get a contract depends on its complexity.

You determine a way to measure complexity and classify lost contracts

as follows:

1. Write the null and alternative hypothesis.

2. Does complexity have an effect?

Low Med High

Price 8 10 12

Lead Time 10 11 9

Technology 5 9 16

Page 56: 8. testing of hypothesis for variable & attribute  data

Test for association (or dependency) between two classifications

(Chi–Square Test) “TWO FACTORS”….Contingency Tables

56

First we need to create a table in MINITABTM

Secondly, in MINITABTM perform a Chi-Square Test

Page 57: 8. testing of hypothesis for variable & attribute  data

Test for association (or dependency) between two classifications

(Chi–Square Test) “TWO FACTORS”….

Contingency Tables

57

Are the factors independent of each other?

Yes; Both factors are independent

Page 58: 8. testing of hypothesis for variable & attribute  data

QUESTIONS