analysis of variance
DESCRIPTION
Analysis of Variance. Introduction. Analysis of Variance. The An alysis o f Va riance is abbreviated as ANOVA Used for hypothesis testing in Simple Regression Multiple Regression Comparison of Means. Sources. There is variation anytime that all of the data values are not identical - PowerPoint PPT PresentationTRANSCRIPT
Analysis of VarianceAnalysis of Variance
IntroductionIntroduction
Analysis of VarianceAnalysis of Variance
The The AnAnalysis alysis oof f VaVariance is abbreviated as riance is abbreviated as ANOVAANOVA
Used for hypothesis testing inUsed for hypothesis testing in Simple RegressionSimple Regression Multiple RegressionMultiple Regression Comparison of MeansComparison of Means
SourcesSources
There is variation anytime that all of There is variation anytime that all of the data values are not identicalthe data values are not identical
This variation can come from This variation can come from different different sourcessources such as the model such as the model or the factoror the factor
There is always the left-over There is always the left-over variation that can’t be explained by variation that can’t be explained by any of the other sources. This source any of the other sources. This source is called the erroris called the error
VariationVariation
VariationVariation is the sum of squares of the is the sum of squares of the deviations of the values from the mean of deviations of the values from the mean of those valuesthose values
As long as the values are not identical, As long as the values are not identical, there will be variationthere will be variation
Abbreviated as SS for Sum of SquaresAbbreviated as SS for Sum of Squares
Degrees of FreedomDegrees of Freedom
The The degrees of freedomdegrees of freedom are the are the number of values that are free to number of values that are free to vary once certain parameters have vary once certain parameters have been establishedbeen established
Usually, this is one less than the Usually, this is one less than the sample size, but in general, it’s the sample size, but in general, it’s the number of values minus the number number of values minus the number of parameters being estimatedof parameters being estimated
Abbreviated as dfAbbreviated as df
VarianceVariance
The sample The sample variancevariance is the average is the average squared deviation from the meansquared deviation from the mean
Found by dividing the variation by Found by dividing the variation by the degrees of freedomthe degrees of freedom
Variance = Variance = VariationVariation / / dfdf Abbreviated as MS for Mean of the Abbreviated as MS for Mean of the
SquaresSquares MS = MS = SSSS / / dfdf
FF
F is the F test statisticF is the F test statistic There will be an F test statistic for There will be an F test statistic for
each source except for the error and each source except for the error and totaltotal
F is the ratio of two sample variancesF is the ratio of two sample variances The MS column contains variancesThe MS column contains variances The F test statistic for each source is The F test statistic for each source is
the MS for that row divided by the MS the MS for that row divided by the MS of the error rowof the error row
FF
F requires a pair of degrees of F requires a pair of degrees of freedom, one for the numerator and freedom, one for the numerator and one for the denominatorone for the denominator
The numerator df is the df for the The numerator df is the df for the sourcesource
The denominator df is the df for the The denominator df is the df for the error rowerror row
F is always a right tail testF is always a right tail test
The ANOVA TableThe ANOVA Table
The ANOVA table is composed of The ANOVA table is composed of rows, each row represents one rows, each row represents one source of variationsource of variation
For each source of variation …For each source of variation … The variation is in the SS columnThe variation is in the SS column The degrees of freedom is in the df The degrees of freedom is in the df
columncolumn The variance is in the MS columnThe variance is in the MS column The MS value is found by dividing the SS The MS value is found by dividing the SS
by the dfby the df
ANOVA TableANOVA Table
The complete ANOVA table can be The complete ANOVA table can be generated by most statistical generated by most statistical packages and spreadsheetspackages and spreadsheets
We’ll concentrate on understanding We’ll concentrate on understanding how the table works rather than the how the table works rather than the formulas for the variationsformulas for the variations
The ANOVA TableThe ANOVA Table
SourceSource SSSS(variation)(variation)
dfdf MSMS(variance)(variance)
FF
Explained*Explained*
ErrorError
TotalTotal
The explained* variation has different names depending on the particular type of ANOVA problem
Example 1Example 1
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 18.918.9 33
ErrorError 72.072.0 1616
TotalTotal
The Sum of Squares and Degrees of Freedom are given. Complete the table.
Example 1 – Find TotalsExample 1 – Find Totals
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 18.918.9 33
ErrorError 72.072.0 1616
TotalTotal 90.990.9 1919
Add the SS and df columns to get the totals.
Example 1 – Find MSExample 1 – Find MS
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 18.918.9 ÷÷ 3 3 = 6.30= 6.30
ErrorError 72.072.0 ÷ ÷ 1616 = 4.50= 4.50
TotalTotal 90.990.9 ÷ ÷ 1919 = 4.78= 4.78
Divide SS by df to get MS.
Example 1 – Find FExample 1 – Find F
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 18.918.9 33 6.306.30 1.401.40
ErrorError 72.072.0 1616 4.504.50
TotalTotal 90.990.9 1919 4.784.78
F = 6.30 / 4.50 = 1.4
Notes about the ANOVANotes about the ANOVA
The MS(Total) isn’t actually part of The MS(Total) isn’t actually part of the ANOVA table, but it represents the ANOVA table, but it represents the sample variance of the response the sample variance of the response variable, so it’s useful to findvariable, so it’s useful to find
The total df is one less than the The total df is one less than the sample sizesample size
You would either need to find a You would either need to find a Critical F value or the p-value to Critical F value or the p-value to finish the hypothesis testfinish the hypothesis test
Example 2Example 2
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 106.6106.6 21.3221.32 2.602.60
ErrorError 2626
TotalTotal
Complete the table
Example 2 – Step 1Example 2 – Step 1
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 106.6106.6 55 21.3221.32 2.602.60
ErrorError 2626 8.208.20
TotalTotal
SS / df = MS, so 106.6 / df = 21.32. Solving for df gives df = 5.
F = MS(Source) / MS(Error), so 2.60 = 21.32 / MS. Solving gives MS = 8.20.
Example 2 – Step 2Example 2 – Step 2
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 106.6106.6 55 21.3221.32 2.602.60
ErrorError 213.2213.2 2626 8.208.20
TotalTotal 3131
SS / df = MS, so SS / 26 = 8.20. Solving for SS gives SS = 213.2.
The total df is the sum of the other df, so 5 + 26 = 31.
Example 2 – Step 3Example 2 – Step 3
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 106.6106.6 55 21.3221.32 2.602.60
ErrorError 213.2213.2 2626 8.208.20
TotalTotal 319.8319.8 3131
Find the total SS by adding the 106.6 + 213.2 = 319.8
Example 2 – Step 4Example 2 – Step 4
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 106.6106.6 55 21.3221.32 2.602.60
ErrorError 213.2213.2 2626 8.208.20
TotalTotal 319.8319.8 3131 10.3210.32
Find the MS(Total) by dividing SS by df. 319.8 / 31 = 10.32
Example 2 – NotesExample 2 – Notes
Since there are 31 df, the sample Since there are 31 df, the sample size was 32size was 32
Since the sample variance was 10.32 Since the sample variance was 10.32 and the standard deviation is the and the standard deviation is the square root of the variance, the square root of the variance, the sample standard deviation is 3.21sample standard deviation is 3.21
Example 3Example 3
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 56.756.7
ErrorError 1414 13.5013.50
TotalTotal
The sample size is n = 20. Work this one out on your own!
Example 3 - SolutionExample 3 - Solution
SourceSource SSSS dfdf MSMS FF
ExplainedExplained 56.756.7 55 11.3411.34 0.840.84
ErrorError 189.0189.0 1414 13.5013.50
TotalTotal 245.7245.7 1919 12.9312.93
How did you do?