g7-quantitative

92
QUANTITATIVE DATA ANALYSIS INSTRUCTOR: DR. TUNG NGUYEN GROUP 7 MEMEMBER: Ly Ngoc Tra An Ngo Huong Giang Tran Nhu Hanh Tran Thi My Hanh Nguyen Thi Hong Tham Nguyen Thi Thao Tien

Upload: gilliannguyen

Post on 28-Nov-2014

828 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: G7-quantitative

QUANTITA

TIVE D

ATA

ANALYSIS

INSTRUCTOR: DR. TUNG NGUYENGROUP 7MEMEMBER:Ly Ngoc Tra AnNgo Huong GiangTran Nhu HanhTran Thi My HanhNguyen Thi Hong ThamNguyen Thi Thao Tien

Page 2: G7-quantitative

OUT LINE

Data analysis Central Tendency : Mean,Median,Mode Spread of distribution : Range, Variance,

Standard Deviation Experimental :

Paired T-Test

Anova

Page 3: G7-quantitative

CENTRAL TENDENCY

The term central tendency refers to the "middle" value or perhaps a typical value of the data, and is measured using the mean, median, or mode. Each of these measures is calculated differently, and the one that is best to use depends upon the situation.

In statistics, the term central tendency relates to the way in which quantitative data tend to cluster around some value

In the simplest cases, the measure of central tendency is an average of a set of measurements, the word average being variously construed as mean, median, or other measure of location, depending on the context.

Both "central tendency" and "measure of central tendency" apply to either statistical populations or to samples from a population.

Page 4: G7-quantitative

MEASURES OF CENTRAL TENDENCY

Arithmetic mean: (or simply, mean) – the sum of allmeasurements divided by the number of observations inthe data set

The mean is the most commonly-used measure of central tendency. When we talk about an "average", we usually are referring to the mean. The mean is simply the sum of the values divided by the total number of items in the set. The result is referred to as the arithmetic mean. Sometimes it is useful to give more weighting to certain data points, in which case the result is called the weighted arithmetic mean.

The mean is valid only for interval data or ratio data. Since it uses the values of all of the data points in the population or sample, the mean is influenced by outliers that may be at the extremes of the data set.

Page 5: G7-quantitative

MEDIAN: THE MIDDLE VALUE THAT SEPARATES THE

HIGHER HALF FROM THE LOWER HALF OF THE DATA

SETThe median is determined by sorting the data set from lowest to highest values and taking the data point in the middle of the sequence. There is an equal number of points above and below the median. For example, in the data set {1,2,3,4,5} the median is 3; there are two data points greater than this value and two data points less than this value. In this case, the median is equal to the mean. But consider the data set {1,2,3,4,10}. In this dataset, the median still is three, but the mean is equal to 4. If there is an even number of data points in the set, then there is no single point at the middle and the median is calculated by taking the mean of the two middle points.

The median can be determined for ordinal data as well as interval and ratio data. Unlike the mean, the median is not influenced by outliers at the extremes of the data set. For this reason, the median often is used when there are a few extreme values that could greatly influence the mean and distort what might be considered typical. This often is the case with home prices and with income data for a group of people, which often is very skewed. For such data, the median often is reported instead of the mean. For example, in a group of people, if the salary of one person is 10 times the mean, the mean salary of the group will be higher because of the unusually large salary. In this case, the median may better represent the typical salary level of the group.

Page 6: G7-quantitative

MODE (STATISTICS): THE MOST FREQUENT VALUE IN

THE DATA SET

The mode is the most frequently occurring value in the data set. For example, in the data set {1,2,3,4,4}, the mode is equal to 4. A data set can have more than a single mode, in which case it is multimodal. In the data set {1,1,2,3,3} there are two modes: 1 and 3.

The mode can be very useful for dealing with categorical data. For example, if a sandwich shop sells 10 different types of sandwiches, the mode would represent the most popular sandwich. The mode also can be used with ordinal, interval, and ratio data. However, in interval and ratio scales, the data may be spread thinly with no data points having the same value. In such cases, the mode may not exist or may not be very meaningful.

Page 7: G7-quantitative

WHEN TO USE MEAN, MEDIAN, AND MODE

Measurement Scale

Best Measure of the "Middle"

Nominal(Categorical)

Mode

Ordinal Median

Interval Symmetrical data: MeanSkewed data: Median

Ratio Symmetrical data: MeanSkewed data: Median

Page 8: G7-quantitative

A RANGE, A VARIANCE, AND A STANDARD DEVIATION

RANGE

Range = The range indicates the distance between the two most extreme scores in a distribution

>>> Range = highest score – lowest score

Page 9: G7-quantitative

VARIANCE AND STANDARD DEVIATION

•The variance and standard deviation are two

measures of variability that indicate how

much the scores are spread out around the p

mean

• We use the mean as our reference point since

it is at the center of the distribution

Page 10: G7-quantitative

Variance = how spread out (far away) a number is from the mean

Standard Deviation = loosely defined as the average amount a number differs from the mean

Page 11: G7-quantitative

We will use the following sample data set to explain the range, variance, and standard deviation:

4, 6, 3, 7, 9, 4, 2, 1, 4, 2

Page 12: G7-quantitative

SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2

Range:

R = maximum score - minimum score

In order to figure out the range, A) arrange your data set in order from lowest to highest and B) subtract the lowest number from the highest number.

 

A) When arranged in order, 4, 6, 3, 7, 9, 4, 2, 1, 4, 2  becomes:  1, 2, 2, 3, 4, 4, 4, 6, 7, 9

 

B) The lowest number is 1 and the highest number is 9.  Therefore, R = 9-1 = 8

Page 13: G7-quantitative

SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2

The Computational Formula: 

From the above formula:

S2 = variance

Σ = sigma = the sum of (add up all the numbers)

X = the numbers from your data set

X2 = the numbers from your data set squared

N = the total number of numbers you have in your data set

Page 14: G7-quantitative

SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2

The easiest way to compute variance with the computational formula is as follows:

A) List each of the numbers in your data set vertically & get the sum of that column

B) Figure out n (count how many numbers you have in your data set)

C) Square each number in your data set and get the sum of that column

 

A):            C):

X                 X2

4                 42=16

6                 62=36

3                 32=9

7                 72=49

9                 92=81

4                 42=16

2                 22=4

1                 12=1

4                42=16

2                22=4

Σ=42        Σ=232

 

B): N=10

Page 15: G7-quantitative

SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2

Now use the sum for part A) and C), as well as the value for N which you found in part B) to fill in the formula:

Do the math and S2 = 5.56

Page 16: G7-quantitative

SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2

The Conceptual Formula:

From the above formula:

S2 = variance

Σ = sigma = the sum of (add up all the numbers)

X = the numbers from your data set

M = the mean

N = the total number of numbers you have in your data set

Page 17: G7-quantitative

SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2

The easiest way to compute variance with the computational formula is as follows:

A) List each of the numbers in your data set vertically & get the sum of that column

B) Figure out n (count how many numbers you have in your data set)

C) Figure out M 

D) Subtract M from each number in your data set  (Notice how the sum is zero)

E) Square the numbers you got for part D) and get the sum of that column

 

 

A):            D):                              E):

X                (X-M)                        (X-M)2

4                 (4-4.2)= -0.2          (-0.2)2= 0.04

6                 (6-4.2)= 1.8            (1.8)2= 3.24

3                 (3-4.2)= -1.2            (-1.2)2= 1.44

7                 (7-4.2)= 2.8             (2.8)2= 7.84

9                 (9-4.2)= 4.8             (4.8)2= 23.04

4                 (4-4.2)= -0.2            (-0.2)2= 0.04

2                 (2-4.2)= -2.2            (-2.2)2= 4.84

1                 (1-4.2)= -3.2             (-3.2)2= 10.24

4                 (4-4.2)= -0.2            (-0.2)2= 0.04

2                 (2-4.2)= -2.2            (-2.2)2= 4.84

Σ=42        Σ=0                             Σ=55.6

 

B): N=10

C): M= 42/10=4.2

Page 18: G7-quantitative

Now use the sum for part E), as well as the value for N which you found in part B) to fill in the formula:

 

 

Do the math and S2 = 5.56

Page 19: G7-quantitative

STANDARD DEVIATION:

Standard deviation is simply the square root of the variance.  Therefore, it does not matter if you use the computational formula or the conceptual formula to compute variance.

For our sample data set, our variance came out to be 5.56, regardless of the formula used.  The standard deviation for our data set then becomes:  S = = 2.36

Page 20: G7-quantitative

 INDEPENDENT SAMPLES

• The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared.

• E.g: suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test. The randomization is not essential here—if we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample t-test to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational.

Page 21: G7-quantitative

INDEPENDENT DATA ANALYSIS

Calculations:

a. Equal sample sizes, equal variance

b. Unequal sample sizes, equal variance

c. Unequal sample sizes, unequal variance

Page 22: G7-quantitative

A. EQUAL SAMPLE SIZES, EQUAL VARIANCE

This test is only used when both:

the two sample sizes (that is, the number, n, of participants of each group) are equal;

it can be assumed that the two distributions have the same variance.

Page 23: G7-quantitative

B. UNEQUAL SAMPLE SIZES, EQUAL VARIANCE

This test is used only when it can be assumed that the two distributions have the same variance.

Page 24: G7-quantitative

C. UNEQUAL SAMPLE SIZES, UNEQUAL VARIANCE

This test, also known as Welch's t-test, is used only when the two population variances are assumed to be different (the two sample sizes may or may not be equal) and hence must be estimated separately.

Page 25: G7-quantitative

WORKED EXAMPLE

• A study of the effect of caffeine on muscle metabolism used eighteen male volunteers who each underwent arm exercise tests. Nine of the men were randomly selected to take a capsule containing pure caffeine one hour before the test. The other men received a placebo capsule. During each exercise the subject's respiratory exchange ratio (RER) was measured. (RER is the ratio of CO2 produced to O2 consumed and is an indicator of whether energy is being obtained from carbohydrates or fats).

• Question: whether, on average, caffeine changes RER.• Populations: “men who have not taken caffeine” and “men who

have taken caffeine”. (If caffeine has no effect on RER the two sets of data can be regarded as having come from the same population.)

Page 26: G7-quantitative

Placebo Caffeine

105 96

119 99

100 94

97 89

96 96

101 93

94 88

95 105

98 88

Mean = 100.56

Mean = 94.22

SD = 7.70 SD = 5.61

• The means show that, on average, caffeine appears to have altered RER from about 100.6% to 94.2%, a change of 6.4%•. However, there is a great deal of variation between the data values in both samples and considerable overlap between them. • Is the difference between the two means simply due sampling variation, or does the data provide evidence that caffeine does, on average, reduce RER? >> p-value answers this question.•The t-test tests the null hypothesis that the mean of the caffeine treatment equals the mean of the placebo versus the alternative hypothesis that the mean of caffeine treatment is not equal to the mean of the placebo treatment.•Computer output obtained for the RER data gives the sample means and the 95% confidence interval for the difference between the means.

Page 27: G7-quantitative

COMPUTER OUTPUT

The p-value is 0.063 and, therefore, the difference between the two means is not statistically significantly different from zero at the 5% level of significance. There is an estimated change of 6.4% (SE = 3.17%). However, there is insufficient evidence (p = 0.063) to suggest that caffeine does change the mean RER.

Page 28: G7-quantitative

Alternative suggestionIt could be argued, however, that the researcher might only be interested in whether 'caffeine reduces RER'. That is, the researcher is looking for a specific direction for the difference between the two population means. This is an example of a one-tailed t-test as opposed to a two-tailed t-test outlined above.

SPSS only performs a 2-tailed test (the non-directional alternative hypothesis) and to obtain the p-value for the directional alternative hypothesis (one-tailed test) the p-value should be halved. Hence, in this example, p = 0.032.

Report: The mean RER in the caffeine group (94.2 ± 1.9) was significantly lower (t = 1.99, 16 df, one-tailed t-test, p = 0.032) than the mean of the placebo group (100.6 ± 2.6).

Note: It is important to decide whether a one- or two-tailed test is being carried-out, before analysis takes place.Otherwise it might be tempting to see what the p-value is before making your decision!

Page 29: G7-quantitative

A suitable null hypothesis in both cases is

H0: On average, caffeine has no effect on RER, with an alternative (or experimental) hypothesis,

H1: On average, caffeine changes RER (2-tail test), or H1: On average, caffeine reduces RER (1-tail case).

Page 30: G7-quantitative

2. ONE SAMPLE T-TEST

Compare the mean score of a sample to a known value. Usually, the known value is a population mean.

Assumption:

The dependent variable is normally distributed.

Page 31: G7-quantitative

In testing the null hypothesis that the population mean is equal to a specified value μ, use the statistic:

: sample mean S: sample standard deviationn: sample size

Page 32: G7-quantitative

2. PAIRED SAMPLES T-TEST

What it does:  compare the means of two variables compute the difference between the two variables for each case, and test to see if the

average difference is significantly different from zero

Assumption:Both variables should be normally distributed.

Page 33: G7-quantitative

Hypothesis: Null: There is no significant difference between the means of the two variables. Alternate: There is a significant difference between the means of the two variables.

Page 34: G7-quantitative

Difference between a paired samples t-test and an independent samples t-test?

Both tests are used to find significant differences between groups, but the independent samples t-test assumes the groups are not related to each other, while the dependent samples t-test or paired samples t-test assumes the groups are related to each other.

A dependent samples t-test or paired samples t-test would be used to find differences within groups, while the independent samples t-test would be used to find differences between groups. 

Page 35: G7-quantitative

Independent variable and dependent variable: The independent variable and the dependent variable is the same in both the

dependent samples t-test and the independent samples t-test. The variable of measure of the variable of interest is the dependent variable and the

grouping variable is the independent variable. 

Page 36: G7-quantitative

The most common use of the dependent samples t-test is in a pretreatment vs. posttreatment scenario where the researcher wants to test the effectiveness of a treatment.

1. The participants are tested pretreatment, to establish some kind of a baseline measure

2. The participants are then exposed to some kind of treatment

3. The participants are then tested posttreatment, for the purposes of comparison with the pretreatment scores

Page 37: G7-quantitative

For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups. The average and standard deviation of those differences are used in the equation. The degree of freedom used is n − 1.

Page 38: G7-quantitative

EXAMPLE: SPSS OUTPUT

We compared the mean test scores before (pre-test) and after (post-test) the subjects completed a test preparation course.

We want to see if our test preparation course improved people's score on the test

Page 39: G7-quantitative

The post-test mean scores are higher.

Page 40: G7-quantitative

There is a strong positive correlation. People who did well on the pre-test also did well on the post-test.

Page 41: G7-quantitative

Remember, this test is based on the difference between the two variables. Under "Paired Differences" we see the descriptive statistics for the difference between the two variables.

Page 42: G7-quantitative

The T value = -2.171

We have 11 degrees of freedom

Our significance is .053

Page 43: G7-quantitative
Page 44: G7-quantitative

If the significance value is less than .05, there is a significant difference.If the significance value is greater than. 05, there is no significant difference.

Conclusion: There is no difference between pre- and post-test scores. Our test preparation course did not help!

Page 45: G7-quantitative

ANOVA

PRESENTER: TRAN NHU HANH

Page 46: G7-quantitative

WHAT IS ANOVA?

• ANOVA is an analysis of the variation present in an experiment. It is a test of the hypothesis that the variation in an experiment is no greater than that due to normal variation of individuals' characteristics and error in their measurement.

• ANOVA, is a technique from statistical interference that allows us to deal with several populations

Page 47: G7-quantitative

TYPES OF ANOVA

1. One-way ANOVA

2. Two-way ANOVA

Page 48: G7-quantitative

ONE-WAY ANOVA DEFINITION

• A One-way ANOVA is used when comparing two or more group means on a continuous dependent variable. In other words, one-way ANOVA techniques can be used to study the effect of k(>2) levels of a single factor.

• The independent T-Test is a special case of the One-way ANOVA for situatiosn where there are only two group means

Page 49: G7-quantitative

MAJOR CONCEPTS:

1. CALCULATING SUMS OF SQUARES• The One-way ANOVA separates the total variance in the continuous

dependent variable into two components: Variability between the groups and Variability within the groups

• Variability between the groups is calculated by first obtaining the sums of squares between groups (SSb), or the sum of the square differences between each indibidual group mean from the grand mean

• Variability within the groups is calculated by first obtaining the sums of squares within groups (SSw) or the sum of the squared differences beyween each individual score and that individual’s group mean.

Page 50: G7-quantitative

TYPES OF VARIABLES FOR ONE-WAY ANOVA

• The IV (Independent Variable) is categorical. The categorical IV can be two groups or it can have more than two groups.

• The DV (Dependent Variable) is continuous

• Data are collected on both variables for each person in the study.

Page 51: G7-quantitative

EXAMPLES OF RESEARCH QUESTIONS FOR ONE-WAY ANOVA

1. Is there a significant difference in student attitudes toward the course between students who pass or fail a course?

• Student attitude is continuous

• Passing a course is categorical (pass/fail)

Because the IV has only 2 groups, we can use independent T-Test

2. Does student satisfaction significantly differ by location of institution (rural, urban, suburban)?

• Student satisfaction is continuous

• Institution location is categorical

Page 52: G7-quantitative

The linear model, conceptually, is:

SSt = SSb + SSw

SSt: total sums of squares

SSb: sums of squares between groups

SSw: sums of squares within groups

Page 53: G7-quantitative

ONE-WAY ANOVA AS A RATIO OF VARIANCES:

Formula for variance:

Numerator: a sum of squared values (or a sums of squares)

Denominator: degrees of freedom

Page 54: G7-quantitative

• The ANOVA analyzes the ratio of the variance between groups the variance within the groups

• In ANOVA, these variances, formerly known to us as , are referred as mean squares (MS). Mean squares are calculated by dividing each sum of squares by the degrees of freedom associated with it.

Page 55: G7-quantitative

• Thus, a mean square between is simply the variance between groups obtained by a sums of squares divided by degrees of freedom

• Likewise, a mean square within is simply the variance between groups obtained by a sums of squares divided by degrees of freedom

Page 56: G7-quantitative

FACTORS THAT AFFECT SIGNIFICANCE

F -ratio: the variation due to an experimental treatment or effect divided by the variation due to experimental error. The null hypothesis is this ratio equals 1.0, or the treatment effect is the same as the experimental error. This hypothesis is rejected if the F-ratio is significantly large enough that the possibility of it equaling 1.0 is smaller than some pre-assigned criteria such as 0.05 (one in twenty)

The MSb and the MSw are then divided to obtain the F ratio

for hypothesis testing

Page 57: G7-quantitative

DISTRIBUTION OF F - RATIO

• F distribution is positively skewed• If F statistic falls near 1.0, then

most likely the null is true• If F statistic is large, expect null is

false. Thus, signigicant F ratios will be in the tail of the F distribution

Page 58: G7-quantitative

P VALUE

In statistical hypothesis testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α, which is often 0.05 or 0.01.

Page 59: G7-quantitative
Page 60: G7-quantitative

• The larger the value of t, the more liley we are to find significant results

• t is a special case of ANOVA when only two groups comprise the independent variable

• We’re famimilar with the t distribution as normally distributed (for large df), with positive and negative values. The F statistics, on the other hand, is positively skewed, and is comprised of squared values. Thus, for any two group situation, t2= F

t2= F

Page 61: G7-quantitative

CALCULATIONS

• dfb = k-1(k: numbers of samples/ groups/ levels)

• dfw = N- k (total of individuals in groups)

• dfT = N -1

• MSb = SSb/ dfb

• MSw = SSw/ dfw

• F = MSb/ MSw

Page 62: G7-quantitative

STEPS IN ONE-WAY ANOVA

STEP 1: STATE HYPOTHESES

To determine if different levels of factor affect measured observations differently, the following hypotheses are tested.

• There is no significant difference among groups in variable X

• There is a significant difference between at least two of the groups in the variable X. In other words, at least one mean will significantly differ.

Page 63: G7-quantitative

STEP 2: SET THE CRITERION FOR REJECTING HO

Page 64: G7-quantitative

STEP 3: COMPUTE TEST STATISTIC

Page 65: G7-quantitative

STEP 4: COMPARE TEST STATISTIC TO CRITERION

Page 66: G7-quantitative

STEP 5: MAKE DECISION

• Fail to reject the null hypothesis and conclude tha there is no significant different among the group F(dfb, dfw) = insert F statistic, p> insert α

• Reject the null hypothesis and conclude that there is a significant difference among the grou F(dfb, dfw) = insert F statistic, p <insert α

Page 67: G7-quantitative

Difference between one-way and two-way ANOVA

ANOVA Test

TWO-WAY ANOVA

Page 68: G7-quantitative

ONE-WAY ANOVA

• One-Way ANOVA has one independent variable (1 factor) with > 2 conditions

– conditions = levels = treatments

– e.g., for a brand of cola factor, the levels are:

Coke, Pepsi, RC Cola

• Independent variables = factors

Page 69: G7-quantitative

TWO-WAY ANOVA

• Two-Way ANOVA has 2 independent variables (factors)– each can have multiple conditions

Example• Two Independent Variables (IV’s)– IV1: Brand; and IV2: Calories– Three levels of Brand:

• Coke, Pepsi, RC Cola- Two levels of Calories:

• Regular, Diet

Page 70: G7-quantitative

WHEN TO USE

• One-way ANOVA: you have more than two levels (conditions) of a single IV

– EXAMPLE: studying effectiveness of three types of pain reliever

aspirin vs. tylenol vs. ibuprofen• Two-way ANOVA: you have more than one IV (factor)

– EXAMPLE: studying pain relief based on pain reliever and type of pain• Factor A: Pain reliever (aspirin vs. tylenol)• Factor B: type of pain (headache vs. back pain

Page 71: G7-quantitative

NOTATION

Factor A Factor B. a : the number of categories of Factor A, b : the number of categories of Factor B.Total number of groups is ab.TThe total number of observations N .The response/dependent variable value for each

observation :Yijk , where i : the subject’s category for Factor A, and j : the

subject’s category for Factor B. Then i and j together : a group, and k denotes which individual we’re talking about within this particular group.

The number of observations in each group n and N = abn.

Page 72: G7-quantitative

How the number of hours of TV people watch per week depends on two variables: gender and age. Each person is classified according to gender (male, female) and age (18–24, 25–54,55+).

There are six groups—one for each combination of gender and age. We randomly sample five people from each group, and each person reports the time, in hours, that he or she watches TV per week. The data is shown in

Page 73: G7-quantitative

Age 18–24

Age 25–54

Age 55+

Male 2027202228

2321232828

3333393337

Female 2519273231

3226333324

4443524354

Page 74: G7-quantitative

TWO-WAY ANOVA TABLE

1. Sums of squares.

2. Degrees of freedom.

3. Mean squares.

Page 75: G7-quantitative

There are three main questions that we might ask in two-way ANOVA:

• Does the response variable depend on Factor A?

• Does the response variable depend on Factor B?

• Does the response variable depend on Factor A differently for different values of Factor B, and vice versa?

Whether TV viewing time depends on age and gender.

The third question asks whether TV viewing time depends on gender differently for people of different ages, or whether TV viewing time depends on age differ- ently for men than for women.

(For example, perhaps it’s true that women 55+ watch more TV than men 55+, but women 18–24 watch less TV than men 18–24.)

Page 76: G7-quantitative

1.Sums of Squares

Two-way ANOVA involves five different sums of squares:

• The total sum of squares, SS Tot , measures the total variability in the response variable values. Its formula is

• The Factor A sum of squares, SS A, measures the variability that can be explained by differences in Factor A. Its formula is

Page 77: G7-quantitative

_Yij● represents the sample mean of the group in category i of

Factor A and category j of Factor B (always an average of n observations)._

Yi●●represents the sample mean of all the data in category i of Factor A combined (always an average of bn observations)._

Y●j●represents the sample mean of all the data in category j of Factor B combined (always an average of an observations)._

Y●●●represents the overall sample mean of all the data from all groups combined (always an average of all abn = N observations).

Page 78: G7-quantitative

• The Factor B sum of squares, SS B , measures the variability that can be explained by differences in Factor B. Its formula is

•The interaction sum of squares, SS AB , measures the variability that can be explained by interaction between the effects of Factors A and B. (We’ll talk more about what this means later.) Its formula is

•The error sum of squares, SS E , measures the variability of the ob- servations around their group sample means. Its formula is

Page 79: G7-quantitative

•If we call the sample standard deviation within each group sij , then another formula for SS E is

Page 80: G7-quantitative

Degrees of freedom

Page 81: G7-quantitative

Mean squares.

Page 82: G7-quantitative

ANOVA TABLE

Page 83: G7-quantitative

Using statistical software-

Page 84: G7-quantitative

TWO-WAY ANOVA HYPOTHESIS TESTS

• Does the response variable depend on Factor A?

• Does the response variable depend on Factor B?

• Does the response variable depend on Factor A differently for different values of Factor B, and vice versa?

Main effects

Interaction

Page 85: G7-quantitative

Interaction :

We say that there is interaction if Y depends on Factor A differently for different values of Factor B, and vice versa.

Similarly, we say that there is NO interaction if Y depends on Factor A in the same way for all values of Factor B, and vice versa.

Page 86: G7-quantitative

HYPOTHESES

In the test for interaction, the null hypothesis (Ho) is that there is no interaction, while the alternative hypothesis (Ha) is that there is interaction.

Page 87: G7-quantitative

There is no interaction on the left. For each age group, women average watching five more hours of TV per week than men. For each gender, the middle age group averages watching six more hours of TV per week than the youngest age group, and the oldest age group averages watching nine more hours of TV per week than the middle age group.

• There is interaction on the right. For each age group, women average watching more TV than men, but how much more varies for the different age groups. Also, for each gender, older people average watching more TV, but how much more varies by gender.

Page 88: G7-quantitative

ASSUMPTIONS

The assumptions for the two-way ANOVA F test for interaction are exactly the same as those of the one-way ANOVA F test, with one additional re- quirement: the number of observations should be the the same for all groups.

Page 89: G7-quantitative

TEST STATISTIC

Page 90: G7-quantitative

P-VALUE

Page 91: G7-quantitative

DECISION

Page 92: G7-quantitative

• If we believe there is interaction, then we don’t bother to ask whether the response depends on Factor A or Factor B separately—the fact that there is interaction means that the response depends on Factor A differently for different values of Factor B, and vice versa. So we stop here and do not perform the tests for main effects (which we’ll talk about in the next subsection).• If we believe it’s reasonable that there is no interaction, then that means we can look at the effects of Factor A and Factor B separately, so we proceed to the tests for main effects.