please pick up a m&m activity sheet, form a group of 2-3 and choose a bag of m&ms

Download Please pick up a M&M activity sheet, form a group of 2-3 and choose a bag of M&Ms

If you can't read please download the document

Upload: godwin-sherman

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Please pick up a M&M activity sheet, form a group of 2-3 and choose a bag of M&Ms
  • Slide 2
  • Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests
  • Slide 3
  • There are six colors so k = 6. Suppose we wanted to determine if the proportions for the different colors in a large bag of M&M candies matches the proportions that the company claims is in their candies. We could record the color of each candy in the bag. This would be univariate, categorical data. How many categories for color would there be? k is used to denote the number of categories for a categorical variable
  • Slide 4
  • M&M Candies Continued... We could count how many candies of each color are in the bag. A one-way frequency table is used to display the observed counts for the k categories. RedBlueGreenYellowOrangeBrown 232821192225 A goodness-of-fit test will allow us to determine if these observed counts are consistent with what we expect to have.
  • Slide 5
  • Goodness-of-Fit Test Procedure Null Hypothesis: H 0 : p 1 = hypothesized proportion for Category 1 p k = hypothesized proportion for Category k H a : H 0 is not true Test Statistic:... The goodness-of-fit statistic, denoted by X 2, is a quantitative measure to the extent to which the observed counts differ from those expected when H 0 is true. The X 2 value can never be negative. Read chi-squared The goodness-of-fit test is used to analysze univariate categorical data from a single sample.
  • Slide 6
  • Goodness-of-Fit Test Procedure Continued... P -values: When H 0 is true and all expected counts are at least 5, X 2 has approximately a chi-square distribution with df = k 1. Therefore, the P -value associated with the computed test statistic value is the area to the right of X under the df = k 1 chi- square curve. Assumptions: 1)Observed cell counts are based on a random sample 2)The sample size is large enough as long as every expected cell count is at least 5
  • Slide 7
  • Different df have different curves curves are skewed right As df increases, the 2 curve shifts toward the right and becomes more like a normal curve Facts About 2 distributions df=3 df=5 df=10
  • Slide 8
  • A common urban legend is that more babies than expected are born during certain phases of the lunar cycle, especially near the full moon. The table below shows the number of days in the eight lunar phases with the number of births in each phase for 24 lunar cycles. Lunar PhaseNumber of DaysNumber of Births New Moon247680 Waxing Crescent15248,442 First Quarter247579 Waxing Gibbous14947,814 Full Moon247711 Waning Gibbous15047,595 Last Quarter247733 Waning Crescent15248,230 There are eight phases so k = 8.
  • Slide 9
  • Lunar Phases Continued... There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days. Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon247680 =24/699=.0343 Waxing Crescent15248,442 First Quarter247579 Waxing Gibbous14947,814 Full Moon247711 Waning Gibbous15047,595 Last Quarter247733 Waning Crescent15248,230
  • Slide 10
  • Lunar Phases Continued... There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days. Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon247680 =24/699=.0343 Waxing Crescent15248,442.217 First Quarter247579.0343 Waxing Gibbous14947,814.213 Full Moon247711.0343 Waning Gibbous15047,595.215 Last Quarter247733.0343 Waning Crescent15248,230.217
  • Slide 11
  • Lunar Phases Continued... There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days. Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon247680 0343 =.0343*222784=7641.49 Waxing Crescent15248,442.217 First Quarter247579.0343 Waxing Gibbous14947,814.213 Full Moon247711.0343 Waning Gibbous15047,595.215 Last Quarter247733.0343 Waning Crescent15248,230.217
  • Slide 12
  • Lunar Phases Continued... There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days. Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon247680 0343 =.0343*222784=7641.49 Waxing Crescent15248,442.217 48433.24 First Quarter247579.0343 7641.49 Waxing Gibbous14947,814.213 47452.27 Full Moon247711.0343 7641.49 Waning Gibbous15047,595.215 47809.44 Last Quarter247733.0343 7641.49 Waning Crescent15248,230.217 48433.24
  • Slide 13
  • Lunar Phases Continued... H 0 : p 1 =.0343, p 2 =.2175, p 3 =.0343, p 4 =.2132, p 5 =.0343, p 6 =.2146, p 7 =.0343, p 8 =.2175 H a : H 0 is not true Test Statistic: P -value >.10df = 7 =.05 Since the P -value > , we fail to reject H 0. There is not sufficient evidence to conclude that lunar phases and number of births are related. What type of error could we have potentially made with this decision? Type II The X 2 test statistic is smaller than the smallest entry in the df = 7 column of Appendix Table 8.
  • Slide 14
  • Slide 15
  • Slide 16
  • Get your quizzes and homework from your folder Have your practice test out
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • A study was conducted to determine if collegiate soccer players had in increased risk of concussions over other athletes or students. The two-way frequency table below displays the number of previous concussions for students in independently selected random samples of 91 soccer players, 96 non-soccer athletes, and 53 non-athletes. Number of Concussions 012 3 or more Total Soccer Players4525111091 Non-Soccer Players68158596 Non-Athletes4553053 Total158452215240 These values in green are the observed counts. Also called a contingency table. These values in blue are the marginal totals. This value in red is the grand total. This is univariate categorical data - number of concussions - from 3 independent samples. If there were no difference between these 3 populations in regards to the number of concussions, how many soccer players would you expect to have no concussions? We would expect (158/240)(91).
  • Slide 23
  • X 2 Test for Homogeneity Null Hypothesis: H 0 : the true category proportions are the same for all the populations or treatments Alternative Hypothesis: H a : the true category proportions are not all the same for all the populations or treatments Test Statistic: The 2 Test for Homogeneity is used to analyze univariate categorical data from 2 or more independent samples.
  • Slide 24
  • X 2 Test for Homogeneity Continued... Expected Counts: (assuming H 0 is true) P -value: When H 0 is true and all expected counts are at least 5, X 2 has approximately a chi-square distribution with df = (number of rows 1)(number of columns 1). The P -value associated with the computed test statistic value is the area to the right of X under the appropriate chi-square curve.
  • Slide 25
  • X 2 Test for Homogeneity Continued... Assumptions: 1)Data are from independently chosen random samples or from subjects who were assigned at random to treatment groups. 2)The sample size is large: all expected cell counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.
  • Slide 26
  • Soccer Players Continued... Number of Concussions 012 3 or more Total Soccer Players 4525111091 Non-Soccer Players 68158596 Non-Athletes 4553053 Total 158452215240 State the hypotheses. H 0 : Proportions in each response category (number of concussions) are the same for all three groups H a : Category proportions are not all the same for all three groups Df = (2)(3) = 6 To find df count the number of rows and columns not including the totals! df = (number of rows 1)(number of columns 1) Another way to find df you can also cover one row and one column, then count the number of cells left (not including totals)
  • Slide 27
  • Number of Concussions 012 3 or more Total Soccer Players 45 (59.9)25 (17.1)11 (8.310 (5.7)91 Non-Soccer Players 68 (63.2)15 (18.0)8 (8.8)5 (6.0)96 Non-Athletes 45 (34.9)5 (10.0)3 (4.9)0 (3.3)53 Total 158452215240 Number of Concussions 012 or moreTotal Soccer Players45 (59.9)25 (17.1)21 (14.0)91 Non-Soccer Players68 (63.2)15 (18.0)13 (14.8)96 Non-Athletes45 (34.9)5 (10.0)3 (8.2)53 Total1584522240 Soccer Players Continued... Expected counts are shown in the parentheses next to the observed counts. df = 4 Test Statistic: Notice that NOT all the expected counts are at least 5. So combine the column for 2 concussions and the column for 3 or more concussions. This combined table has a df = (2)(2) = 4. P -value