statistics project second semester

8
Taegh Sokhey Title: Is the distribution of M&M candy colors in each bag the same as the manufacturer (MARS) claims? Introduction: A while back, my friend gave me a packet of M&Ms, upon opening it; I was surprised to see that each and every M&M was blue! I wondered whether or not this was just chance variation, or whether the customer was not being told the correct distribution of candies. Ho: The color distribution of M&M’s is consistent with the specified distribution: 24% blue, 14% brown, 16% green, 20% orange, 13% red, and 14 % yellow. Ha: The color distribution of M&M’s is not consistent with the specified distribution: 24% blue, 14% brown, 16% green, 20% orange, 13% red, and 14 % yellow. α: 0.05 Response Variable: Percentage of M&M colors in total. Analysis- The data was analyzed using the Chi-Square Goodness of Fit (GOF) test , ti-84, Mini-Tab (To confirm the results from the Ti-84). The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern. I am using this test because two values are involved, an observed value, (which is the frequency of colors from a sample), and the expected frequency. (Which is calculated based upon MARS’s claimed distribution.)

Upload: taegh-singh

Post on 05-May-2017

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics Project Second Semester

Taegh Sokhey

Title: Is the distribution of M&M candy colors in each bag the same as the manufacturer (MARS) claims?

Introduction: A while back, my friend gave me a packet of M&Ms, upon opening it; I was surprised to see that each and every M&M was blue! I wondered whether or not this was just chance variation, or whether the customer was not being told the correct distribution of candies.

Ho: The color distribution of M&M’s is consistent with the specified distribution: 24% blue, 14% brown, 16% green, 20% orange, 13% red, and 14 % yellow.

Ha: The color distribution of M&M’s is not consistent with the specified distribution: 24% blue, 14% brown, 16% green, 20% orange, 13% red, and 14 % yellow.

: 0.05 α

Response Variable: Percentage of M&M colors in total.

Analysis- The data was analyzed using the Chi-Square Goodness of Fit (GOF) test , ti-84, Mini-Tab (To confirm the results from the Ti-84). The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern.

I am using this test because two values are involved, an observed value, (which is the frequency of colors from a sample), and the expected frequency. (Which is calculated based upon MARS’s claimed distribution.)

Page 2: Statistics Project Second Semester

Taegh Sokhey

Data Collection

I purchased a wholesale bag of M&M (Fun Size) candies from the local Target. I obtained 30 samples of M&M candy packets, or 617 samples of M&Ms; this is because we know that there are more than 300 packets, or 6170 individual M&Ms in existence.

I went through each packet and compiled the M&Ms. I then proceeded to separate every individual color of M&M into their respective categories; blue, brown, green, orange, red and yellow.

I have conducted this test assuming that M&M manufacturer; MARS has not sneakily changed the manufacturing process of these candies to ensure that they meet the suggested distribution.

________________________________________________________________________

Conditions:

Random: Random sampling was used.

Page 3: Statistics Project Second Semester

Taegh Sokhey

Large Sample Size: All of the expected counts of M&Ms were all at least 5 in ………each category.

Independent: There are more than 6170 M&Ms that are/were in existence.

- It is safe to conduct a Chi Square GOF test _______________________________________________________________________

After separating the M&Ms, I proceeded to count how many were in each category. After I found the “Observed” values of each color (category), I added the amount of M&Ms in each category to find the overall total number of M&Ms. the total came out to be 617. (N) The expected values were found by multiplying N (617) X P(24% blue or 14% brown or 16% green or 20% orange or 13% red, or 14 % yellow.)

The color distribution from the sample:

Colors Blue Brown Green Orange Red YellowObserved

165 94 106 96 74 82

Expected

148.08 86.38 98.72 123.4 80.21 86.38

I placed the data into my Ti-84 Calculator and performed the Chi Square GOF test. I used DF= N-1 -> 617-1 = 616

Page 4: Statistics Project Second Semester

Taegh Sokhey

The formula for the Chi-Square Test is:

Although no graphs were required in order to perform the test, I wanted to compare a pie chart I made with one packet from the physical sample set with a pie chart I constructed with MiniTab

Page 5: Statistics Project Second Semester

Taegh Sokhey

Page 6: Statistics Project Second Semester

Taegh Sokhey

From Mini Tab (Trial Version)

________________________________________________________________________

Discussion And Conclusions!

Procedure: Chi Square GOF test

The test statistic was 582.07 which means that the p-value is 0.833, this p-value is greater than our significance level (α) 0.05, therefore, we fail to reject the null hypothesis that the color distributions of M&Ms are different from the distributions that MARS suggests, and we don’t have significant evidence to conclude that the distributions are different from their claims.

These results are valid because all of the conditions for the test were met, the test was appropriate because the data was categorical and also because I did this test by hand, with mini tab and also with the Ti-84 Calculator. I have also had this test analyzed by two postgraduates of this course.

Page 7: Statistics Project Second Semester

Taegh Sokhey

Compiling the data was very difficult. I had to count and recount these M&Ms three times to ensure I had the right number. Additionally, they were very costly.

An error I had made was that I tried to collect data sample by sample; this was a tabulating mess as keeping values concrete was very frustrating. The image of me counting the blue M&M’s displays this inferior sampling method.

Instead of doing this, I decided to pool the M&M’s together and then count and divide them into categories. Luckily, I was able to fix this error.

Recommendations for next year: Have a semester project first semester also. This was a really rewarding and challenging experience. I am fortunate that I decided to do this project as I have learned a lot and refreshed my statistical comprehension.

Here is a photo of me collecting the data using the more difficult method:

Page 8: Statistics Project Second Semester

Taegh Sokhey