utsg sta220 midterm 2012s

Upload: examkiller

Post on 07-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    1/12

    1

    STA220H Term Test Jun 7, 2012

    Last Name: ____tions_____ First Name:____Solu________ Student #:_____________

    TA: Avideh Panpan Lingling (circle one)

    Time allowed: 1 hour, 50 minutes.

    Aids: one sided handwritten aid sheet + non-programmable calculator + dictionary

    Check that you have all the consecutively numbered pages of this test, up to the lastpage which says THE END at the bottom.

    Normal tables are either attached or a separate handout. Please give all probabilities tofour decimal places unless they are unnecessary zeroes.

    Best marks go to best answers, as a general rule, particularly where some explanationis requested, so try to be complete but also clear and concise; a lot of nonsense candecrease your grade. Answers should be in context of the study.

    Be sure to proportion your time carefullyamong the questions andlimit your time spent on any single question, as time may be tight.

    Show your work and answer in the space provided (or indicate clearly where tolook), and in ink. Pencil may be used, but then remarks will not be allowed. Use

    the back of pages for rough work.

    Marks are shown in brackets at the end of the question parts, and are distributed asfollows:

    Good luck!!

    Question 1 2 3 4 5 6 7 8 Total

    Max 20 10 14 12 7 10 10 17 100

    Grade

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    2/12

    2

    1) A manufacturer of cereal finds that the masses of cereal in the companys 205g-labelled boxes are normally distributed with a mean of 200g and a standarddeviation of 16.3g. Sketch a picture of this distribution, and label the mean and st.dev.

    [3]a) Describe the accuracy of the manufacturers filling technique:

    Circle one: Biased Unbiased [1]

    b) Approximately what proportion of these boxes contain between 183.7g and 216.3g ofcereal?

    [1]68% (from empirical rule)

    c) What is the (exact) probability that a box selected at random contains more than205g of cereal?

    [2]

    d) Determine the range of masses that you would expect 90% of these boxes of cerealto contain (Assuming we mean the middle 90%). [3]

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    3/12

    3

    g)The quality control engineer decides to sample every 12thbox. Could this beconsidered a Simple Random Sample? Why or why not? If there are any potentialproblems with this technique, explain what can go wrong.

    [3]

    (2) No, two boxes beside each other have no chance of being in the sample together,so not SRS. This is systematic sampling. (1)Problems would arise if there is anatural period of 12, such as 12x10 boxes in the shipment.

    h) Another engineer decides to draw a (random) sample of 3 shipments, and thensamples 3 of the 120 boxes from each shipment. Name this sampling technique.

    [2]Multi-stage (1)(or 2-stage) Cluster (1)

    ItemID6711

    Downlo

    aderID:

    278

    ItemID:6711

    ItemID:6711

    ItemID:6711

    DownloaderID:278

    Downlo

    aderI

    D:27

    8

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    4/12

    4

    2)

    Item

    ID:

    6711

    Downlo

    ader

    ID:

    278

    I

    I

    D

    I

    I

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    5/12

    5

    3) A Biologist (Pierce, 1949) believed that the chirp rate of crickets might be influencedby temperature. He measured the number of chirps per second for different cricketsat different temperatures (in F). A portion of the (modified) data and some Minitaboutput is shown below.

    Regression Analysis: chirp_rate versustemperature

    The regression equation is(A)= (B)+ 0.214 temperature

    Predictor Coef SE Coef T PConstant -0.435 2.323 -0.19 0.853temperature 0.21443 0.02910 7.37 0.000

    S = 0.994214 R-Sq = (C)

    Analysis of Variance

    Source DF SS MS FRegression 1 (D) 53.669 54.30Residual Error 27 26.688 0.988Total 28 80.358

    temperature

    chirp_

    rate

    959085807570

    22

    21

    20

    19

    18

    17

    16

    15

    14

    Scatterplot of chirp_rate vs temperature

    chirp_rate temperature

    20.0 88.6

    16.0 71.6

    19.8 93.3

    18.4 84.3

    17.1 80.6

    a) Four items have been replaced with letters. Fill in what they should be.[4]

    A - chirp_rate

    B - -0.435

    C - 66.8% or 0.668

    D - 53.67

    b) What does the estimated slope of 0.214 tell us about the relationship betweenchirp rate and temperature? Be specifica qualitative answer will not sufficehere.

    [2]For each 1 degree F increase in temperature (1), we expect the chirp rate toincrease by 0.214 chirps/sec (1).

    ItemID:6711

    DownloaderID:278

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    6/12

    6

    The residual plots for this analysis are shown below.

    Standardized Residual

    Percent

    210-1-2

    99

    90

    50

    10

    1

    Fitted Value

    Standardiz

    ed

    Residual

    1918171615

    2

    1

    0

    -1

    -2

    Standardized Residual

    Frequency

    1.51.00.50.0-0.5-1.0-1.5-2.0

    8

    6

    4

    2

    0

    Observation Order

    Standardized

    Residual

    282624222018161412108642

    2

    1

    0

    -1

    -2

    Normal Probability Plot of the Residuals Residuals Versus the Fitted Values

    Histogram of the Residuals Residuals Versus the Order of the Data

    Residual Plots for chirp_rate

    c) Do you see any problems with these residuals? If so, list the problems in order ofimportance. If not, state why you came to this conclusion.

    [3](1) No problems.

    No pattern in the order plot(1) No pattern and constant variance in the residual plot(1) Straight line in the NPP

    d) Predict the chirp rate for a cricket at 78F. Are there any problems with thisprediction? Explain if so, ignoring the prediction if you like.

    [2](1) CR = -0.435 + 0.241(78) = 16.26 chirps/s

    (1) No problems.

    e) Predict the temperature for a chirp rate of 21. Are there any problems with thisprediction? Explain if so, ignoring the prediction if you like.[3]

    (2) Cannot do this without regressing temperature on chirp rate.(1) Also, 21 is at the boundary of our data so borderline extrapolation.

    ItemID6711

    DownloaderID:278

    Item

    ID:67

    11

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    7/12

    7

    4) Lets further examine the biology data now.Stem-and-Leaf Display: temperatureStem-and-leaf of temperature N = 29Leaf Unit = 1.0

    4 6 99996 7 116 7

    8 7 5510 7 6612 7 88(5) 8 0000112 8 222233334 8 43 83 8 82 92 9 33

    a) Which numerical measures would you choose in order to provide a good pictureof this distribution, and why?

    [3]

    Choose 5-number summary (2) since distribution is not unimodal and symmetric(1).

    b) State the two quartiles and the median for the temperature variable.[3]

    Q1= 75M = 80Q3= 83

    c) How large would a temperature measurement need to be for Minitabs boxplotroutine to identify it as an outlier?

    [2](1) IQR = Q3Q1= 8375 = 8(1) Upper fence = Q3+ 1.5 x IQR = 83 + 1.5(12) = 95

    ItemID:6711

    DownloaderID:278

    Item

    ID:67

    11

    Downl

    oader

    ID:2

    78

    78

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    8/12

    8

    If temperature is converted to degrees Celsius [C = 0.55*(F - 32)], the mean willchange but the standard deviation will stay the same.

    True or False? (circle one) [1]

    d) If temperature is converted to degrees Rankine [R = F + 460)], the mean willchange but the standard deviation will stay the same.

    True or False? (circle one) [1]

    e) If you were to calculate the 10% trimmed mean for temperature, you wouldexpect it to be __________ than (as) the sample mean? (circle one)

    Lower About the same Higher [1]

    f) The median and the trimmed mean are more ___resistant___ to outliers than themean.

    [1]

    5)The head table at a particular wedding consists of the bride, her four bridesmaids,the groom, and his four groomsmen. All face the crowd.

    a) How many possible arrangements of the bridal party are there?[1]

    10! = 3,628,800

    Suppose that the bride and groom want to sit together for the rest of Q5. (!)

    b) In addition, all of the groomsmen (and bridesmaids) want to sit together. Howmany arrangements are there now? What is the probability of this happeningwith a random seating arrangement?

    [2]3!4!4!2! = 6912 P(X) = 6912/10! = 0.0019

    c) Somehow, all of the groomsmen are identical quadruplets (they all look identical).How many uniquely identifiable arrangements are there now?

    [2]

    9!2!/4! = 30,240

    d) One of the groomsmen just broke up with one of the bridesmaids. What is theprobability that they are not sitting beside each other?

    [2]9!2!8!2!2! = 564,480 P(X) = 564,480/9!2! = 0.7778

    ItemID:6

    711

    Down

    load

    erI

    D:27

    8

    Item ID:6711

    DownloaderID:278

    DownloaderID:278

    I

    ItemID:6711

    DownloaderID:278

    Downlo

    aderI

    D:27

    8

    Downl

    oader

    ID:2

    78

    I

    Downloader

    ID:278

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    9/12

    9

    6) Suppose a track and field athlete at the upcoming Olympics undergoes drug testing.The testing procedure has 95% sensitivity (prob. of correctly identifying an athletewho is using drugs) and 90% specificity (prob. of correctly clearing an athlete who isnot using drugs). Assume that 2% of all T&F athletes at the upcoming games havehad positive drug tests in the past, and can be presumed to be using drugs now.

    a) Draw a tree diagram to represent all of the possible outcomes for this test.Include the probabilities along the branches.

    [4]

    b) If an athlete tests positive for drugs, what is the probability that they are actuallyon drugs?

    [3]

    c) How would you expect this probability to change if the incidence of drug use inthe population was actually 10% instead of 2%?

    [1]It should increase

    d) Identify the main problem with this testing procedure.[2]

    (2)The specificity is way too low. Accusing 10% of all clean athletes of a druginfringement would be intolerable to the athletes.

    Note: The low conditional probability from b) is a result of the above, not the problem.

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    10/12

    10

    7) A research team wants to determine what brand of cat food results in the leastamount of cat hair on furniture. The team runs an experiment with 300 cats,separated into long-haired and short-haired and each with their own living space.Each cat is given one of three brands of cat food (cheap dry food, fancy dry food, orwet food) as well as either purified water or tap water. The team measures theamount of hair on the floor of each living space.

    Identify all the key design elements, such as: [10]

    a) the factors, levels, and treatments(1) Factor A (3 levels): cheap dry food, fancy dry food, wet food(1) Factor B (2 levels): purified water, tap water

    (1) Treatments (6 possible): A1B1 A1B2 A2B1 A2B2 A3B1 A3B2

    b) any blocking variables (if present)(1) Blocked by hair type (short/long)

    c) response variable(s)(2) Amount of hair on the floor

    d) use of blindingNone or None at researcher level, but assume animals are not aware of

    the experiment so single-blind(1)

    e) possible improvements to the designOne of (1):- introduce blinding by having the food delivered automatically to the cats or

    having the person who measures the amount of hair unaware of treatments- block by sex of the cats

    f) We could use side-by-side boxplots to compare the amount of hair left on thefloor for each type of cat food. True or False ?

    g) If we notice a significant difference between the mean hair found from differentbrands, we can assume a cause-effect relationship. True or False ?

    ItemID6711

    ItemID:6711

    DownloaderID:278

    Downloader ID: 278

    D

    ItemID:6711

    DownloaderID:278

    Item ID:6711

    DownloaderID:278 D

    Item

    ID:

    67

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    11/12

    11

    8) Regression, againBelow is some output from a regression analysis performed on a dataset containingthe age and systolic blood pressure measurement for 30 patients. These patientswere a random sample from all of the patients at a medical clinic in Toronto.

    The regression equation isblood_pressure = 98.7 + 0.971 age

    Predictor Coef SE Coef T PConstant 98.71 10.00 9.87 0.000age 0.9709 0.2102 4.62 0.000

    S = 17.3137 R-Sq = 43.2%

    Analysis of Variance

    Source DF SS MS FRegression 1 6394.0 6394.0 21.33Residual Error 28 8393.4 299.8Total 29 14787.5

    Unusual Observationsage

    blood_

    pressure

    70605040302010

    220

    200

    180

    160

    140

    120

    100

    Scatterplot of blood_pressure vs age

    Obs age blood_pressure Fit SE Fit Residual St Resid2 47.0 220.00 144.35 3.19 XXX XXX

    a) Something was minimized by this regression procedure. What is it, in very simplewords, and what is the actual (minimal) numerical value of this quantity for theregression here?

    [3]The sum of the squares of the differences (1)between actual observed values andtheir predicted values (1)was minimized. The sum is 8393.4 (1)

    b)__43.2%____ of the variation in ____blood pressure___ can be explained by therelationship with _______age_______ . Fill in the blanks.

    [2]

    c) Minitab has identified one unusual observation. This observation has: (circle one)[3]

    i)

    High leverage True or False

    ii) High influence True or Falseiii)Large residual True or False

    d)The value under Residual has been replaced with XXX. What is this value?[1]

    75.65

  • 7/21/2019 Utsg Sta220 Midterm 2012s

    12/12

    12

    e) What is the meaning of the slope estimate 0.971? Again, be specific.[2]

    For every one year increase in age (1), we expect blood pressure to increase by0.971 units (1).

    f) Does the intercept have any meaning in this analysis? If so, what is theinterpretation of the intercept? If not, state why it is meaningless.

    [2]

    Yes (1), this is the expected blood pressure for a newborn child (1).

    g) A researcher wants to use this model to predict the blood pressure of all Torontoresidents between the ages of 18-70. Assuming we deal with the outlier (byremoving it, for example), is this an appropriate use of regression? Explain whyor why not using terminology from class.

    [3]

    No (1), the population from which the sample was drawn is only the patients of

    that particular clinic, not all of Toronto (1).These patients (being members of amedical clinic) are likely less healthy than Torontonians in general.

    Applying the results to the entire city would result in biased results (1).

    h) How many ways can I arrange these 30 patients for a photo, assuming theystand in a line?

    [1]30! or 2.6525 e32

    THE END

    ItemID6711

    ItemID:6711