biostats 640 intermediate biostatistics spring 2018

15
BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________ Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 1 of 15 BIOSTATS 640 Intermediate Biostatistics Spring 2018 Examination 1 Units 1 and 2 – Review of Introductory Biostatistics & Regression and Correlation Due: Monday February 26, 2018 Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone. Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam (sorry – Faxed exams are NOT permitted): (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (2) Worcester Section. Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (3) Amherst Section Please put your exam (stapled please) in my mailbox, located in the mail room on the 4 th floor of Arnold House. If you are unable to come to Arnold House on Monday February 26, 2018, I will accept a pdf. (4) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than February 26, 2018 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

Upload: others

Post on 19-Oct-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 1 of 15

BIOSTATS 640 Intermediate Biostatistics

Spring 2018 Examination 1

Units 1 and 2 – Review of Introductory Biostatistics & Regression and Correlation Due: Monday February 26, 2018

Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone. Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam (sorry – Faxed exams are NOT permitted): (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (2) Worcester Section. Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (3) Amherst Section Please put your exam (stapled please) in my mailbox, located in the mail room on the 4th floor of Arnold House. If you are unable to come to Arnold House on Monday February 26, 2018, I will accept a pdf. (4) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than February 26, 2018 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

Page 2: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 2 of 15

Signature This is to confirm that in completing this exam, I worked independently and did not consult with anyone. Name: ___________________________________________________________ Date: ___________________________

Thank you!

Page 3: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 3 of 15

1. (10 points total)

The following is a table showing the frequency and cumulative frequency distribution of the survival times of 347 patients diagnosed with cancer.

Survival Time (years) Frequency Cumulative Frequency <1 62 62 1-2 45 107 2-3 38 145 3-4 28 173 4-5 25 198 5-6 10 208 6-7 14 222 7-8 11 233 8-9 9 242 9-10 8 250 10-11 8 258 11-12 8 266 12-13 9 275 13-14 5 280 14-15 2 282 15-16 3 285 16-17 4 289 17-18 7 296 18-19 1 297 19-20 3 300

At least 20 47 347 Total 347

1a. (3 points) What is the modal survival time?

1b. (3 points) Estimate the median survival time. 1c. (2 points) Estimate the smallest value of the mean survival time. 1d. (2 points) In your opinion, are these data symmetric, positively skewed, or negatively skewed? Explain your answer.

Page 4: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 4 of 15

2. (5 points, total) Consider the relationship between the standard deviation and the standard error. Suppose it is known that the standard deviation is 3. How large a sample n should be taken for the standard error of the mean to have a value of 0.5?

Page 5: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 5 of 15

3. (5 points total) Consider the following cross-tabulation of 100 individuals by their smoking status and systolic blood pressure. Shown are counts of individuals. For example, the entry “10” in the first row says that there are 10 non-smokers in this sample with systolic blood pressure between 90 and 109 mm Hg.

Systolic blood pressure, mm Hg Non-Smokers Smokers 90-109 10 5

110-129 24 15 130-149 18 10 150-169 9 3 170-189 2 2 190-209 0 3

Consider this distribution as a population and, as such, a universe of possibilities from which simple probabilities can be computed. Define two events “A” and “B” as follows: A = smoker B = systolic blood pressure of 170 or greater.

3a. (1 point) Find Probability [ A ]

3b. (1 point) Find Probability [ B].

3c. (3 points) Are A and B independent? Explain.

Page 6: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 6 of 15

4. (10 points total)

4a. (2 points) ________________________ TRUE or FALSE. Consider the construction of a 95% confidence interval. Suppose one repeats the sampling process indefinitely. Suppose further that, for each sample drawn, a new 95% confidence interval estimate is obtained. True or False à If for each sample, the investigator states that the population parameter value is contained in the interval, about 95% of these statements will be correct. 4b. (2 points) ________________________ TRUE or FALSE. Consider testing a hypothesis about a population mean parameter for a normal probability distribution. The alternative hypothesis is two sided. Investigators A and B each draw a sample of size n, and compute a sample mean and sample variance. Suppose the sample sizes and means are identical. Suppose further that the sample variance obtained by investigator B is larger. True or False à The achieved level of significance (the p-value) calculated by investigator B will be smaller. 4c. (2 points) ________________________ TRUE or FALSE. Many journals recognize the tendency among researchers to publish disproportionately many trials suggesting the benefit of an experimental treatment in comparison to trials for which the benefit of an experimental treatment is not established. Consider hypothesis tests for which the type I and II errors have equal probability of 0.05. True or False à The phenomenon described above suggests an excess of type II errors in the literature relative to type I errors. 4d. (2 points) ________________________ TRUE or FALSE. Consider the estimation of a population parameter. A non-probability sample is drawn and used to obtain the required estimate. True or False à Short of defining the sample as the entire population, there is no way to assess the accuracy of the resulting estimate. 4e. (2 points) ________________________ TRUE or FALSE. A one sample t-test is performed to test the null hypothesis that the mean of a normal distribution with unknown variance parameter is equal to a specified value. True or False à The value of the t-statistic calculated from the data will achieve smaller level of significance if the alternative is two sided than if it is one sided.

Page 7: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 7 of 15

5. (10 points total)

The following table summarizes the survival in days of female and male cockroaches, Blatella vaga when kept without food or water.

Group Sample size, n Sample mean Sample standard deviation Females nFEMALES = 10 FEMALESX = 8.5 days FEMALESS = 3.6 days

Males nMALES = 10 MALESX = 4.8 days MALESS = 0.9 days Is the apparent greater variability in length of survival that is observed for females, relative to males, statistically significant? Carry out the appropriate statistical hypothesis test. In developing your answer, please provide the following.

5a. (2 points)

What is the null hypothesis?

5b. (2 points) What is the alternative hypothesis? 5c. (2 points) What is the appropriate test statistic and what is its value for these data? 5d. (2 points) What is the p-value? 5e. (2 points) Write a clear interpretation of the results of your test.

Page 8: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 8 of 15

6. (10 points total)

6a. (2 points) Consider the construction of a 98% confidence interval of the mean for the setting of a simple random sample from a Normal distribution where the variance parameter σ2 is known. What is the value of 1 / 2z α− that is used in this confidence interval construction? Questions #6b, #6c, #6d, and #6e all pertain to the following: Suppose next that you are told that (262.09, 374.11) is the result of calculating a 95% confidence interval for the mean of a Normal distribution for the setting of a simple random sample of size n=80 from a normal distribution where the variance parameter σ2 is known. 6b. (2 points) What is the point estimate of the population mean µ? 6c. (2 points) What is the value of the standard error of the mean? 6d. (2 points) What is the value of the population variance? 6e. (2 points) Write a clear interpretation of the confidence interval.

Page 9: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 9 of 15

7. (10 points total) The distribution of serum levels of alpha tocopherol (serum vitamin E) is approximately normal with mean µ = 860 µg/dL and standard deviation σ = 340 µg/dL .

7a. (3 points) What percent of people have serum alpha tocopherol levels between 400 and 1000 µg/dL?

7b. (3 points) Suppose a person is identified has having toxic levels of alpha tocopherol if his or her serum level is > 2000 µg/dL. What percentage of people will be so identified? 7c. (4 points) A study is undertaken for evidence of toxicity among 2000 people who regularly take vitamin-E supplements. The investigators found that 4 people have serum alpha tocopherol levels > 2000 µg/dL. Is this an unusual number of people with toxic levels of serum alpha tocopherol?

Page 10: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 10 of 15

8. (10 points total) In a study of crop losses due to air pollution, plots of Blue Lake snap beans were grown in n = 12 open-top field chambers, which were fumigated with various concentrations of sulfur dioxide (X), in ppm. After a month of fumigation, the plants were harvested and the total yield (Y) of bean pods, in kg, was recorded for each chamber. Some preliminary calculations have been performed for you.

n = 12 sx = 0.11724

SSQ (residual) = 0.2955

x = 0.12 sy = 0.31175

y = 1.117

rxy = −0.8506

8a. (2 points) By hand (or in Excel!) calculate the linear regression of Y on X by obtaining the values of the estimated slope ( β1 ) and intercept ( β0 ). 8b. (2 points) Produce the analysis of variance table by completing the 10 blank entries in the table below.

Source Sum of Squares DF Mean Square F-Ratio P

Regression

______________

____

______________

________

______

Residual

______________ ____ ______________

Total

______________

____

Page 11: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 11 of 15

8c. (2 points) Under the assumption that the linear model is applicable, calculate a 95% confidence interval estimate of an individual (single chamber) yield of beans exposed to x=0.24 ppm of sulfur dioxide. 8d. (2 points) Under the assumption that the linear model is applicable, calculate a 95% confidence interval estimate of the mean yield of beans grown under conditions of exposure to x=0.24 ppm of sulfur dioxide. 8e. (2 points) What percent of the observed variability in yield is explained by the fitted model?

Page 12: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 12 of 15

9. (10 points total) To assess physical conditioning in normal individuals, it is useful to know how much energy they are capable of expending. Since the process of expending energy requires oxygen, one way to evaluate this is to look at the rate at which they use oxygen at peak physical activity. To examine the peak physical activity, tests have been designed where the individual runs on a treadmill. At specified time intervals, the speed at which the treadmill moves and the grade of the treadmill both increase. The individual is then systematically run to maximum physical capacity. The maximum capacity is determined by the individual; the person stops when unable to go further. Because physical conditioning is relative to the size of the individual, such measures take into account body size. One of these is VO2 MAX (ml/kg/min); this is computed by looking at the volume of oxygen used per minute per kilogram of body weight. Consider the following multiple predictor regression analysis of n=94 sedentary males with treadmill tests. The dependent (outcome) variable is Y = VO2 MAX . There are four predictors:

X1 = treadmill duration (seconds) X2 = maximum heart rate (beats/minute) X3 = height (centimeters) X4 = weight (kilograms)

A partial display of the regression results is provided. Coefficients Table Constant or Predictor β SE(β )

X1 = treadmill duration 0.0510 0.00416 X2 = max heart rate 0.0191 0.0258

X3 = height -0.0320 0.0444 X4 = weight 0.0089 0.0520

Constant (intercept) 2.89 11.17 Analysis of Variance Table

Source Sum of Squares DF Mean Square F-Ratio P

Regression

4,314.69

____

____________

_________

_____

Residual

___________ ____ ____________

Total

5,245.31

____

Page 13: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 13 of 15

9a. (2 points) Compute the t-statistic value for testing the adjusted statistical significance of X1 = treadmill duration. What is its achieved significance (the p-value)? Do we reject β1 = 0 at the 10% significance level? 9b. (3 points) Fill in the missing values in the analysis of variance table by completing the 8 blanks below.

Source Sum of Squares DF Mean Square F-Ratio P

Regression

4,314.69

_____

______________

_________

_______

Residual

______________ _____ ______________

Total

5,245.31

_____

9c. (3 points) Next, test the overall significance of the fitted model. In developing your answer, be sure to state the null and alternative hypotheses. In 1-2 sentences, interpret your findings. 9d. (2 points) What is R2? In reporting your answer, give its numerical value and 1 sentence, explain its meaning.

Page 14: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 14 of 15

10. (10 points total) Consider a multiple linear regression to evaluate some hypothesized associations with plasma lipid levels of total cholesterol (Y), mg/dL, in a sample of 25 patients suffering from hyperlipoproteinemia. Two predictors were considered:

X1 = weight (kg) X2 = age (years)

Three models were fit. The table below shows the estimated regression model and the residual sum of squares (SSE) for each model. The total sum of squares, corrected is SSY = 145,377.04 Model Fitted line Sum of Squares Residual, SSE 1 Y = 199.2975 +1.622X1

135,145.3138

2 Y = 102.5751+ 5.321X2

43,444.3743

3 Y = 77.983+ 0.417X1 + 5.217X2

42,806.2254

10a. (3 points) For each model, what is the predicted cholesterol level Y for a 30-year old patient who weights 70 kg? Next, suppose the observed cholesterol for this patient is Y = 263 mg/dL. In 1-2 sentences, compare each of the 3 predicted values Y with the observed Y =263 mg/dL. 10b. (3 points) For each model, what is the R2 value? 10c. (4 points) If you use R2 and model simplicity as your selection criteria, what model appears to be the best predictive model?

Page 15: BIOSTATS 640 Intermediate Biostatistics Spring 2018

BIOSTATS 640 Exam 1 – Spring 2018 Name ________________________________________________

Z:\bigelow\...\2018\...\BE640 Exam 1 2018.doc Page 15 of 15

11. (10 points total) A psychologist performed a multiple predictor linear regression analysis of anxiety level (Y), measured on a scale ranging from 1 to 50, as the average of an index determined at three points in a 2-week period. Three predictors were considered:

X1 = systolic blood pressure (mm Hg) X2 = IQ X3 = Job satisfaction, measured on a scale ranging 1 to 25.

The following table summarizes the results obtained from a “variables-added-in-order” regression on data from a sample of size n=22.

Source DF Sum of Squares Regression

X1 X2 | X1

X3 | (X1, X2)

1 1 1

981.326 190.232 129.431

Residual, SSE 18 442.292

11a. (5 points) Test for the significance of each independent variable as it enters the model. For each test, state the null and alternative hypotheses in terms of regression coefficient parameters. 11b. (5 points) Test for the significance of adding both X2 and X3 to a model already containing X1. State the null and alternative hypotheses in terms of regression coefficient parameters.