experimental design - mathshepherd.com design ... cluster are not stratified srs ! systematic...

25
Experimental Design There is no recovery from poorly collected data!

Upload: ngoquynh

Post on 09-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Experimental Design There is no recovery from poorly

collected data!

Vocabulary List

n  Look over the list of words. n  Count how many you feel you know. n  Place a “dot” on the number line above

that number.

Observational Study vs Experiment

Observational Study – no assignment, no treatments imposed, only observe what is or has already happened, best to have a random sample. Can determine a correlation, but not a causation.

Experiment – manipulate or control one or more variables, random selection and/or assignment, one or more treatments imposed. If properly done, can determine causation.

Population vs Sample

n  Population n  Census n  Parameters

n  Sample n  Survey n  Statistics

Explanatory and Response Variables

We are interested in how each experimental unit (or subject) responds to the explanatory variable (or factor)

n Factors n  Levels – the different quantities or categories of the

factor

n Treatments – combinations of the different factors and levels.

n Response Variable – what we measure to analyze.

Characteristics of a Well-Designed Experiment

n  Control n  Keeping conditions as constant as possible to reduce variation

from extraneous variables n  Doesn’t mean you have to have a “control group” or a “placebo” n  Blinding is another form of control

n  Replication n  Having at least two experimental units in each treatment group. n  Reduces chance variation

n  Randomization n  Experimental units are randomly assigned to treatments to “average out” variation of variable that cannot be controlled and reduce bias.

n  Blocking (sometimes) – used when you believe that subgroups of the population will respond differently, helps reduce variability.

Bias – systematic deviation from the truth, when our sample does not represent the population.

Sources/Types of Bias n  Undercoverage n  Voluntary Response Bias n  Response/Measurement Bias n  Nonresponse Bias

Sampling Methods

Helps avoid bias and ensure that the samples are representative of the population so results can be generalized to the population. n  Simple Random Sample (SRS) n  Stratified Sample n  Cluster Sample n  Systematic Sample n  Multistage Sample

Tips:

n  The real advantage of stratified sampling is that it often allows us to make more accurate inferences about a population than does SRS. Strata are homeogenous.

n  Cluster samples are ideal when clusters mirror the characteristics of the population (clusters are heterogeneous..if they are not you must select a large number of clusters)

More tips….

n  Voluntary Response and Convenience Sampling – Don’t Go There (not reliable due to bias)

n  Cluster are not stratified SRS n  Systematic sampling is a type of

random sample but not SRS.

Why SRS?

n  Utilizes the entire sampling frame so is free of classification error

n  Requires minimum advance knowledge of population

n  Relatively easy to interpret data n  Best used when efficiency is less important

than simplicity, cost of sampling is small enough, and there is not knowledge about the population

Videos

n  Bad Science n  Read “FDA Approves Placebo” n  Placebo Effect

Making Inferences

n  Inferences about population requires that individuals taking part in a study be randomly selected from the larger population.

n  Inferences about cause and effect require a well-designed experiment that randomly assigns treatments to experimental units.

Questions from the 2002 Multiple Choice Exam

1. Which of the following is a key distinction between well designed experiments and observational studies?

(A) More subjects are available for experiments than for observational studies.

(B) Ethical constraints prevent large-scale observational studies. (C) Experiments are less costly to conduct than observational

studies. (D) An experiment can show a direct cause-and-effect

relationship, whereas an observational study cannot.

(E) Tests of significance cannot be used on data collected from an observational study.

#9 A volunteer for a mayoral candidate's campaign periodically conducts polls to estimate the proportion of people in the city who are planning to vote for this candidate in the upcoming election. Two weeks before the election, the volunteer plans to double the sample size in the polls. The main purpose of this is to (A) reduce nonresponse bias (B) reduce the effects of confounding variables (C) reduce bias due to the interviewer effect (D) decrease the variability in the population (E) decrease the standard deviation of the sampling distribution of the sample proportion

#15

A high school statistics class wants to conduct a survey to determine what percentage of students in the school would be willing to pay a fee for participating in after-school activities. Twenty students are randomly selected from each of the freshman, sophomore, junior, and senior classes to complete the survey. This plan is an example of which type of sampling? (A) Cluster (B) Convenience (C) Simple random (D) Stratified random (E) Systematic

#16

Jason wants to determine how age and gender are related to political party preference in his town. Voter registration lists are stratified by gender and age-group. Jason selects a simple random sample of 50 men from the 20 to 29 age-group and records their age, gender, and party registration (Democratic, Republican, neither). He also selects an independent simple random sample of 60 women from the 40 to 49 age-group and records the same information. Of the following, which is the most important observation about Jason's plan? (A) The plan is well conceived and should serve the intended purpose. (B) His samples are too small. (C) He should have used equal sample sizes. (D) He should have randomly selected the two age groups instead of choosing them nonrandomly. (E) He will be unable to tell whether a difference in party affiliation is related to differences in age or to the difference in gender.

#22 A study of existing records of 27,000 automobile accidents involving children in Michigan found that about 10 percent of children who were wearing a seatbelt (group SB) were injured and that about 15 percent of children who were not wearing a seatbelt (group NSB) were injured. Which of the following statements should NOT be included in a summary report about this study? (A) Driver behavior may be a potential confounding factor. (B) The child's location in the car may be a potential confounding factor. (C) This study was not an experiment, and cause-and-effect inferences are not warranted. (D) This study demonstrates clearly that seat belts save children from injury. (E) Concluding that seatbelts save children from injury is risky, at least until the study is independently replicated.

#25 A new medication has been developed to treat sleep-onset insomnia (difficulty in falling asleep). Researchers want to compare this drug to a drug that has been used in the past by comparing the length of time it takes subjects to fall asleep. Of the following, which is the best method for obtaining this information?

(A) Have subjects choose which drug they are willing to use, then compare the results. (B) Assign the two drugs to the subjects on the basis of their past sleep history without randomization, then compare the results. (C) Give the new drug to all subjects on the first night. Give the old drug to all subjects on the second night. Compare the results. (D) Randomly assign the subjects to two groups, giving the new drug to one group and no drug to the other group, then compare the results. (E) Randomly assign the subjects to two groups, giving the new drug to one group and the old drug to the other group, then compare the results.

2006A #5 - Shrimp 5. A biologist is interested in studying the effect of growth-

enhancing nutrients and different salinity (salt) levels in water on the growth of shrimps. The biologist has ordered a large shipment of young tiger shrimps from a supply house for use in the study. The experiment is to be conducted in a laboratory where 10 tiger shrimps are placed randomly into each of 12 similar tanks in a controlled environment. The biologist is planning to use 3 different growth-enhancing nutrients (A, B, and C) and two different salinity levels (low and high). (a) List the treatments that the biologist plans to use in this experiment.

2006A #5 (b)  Using the treatments list in part (a), describe a completely

randomized design that will allow the biologist to compare the shrimp's growth after 3 weeks.

(c)  Give one statistical advantage to having only tiger shrimps in the experiment. Explain why this is a advantage.

(d)  Give one statistical disadvantage to having only tiger shrimps in the experiment. Explain why this is a disadvantage.

2006B #5 5. When a tractor pulls a plow through an agricultural field, the energy

needed to pull that plow is called the draft. The draft is affected by environmental conditions such as soil type, terrain, and moisture. A study was conducted to determine whether a newly developed hitch would be able to reduce draft compared to the standard hitch. (A hitch is used to connect the plow to the tractor.) Two large plots of land were used in this study. It was randomly determined which plot was to be plowed using the standard hitch. As the tractor plowed that plot, a measurement device on the tractor automatically recorded the draft at 25 randomly selected points in the plot. After the plot was plowed, the hitch was changed from the standard one to the new one, a process that takes a substantial amount of time. Then the second plot was plowed using the new hitch. Twenty-five measurements of draft were also recorded at randomly selected points in this plot.

2006B #5 (a) What was the response variable in this study?

Identify the treatments. What were the experimental units?

(b) Given that the goal of the study is to determine whether a newly developed hitch reduces draft compared to the standard hitch, was randomization used properly in this study? Justify your answer.

(c) Given that the goal of the study is to determine whether a newly developed hitch reduces draft compared to the standard hitch, was replication used properly in this study? Justify your answer.

(d) Plot of land is a confounding variable in this experiment. Explain why.

2007A #2 (Dog Health) 2. As dogs age, diminished joint and hip health may lead to

joint pain and thus reduce a dog's activity level. Such a reduction in activity can lead to other health concerns such as weight gain and lethargy due to lack of exercise. A study is to be conducted to see which of two dietary supplements, glucosamine or chondroitin, is more effective in promoting joint and hip health and reducing the onset of canine osteoarthritis. Researchers will randomly select a total of 300 dogs from ten different large veterinary practices around the country. All of the dogs are more than 6 years old, and their owners have given consent to participate in the study. Changes in joint and hip health will be evaluated after 6 months of treatment.

2007A #2 (Dog Health) (a) What would be an advantage to adding a control

group in the design of this study? (b) Assuming a control group is added to the other

two groups in the study, explain how you would assign the 300 dogs to these three groups for a completely randomized design.

(c) Rather than using a completely randomized design, one group of researchers proposes blocking on clinics, and another group of researchers proposes blocking on breed of dog. How would you decide which one of these two variables to use as a blocking variable?