161.120 introductory statistics week 5 lecture slides about relationships –cast chapter 6 –text...

46
161.120 Introductory Statistics Week 5 Lecture slides About Relationships CAST chapter 6 Text sections 5.5 and 6.4 Surveys and Experiments Text chapters 3 and 4 CAST chapter 7

Upload: marybeth-simmons

Post on 26-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

161.120 Introductory Statistics Week 5 Lecture slides

• About Relationships– CAST chapter 6

– Text sections 5.5 and 6.4

• Surveys and Experiments– Text chapters 3 and 4

– CAST chapter 7

Page 2: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Relationships

• So far we have examined relationships

– between two numerical variables (scatterplots, correlation and least squares)

– between two categorical variables (contingency tables, conditional proportions and stacked bar charts)

• Interpreting such relationships can be harder than you might imagine and care must be taken

• In some situations, the relationship between two variables, such as the relationship evident in a scatterplot, may not describe a meaningful 'real' relationship.

Page 3: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Groups and relationshipsExamples of questions about differences between two or more groups, where group membership is represented by a categorical variable.

• Does application of a surface coating affect the hardness of a plastic?

– Is there a relationship between the coating and hardness?

• Are children from large families more or less likely to go to university?

– Is there a relationship between number of siblings and attendance at university?

• Which of three different varieties of corn has greatest yield? – Is there a relationship between corn variety and yield?

• Do boys aged 14 perform better than girls at maths? – Is there a relationship between gender and mark in a maths test?

Page 4: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Means of sub-groups and overall

• Moving values from one group to another can increase the means of both groups.– Example: Emigration joke, that the large number of New

Zealanders in the 80’s emigrating to Australia increased the average IQ of both countries.

• If membership of sub-groups is defined differently in different data sets, comparisons of the sub-group means can give a very different impression from a comparison of the overall means.

• It is possible for group A to have a higher overall mean than group B, but lower means within all sub-groups.

Page 5: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Lurking Variables• The marginal relationship between X and Y can be very

different from the conditional relationship for specific values of a third variable, Z.

• Z, the third variable is called a lurking variable. The marginal relationship between X and Y can be...  *   stronger,  *   weaker, or even  *   a different directionthan their conditional relationship, given Z.

Example: Reading ability and height

Page 6: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Simpson's paradox

• The relationship between two categorical variables, X and Y, can also be strongly influenced by a third lurking variable, Z.

• When the direction of the relationship reverses, the effect is

called Simpson's paradox.

• There is no real contradiction; it just takes a bit more thought to understand why your initial intuition is wrong.

• Example – The relationship between smoking and dying.

Page 7: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Extreme effects of lurking variables

• In some situations, the lurking variable Z is perfectly related to X, so it is impossible to distinguish between their effects on Y. The variables X and Z are then said to be confounded.

• If a lurking variable Z is confounded with X, the data contain no information about the nature of the relationship between X and Y.

• It is critically important that confounding is avoided when data are collected.

• Example – Experiment to compare the yield of a new variety of wheat with the standard variety.

Page 8: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Example 6.8 Blood Pressure and Oral Contraceptive Use

Hypothetical data on 2400 women. Recorded oral contraceptive use and if had high blood pressure.

Percent with high blood pressure is about the same among oral contraceptive users and nonusers.

Page 9: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Example 6.8 Blood Pressure and Oral Contraceptive Use (cont)

Many factors affect blood pressure. If users and nonusers differ with respect to such a factor, the factor confounds the results. Blood pressure increases with age and users tend to be younger.

In each age group, the percentage with high blood pressure is higher for users than for nonusers => Simpson’s Paradox.

Page 10: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Association

• When two variables are related, we say that there is association between them.

• For example, consider the height, X, and weight, Y, of a sample of school children.

Tall children tend to be heavier, so high values of X are associated with high values of Y.

The correlation coefficient describes the amount of linear association between two such numerical variables.

Page 11: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Causal relationships • In some data sets, it is possible to conclude that one variable has a

direct influence on the other. This is called a causal relationship.

• Example - A farmer tries two different worming treatments on herds of sheep that are otherwise identical. Any difference between the proportions of infected sheep after the two treatments would suggest that the treatment causally affects infection.

• If two variables are causally related, it is possible to conclude that changes to the explanatory variable, X, will have a direct impact on Y.

Page 12: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Non-causal relationships

• Not all relationships are causal.

• If two variables are not causally related, it is impossible to tell whether changes to one variable, X, will result in changes to the other variable, Y.

Page 13: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Is a relationship causal?

• Researchers usually hope to find causal relationships between the variables that are recorded.

• Causality can only be determined by reasoning about how the data were collected.

• The data values themselves contain no information that can help you to decide.

• Non-causal relationships usually result from lurking variables that are related to the variables under investigation. Causal relationships can only be deduced if it can be reasoned that lurking variables are not present.

Page 14: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Observational Study:Researchers observes units and records data from each unit

Experiment: The researcher actively changes some characteristics of the units

before the data are collected.

Page 15: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Observational studies and experiments

• The method of data collection has a major influence on whether a relationship can be interpreted as causal.

• Observational studies – potential for a lurking variable, so difficult to interpret relationships.

• Experiments - often do allow relationships to be interpreted as causal ones

– In a well designed experiment, there is little chance of lurking variables driving the observed relationships, so any relationship will be causal.

– In a badly designed experiment however, lurking variables (and confounding) can still cause difficulties in interpreting relationships.

Page 16: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Census• When measurements are made from every item in the target

population

• A census is often not feasible: – The cost and time required to record information from every unit in

a large population can be immense.

– Recording some variables destroys the units. For example, testing the resistance of apples to bruising leaves them damaged.

• Fortunately, we can often obtain sufficiently accurate information by only measuring a selection of units from the population.

Page 17: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Sample• Data from a subset of the population is called a sample.

• The simplest way to select a representative sample from a population is called a simple random sample.

– Each unit has the same chance of being selected

– Some random mechanism is used to determine whether any particular unit is included in the sample.

• Although there is some inaccuracy when a sample is used instead of the whole population, the savings in cost and time often outweigh this.

Page 18: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Sampling Error

• The aim of a random sample is often to estimate some population characteristic. The population characteristic of interest might be...– The mean of some variable (e.g. the mean lifetime of a breed of

dog or the mean weight of cabbages grown in a farm). – The proportion in some category (e.g. the proportion of students

aged over 30 or the proportion of cows that fail to get pregnant in a year).

• The population characteristic is unknown, but the corresponding

value from a sample can be used to estimate it. • The difference between an estimate and the value being

estimated is called the sampling error.

Page 19: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Effect of sample size on sampling error• The larger the sample size, the smaller the sampling error.

However when the population is large, sampling a small proportion of the population may still give accurate estimates.

• Sampling error depends much more strongly on the sample size than on the proportion of the population that is sampled.– For example, a sample of 10 from a population of 10,000

people will estimate the proportion of males almost as accurately as a sample of size 10 from a population of 100.

• The cost savings from using a sample instead of a full census can be huge.

Page 20: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Different sampling schemes• Sampling with replacement (SWR)

– Selected unit / individual is returned to the population– A sample with replacement can contain the same unit /

individual from the population more than once.

• Sampling without replacement (SWOR) – Selected unit / individual is removed from the population  – No unit / individual can appear more than once in the sample– SWOR covers more of the population and gives more

accurate estimates than SWR

Page 21: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Practical differences

• If the sample size, n, is much smaller than the population size, N, there is little practical difference between SWR and SWOR – there would be little chance of the same individual being

picked twice in SWR.

• When the population is large (and considerably larger than the sample size), SWR and SWOR are almost identical.

• If the population size is infinite, SWR and SWOR are identical.

Page 22: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Choosing a Simple Random SampleYou Need:• List of the units in the population.• Source of random numbers.

ROW 0 00157 37071 79553 31062 42411 79371 25506 69135 1 38354 03533 95514 03091 75324 40182 17302 64224 2 59785 46030 63753 53067 79710 52555 72307 10223 3 27475 10484 24616 13466 41618 08551 18314 57700 4 28966 35427 09495 11567 56534 60365 02736 32700 5 98879 34072 04189 31672 33357 53191 09807 85796 6 50735 87442 16057 02883 22656 44133 90599 91793 7 16332 40139 64701 46355 62340 22011 47257 74877 8 83845 41159 67120 56273 67519 93389 83590 12944 9 12522 20743 28607 63013 60346 71005 90348 86615

Portion of a Table of Random Digits:

Page 23: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Simple Random Sample of StudentsClass of 270 students. Want a simple random sample of 10 students.

ROW 0 00157 37071 79553 31062 42411 79371 25506 69135 1 38354 03533 95514 03091 75324 40182 17302 64224 2 59785 46030 63753 53067 79710 52555 72307 10223 3 27475 10484 24616 13466 41618 08551 18314 57700 4 28966 35427 09495 11567 56534 60365 02736 32700 5 98879 34072 04189 31672 33357 53191 09807 85796 6 50735 87442 16057 02883 22656 44133 90599 91793 7 16332 40139 64701 46355 62340 22011 47257 74877 8 83845 41159 67120 56273 67519 93389 83590 12944 9 12522 20743 28607 63013 60346 71005 90348 86615

1. Number the units: Students numbered 001 to 270.2. Choose a starting point: Row 3, 2nd column (10484…)3. Read off consecutive numbers: (3-digit labels here)

104, 842, 461, 613, 466, 416, 180, 855, 118, 314, 577, 002, 896, …4. If number corresponds to a label, select that unit.

If not, skip it. Continue until desired sample size obtained.

Page 24: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

4.3 Other Sampling MethodsNot always practical to take a simple random sample, can be difficult to get a numbered list of all units.

Example: College administration would like to survey a sample of students living in dormitories.

Shaded squares show a simple random sample of 30 rooms.

Page 25: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Stratified Random SamplingDivide population of units into groups (called strata) and take a simple random sample from each of the strata.

College survey: Two strata = undergrad and graduate dorms.

Take a simple random sample of 15 rooms from each of the strata for a total of 30 rooms.Ideal: stratify so little variability in responses within each of the strata.

Page 26: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Cluster SamplingDivide population of units into groups (called clusters), take a random sample of clusters and measure only those items in these clusters.

College survey: Each floor of each dorm is a cluster.

Take a random sample of 5 floors and all rooms on those floors are surveyed.

Advantage: need only a list of the clusters instead of a list of all individuals.

Page 27: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Systematic SamplingOrder the population of units in some way, select one of the first k units at random and then every kth unit thereafter.

College survey: Order list of rooms starting at top floor of 1st undergrad dorm. Pick one of the first 11 rooms at random => room 3, then pick every 11th room after that.

Note: often a good alternative to random sampling but can lead to a biased sample.

Page 28: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Multistage SamplingUsing a combination of the sampling methods, at various stages.

Example:• Stratify the population by region of the country.

• For each region, stratify by urban, suburban, and rural and take a random sample of communities within those strata.

• Divide the selected communities into city blocks as clusters, and sample some blocks.

• Everyone on the block or within the fixed area may then be sampled.

Page 29: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Example 4.7 The Nationwide Personal Transportation Survey

Nationwide Personal Transportation Survey: taken every 5 years by the U.S. Department of Transportation.

1995 Survey = 21,000 households. Interviews conducted by telephone using a computer-assisted telephone interviewing (CATI) system.

Multistage Sample:• U.S. households were stratified by region of country, size

of metropolitan area, and whether there is a subway system.

• Households were then selected by random-digit dialing.

• Everyone in a selected household was included => each household was a cluster.

Page 30: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Non-sampling error• Problems often arise when sampling

– for example, some sampled people are likely to refuse to participate in your study.

• Such difficulties also result in errors and these are called non-sampling errors.

– These errors can be much higher than sampling errors

– Are usually much more serious

• Unlike sampling errors, the likely size of non-sampling errors cannot be estimated from a single sample -- it is extremely difficult to assess their likely size.

• Non-sampling errors often distort estimates by pulling them in one direction. The estimates are then called biased. (Sampling errors do not cause estimates

to be consistently low or high.)

• It is therefore important to design a survey to minimise the risk of non-sampling errors.

Page 31: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

'Missing' Responses

• Failure to obtain information from some members of the target population

• Coverage error occurs when the sample is not selected from the target population, but from only part of the target population. As a result, the estimates that are obtained do not describe the whole target population -- only a subgroup of it.

• Non-response error occur when some selected individuals do not respond. This may be caused by ...

– Failure to contact the individuals

– Refusal to participate in the study

– Refusal to answer particular questions

Page 32: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

'Inaccurate' responses

• Non-sampling error can be caused by inaccurate information being obtained from the sampled individuals

• Instrument error usually results from poorly designed questions. Different wording of questions can lead to different answers being given by a respondent.

• Interviewer error occurs when some characteristic of the interviewer, such as age or sex, affects the way in which respondents answer questions. For example, questions about racial discrimination might be differently answered depending on the racial group of the interviewer.

Page 33: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Methods of obtaining sample information

• Whatever sampling scheme is used, information must be obtained from each individual selected in the sample.

• When sampling items produced by a factory or trees in a forest, the process of obtaining measurements from each item is usually fairly straightforward.

• However there are various options for collecting information from human populations.

– Telephone

– Mailed questionnaire

– Interviewer

– Street corner

– Self-selected

Page 34: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Disasters in SamplingResponses from a self-selected group, convenience sample or haphazard sample rarely representative of any larger group.

Example 4.10 A Meaningless Poll“Do you support the President’s economic plan?” Results from TV quickie poll and proper study:

Those dissatisfied more likely to respond to TV poll and it did not give the “not sure” option.

Page 35: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Experiments

• An experiment looks for a causal relationship between a response and one or more explanatory variables.

• The researcher can control the values of the explanatory variable that are used.

• For example, a researcher may wish to determine...

How does ozone affect the yield of soybean plants?

Does asprin lower blood pressure after an operation?

Which of four varieties of sweet corn have highest yields?

How much will the weight of chicken eggs be reduced if their feed quality is reduced?

Page 36: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Experimental Units

• Experiments are generally conducted on a set of experimental units

– Examples: people, animals, trees, areas in a field, herds of cows

• Response measurement/s are made from each unit

• The definition of the experimental units is closely associated with the response measurement that will be taken.

– For example, if a farmer is interested in the milk yield of a herd of cows, it may be decided that monthly measurements will be made from each cow. Each combination of a cow and a month would be considered to be a separate experimental unit.

Page 37: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Factors and treatments

• The researcher has control over some aspect of each unit

– perhaps a numerical characteristic such as the temperature at which a plant is grown or a categorical characteristic such as the variety of plant that is planted.

• These controlled characteristics are the explanatory variables and are called factors in the context of an experiment.

• The different values of the controlled characteristics are called experimental treatments.

• Each experimental unit receives some treatment.

• The decision about which of the different treatments is applied to each experimental unit is called the experimental design.

Page 38: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Differences between experimental units• The experimental units are usually not identical -- they have

characteristics that affect the response.

• Experiment on sheep – The sheep in a herd will have a variety of ages, weights and other

characteristics. Even with no treatment applied, the weight gain in 6 months will vary from sheep to sheep.

• Experiment on flowers grown in a greenhouse – Some plants will be nearer to the windows, heat sources,

ventilation, etc, than others. Some plants will also be naturally more vigorous than others. The concept of 'experimental unit' includes everything about a plant other than the treatment that is applied (perhaps different amounts of fertiliser), including location, soil and genetics.

Page 39: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Lurking Variables & Misleading results

• The differing characteristics of the experimental units can be lurking variables if they are associated with the treatment.

• The recorded effect of a treatment may be very different from its true effect if characteristics of the experimental units are lurking variables.

• This problem can be avoided by good experimental design, which minimizes association between allocation of the

treatments and characteristics of the experimental units

Page 40: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Randomisation

• The best way to avoid association between differing characteristics of the experimental units and the treatments is to randomly allocate treatments to the experimental units.

• This is called randomisation of the treatments and the experimental design is called a completely randomised design.

• The simplest way to randomise the treatments involves numbering each experimental unit. Random numbers can be used to pick the desired number of units (without replacement) to receive the first treatment. Then the second treatment can be

allocated, etc.

Page 41: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Randomization: The Crucial Element

Randomizing the Type of Treatment:Randomly assigning the treatments to the experimental units keeps the researchers from making assignments favorable to their hypotheses and also helps protect against hidden or unknown biases.

Randomizing the Order of Treatments:If all treatments are applied to each unit, randomization should be used to determine the order in which they are applied.

Page 42: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Case Study 3.2 Kids and Weight Lifting

Randomized Experiment involving 43 young volunteers.

Three groups: 1 = heavy load2 = moderate load 3 = control group

Is weight training good for children? If so, is it better to lift heavy weights for few repetitions or moderate weights more times?

“Leg extension strength significantly increased in both exercise groups compared with that in the control subjects.” Faigenbaum et al., 1999, p. e5

Page 43: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Variation & Replication

• Causes of variation in a completely randomised experiment

– Treatments

• The experiment is usually conducted specifically to determine how the treatments affect the response.

– Random variation • Refers to all variation in the response that cannot be explained in terms

of the treatment. It can involve measurement errors and differences between the experimental units.

• There must be enough data to estimate random variation separately from variation caused by the treatments.

• The easiest way to do this is with repeat measurements for

each treatment -- replication.

Page 44: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Blocking• When more is known about the differences between the

experimental units, we can improve on randomisation.

• In a randomised block design, the experimental units are grouped into blocks, with all units in a block similar in some way.

– The block sizes should be a multiple of the number of treatments.

– The treatments are then allocated at random within each block.

– Blocking is done to reduce natural variation within blocks and

therefore give more accurate estimates of treatment effects. • This design improves on a completely randomised design

because the blocks are guaranteed to have no association with the treatments and therefore cannot correspond to a lurking variable.

Page 45: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Control Groups, Placebos, and BlindingControl Groups:

Treated identically in all respects except they don’t receive the active treatment. Sometimes they receive a dummy treatment or a standard/existing treatment.

Placebo:Looks like real drug but has no active ingredient. Placebo effect = people respond to placebos.

Blinding:Single-blind = participants do not know which treatment they have received. Double-blind = neither participant nor researcher making measurements knows who had which treatment.

Double Dummy:Each group given two “treatments”…

Group 1 = real treatment 1 and placebo treatment 2Group 2 = placebo treatment 1 and real treatment 2

Page 46: 161.120 Introductory Statistics Week 5 Lecture slides About Relationships –CAST chapter 6 –Text sections 5.5 and 6.4 Surveys and Experiments –Text chapters

Pairing and BlockingMatched-Pair Designs

Use either two matched individuals or same individual receives each of two treatments. Special case of a block design. Important to randomize order of two treatments and use blinding if possible.

Block DesignsExperimental units divided into homogeneous groups called blocks, each treatment randomly assigned to one or more units in each block.

If blocks = individuals and units = repeated time periods in which receive varying treatments; called repeated-measures designs.