experiment notes

Upload: susan-may

Post on 07-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Experiment Notes

    1/6

    Y520 Spring 2000 Page 1

    Michael Y520

    Experimental Method

    The best method indeed the only fully compelling method of establishing causation is to conduct a carefully

    designed experimentin which the effects of possible lurking variables are controlled. To experiment means to

    actively change x and to observe the response in y (p. 202).

    Moore, D., & McCabe, D. (1993).Introduction to the practice of statistics. New York: Freeman.

    The experimental method is the only method of research that can truly test hypotheses concerning cause-and-effectrelationships. It represents the most valid approach to the solution of educational problems, both practical and

    theoretical, and to the advancement of education as a science (p. 298).

    Gay, L. R. (1992). Educational research (4th Ed.). New York: Merrill.

    Importance of Good Design: (http://www.tufts.edu/*gdallal/study.htm)

    100% of all disasters are failures of design, not analysis. Ron Marks, Toronto, August 16, 1994

    To propose that poor design can be corrected by subtle [statistical] analysis techniques is contrary to good scientific

    thinking. Stuart Pocock (Controlled Clinical Trials, p 58) regarding the use of retrospective adjustment for trials

    with historical controls.

    Issues of design always trump issues of analysis. G.E. Dallal, 1999, explaining why it would be wasted effort to

    focus on the analysis of data from a study under challenge whose design was fatally flawed.

    Unique Features of Experiments:

    1. The investigator manipulates a variable directly (the independent variable).

    2. Empirical observations based on experiments provide the strongest argument for cause-effect relationships.

    Additional features:

    1. Problem statement theory constructs operational definitions variables hypotheses.

    2. The research question (hypothesis) is often stated as the alternative hypothesis to the null hypothesis, that is

    used to interpret differences in the empirical data.

    3. Random sampling of subjects from population (insures sample is representative of population).4. Random assignment of subjects to treatment and control (comparison) groups (insures equivalency of groups;

    ie., unknown variables that may influence outcome are equally distributed across groups).

    5. Extraneous variables are controlled by 3 & 4 and other procedures if needed.

    6. After treatment, performance of subjects (dependent variable) in both groups is compared.

    Ways to control extraneous variables:

    1. Random assignment of subjects to groups. This is the best way to control extraneous variables in

    experimental research. Provides control for subject characteristics, maturation, and statistical regression.

    2. Variables that may still exist:

    a. Subject mortality (i.e., dropouts due to treatment)

    b. Hawthorne effect

    c. Fidelity of treatment (manipulation check)

    d. Data collector bias (double blind studies)e. Location, history

    3. Additional procedures for controlling extraneous variables (use as needed)

    a. Exclude certain variables.

    b. Blocking.

    c. Matching subjects on certain characteristics.

    d. Use subject as own control.

    e. Analysis of covariance.

  • 8/4/2019 Experiment Notes

    2/6

    Y520 Spring 2000 Page 2

    Michael Y520

    True Experimental DesignsA. Randomized Post-test only Control Group Design

    Treatment R X1 O R = random assignment

    Comparison R X2 O X = Treatment occurs for X1 only

    O = Observation (dependent variable)

    This is the best of all designs for experimental research.Random assignment controls for subject characteristics, maturation, statistical regression.

    Potential threats not controlled: subject mortality, Hawthorne effect, fidelity of treatment, data collection bias, unique

    features of location, history of subjects.

    B. Randomized Pretest Post-test Control Group Design

    Treatment R O1 X1 O2 R = random assignment

    Comparison R O1 X2 O2 X = Treatment occurs for X1 only

    O1 = Observation (Pre-test)

    O2 = Observation (Post-test, dependent

    Potential threat: Effect of pre-testing. variable)

    C. Randomized Solomon Four Group Design

    Treatment R O1 X1 O2 R = random assignment

    Comparison R O1 X2 O2 X = Treatment occurs for X1 only

    O1 = Observation (Pre-test)

    Treatment R X1 O2 O2 = Observation (Post-test, dependent

    Comparison R X2 O2 variable)

    Random sampling, random assignment.

    Best control of threats to internal validity, particularly the threat introduced by pretesting.

    Requires a relatively large number of subjects.

    D. Randomized Assignment with Matching

    1. Randomized (Sampling & Assignment), Matched Ss, Post-test only, Control Group

    Treatment M,R X1 O M = Matched Subjects

    R = Random assignment of matched pairs

    Comparison M,R X2 O X =Treatment (for X1 only)

    O = Observation (dependent variable)

    Example: An experimenter wants to test the impact of a novel instructional program in formal logic. The investigator

    infers from reports in the literature that high ability students and those with programming, mathematical, or

    music backgrounds are likely to excel in formal logic regardless of type of instruction. The experimenterrandomly samples subjects, looks at subjects SAT scores, matches subjects on basis ofSAT scores and randomly

    assigns matched pairs (one of each pair to each group). The other concominant variables (previous

    programming, mathematical, and music experience) could also be matched.

  • 8/4/2019 Experiment Notes

    3/6

    Y520 Spring 2000 Page 3

    Michael Y520

    2. Randomized Pretest-Post-test Control Group, Matched Ss

    Treatment O1 M,R X1 O2 O1 = Pretest

    M = Matched Subjects

    Comparison O1 M,R X2 O2 R = Random assignment of matched pairs

    X =Treatment (for X1 only)

    O2 = Observation (dependent variable)

    Subjects are matched on the basis of their pretest score and pairs of subjects are randomly assigned to groups.

    3. Matching Methods

    a. Mechanical matching

    1). Rank order subjects on variable, take top two, randomly assign members of pairs to groups. Repeat for

    all pairs.

    2). Problems:

    Impossible to match on more than one or two variables simultaneously.

    May need to eliminate some Ss due to no appropriate match for one of the groups. a. Statistical

    matching

    b. Statistical Matching

    1). The purpose is to control for factors that cannot be randomized but nonetheless can be measured on (at

    least) an interval scale (but in practice we often treat ordinal scales as if they were interval). Statisticalcontrol is achieved by measuring one or more concomitant variables (referred to as the covariate) in

    addition to the variable (variate) of primary interest (i.e., the dependent or response variable).

    Statistical control can be used in experimental designs and because no direct manipulation of subjects

    or conditions is required, it can also be used in quasi-expermential and non-experimental designs.

    2). Analysis of covariance is used to test the main and interaction effects of categorical variables on a

    continuous dependent variable, controlling for the effects of selected other continuous variables which

    covary with the dependent.The control variable is called the covariate.

    (http:http://www2.chass.ncsu.edu/garson/pa765/ancova.htm).

    3). To control a covariate statistically means the same as to adjust for the covariate or to correct for

    covariate, or to hold constantor to partial outthe covariate. (http://www.psych.uiuc.edu/-

    mho/psy307a.html)

    4). But see:

    Loftin, L., & Madison, S. (1991). The extreme dangers of covariance corrections. In B. Thompson(Ed.), (1991). Advances in educational research: Substantive findings, methodological developments

    (Vol. 1, pp. 133-148). Greenwich, CT: JAI Press. (IBSN: 1-55938-316-X)

    Thompson, B. (1992). Misuse of ANCOVA and related "statistical control" procedures.Reading

    Psychology, 13, iii-xviii.

  • 8/4/2019 Experiment Notes

    4/6

    Y520 Spring 2000 Page 4

    Michael Y520

    Pre-Experimental Designs

    A. One-Shot Case Study

    X O X = treatment

    O = Observation (dependent variable)

    Problems: No control group; cannot tell if treatment had any effect.

    Comments from Campbell and Stanley (1963): As has been pointed out (e.g., Boring, 1954; Stouffer, 1949) such studies have such a total absence of

    control as to be of almost no scientific value (p. 6).

    Basic to scientific evidence (and to all knowledge-diagnostic processes including the retina of the eye) is the

    process of comparison, of recording differences, or of contrast. Any appearance of absolute knowledge, or

    intrinsic knowledge about singular isolated objects, is found to be illusory upon analysis. Securing scientific

    evidence involves making at least one comparison" (p. 6).

    It seems well-nigh unethical... to allow, as theses or dissertations in education, case studies of this nature

    (i.e., involving a single group observed at one time only)" (p. 7).

    B. One Group Pretest-Post test Design

    O1 X O2 O1 = PretestX = treatment

    O2 = Observation (dependent variable)

    Problems: No control group. Changes between pre- and post-test may be due not to the treatment but to:

    history, maturation, instrument decay, data collection characteristics, data collection bias, testing, statistical

    regression, attitude of subjects, problems with implementation, etc.

    C. Static-group comparison design

    X O1 X = treatment

    O1 = Observation (dependent variable)

    O1

    Intact, existing groups are used. No random selection of subjects; no random assignment to groups. No way to insureequivalence of groups.

    Comments from Campbell and Stanley (1963):

    Instances of this kind of research include, for example, the comparison of school systems which require the

    bachelors degree of teachers (the X) versus those which do not; the comparison of students in classes given

    speed-reading training versus those not given it; the comparison of those who heard a certain TV program with

    those who did not, etc. (p. 12).

    There is ... no formal means of certifying that the groups would have been equivalent had it not been for the

    X.... If O2 and O2 differ, this difference could well have come through the differential recruitment of persons

    making up the groups: the groups might have differed anyway, without the occurrence of X" (p. 12).

  • 8/4/2019 Experiment Notes

    5/6

    Y520 Spring 2000 Page 5

    Michael Y520

    Quasi-Experimental Designs No random sampling of subjects. Intact groups often used.

    No random assignment of Ss to groups. Confidence in equivalency of groups is lower.

    A. Matching-only Group Design

    Treatment M X1 O X = treatment

    Control M X2 O

    B. Matching-only Pretest-Post test Group Design

    Treatment O1 M X1 O2 O1 = Pretest

    X1 = treatment

    Control O1 M X2 O2 O2 = Post test

    Existing, intact groups.

    Subjects matched on one or more variables; can't be certain if groups are equivalent on remaining unmatched

    variables.

    Matching is never a substitute for random sampling and random assignment to groups.

    C. Single Group Time Series Design

    The essence of the time-series design is the presence of a periodic measurement process on some group or

    individual and the introduction of an experimental change into this time series of measurements, the results of

    which are indicated by a discontinuity in the measurements recorded in the time series" (Campbell & Stanley,

    1963, p. 37).

    O1O2 O3O4 O5X1 O6O7 O8O9 O10 X1 = treatment

    Factorial Designs. Requires at a minimum, two levels variable A crossed with two levels of variable B. That is, all levels of A

    occur with all levels of B.

    Factorial designs enable the investigator to observe an interaction, if one exists. An interaction simply meansthat different levels of the dependent variable occur at different levels of the independent variable.

    Let us suppose that three types of teachers are all, in general, effective (e.g., the spontaneous extemporizers,

    the conscientious preparers, and the close supervisors of student work). Similarly, three teaching methods in

    general turn out to be equally effective (e.g., group discussion, formal lecture, and tutorial). In such a case...,

    teaching methods could plausibly interact strongly with types, the spontaneous extemporizer doing best with

    group discussion and poorest with tutorial, and the close supervisor doing best with tutorial and poorest with

    group discussion (Campbell & Stanley, 1963, p. 29).

    Threats to Internal ValidityIs the investigators conclusion correct? Are the changes in independent variable indeed responsible for the observed

    variation in the dependent variable? Or, might the variation in the dependent variable be attributable to other causes?

    This is the question of internal validity.The following list is from Campbell and Stanley (1963) as interpreted by

    Kirk (1995):

    1. History. Events other than the administration of a treatment level that occur between the time the treatment

    level is assigned to subjects and the time the dependent variable is measured may affect the dependent

    variable.

    2. Maturation. Processes not related to the administration of a treatment level that occur within subjects is

    simply a function of the passage of time (growing older, stronger, larger, more experienced, and so on) may

    affect the dependent variable.

    3. Testing. Repeated testing of subjects may result in familiarity with the testing situation or acquisition of

    information that can affect the dependent variable.

  • 8/4/2019 Experiment Notes

    6/6

    Y520 Spring 2000 Page 6

    Michael Y520

    4. Instrumentation. Changes in the calibration of a measuring instrument, shifts in the criteria used by

    observers and scorers, or unequal intervals in different ranges of a measuring instrument can affect the

    measurement of the dependent variable.

    5. Statistical regression. When the measurement of the dependent variable is not perfectly reliable, there is a

    tendency for extreme scores to regress or move toward the mean. Statistical regression operates to (a) increase

    the scores of subjects originally found to score low on a test, (b) decrease the scores of subjects originally

    found to score high on a test, and (c) not affect the scores of subjects at the mean of the test. The amount of

    statistical regression is inversely related to the reliability of the test.

    6. Selection. Differences among the dependent-variable means may reflect prior differences among thesubjects assigned to the various levels of the independent variable.

    7. Mortality. The loss of subjects in the various treatment conditions may alter the distribution of subject

    characteristics across the treatment groups.

    8. Interactions with selection. Some of the foregoing threats to internal validity may interact with selection to

    produce effects that are confounded with or indistinguishable from treatment effects. Among these are

    selection-history effects and selection-maturation effects. For example, selection-maturation effects occur

    when subjects with different maturation schedules are assigned to different treatment levels.

    9. Ambiguity about the direction of causal influence. In some types of research for example,

    correlational studies it may be difficult to determine whether X is responsible for the change in Y or vice

    versa. This ambiguity is not present when X is known to occur before Y.

    10. Diffusion or imitation of treatments. Sometimes the independent variable involves information that is

    selectively presented to subjects in the various treatment levels. If the subjects in different levels can

    communicate with one another, differences among the treatment levels may be compromised.11. Compensatory rivalry by respondents receiving less desirable treatments. When subjects in some

    treatment levels receive goods or services generally believed to be desirable and this becomes known to

    subjects in treatment levels that do not receive those goods and services, social competition may motivate the

    subjects in the latter group, the control subjects, to attempt to reverse or reduce the anticipated effects of the

    desirable treatment levels. Saretsky (1972) named this the John Henry effect in honor of the steel driver who,

    upon learning that his output was being compared with that of a steam drill, worked so hard that he

    outperformed the drill and died of overexertion.

    12. Resentful demoralization of respondents receiving less desirable treatments. If subjects learn that the

    treatment level to which they have been assigned received less desirable goods or services, they may

    experience feelings of resentment and demoralization. Their response may be to perform at an abnormally low

    level, thereby increasing the magnitude of the difference between their performance and that of units assigned

    to the desirable treatment level.

    Campbell, D. T., & Stanley, J. C. (1963).Experimental and quasi-experimental designs for research. Chicago, IL:

    Rand McNally.

    Kirk, R. E. (1995).Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole.