experimental psychology review - wofford...

C H A P T E R S 1 - 1 3

Experimental Psychology Review

Basic descriptive stats and distributions

Measures of central tendency

Mean, median, mode

Measures of variability

Standard deviation and variance

Range

Types of distributions

Normal, positive or negative skew

Effect of extreme scores

Graphs & types of measurements

1

)( 2

N

MXS

N

X

2)(

Calculating standard deviation (s)

1. Score minus mean Calculate deviation score

2. Square deviations (w/o sums to zero)

3. Sum squared deviations (SS)

4. Divide by N or N - 1 This step = variance Use N for population Use N-1 to estimate population from sample

5. Take square root of value

Return to original metric -undo squared values

RTs x - M (x - M)2

512 -52.47 2753.101

587 22.53 507.6009

590 25.53 651.7809

578 13.53 183.0609

567 2.53 6.4009

533 -31.47 990.3609

573 8.53 72.7609

529 -35.47 1258.121

577 12.53 157.0009

572 7.53 56.7009

572 7.53 56.7009

591 26.53 703.8409

575 10.53 110.8809

577 12.53 157.0009

534 -30.47 928.4209

Avg =

564.4667 8593.734 sum of (X-M)2

sd = 613.8381 Variance: sum divided by N-1

24.77576 = 24.77576 SD: square root of sum/N-1

Scales of measurement

Nominal, ordinal, interval, ratio (p61)

Also distinguished as discrete vs. continuous variables

Qualitative vs. quantitative

Example grade distribution

A 1

A- 2

B+ 3

B 7

B- 8

C+ 6

C 3

C- 2

D 1

F 1

0

1

2

3

4

5

6

7

8

9

A A- B+ B B- C+ C C- D F

N = 34

M = 80.38

Median = 81

Mode = B-

Z-scores

Examine score in relation to a distribution of scores Convert scores to a standard score (z-score)

Z-score: standard deviation of score from sample or population mean

If score is above mean = positive score

If score is below mean = negative score

Assume a standard normal distribution

µ = 0

= 1

xz

Reliability

“Consistency and stability of a measuring instrument”(p65)

Observed score = true score + error

Types of errors: Method error (e.g. test situation, equipment error)

Trait error (e.g. fatigue, health, truthfulness)

Types: Test-retest reliability, Alternative forms reliability, Split-half reliability, Inter-rater reliability

Measured reliability Correlation coefficient: -1 to 0 to +1

.70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak

Validity

Does measure provide info on what we really want to measure? Is it useful for what we want?

Multiple types of validity Content validity: representative of sample of behaviors to be measured

Criterion validity: accurately predicts behavior (concurrent and predictive validity)

Construct validity: accurately measures construct

Face validity: appear valid on surface

Validity is not all-or-none, but on a scale

Other types: Internal validity: eliminate extraneous variables

External validity: findings will generalize to other contexts

Correlations

Direction of relationship:

Positive: As value of 1 variable increases, so does the other

Direct correlation

Negative: As value of 1 variable increases, the other decreases

Indirect correlation

No relationship

Magnitude, size or strength of relationship:

-1.00 to 0 to +1.00 (“correlation coefficient”)

0 = no relationship

1 = perfect predicted relationship

Hypothesis testing

Null vs. alternative hypothesis When you reject the null hypothesis:

“The findings are statistically significant.”

One vs. two tailed test

Critical region Critical value (cv)

Defines “unlikely event” for H0 distribution

Alpha ( ) Upper probability value for critical region

p-value Probability of result occurring

Inferential statistic: z-test

Z-score:

Comparison of score with population distribution in terms of SD from population mean

Z-test:

Comparison of sample mean with sampling distribution

Sampling distribution’s

µx = µ

σx < σ

So… σx= σ/√N

( )X

X

xz

50 100 150

0.00

0.01

0.02

0.03

IQ

Den

sity

IQ for 1 Subject

1151059585

80

70

60

50

40

30

20

10

0

Mean IQ for 10 Subjects

Fre

qu

en

cy

Errors

Type I

error

Correct

decision

Correct

decision

Type II

error

Actual situation

NO Effect Effect

H0 True H0 False

Reject H0

Retain H0

Experimenter’s Decision

Conclude there was an effect when there actually wasn’t – the

risk of that is

Conclude there wasn’t an effect when there actually was an effect –

also called

Statistical Power

What is the probability of making the correct decision??

If treatment effect exists either…

We correctly detect the effect or…

We fail to detect the effect (Type II error or )

So, the probability of correctly detecting is 1 -

Power: probability that test will correctly reject null hypothesis (i.e. will detect effect)

Power depends on:

Size of effect

Alpha level

Sample size

-3 -2 -1 0 1 2 3-3 -2 -1 0 1 2 3

Reject H0

Threats to internal validity

Nonequivalent control grp Use random assignment

Use pretest/posttest design History

Test at different time pts Maturation

Use control group Testing effect

Use control group Regression to mean

Use control grp w/ same extreme scores

Instrumentation effect Use control group

Mortality or attrition Use control group

Diffusion of treatment Tell Ss not to discuss study

Experimenter or participant effects

Use single-blind or double-blind method

Use placebo group Ceiling and floor effects

Carefully select DV to avoid

How to prevent these potential confounds

Review t-tests

Single-sample t-test (df = N – 1)

Independent samples t-test (df = (n1 – 1)+(n2 – 1) )

Related or paired-samples t-test (df = N – 1)

ms

Mt

)(

21

21

)(

mms

MMt

DM

DD

s

Mt

n

ssM

1

)( 2

N

MXS

2

2

2

1

2

1)( 21 n

s

n

ss MM

N

ss D

M D 1

)( 2

N

MDS

D

D

n

ssM

2

1

)( 2

2

N

MXS

Effect size

Cohen’s d:

Variance accounted for (r2):

Influencing factors: Difference between means

Bigger difference – larger t-test

Size of sample variance

Larger variance – smaller t-test

Sample sizes

Larger sample – higher probability of sig t-test (little influence on effect size)

22

2

2

2

1

21

ss

MMd

dft

tr

2

22

1-way ANOVA: Partitions the Variance

Total Variance

Between Treatment Variance

1. Treatment effects 2. Error

Within Treatment Variance

Error

Between variance ---------------------- Within variance

F =

Repeated-measures ANOVA

The partitioning of degrees of freedom for a repeated-measures experiment

Partitioning the Variance in Factorial ANOVA 2-way ANOVA

Total Variability

Between-treatments

variability

Within-treatments

variability

Factor A

variability

Factor B

variability

Interaction

variability

atmentwithin tre

AxB)or Bor (A reatment

MS

MSF

t

Definitional formulas

Between treatment SS (sums of squares) Sum of squared deviations from each group’s mean from grand mean multiplied by the number of Ss in group

Within groups SS Sum of squared deviations of each score from group mean

Participant (between subject) SS Sum of squared difference scores from the mean of each participant across the conditions and the grand mean, multiplied by the number of conditions

Total SS Sum of squared deviations of each score from the grand mean

])[( 2nMM Gg

2)( gMX

2)( GMX

])[( 2kMM GP

Factorial Anova: Hypotheses

Main effect for gender H0: µM = µF

H1: µM ≠ µF

Main effect for note-taking H0: µM1 = µM2 = µC

H1: at least 1 mean different

Interaction of gender and note-taking H0: Mean differences explained by ME

H1: Interaction between factors

Disability and gender effects on play

0

2

4

6

8

typical physical mental

male

female


male 7.3 3 3.2 13.5

female 6.8 3.4 4 14.2

14.1 6.4 7.2

If there were interactions…


male 7.3 6 6.2 19.5

female 6.8 3.4 4 14.2

14.1 9.4 10.2

0

2

4

6

8


male

female


male 7.3 3 3.2 13.5

female 4 6.8 7 17.8

11.3 9.8 10.2

0

2

4

6

8


male

female

Post Hoc Tests

Significant ANOVA – there is at least 1 mean that is different

Post-tests examine which means are and are not significantly different

Compare 2 means at a time (pair-wise comparisons)

Type I error: divide alpha among all tests need to do Planned comparisons: based on predictions

Tukey’s HSD

Scheffe test (numerator is for MSbetween for only the two treatments you want to compare)

Bonferroni

Independence Chi-square test: 2 variables

Examine relationship between 2 variables

Treating cocaine addiction

No Yes

Desipramine 14 10

Lithium 6 18

Placebo 4 20

Relapse

Total

24

24

24

72Total 24 48

Compare study findings to expected findings

n

fff rce

No relapse: 24*24/72 = 8 (successes expected/drug)

Yes relapse: 48*24/72 = 16 (failures expected/drug)

Independence Chi-square test

Null hypothesis: no relationship between type of drug and relapse

df = (R – 1)(C – 1) = 2-1(3-1) = 2

Critical value X2 @ .05 = 5.99

Significant!

Examine proportions (#/total) of no relapse

Drug1: 14/24 = .58, Drug2: 6/24 = .25, Drug3: 4/24 = .17 (vs. expected: 8/24 = .33)

e

eo

f

ffX

2

2 )(

5.1016

)1620(

16

)1618(

16

)1610(

8

)84(

8

)86(

8

)814( 2222222

X

No Yes

Desipramine 14 / 8 10 / 16

Lithium 6 / 8 18 / 16

Placebo 4 / 8 20 / 16

Relapse

Parametric statistics

z-test

One-sample t-test

Independent samples t-test

Paired samples t-test

One-way ANOVA

Repeated-measures ANOVA

Two-way ANOVA

Mixed ANOVA

Statistics by design and # levels

Between Ss design

1 IV; 2 levels Independent samples t-test

1 IV; 3+ levels One way ANOVA

2 IVs; 3+ levels Two way ANOVA

Within Ss design

1 IV; 2 levels Paired-samples t-test

1 IV; 3+ levels Repeated-measures ANOVA

2 IVs; 3+ levels Repeated-measures ANOVA

Between Ss and within Ss design

• 2 IVs; 2+ levels

• Mixed ANOVA

Is the statistic significant? Calculate df and look up critical values

Independent-samples t-test 14 women read essay by “John”, 14 read essay by “Joan”; DV: rate quality; t = 2.56 df: (n-1)+(n-1) df = 26; cv = 2.056

Paired samples t-test 8 Ss attitude before/after lecture; t = 2.76 df: N -1 Df = 7; cv =2.365

One-way ANOVA 15 Ss randomly assigned to either positive, negative, or no feedback (5 per condition); DV: self-esteem score; F = 4.37 df bet = k – 1; df w/in = N – k df bet = 2; df w/in = 12; cv = 3.88

Repeated-measures ANOVA 5 Ss on exercise program; assessed on well-being 4x (pre, 2wk, 4wk, post at 6wk); F = 2.12 df bet = k-1; df w/in = N-k; df bet Ss = n-1; df error = w/in-bet Ss df bet = 3; df err = 12; cv = 3.49

Two-way ANOVA Ss evaluated the quality of a passage of poetry then listened to opinion of either an expert or novice. The opinion was either slightly, moderately or highly discrepant from the initial ratings of quality. The 3 x 2 design examined change in quality rating. There were 5 Ss per condition. The F for the main effect of source expertise was 6.78. df bet = k-1; dfA = k-1; dfB = k-1; dfAxB = dfbet – dfA – dfB; df w/in N-k Df bet = 5; dfA = 2, dfB = 1; dfAxB = 2; dfw/in = 24

Effect sizes

t-tests

Cohen’s D

Confidence interval

r2 (variance explained)

ANOVAs

eta-squared

experimental psychology review - wofford...

Documents