experimental psychology review - wofford...
TRANSCRIPT
C H A P T E R S 1 - 1 3
Experimental Psychology Review
Basic descriptive stats and distributions
Measures of central tendency
Mean, median, mode
Measures of variability
Standard deviation and variance
Range
Types of distributions
Normal, positive or negative skew
Effect of extreme scores
Graphs & types of measurements
1
)( 2
N
MXS
N
X
2)(
Calculating standard deviation (s)
1. Score minus mean Calculate deviation score
2. Square deviations (w/o sums to zero)
3. Sum squared deviations (SS)
4. Divide by N or N - 1 This step = variance Use N for population Use N-1 to estimate population from sample
5. Take square root of value
Return to original metric -undo squared values
RTs x - M (x - M)2
512 -52.47 2753.101
587 22.53 507.6009
590 25.53 651.7809
578 13.53 183.0609
567 2.53 6.4009
533 -31.47 990.3609
573 8.53 72.7609
529 -35.47 1258.121
577 12.53 157.0009
572 7.53 56.7009
572 7.53 56.7009
591 26.53 703.8409
575 10.53 110.8809
577 12.53 157.0009
534 -30.47 928.4209
Avg =
564.4667 8593.734 sum of (X-M)2
sd = 613.8381 Variance: sum divided by N-1
24.77576 = 24.77576 SD: square root of sum/N-1
Scales of measurement
Nominal, ordinal, interval, ratio (p61)
Also distinguished as discrete vs. continuous variables
Qualitative vs. quantitative
Example grade distribution
A 1
A- 2
B+ 3
B 7
B- 8
C+ 6
C 3
C- 2
D 1
F 1
0
1
2
3
4
5
6
7
8
9
A A- B+ B B- C+ C C- D F
N = 34
M = 80.38
Median = 81
Mode = B-
Z-scores
Examine score in relation to a distribution of scores Convert scores to a standard score (z-score)
Z-score: standard deviation of score from sample or population mean
If score is above mean = positive score
If score is below mean = negative score
Assume a standard normal distribution
µ = 0
= 1
xz
Reliability
“Consistency and stability of a measuring instrument”(p65)
Observed score = true score + error
Types of errors: Method error (e.g. test situation, equipment error)
Trait error (e.g. fatigue, health, truthfulness)
Types: Test-retest reliability, Alternative forms reliability, Split-half reliability, Inter-rater reliability
Measured reliability Correlation coefficient: -1 to 0 to +1
.70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak
Validity
Does measure provide info on what we really want to measure? Is it useful for what we want?
Multiple types of validity Content validity: representative of sample of behaviors to be measured
Criterion validity: accurately predicts behavior (concurrent and predictive validity)
Construct validity: accurately measures construct
Face validity: appear valid on surface
Validity is not all-or-none, but on a scale
Other types: Internal validity: eliminate extraneous variables
External validity: findings will generalize to other contexts
Correlations
Direction of relationship:
Positive: As value of 1 variable increases, so does the other
Direct correlation
Negative: As value of 1 variable increases, the other decreases
Indirect correlation
No relationship
Magnitude, size or strength of relationship:
-1.00 to 0 to +1.00 (“correlation coefficient”)
0 = no relationship
1 = perfect predicted relationship
Hypothesis testing
Null vs. alternative hypothesis When you reject the null hypothesis:
“The findings are statistically significant.”
One vs. two tailed test
Critical region Critical value (cv)
Defines “unlikely event” for H0 distribution
Alpha ( ) Upper probability value for critical region
p-value Probability of result occurring
Inferential statistic: z-test
Z-score:
Comparison of score with population distribution in terms of SD from population mean
Z-test:
Comparison of sample mean with sampling distribution
Sampling distribution’s
µx = µ
σx < σ
So… σx= σ/√N
( )X
X
xz
50 100 150
0.00
0.01
0.02
0.03
IQ
Den
sity
IQ for 1 Subject
1151059585
80
70
60
50
40
30
20
10
0
Mean IQ for 10 Subjects
Fre
qu
en
cy
Errors
Type I
error
Correct
decision
Correct
decision
Type II
error
Actual situation
NO Effect Effect
H0 True H0 False
Reject H0
Retain H0
Experimenter’s Decision
Conclude there was an effect when there actually wasn’t – the
risk of that is
Conclude there wasn’t an effect when there actually was an effect –
also called
Statistical Power
What is the probability of making the correct decision??
If treatment effect exists either…
We correctly detect the effect or…
We fail to detect the effect (Type II error or )
So, the probability of correctly detecting is 1 -
Power: probability that test will correctly reject null hypothesis (i.e. will detect effect)
Power depends on:
Size of effect
Alpha level
Sample size
-3 -2 -1 0 1 2 3-3 -2 -1 0 1 2 3
Reject H0
Threats to internal validity
Nonequivalent control grp Use random assignment
Use pretest/posttest design History
Test at different time pts Maturation
Use control group Testing effect
Use control group Regression to mean
Use control grp w/ same extreme scores
Instrumentation effect Use control group
Mortality or attrition Use control group
Diffusion of treatment Tell Ss not to discuss study
Experimenter or participant effects
Use single-blind or double-blind method
Use placebo group Ceiling and floor effects
Carefully select DV to avoid
How to prevent these potential confounds
Review t-tests
Single-sample t-test (df = N – 1)
Independent samples t-test (df = (n1 – 1)+(n2 – 1) )
Related or paired-samples t-test (df = N – 1)
ms
Mt
)(
21
21
)(
mms
MMt
DM
DD
s
Mt
n
ssM
1
)( 2
N
MXS
2
2
2
1
2
1)( 21 n
s
n
ss MM
N
ss D
M D 1
)( 2
N
MDS
D
D
n
ssM
2
1
)( 2
2
N
MXS
Effect size
Cohen’s d:
Variance accounted for (r2):
Influencing factors: Difference between means
Bigger difference – larger t-test
Size of sample variance
Larger variance – smaller t-test
Sample sizes
Larger sample – higher probability of sig t-test (little influence on effect size)
22
2
2
2
1
21
ss
MMd
dft
tr
2
22
1-way ANOVA: Partitions the Variance
Total Variance
Between Treatment Variance
1. Treatment effects 2. Error
Within Treatment Variance
Error
Between variance ---------------------- Within variance
F =
Repeated-measures ANOVA
The partitioning of degrees of freedom for a repeated-measures experiment
Partitioning the Variance in Factorial ANOVA 2-way ANOVA
Total Variability
Between-treatments
variability
Within-treatments
variability
Factor A
variability
Factor B
variability
Interaction
variability
atmentwithin tre
AxB)or Bor (A reatment
MS
MSF
t
Definitional formulas
Between treatment SS (sums of squares) Sum of squared deviations from each group’s mean from grand mean multiplied by the number of Ss in group
Within groups SS Sum of squared deviations of each score from group mean
Participant (between subject) SS Sum of squared difference scores from the mean of each participant across the conditions and the grand mean, multiplied by the number of conditions
Total SS Sum of squared deviations of each score from the grand mean
])[( 2nMM Gg
2)( gMX
2)( GMX
])[( 2kMM GP
Factorial Anova: Hypotheses
Main effect for gender H0: µM = µF
H1: µM ≠ µF
Main effect for note-taking H0: µM1 = µM2 = µC
H1: at least 1 mean different
Interaction of gender and note-taking H0: Mean differences explained by ME
H1: Interaction between factors
Disability and gender effects on play
0
2
4
6
8
typical physical mental
male
female
typical physical mental
male 7.3 3 3.2 13.5
female 6.8 3.4 4 14.2
14.1 6.4 7.2
If there were interactions…
typical physical mental
male 7.3 6 6.2 19.5
female 6.8 3.4 4 14.2
14.1 9.4 10.2
0
2
4
6
8
typical physical mental
male
female
typical physical mental
male 7.3 3 3.2 13.5
female 4 6.8 7 17.8
11.3 9.8 10.2
0
2
4
6
8
typical physical mental
male
female
Post Hoc Tests
Significant ANOVA – there is at least 1 mean that is different
Post-tests examine which means are and are not significantly different
Compare 2 means at a time (pair-wise comparisons)
Type I error: divide alpha among all tests need to do Planned comparisons: based on predictions
Tukey’s HSD
Scheffe test (numerator is for MSbetween for only the two treatments you want to compare)
Bonferroni
Independence Chi-square test: 2 variables
Examine relationship between 2 variables
Treating cocaine addiction
No Yes
Desipramine 14 10
Lithium 6 18
Placebo 4 20
Relapse
Total
24
24
24
72Total 24 48
Compare study findings to expected findings
n
fff rce
No relapse: 24*24/72 = 8 (successes expected/drug)
Yes relapse: 48*24/72 = 16 (failures expected/drug)
Independence Chi-square test
Null hypothesis: no relationship between type of drug and relapse
df = (R – 1)(C – 1) = 2-1(3-1) = 2
Critical value X2 @ .05 = 5.99
Significant!
Examine proportions (#/total) of no relapse
Drug1: 14/24 = .58, Drug2: 6/24 = .25, Drug3: 4/24 = .17 (vs. expected: 8/24 = .33)
e
eo
f
ffX
2
2 )(
5.1016
)1620(
16
)1618(
16
)1610(
8
)84(
8
)86(
8
)814( 2222222
X
No Yes
Desipramine 14 / 8 10 / 16
Lithium 6 / 8 18 / 16
Placebo 4 / 8 20 / 16
Relapse
Parametric statistics
z-test
One-sample t-test
Independent samples t-test
Paired samples t-test
One-way ANOVA
Repeated-measures ANOVA
Two-way ANOVA
Mixed ANOVA
Statistics by design and # levels
Between Ss design
1 IV; 2 levels Independent samples t-test
1 IV; 3+ levels One way ANOVA
2 IVs; 3+ levels Two way ANOVA
Within Ss design
1 IV; 2 levels Paired-samples t-test
1 IV; 3+ levels Repeated-measures ANOVA
2 IVs; 3+ levels Repeated-measures ANOVA
Between Ss and within Ss design
• 2 IVs; 2+ levels
• Mixed ANOVA
Statistics by design and # levels
Between Ss design
1 IV; 2 levels Independent samples t-test
1 IV; 3+ levels One way ANOVA
2 IVs; 3+ levels Two way ANOVA
Within Ss design
1 IV; 2 levels Paired-samples t-test
1 IV; 3+ levels Repeated-measures ANOVA
2 IVs; 3+ levels Repeated-measures ANOVA
Between Ss and within Ss design
• 2 IVs; 2+ levels
• Mixed ANOVA
Statistics by design and # levels
Between Ss design
1 IV; 2 levels Independent samples t-test
1 IV; 3+ levels One way ANOVA
2 IVs; 3+ levels Two way ANOVA
Within Ss design
1 IV; 2 levels Paired-samples t-test
1 IV; 3+ levels Repeated-measures ANOVA
2 IVs; 3+ levels Repeated-measures ANOVA
Between Ss and within Ss design
• 2 IVs; 2+ levels
• Mixed ANOVA
Is the statistic significant? Calculate df and look up critical values
Independent-samples t-test 14 women read essay by “John”, 14 read essay by “Joan”; DV: rate quality; t = 2.56 df: (n-1)+(n-1) df = 26; cv = 2.056
Paired samples t-test 8 Ss attitude before/after lecture; t = 2.76 df: N -1 Df = 7; cv =2.365
One-way ANOVA 15 Ss randomly assigned to either positive, negative, or no feedback (5 per condition); DV: self-esteem score; F = 4.37 df bet = k – 1; df w/in = N – k df bet = 2; df w/in = 12; cv = 3.88
Repeated-measures ANOVA 5 Ss on exercise program; assessed on well-being 4x (pre, 2wk, 4wk, post at 6wk); F = 2.12 df bet = k-1; df w/in = N-k; df bet Ss = n-1; df error = w/in-bet Ss df bet = 3; df err = 12; cv = 3.49
Two-way ANOVA Ss evaluated the quality of a passage of poetry then listened to opinion of either an expert or novice. The opinion was either slightly, moderately or highly discrepant from the initial ratings of quality. The 3 x 2 design examined change in quality rating. There were 5 Ss per condition. The F for the main effect of source expertise was 6.78. df bet = k-1; dfA = k-1; dfB = k-1; dfAxB = dfbet – dfA – dfB; df w/in N-k Df bet = 5; dfA = 2, dfB = 1; dfAxB = 2; dfw/in = 24
Effect sizes
t-tests
Cohen’s D
Confidence interval
r2 (variance explained)
ANOVAs
eta-squared