is 4800 empirical research methods for information science class notes march 16, 2012
DESCRIPTION
IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012. Instructor: Prof. Carole Hafner, 446 WVH [email protected] Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/. Outline. Sampling and statistics (cont.) T test for paired samples - PowerPoint PPT PresentationTRANSCRIPT
IS 4800 Empirical Research Methods for Information Science
Class Notes March 16, 2012
Instructor: Prof. Carole Hafner, 446 [email protected] Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
Outline
• Sampling and statistics (cont.)
• T test for paired samples
• T test for independent means
• Analysis of Variance
• Two way analysis of Variance
3
Relationship Between Population and Samples When a Treatment
Had No EffectPopulation
M1 M2
Sample 2Sample 1
4
Relationship Between Population and Samples When a Treatment
Had An EffectControlgrouppopulation
c
Controlgroupsample
Mc
Treatmentgroupsample
Mt
Treatmentgrouppopulation
t
Population
Mean? Variance?
2
Sampling
Sample of size N
Mean values from all possible samples of size Naka “distribution of means”
N
XM
NM
22
N
MXSD
2
2 )(
ZM = ( M - M
Z tests and t-tests
t is like Z:
Z = M - μ /
t = M – μ / μ = 0 for paired samples
We use a stricter criterion (t) instead of Z because is based on an estimate of the population variance while is based on a known population variance.
M
MS
MS
M
S2 = Σ (X - M)2 = SS
N – 1 N-1S2
M = S2/N
Given info about population of changescores and thesample size we willbe using (N)
T-test with paired samples
Now, given a particular sample of change scores of size N
We can compute the distribution of means
We compute its mean
and finally determine the probability that this mean occurred by chance
?
= 0S2 est 2 from sample = SS/df
MS
Mt
df = N-1
S2M = S2/N
t test for independent samples
Given two samples
Estimate population variances(assume same)
Estimate variancesof distributions of means
Estimate varianceof differences between means(mean = 0)
This is now yourcomparison distribution
Estimating the Population Variance
S2 is an estimate of σ2
S2 = SS/(N-1) for one sample (take sq root for S)
For two independent samples – “pooled estimate”:S2 = df1/dfTotal * S1
2 + df2/dfTotal * S22
dfTotal = df1 + df2 = (N1 -1) + (N2 – 1)
From this calculate variance of sample means: S2M = S2/N
needed to compute t statistic
S2difference = S2
Pooled / N1 + S2Pooled / N2
t test for independent samples, continued
This is yourcomparison distributionNOT normal, is a ‘t’ distribution
Shape changes depending on df
df = (N1 – 1) + (N2 – 1)
Distribution of differencesbetween means
Compute t = (M1-M2)/SDifference Determine if beyond cutoff score for test parameters (df,sig, tails) from lookup table.
ANOVA: When to use
• Categorial IVnumerical DV (same as t-test)
• HOWEVER:– There are more than 2 levels of IV so:– (M1 – M2) / Sm won’t work
12
ANOVA Assumptions
• Populations are normal
• Populations have equal variances
• More or less..
13
Basic Logic of ANOVA
• Null hypothesis– Means of all groups are equal.
• Test: do the means differ more than expected give the null hypothesis?
• Terminology– Group = Condition = Cell
14
Accompanying Statistics• Experimental
– Between-subjects• Single factor, N-level (for N>2)
– One-way Analysis of Variance (ANOVA)
• Two factor, two-level (or more!)– Factorial Analysis of Variance
– AKA N-way Analysis of Variance (for N IVs)
– AKA N-factor ANOVA
– Within-subjects• Repeated-measures ANOVA (not discussed)
– AKA within-subjects ANOVA
15
• The Analysis of Variance is used when you have more than two groups in an experiment– The F-ratio is the statistic computed in an Analysis of
Variance and is compared to critical values of F
– The analysis of variance may be used with unequal sample size (weighted or unweighted means analysis)
– When there are just 2 groups, ANOVA is equivalent to the t test for independent means
ANOVA: Single factor, N-level (for N>2)
One-Way ANOVA – Assuming Null Hypothesis is True…
Within-Group EstimateOf Population Variance
21est
22est
23est
2estwithin
Between-Group EstimateOf Population Variance
M1
M2
M3
2estbetween
2
2
estwithin
estbetweenF
Justification for F statistic
Calculating F
Example
Example
Using the F Statistic
• Use a table for F(BDF, WDF)– And also α
BDF = between-groups degrees of freedom =
number of groups -1
WDF = within-groups degrees of freedom =
Σ df for all groups = N – number of groups
One-way ANOVA in SPSS
23
Data
0
1
2
3
4
5
6
1 Day 2 Day 3 Day
Performance
Mean
24
Analyze/Compare Means/One Way ANOVA…
SPSS Results…
ANOVA
Performance
24.813 2 12.406 9.442 .001
27.594 21 1.314
52.406 23
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
F(2,21)=9.442, p<.05
26
Factorial Designs
• Two or more nominal independent variables, each with two or more levels, and a numeric dependent variable.
• Factorial ANOVA teases apart the contribution of each variable separately.
• For N IVs, aka “N-way” ANOVA
27
Factorial Designs
• Adding a second independent variable to a single-factor design results in a FACTORIAL DESIGN
• Two components can be assessed– The MAIN EFFECT of each independent variable
• The separate effect of each independent variable
• Analogous to separate experiments involving those variables
– The INTERACTION between independent variables • When the effect of one independent variable changes over levels of a
second
• Or– when the effect of one variable depends on the level of the other variable.
Example
Wait Time Sign in Student Centervs. No Sign
Satisfaction
0
2
4
6
8
10
12
Level 1 Level 2
Level of Independent Variable A
Val
ue
of t
he
Dep
end
ent
Var
iab
le
Level 1 Level 2
Example of An Interaction - Student Center Sign – 2 Genders x 2 Sign Conditions
F
M
NoSign
Sign
30
Two-way ANOVA in SPSS
31
Analyze/General Linear Model/Univariate
32
Results
Tests of Between-Subjects Effects
Dependent Variable: Performance
26.507a 5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
SourceCorrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)a.
33
Results
34
Degrees of Freedom
• df for between-group variance estimates for main effects– Number of levels – 1
• df for between-group variance estimates for interaction effect – Total num cells – df for both main effects – 1– e.g. 2x2 => 4 – (1+1) – 1 = 1
• df for within-group variance estimate– Sum of df for each cell = N – num cells
• Report: “F(bet-group, within-group)=F, Sig.”
Publication formatTests of Between-Subjects Effects
Dependent Variable: Performance
26.507a 5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
SourceCorrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)a.
N=24, 2x3=6 cells => df TrainingDays=2, df within-group variance=24-6=18
=> F(2,18)=7.20, p<.05
36
Reporting rule
• IF you have a significant interaction
• THEN – If 2x2 study: do not report main effects, even if
significant– Else: must look at patterns of means in cells to
determine whether to report main effects or not.
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.340.120.41
n.s.
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.340.120.02
Significant interaction between TrainingDaysAnd Trainer, F(2,22)=.584, p<.05
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.340.020.41
Main effect of Trainer, F(1,22)=.001, p<.05
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.040.120.01
Significant interaction between TrainingDaysAnd Trainer, F(2,22)=.584, p<.05
Do not report TrainingDays as significant
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.040.020.41
Main effects for both TrainingDays, F(2,22)=7.20, p<.05, and Trainer,F(1,22)=.001, p<.05
“Factorial Design”
• Not all cells in your design need to be tested– But if they are, it is a “full factorial design”, and you
do a “full factorial ANOVA”
Real-Time Retrospective
Agent
Text
X
43
Higher-Order Factorial Designs
• More than two independent variables are included in a higher-order factorial design– As factors are added, the complexity of the experimental
design increases• The number of possible main effects and interactions increases
• The number of subjects required increases
• The volume of materials and amount of time needed to complete the experiment increases