descriptive statistics 922. hypothesis (linguistic) participants task (stimuli = questions,...
TRANSCRIPT
Descriptive statistics
922
Hypothesis (Linguistic) Participants Task (stimuli = questions, responses = answers) Results Conclusions
Key terms: stimulus design, response measure
What do we need to run an experiment?
Example
Show me the cat that bit the dog
Show me the cat that the dog bit
Picture from:
Friedmann &Novogrodsky (2001)
Design
Number of conditions Within subject / between subject How many items to each participant Order of items
Measure Response
Variables Scales Analysis
Descriptive Inferential
Variables Any experimental category that has a
value that can vary. Anything that is not constant and can
change over time, or be different in different people is a variable
Variables can take many forms Variables can be manipulated and
observed
Properties of Variables Continuous variable – along a continuum
with equal intervals (e.g., age, height, weight, grade in a test)
Ordinal variables – rating along a continuum with estimated intervals (e.g., evaluation)
Discrete variables (categorical, nominal) – divide to categories (e.g., language, yes/no, correct/incorrect)
Types of Variables Independent variables –
Characteristics of the subject (Participant variable)Conditions chosen by the experimenter
Dependent variables – what the experiment measures (e.g., degree of success)
Intervening variables – variables which are not measured or manipulated, but could influence the results (e.g., concentration, intelligence)
Field, A. & G. Hole. 2003. How to design and Report Experiments. London: A Sage Publications Company
Scales
Nominal Ordinal Interval Ratio
Scales
Nominal Ordinal Interval Ratio
Two things with the same number are similar (same name)
Scales
Nominal Ordinal Interval Ratio
Four is more than three (but not the same as three from two)
Scales
Nominal Ordinal Interval Ratio
Four is more than two (but not twice)
Scales
Nominal Ordinal Interval Ratio Four is more than
three, same as three from two, and is twice two
Which scale are the following variables rated on? Height Celsius degrees TV channel number Grades in an exam (1-100) Psychological rating (anxiety on a scale of 1-
10) Time (13:00, 14:00) Time (one hour, two hours, three hours) Phone number Rating places in a race
Variables and Scales: summary
Choose an appropriate task Measure responses Be aware of the variables and their
properties Choose the mathematical operations
appropriate for the scale
Factorial design
Tests all possible combinations, e.g., a 2x2 design – one participant variable and one independent variable with two conditions.
Subject relatives
Object Relatives
TLD
SLI
Practical questions for offline tasks How many subjects? At least 25 How many categories? 2x2 How many items? More subjects >> fewer
items. For 25 – 6 items per category For 50 – 3 is enough For case studies and within subject analysis
at least 10.
SIMPLE NUMERICAL COMPUTATIONS
Ratio
The relation between two nominal variables
V/N ratio: 60/80=3/4
N/V ratio: 80/60=4/3
N
Nouns80
Verbs60
Other words
50
Total190
Example
Goofy said that the Troll had to put two hoops on the pole to win.
Does the Troll win?
Musolino (2004)
Ratio
Yes/no ratio: 8/12=2/3
Proportion
Relation between a group and its part (Verb/Word, Pronouns/Subject position). Ratio out of the total
Verb/Word proportion: 60/190=1/3=0.31
Percentage(%)
Relative proportion out of a hundred Verb percentage (out of all words):
100*(60/190) =31%
Rate
The relative frequency (for population out of a 1000)7% of children have SLI >> 0.07 * 1000 = 70 70 children out of a 1000 have SLI
Frequency
Count the number of times a score occurs.
How many times a value of a variable occurs?
Example
Show 10 pictures, and check for number of “correct” response
Is every bunny eating a carrot?
Roeper, Strauss and Zurer
Pearson (2004)
Picturecorrect
11
21
30
40
50
60
71
81
91
101
Total6
Frequency
Count the number of times a score occurs
ChildScore
18
28
36
46
56
66
72
82
Frequency
Raw scoreChildScore
18
28
36
46
56
66
72
82
FrequencyScoreFrequency
2
6
8
Frequency=how many children got this score
2
2
4
Frequency graph
Score on the test is the horizontal axis (X-axis)
Frequency is on the vertical axis (Y-axis)
Percentile
The cumulative frequency - how many scores are below a particular point in the distribution
Percentile = 100(Cumulative Frequency/Total N)
Grade Frequency cumulative frequency
percentile
100 2 30 100% 90 5 28 93% 80 10 23 77% 70 8 13 43% 60 4 5 17% 50 1 1 3% Total N 30
Frequency polygon (the curve)Frequency distribution
0
2
4
6
8
10
12
50 60 70 80 90 100
Grade
N o
f st
ud
ent
The frequency polygon (the curve) is a picture of the data
Types of distributions (Fig. 4.3 &4.4, pp. 113-116)
A bell shaped curve - a symmetric distribution, a unimodal distribution (one midpoint, one peak),
normal distribution
Peak
Tails
Pointy distribution (Leptokutic) Flat distribution (Platykutic)
In skewed distribution the tail is skewed in one direction:Positively skewed distribution - most scores are low, the tail is directed towards the high (positive) scores which skewed the distributionNegatively skewed distribution - most scores are high, the tail is directed towards the low (negative) scores which skewed the distribution
Bimodal distribution - a double peaked curve
Min (the lowest score) and Max (the highest score)
Range – the range of observed values. Range = Max-Min
But the range changes with the extreme scores (unstable but useful informal measure).
Descriptive Statistics - Some definitions
Mode - most frequently obtained score Mean (average) – average of a set of
numbers Median – the middle score of a group
(when odd) or the average of the two middle scores (when even)
In a bell curve (normal) distribution mode, mean and median will be the same
Mode
GradeFrequency
501
604
708
8010
905
1002
total30
Which grade is most frequent?
Highest in “frequency” column
Mean (average)
GradeFrequency
501
604
708
8010
905
1002
total30
Compute a sum of all grades
Divide by number of grades
Mean (average)
Grade x times
50x150
60x4240
70x8560
80x10800
90x5450
100x2200
total2300
mean2300/30
76.66
Median
GradeFrequency
501
604
708
8010
905
1002
total30
Order all grades in a row according to value
The grade in “the middle” of the row is the median
Median
GradeFrequency
501
604
708
8010
905
1002
total30
We have a row of 30 grades:
50,60,60,60,60,70… Half of 30 is 15 The grade in the 15th
position is the median
Median
GradeFrequency
501
604
708
8010
905
1002
total30
Slight complication: we have 15 grades on both sides of the median
Compute mean of the grades in the 15th and 16th positions
Variability
(Fig
ure
from
Hat
ch &
Far
hady
198
2, p
.56)
Questions: Are both curves the same? How? Are they different? How?
We need to measure the accuracy of the mean.
Coming attractions
How to draw valid statistical inferences?We have to look at the relation between our
sample and the population Today we looked at where the ‘center’ of
the data is – what is the big picture Look at variance, how the data is distributed
DeviationThe distance between a score and the Mean (see Table 4.2, p. 125), how much a score deviates from the average
Sum of squared errors (SS)
Variance Average error in the sample, average error
in the population Variance in the sample = SS/N
33.7143/7=4.8163 Variance in the population = SS/(N-1)
33.7143/6=5.6191 Why N-1? Degree of freedom (read box
4.5, page 129)
Standard deviation (SD)
The average distance between a score and the Mean (square root of the Variance)
SD= √5.6191 = 2.37
What can SD tell us about the distribution (pointy distribution vs. flat distribution)?
Standard Error (SE)
How well does the sample represent the population?
Different samples of the population might yield different means. The SE is the average of the SDs of the means of several samples. Large value - big difference, small value- small difference.
SE = SD/√ N
Confidence Interval
The limits within which 95% or 99% of the samples fall
Lower boundary = Mean-2SE Upper boundary = Mean+2SE
Inferential statistics
z-score and T-score
How can we use the standard deviation (SD) to compare two samples? two exams? two tests?
We translate the raw scores into distance in SD from the mean, by subtracting the mean from the raw score and dividing by the SD.
So for Table 4.2:
1-3.57 8-3.57--------- = -1.08 --------- = 1.86 2.37 2.37
These scores are z-scores. Some z-scores are negative and some are positive. Why?
So for Table 4.2:
1-3.57 8-3.57--------- = -1.08 --------- = 1.86 2.37 2.37
These scores are z-scores. Some z-scores are negative and some are positive. Why?
If you prefer a scale with only positive numbers, you can use the T-score
T score = 10 * z-score +50
10 * -1.08 +50 = 39.2
10*1.86+50 = 68.6
A few words on Covariance and Pearson correlation
Covariance - how much two variables co-vary?
Cov = (X - X) (Y- Y)
But we are interested in sets of scores so we need to sum up all the individual covariance and divide, as always by N-1.
Σ (X-X)(Y-Y)COVxy= ----------------------
N-1
What do we need covariance for? To measure correlations (Pearson correlation coefficient is considered the best way to estimate correlation between X & Y).
Since the two samples do not have the same
SD, we must adjust the covariance to the amount of variation
COVx y
r= --------------
SDx * SDy
What does r mean ?
Positive r - positive correlation Negative r - negative correlation Small r - small correlation Big r - big correlation
inferential statistics.xls
Effect size We can use correlations to measure
experimental effect size r2 - the coefficient of determination - is the
fraction of the variance that is accounted for by a linear correlation.
r=0.1 (small effect) - only 1% of the variance is accounted for by our task (1%=.01=r2)
r=0.3 (medium effect) - 9% of variance is accounted for by our task (9%=.09=r2)
r=0.5 (large effect) - 25% of variance is accounted for by our task (25%=0.25=r2)
r = 1 A perfect effect
Probability How probable it is to get a certain correlation? How probable is it to get a certain score? How probable is it to get a certain mean? How probable is it that two samples are the
same/different?
Playing "Head or tails?" Throwing a dice.
Probability can be calculated by dividing the number of desired events by the number of possible outcomes.
What is the probability of getting a score above the mean?
What is the probability of getting a score which is up to 1SD above the mean? up to 1SD from the mean? (For every z-score there is a probability)
Or by relaying on SD
Confidence Interval
The limits within which 95% of the samples fall
Lower boundary = Mean-2SE Upper boundary = Mean+2SE
Hypothesis testing
How likely is it (how probable is it) that our hypothesis is right?
The probability that some results could happen by chance is less than 5% (or 1%)
p<0.05 (or p<0.01) - the level of significance
Null hypothesis - there is no difference between our sample and the population
Positive hypothesis - the sample does better than the population.
Negative hypothesis - the sample worse better than the population
Alternative hypothesis - the sample is different but there is no direction.
p>0.05
p<0.05
(Fig
ures
fro
m H
atch
& F
arha
dy 1
982,
p.8
7)
If the data falls in the shaded area of 8.5 - the null hypothesis is confirmed
If the data falls in the shaded area of 8.6 - the null hypothesis is rejected
If the data falls in the shaded higher tail of 8.6 - the scores are higher than the population and the null hypothesis is rejected
If the data falls in the shaded negative tail of 8.6 - the scores are lower than the population and the null hypothesis is rejected
Since there is no direction specified by the null hypothesis, we must consider both tails - thus we use a two tailed test (with .025 in each tail).
If we test a directional hypothesis, the level of significance applies to one tail only.
(Fig
ures
fro
m H
atch
& F
arha
dy 1
982,
p.8
8)
A score in the shaded area in 8.7 confirms the _____________ hypothesisA score in the shaded area in 8.8 confirms the _____________ hypothesis