elang 273: statistics
DESCRIPTION
Elang 273: Statistics. September 15, 2008. Statistics. The scientific method is defined by: 1. The research question is empirical 2. The data we collect is public 3. The data is falsifiable. But also with this. Statistics helps most with this. Statistics. - PowerPoint PPT PresentationTRANSCRIPT
Elang 273: Statistics
September 15, 2008
Statistics
The scientific method is defined by: 1. The research question is empirical
2. The data we collect is public
3. The data is falsifiable Statistics helps most with this
But also with this
Statistics
Research question: Is the word glistening used more often in one register (as shown in COCA) than another?
SECTION SPOKEN FICTION MAGAZINE NEWSPAPER ACADEMIC
PER MIL 0.4 12.0 2.8 2.1 0.6
SIZE (MW) 76.6 69.6 78.1 73.4 73.0
FREQ 32 833 219 156 43
How much different do these frequencies have to be before we can say they are different?
Statistics
Researchers have agreed that if the chance that the difference between two groups is greater than a certain percentage, then we will consider the difference to be statistically significant.
A significant difference is better than one in twenty of happening by chance (p < .05). The opposite of significance is random chance.
Two types of statistics
1. Descriptivea. nominal (categorical)b. ordinal (rank order)c. continuous
2. Inferentiala. chi-squareb. t-tests/ANOVAc. correlationsd. varbrul
1. Descriptive Statistics
These are the types of statistics you are familiar with—showing means, percentages, quartiles, usually through bars, pie charts, and graphs
0
3
6
9
12
15
spoken fiction mag news academic
1. Descriptive Statistics
Three types of data
1. Nominal (Categorical): sex, race, national origin, native speaker, how often you choose one thing over another, how often a word occurs in one register versus another
2. Continuous: height, weight, age, scores on a language test, IQ, working memory span
3. Ordinal (Rank Order): No fixed interval (first, second, third place in a race)—what order people choose their favorite dialect
1. Descriptive Statistics
How could you depict the data for each of these types?
1. Nominal
2. Continuous
3. Ordinal (rank order)
1. Nominal (Categorical)
Birmingham
London
general England
other England
other UK
outside UK
Received Pronunciation
London
general England
other England
outside England
West Yorkshire
Scotland
Ireland
general England
other UK
outside England
Answers to “Where is this speaker from?” (native listeners)
1. Nominal (Categorical)
61
90
51
41
32
75
59
8
92
0
10
20
30
40
50
60
70
80
90
100
Australia England India Ireland Kenya New York Scotland South Africa Southern US
correct dialect identification by American English speakers
2. Continuous
0
1
2
3
4
5 UtahNon-Utahs
Native listeners: status vs. solidarity
0
1
2
3
4
5
6
7
RP Brimingham Netw ork New York WY Alabama
solidarity
status
StatusRPBirminghamNetworkNYCWest YorkshireAlabama
SolidarityRPBirminghamNetworkNew YorkWest YorkshireAlabama
3. Ordinal (Rank Order)
Coupland & Bishop, 2007
2. Inferential Statistics
a. Chi square
b. ANOVA/t-test
c. Correlations (rank order correlations)
d. Logical regression
e. Varbrul
2. Inferential Statistics
For each type of statistics we need to know
1. Statistical value (chi value, F statistic, t statistic)
2. Probability value (p value)
3. Degrees of Freedom (df)
2. Inferential Statistics
Research question: Is the word glistening used more often in one register (as shown in COCA) than another?
SECTION SPOKEN FICTION MAGAZINE NEWSPAPER ACADEMIC
PER MIL 0.4 12.0 2.8 2.1 0.6
SIZE (MW) 76.6 69.6 78.1 73.4 73.0
FREQ 32 833 219 156 43
2. Inferential Statistics
Research question: Is the word glistening used more often in one register (as shown in COCA) than another?
What kind of data is this? Nominal (categorical)
For this kind of data we use a chi square
a. Chi-square
Tells us whether something happened more often than chance would predict
http://www-user.uni-bremen.de/~anatol/qnt/qnt_chi.html
Use with multiple choice questions, percentage of time respondents choose specific choice, more corpora or frequency data
a. Chi-square
What chi-square statistic answers: Is the distribution into categories random or not?
(Uses counts of nominal data)
For example, multiple choice questions.Jill loves the taste of coffee.
A-c[æ]fi-186 B-c[^]fi-113 C-c[a]fi-70
Is 186, 113, 70 really different from what random choice would give?
a. Chi square
To compute chi square, you need to know what is observed (the responses you got from your survey, corpus) and the expected frequencies.
To calculate expected frequencies, you add up all the observed frequencies and divide by the number of data points
Observed
Data point 1 Data point 2
Expected
a. Chi-square
(Invented) frequency of use of dude in four million word spoken corpora:
US NZ AU UK15 9 11 5
Random distribution would be:
Observed (what the actually did)
US NZ AU UK10 10 10 10
Expected (what you would expect by random chance)
15159 5
1010 10 10
US NZ AU UK
a. Chi Square
http://www.physics.csbsju.edu/stats/contingency_NROW_NCOLUMN_form.html
chi-square = 2.77 degrees of freedom = 3probability = 0.429
We want this to be large
We want this to be small
The larger the chi value and the smaller the p value the more likely that the difference between the observed and the expected did not occur by chance
a. Chi square
Practice: Is the word glistening used more often in one register (as shown in COCA) than another?
SECTION SPOKEN FICTION MAGAZINE NEWSPAPER ACADEMIC
PER MIL 0.4 12.0 2.8 2.1 0.6
SIZE (MW) 76.6 69.6 78.1 73.4 73.0
FREQ 32 833 219 156 43
To do this, you need to times each number by 10 and use only whole numbers
a. Chi Square
Results:
chi-square = 97.2 degrees of freedom = 4probability = 0.000
a. Chi square
More practice
1. Multiple choice question: Jill loves the taste of coffee.
A-c[æ]fi-186 B-c[^]fi-113 C-c[a]fi-70
did respondents choose number A more often than the other two choices?
2. Identification: American Listeners choose the following choices when asked “where is this speaker from” (he was from Birmingham UK):
London: 45% England: 25% Scotland: 25% Ireland: 5%
Chi-square Homework