elang 273: statistics

26
Elang 273: Statistics September 15, 2008

Upload: ivy-cervantes

Post on 31-Dec-2015

41 views

Category:

Documents


0 download

DESCRIPTION

Elang 273: Statistics. September 15, 2008. Statistics. The scientific method is defined by: 1. The research question is empirical 2. The data we collect is public 3. The data is falsifiable.  But also with this.  Statistics helps most with this. Statistics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Elang 273: Statistics

Elang 273: Statistics

September 15, 2008

Page 2: Elang 273: Statistics

Statistics

The scientific method is defined by: 1. The research question is empirical

2. The data we collect is public

3. The data is falsifiable Statistics helps most with this

But also with this

Page 3: Elang 273: Statistics

Statistics

Research question: Is the word glistening used more often in one register (as shown in COCA) than another?

SECTION SPOKEN FICTION MAGAZINE NEWSPAPER ACADEMIC

PER MIL 0.4 12.0 2.8 2.1 0.6

SIZE (MW) 76.6 69.6 78.1 73.4 73.0

FREQ 32 833 219 156 43

How much different do these frequencies have to be before we can say they are different?

Page 4: Elang 273: Statistics

Statistics

Researchers have agreed that if the chance that the difference between two groups is greater than a certain percentage, then we will consider the difference to be statistically significant.

A significant difference is better than one in twenty of happening by chance (p < .05). The opposite of significance is random chance.

Page 5: Elang 273: Statistics

Two types of statistics

1. Descriptivea. nominal (categorical)b. ordinal (rank order)c. continuous

2. Inferentiala. chi-squareb. t-tests/ANOVAc. correlationsd. varbrul

Page 6: Elang 273: Statistics

1. Descriptive Statistics

These are the types of statistics you are familiar with—showing means, percentages, quartiles, usually through bars, pie charts, and graphs

0

3

6

9

12

15

spoken fiction mag news academic

Page 7: Elang 273: Statistics

1. Descriptive Statistics

Three types of data

1. Nominal (Categorical): sex, race, national origin, native speaker, how often you choose one thing over another, how often a word occurs in one register versus another

2. Continuous: height, weight, age, scores on a language test, IQ, working memory span

3. Ordinal (Rank Order): No fixed interval (first, second, third place in a race)—what order people choose their favorite dialect

Page 8: Elang 273: Statistics

1. Descriptive Statistics

How could you depict the data for each of these types?

1. Nominal

2. Continuous

3. Ordinal (rank order)

Page 9: Elang 273: Statistics

1. Nominal (Categorical)

Birmingham

London

general England

other England

other UK

outside UK

Received Pronunciation

London

general England

other England

outside England

West Yorkshire

Scotland

Ireland

general England

other UK

outside England

Answers to “Where is this speaker from?” (native listeners)

Page 10: Elang 273: Statistics

1. Nominal (Categorical)

61

90

51

41

32

75

59

8

92

0

10

20

30

40

50

60

70

80

90

100

Australia England India Ireland Kenya New York Scotland South Africa Southern US

correct dialect identification by American English speakers

Page 11: Elang 273: Statistics

2. Continuous

0

1

2

3

4

5 UtahNon-Utahs

Page 12: Elang 273: Statistics

Native listeners: status vs. solidarity

0

1

2

3

4

5

6

7

RP Brimingham Netw ork New York WY Alabama

solidarity

status

StatusRPBirminghamNetworkNYCWest YorkshireAlabama

SolidarityRPBirminghamNetworkNew YorkWest YorkshireAlabama

Page 13: Elang 273: Statistics

3. Ordinal (Rank Order)

Coupland & Bishop, 2007

Page 14: Elang 273: Statistics

2. Inferential Statistics

a. Chi square

b. ANOVA/t-test

c. Correlations (rank order correlations)

d. Logical regression

e. Varbrul

Page 15: Elang 273: Statistics

2. Inferential Statistics

For each type of statistics we need to know

1. Statistical value (chi value, F statistic, t statistic)

2. Probability value (p value)

3. Degrees of Freedom (df)

Page 16: Elang 273: Statistics

2. Inferential Statistics

Research question: Is the word glistening used more often in one register (as shown in COCA) than another?

SECTION SPOKEN FICTION MAGAZINE NEWSPAPER ACADEMIC

PER MIL 0.4 12.0 2.8 2.1 0.6

SIZE (MW) 76.6 69.6 78.1 73.4 73.0

FREQ 32 833 219 156 43

Page 17: Elang 273: Statistics

2. Inferential Statistics

Research question: Is the word glistening used more often in one register (as shown in COCA) than another?

What kind of data is this? Nominal (categorical)

For this kind of data we use a chi square

Page 18: Elang 273: Statistics

a. Chi-square

Tells us whether something happened more often than chance would predict

http://www-user.uni-bremen.de/~anatol/qnt/qnt_chi.html

Use with multiple choice questions, percentage of time respondents choose specific choice, more corpora or frequency data

Page 19: Elang 273: Statistics

a. Chi-square

What chi-square statistic answers: Is the distribution into categories random or not?

(Uses counts of nominal data)

For example, multiple choice questions.Jill loves the taste of coffee.

A-c[æ]fi-186 B-c[^]fi-113 C-c[a]fi-70

Is 186, 113, 70 really different from what random choice would give?

Page 20: Elang 273: Statistics

a. Chi square

To compute chi square, you need to know what is observed (the responses you got from your survey, corpus) and the expected frequencies.

To calculate expected frequencies, you add up all the observed frequencies and divide by the number of data points

Observed

Data point 1 Data point 2

Expected

Page 21: Elang 273: Statistics

a. Chi-square

(Invented) frequency of use of dude in four million word spoken corpora:

US NZ AU UK15 9 11 5

Random distribution would be:

Observed (what the actually did)

US NZ AU UK10 10 10 10

Expected (what you would expect by random chance)

15159 5

1010 10 10

US NZ AU UK

Page 22: Elang 273: Statistics

a. Chi Square

http://www.physics.csbsju.edu/stats/contingency_NROW_NCOLUMN_form.html

chi-square = 2.77 degrees of freedom = 3probability = 0.429

We want this to be large

We want this to be small

The larger the chi value and the smaller the p value the more likely that the difference between the observed and the expected did not occur by chance

Page 23: Elang 273: Statistics

a. Chi square

Practice: Is the word glistening used more often in one register (as shown in COCA) than another?

SECTION SPOKEN FICTION MAGAZINE NEWSPAPER ACADEMIC

PER MIL 0.4 12.0 2.8 2.1 0.6

SIZE (MW) 76.6 69.6 78.1 73.4 73.0

FREQ 32 833 219 156 43

To do this, you need to times each number by 10 and use only whole numbers

Page 24: Elang 273: Statistics

a. Chi Square

Results:

chi-square = 97.2 degrees of freedom = 4probability = 0.000

Page 25: Elang 273: Statistics

a. Chi square

More practice

1. Multiple choice question: Jill loves the taste of coffee.

A-c[æ]fi-186 B-c[^]fi-113 C-c[a]fi-70

did respondents choose number A more often than the other two choices?

2. Identification: American Listeners choose the following choices when asked “where is this speaker from” (he was from Birmingham UK):

London: 45% England: 25% Scotland: 25% Ireland: 5%

Page 26: Elang 273: Statistics

Chi-square Homework