«a chi- square test showed that ...» – or did it really ?

27
«A chi-square test showed that...» – or did it really? Bård Uri Jensen http://privat.hihm.no/buj/ [email protected]

Upload: melina

Post on 23-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

«A chi- square test showed that ...» – or did it really ?. Bård Uri Jensen http://privat.hihm.no/buj/ [email protected]. Allowing [ statistical software ] to do our thinking is a sure recipe for disaster . ( Good & Hardin , 2012, p. xi). «Simple» statistical tests. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: «A chi- square  test  showed that ...»  – or  did  it  really ?

«A chi-square test showed that...» – or did it really?

Bård Uri Jensenhttp://privat.hihm.no/buj/

[email protected]

Page 2: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Allowing [statistical software] to do our thinking is a sure recipe for disaster.

(Good & Hardin, 2012, p. xi)

Page 3: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

«Simple» statistical tests

• chi-square (X 2) test• t-test

Page 4: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Statistical hypothesis testing

1. Formulate a hypothesis E.g. In Norwegian L2, Vietnamese have more TENSE errors than

Somali.2. Formulate a null-hypothesis

Vietnamese and Somalis have the same rate of TENSE errors.3. «Disprove» the null-hypothesis = demonstrate its

unlikelihood E.g. less than 5% chance for the null-hypothesis to be true = «Significance»

• We choose α according to what we consider an acceptable risk of false conclusions Often 5% in linguistic research

Page 5: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Conditions of use

• Independent observations chi-square test t-test

• Parametric assumptions t-test

• The dangers of repeated testing any test

Page 6: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

A simple example from ornithology

Page 7: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

A simple example from ornithology

Page 8: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

A simple example from ornithology

Page 9: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

A simple example from ornithology

Page 10: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

A simple example from corpus linguistics

Page 11: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

A simple example from corpus linguistics

• The observations should be independent.• An important condition of use for

chi-squared test t-test

The observations should be of different individuals.

«Chi-square is a much-abused test in second language research studies, and often one of its assumptions (that of independence of data) is violated as a matter of course.»

Larson-Hall (2010, p.206)

Page 12: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Example 1: Chi-squared test, non-independent observations• Blom & Paradis 2013

Journal of Speech, Language, and Hearing Research On past tense production in L2 children with language impairment

• 48 children with English as L2• Overregularization of past tense

Hypothesis: Less common in verb stems ending in /d/ or /t/

• X 2 (1) = 3.45, p (one-sided) = 0.032• Problem: n = 85 + 140, N = 48 • Observations are not independent, so the result is invalid.

overregularization zero marking

d# or t# 16 69

others 42 98

Page 13: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Example 1: Chi-squared test, non-independent observations• Solution A:

Pick just one observation from each author/speaker

• “To exclude the author as one more relevant factor, the database was cleaned so that there is only one example for each verb from any single author.”

Sokolova 2012, p. 94

Page 14: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Example 1: Chi-squared test, non-independent observations• Solution A:

Pick just one observation from each author/speaker Sokolova 2012

• Solution B: Calculate average values for each informant Use the average values as independent observations Test significance with an appropriate test, e.g. t-test or U-test Gujord 2013

• Both these solutions might require a larger corpus!

• «Solution» C: Alter the research question Danckaert 2011

Page 15: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Example 1: Chi-squared test, non-independent observations• Solution B:

Page 16: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Example 2:T-test, non-independent observations• Klavan 2012

PhD thesis from Tartu University Investigation of adposition ‘peal’ and adessive case

• 450 observations of each, from 2 corpora

• t = 8.02, p < 0.001• Conclusion: adessive phrases are longer than ‘peal’-

phrases• Problem: Observations are not independent.• The conclusion is invalid.

Page 17: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Page 18: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Example 3: T-test, non-normal populations• Hunter (2011, s. 48)

PhD thesis from Birmingham University On grammaticality judgements by L2 students

• Conclusion:• the accuracy (max. = 1) for the teacher group (M = .98, SD

= .14) was significantly higher than the student group (M = .64, SD = .49), t(1) = 4.9, p < .001.

• Problem: Mean = 0.98, Maximum value = 1 Standard deviation= 0.14

• The distribution cannot possibly be normal.• The result is invalid.

Page 19: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

0,0 0,5 1,0 1,5

0,0

0,5

1,0

1,5

2,0

2,5

Page 20: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Example 4Repeated testing• Leedham 2011

PhD thesis, The Open University Features in the writing of Chinese students in UK universities

• Conclusion:• There are differences in frequencies of certain phrases

between 3rd year students and younger students

• Problem:• Repeated testing without adjusting the probability values• Some of the results are not valid.

Page 21: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

CV CV

Page 22: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Moral

There are no simple tests.

1. You should understand the conditions of the test.2. You should take the conditions into account.3. You should document properly

how you perform the test, what numbers you put into it, how the conditions are met.

«A chi-square test showed that the difference is significant.»

Page 23: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Is it really that important?

• «[C]ompared to other social sciences (e.g., psychology, communication, sociology, anthropology, …) or branches of linguistics (e.g., psycholinguistics, phonetics, sociolinguistics…), most of corpus linguistics has paradoxically only begun to develop this methodological awareness.»

Gries (forthcoming, p.1)

Page 24: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Is it really that important?

• «It has become increasingly apparent over a period of several years that psychologists, taken in the aggregate, employ the chi-square test incorrectly.»

Lewis and Burke (1949)

Page 25: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

Whose responsibility is it?

Page 26: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

«Corpus linguistics needs to ‘catch up’ [...]»

Gries (forthcoming, p.1)

Page 27: «A chi- square  test  showed that ...»  – or  did  it  really ?

- or did it really?

References (http://privat.hihm.no/buj) Boneau, A. C. (1960). The effects of violations of assumptions underlying the t test. Psychological Bulletin, 57(1), 49-64. Good, P.I. & Hardin, J.W. (2012). Common errors in statistics (and how to avoid them). Hoboken: John Wiley.Gries, S (forthcoming). Quantitative designs and statistical techniques. http://www.linguistics.ucsb.edu/faculty/stgries/research/InProgr_STG_QuantDesAndMethCorpLing_CUPHb.pdf Larson-Hall, J. (2010). A Guide to Doing Statistics in Second Language Research Using SPSS. New York: Routledge.Lewis, D., & Burke, C. J. (1949). The use and misuse of the chi-square test. Psychological Bulletin, 46(6), 433-489.

Blom & Paradis (2013). Past Tense Production by English Second Language Learners With and Without Language Impairment. In Journal of Speech, Language, and Hearing Research. 56, 281-294.Danckaert, L. (2011). On the left periphery of Latin embedded clauses. Ph.D. thesis. University of Gent.Gujord, A.H. (2013). Grammatical encoding of past time in L2 Norwegian : The roles of L1 influence and verb semantics. Ph.D. thesis. University of Bergen.Hunter, J.D. (2011). A multi-method investigation of the effectiveness and utility of delayed corrective feedback in second-language oral production. Ph.D. thesis. University of Birmingham.Klavan, j. (2012). Evidence in linguistics : corpus-linguistic and experimental methods for studying grammaticalsynonymy. Ph.D. thesis. University of Tartu.Leedham, M. (2011). A corpus-driven study of features of Chinese students’ undergraduate writing in UK universities. Ph.D. thesis. The Open University.Sokolova, S. (2012). Asymmetries in Linguistic Construal : Russian Prefixes and the Locative Alternation. Ph.D. thesis. University of Tromsø.