multifactorial exploratory approaches: fundamentals

44
HAL Id: halshs-02908471 https://halshs.archives-ouvertes.fr/halshs-02908471 Submitted on 29 Jul 2020 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Multifactorial Exploratory Approaches: fundamentals Guillaume Desagulier To cite this version: Guillaume Desagulier. Multifactorial Exploratory Approaches: fundamentals. École thématique. United Kingdom. 2019. halshs-02908471

Upload: others

Post on 22-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multifactorial Exploratory Approaches: fundamentals

HAL Id: halshs-02908471https://halshs.archives-ouvertes.fr/halshs-02908471

Submitted on 29 Jul 2020

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Multifactorial Exploratory Approaches: fundamentalsGuillaume Desagulier

To cite this version:Guillaume Desagulier. Multifactorial Exploratory Approaches: fundamentals. École thématique.United Kingdom. 2019. �halshs-02908471�

Page 2: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

Multifactorial Exploratory Approachesfundamentals

Guillaume Desagulier1

1MoDyCo (UMR 7114)Paris 8, CNRS, Paris NanterreInstitut Universitaire de [email protected]

Corpus Linguistics Summer School 2019June 24th, 2019

University of Birmingham

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 3: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

outline

1 introduction

2 terminology

3 commonalities

4 differences

5 exploring not predicting

6 further reading

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 4: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

an example from home

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 5: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

rotation

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 6: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

rotation

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 7: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

rotation

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 8: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

reduction

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 9: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

a more quantitative examplePremier League 2015–2016, final table

team ranking points wins draws losses goals for goals against statusLeicester 1 81 23 12 3 68 36 champions leagueArsenal 2 71 20 11 7 65 36 champions leagueTottenham 3 70 19 13 6 69 35 champions leagueMan City 4 66 19 9 10 71 41 champions leagueMan United 5 66 19 9 10 49 35 europa leagueSouthampton 6 63 18 9 11 59 41 europa leagueWest Ham 7 62 16 14 8 65 51 safeLiverpool 8 60 16 12 10 63 50 safeStoke City 9 51 14 9 15 41 55 safeChelsea 10 50 12 14 12 59 53 safeEverton 11 47 11 14 13 59 55 safeSwansea 12 47 12 11 15 42 52 safeWatford 13 45 12 9 17 40 50 safeWest Brom 14 43 10 13 15 34 48 safeCrystal Palace 15 42 11 9 18 39 51 safeBournemouth 16 42 11 9 18 45 67 safeSunderland 17 39 9 12 17 48 62 safeNewcastle 18 37 9 10 19 44 65 relagation zoneNorwich 19 34 9 7 22 39 67 relagation zoneAston Villa 20 17 3 8 27 27 76 relagation zone

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 10: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

a more quantitative examplePremier League 2015–2016, visualisation

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

graphe des variables

Dim 1 (72.75%)

Dim

2 (2

7.25

%)

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

victoires

nuls

défaites

classement

points

buts_pour

buts_contre

-4 -2 0 2

-4-2

02

4

couleur : statut

Dim 1 (72.75%)

Dim

2 (2

7.25

%)

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

LeicesterArsenal

Tottenham

Man_CityMan_United

Southampton

West_Ham

Liverpool

Stoke_City

Chelsea

Everton

Swansea

Watford

West_Brom

Crystal_PalaceBournemouth

Sunderland

Newcastle

Norwich

Aston_Villa

champions_league

europa_league

non_relegable

relegable

champions_leagueeuropa_leaguenon_relegablerelegable

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 11: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

looking for patterns in the data

Once corpus linguists have collected sizeable amounts of observations,they look for patterns in the data. When the data set is too large, itbecomes impossible to summarize the table with the naked eye andsummary statistics are needed. This is where exploratory data analysissteps in.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 12: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

exploring a data set

Exploring a data set means separating meaningful trends from the noise(i.e. “random” distributions)1

1Even though “language is never, ever, ever random” (Kilgarriff 2005)Multifactorial Exploratory Approaches Guillaume Desagulier

Page 13: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

generating hypotheses

In theory, exploratory data analysis is used to generate hypothesesbecause the linguist does not yet have any assumption as to what kindsof trends should appear in the data.In practice, however, linguists collect observations in the light of specificvariables precisely because they expect that the latter influence thedistribution of the former.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 14: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

the empirical cycleask a theory-

informedquestion

theorypreviousworks

formulatehypotheses

operationalizehypotheses

gathercorpus data

testhypotheses

updatehypotheses

informback theory

quantifyyour observations

usestatistics

interpretthe results

are theresults

significant?no

yes

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 15: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

multifactorial, multivariate, and multidimensional

When a linguistic phenomenon is influenced by several factors at thesame time, its analysis is multifactorial. Experience tells us that,arguably, just about anything in language is multifactorial.

Bresnan et al. (2007)

The dative alternation in English is influenced by several factors, such as:• the meaning of the verb• the length/animacy/definiteness/pronominality/accessibility of the

recipient/theme• the realization of the recipient• etc.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 16: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

multifactorial, multivariate, and multidimensional

Once operationalized by the linguist, these multiple factors are capturedby means of several independent variables. When observations of thelinguistic phenomenon are captured by several variables, the analysis ismultivariate.

Bresnan et al. (2007)

Each of the 3263 observations in the dative data set is described by 15variables

> install.packages("languageR")> library(languageR)> str(dative)

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 17: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

terminologymultifactorial, multivariate, and multidimensional

We are now entering the world of data tables from a maths/statsviewpoint! The analysis becomes (multi)dimensional when the complextable is decomposed into meaningful dimensions.The dimensions can be:• explicit• implicit

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 18: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

terminologyexplicit dimensions

Table 1: a word-word cooccurrence matrix

different quite thing writing natural place sort become men . . .

different 535 491 43 3 0 8 21 2 5 . . .quite 491 3048 102 4 17 37 30 19 23 . . .thing 43 102 176 1 1 0 5 2 0 . . .writing 3 4 1 11 0 0 0 0 0 . . .natural 0 17 1 0 24 1 1 0 0 . . .place 8 37 0 0 1 88 2 0 0 . . .sort 21 30 5 0 1 2 75 0 2 . . .become 2 19 2 0 0 0 0 36 1 . . .men 5 23 0 0 0 0 2 1 38 . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 19: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

terminologyimplicit dimensions

Table 2: a word-vector matrix of some adjectives from the BNC (snapshot)

adjectives V1 V2 V3 V4 V5 V6 . . .gloomy -0.36405 -0.44487 -0.33327 -0.16695 -0.52404 0.31066 . . .sacred 0.60337 -0.20526 -0.042822 -0.33008 -0.68957 0.26654 . . .jaundiced -0.32168 -0.58319 -0.34614 -0.12474 0.10368 0.1733 . . .loud 0.24615 -0.24904 -0.18212 -0.14834 -0.06532 -0.3393 . . .memorable 0.30206 0.20307 0.062304 0.66816 0.048326 0.034361 . . .justified -0.080959 -0.23694 -0.43372 -0.31442 -0.31528 0.0057226 . . .scant -0.14467 -0.29329 0.10832 -0.11123 -0.57925 -0.27022 . . .continuous -0.15253 -0.082764 -0.40871 -0.53719 0.0822 -0.31482 . . .imposing 0.32043 0.155 -0.10547 -0.23157 -0.35657 -0.097553 . . .weighty 0.085281 0.015087 0.58454 0.0094917 -0.082617 0.36811 . . .. . . . . . . . . . . . . . . . . . . . . . . .

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 20: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

multifactorial exploratory methods

• multifactorial exploratory methods are used to summarize suchcomplex tables

• the tables are complex objects of which we want to get a syntheticview.

• we do it by finding clusters in the data. This is no trivial taskbecause we need to make sure the clusters are valid.

• the reward is a graphic representation of the data set in terms ofneat clusters.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 21: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

5 exploratory methodsthe first four are based on eigenvalue decomposition• correspondence analysis (CA)• multiple correspondence analysis (MCA)• principal component analysis (PCA)• exploratory factor analysis (EFA)

the fifth one is more recent• t-SNE

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 22: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

you do not compile a data set randomly

what does exploratory mean? (again)

the linguist makes no assumption as to what kinds of groupings are to befound in the data. In practice, however, you compile a table of databecause you expect to find meaningful groupings. Therefore, if you findno meaningful grouping, this is because your rows and your columns areindependent. Chances are that you might want to rethink the design ofyour study, especially your choice of explanatory variables.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 23: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

goal

• we seek to explore a cloud of points from a data set in the form of arows × columns table with as many dimensions as there are columns.

• like a complex object in real life, a data table has to be rotated so asto be observed from an optimal angle

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 24: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

dimensions

• although the dimensions of a data table are eventually projected in atwo-dimensional plane, they are not spatial dimensions

• if the table has K columns, the data points are initially positioned ina space R of K dimensions

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 25: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

dimensionality reduction

• to allow for easier interpretation, dimensionality-reduction methodsdecompose the cloud into a smaller number of meaningful planes.

• the methods covered in this course summarize the table bymeasuring how much variance there is and decomposing the varianceinto proportions.

• these proportions are eigenvalues in CA, MCA, and PCA. They areloadings in EFA (and a special kind of PCA not covered in thiscourse).2

2See baayen2008analyzing.Multifactorial Exploratory Approaches Guillaume Desagulier

Page 26: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

graphic summary

• all four methods offer graphs that facilitate the interpretation of theresults

• although convenient, these graphs do not replace a carefulinterpretation of the numerical results.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 27: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

differences

The main difference between these methods pertain mainly to the kind ofdata that one works with

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 28: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

CA

• CA takes as input a contingency table, i.e. a table thatcross-classifies observations on a number of categorical variables

• entries in each cell are integers, namely the number of times thatobservations (in the rows) are seen in the context of the variables (inthe columns).

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 29: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

CA

Table 3 is an example of a contingency table. It displays the frequencycounts of four types of nouns (rows) across three corpus files from theBNC-XML (columns).

Table 3: An example of a contingency table

A1J.xml A1K.xml A1L.xml row totalsNN0 136 14 8 158NN1 2236 354 263 2853NN2 952 87 139 1178NP0 723 117 71 911column totals 4047 572 481 5100

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 30: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

MCA

• MCA takes as input a case-by-variable table such as Table 4.• the table consists of i individuals or observations (rows) and j

variables (columns).

Table 4: A sample input table for MCA (Desagulier 2017, p. 36)

corpus file mode genre exact match intensifier syntax adjectiveKBF.xml spoken conv a quite ferocious mess quite preadjectival ferociousAT1.xml written biography quite a flirty person quite predeterminer flirtyA7F.xml written misc a rather anonymous name rather preadjectival anonymousECD.xml written commerce a rather precarious foothold rather preadjectival precariousB2E.xml written biography quite a restless night quite predeterminer restlessAM4.xml written misc a rather different turn rather preadjectival differentF85.xml spoken unclassified a rather younger age rather preadjectival youngerJ3X.xml spoken unclassified quite a long time quite predeterminer longKBK.xml spoken conv quite a leading light quite predeterminer leading

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 31: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

MCA

• historically, MCA was developed to explore the structure of surveysin which informants are asked to select an answer from a list ofsuggestions.

• for example, the question “According to you, which of thesedisciplines best describe the hard sciences: physics, biology,mathematics, computer science, or statistics?” requires informantsto select one category.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 32: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

PCA

• PCA takes as input a table of data of i individuals or observations(rows) and j variables (columns)

• the method handles continuous and nominal data• the continuous data may consist of means, reaction times, formant

frequencies, etc.• the categorical/nominal data are used to tag the observations

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 33: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

PCA

Table 6 is a table of 6 kinds of mean frequency counts further describedby 3 kinds of nominal information.

Table 5: A sample data frame (Lacheret-Dujour et al. 2019)

corpus sample fPauses fOverlaps fFiller fProm fPI fPA subgenre interactivity planning typeD0001 0.26 0.12 0.14 1.79 0.28 1.54 argumentation interactive semi-spontaneousD0002 0.42 0.11 0.10 1.80 0.33 1.75 argumentation interactive semi-spontaneousD0003 0.35 0.10 0.03 1.93 0.34 1.76 description semi-interactive spontaneousD0004 0.28 0.11 0.12 2.29 0.30 1.79 description interactive semi-spontaneousD0005 0.29 0.07 0.23 1.91 0.22 1.69 description semi-interactive spontaneousD0006 0.47 0.05 0.26 1.86 0.44 1.94 argumentation interactive semi-spontaneous. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 34: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

EFA

Like PCA, EFA takes as input a table of continuous data. However, itdoes not commonly accommodate nominal data. Typically, Table 6minus the 3 columns of nominal data can serve as input for EFA.

Table 6: A sample data frame (Lacheret-Dujour et al. 2019)

corpus sample fPauses fOverlaps fFiller fProm fPI fPAD0001 0.26 0.12 0.14 1.79 0.28 1.54D0002 0.42 0.11 0.10 1.80 0.33 1.75D0003 0.35 0.10 0.03 1.93 0.34 1.76D0004 0.28 0.11 0.12 2.29 0.30 1.79D0005 0.29 0.07 0.23 1.91 0.22 1.69D0006 0.47 0.05 0.26 1.86 0.44 1.94. . . . . . . . . . . . . . . . . . . . .

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 35: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

exploring is not predicting

The methods presented in this course are exploratory, as opposed toexplanatory or predictive. They help find structure in multivariate datathanks to observation groupings. The conclusions made with thesemethods are therefore valid for the corpus only.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 36: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

exploring is not predicting

For example, we shall see that middle-class female speakers aged 25 to59 display a preference for the use of bloody in the British NationalCorpus. This finding should not be extended to British English in general.Indeed, we may well observe different tendencies in another corpus ofBritish English.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 37: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

exploring is not predicting

Neither should the conclusions made with exploratory methods be usedto make predictions. Of course, exploratory methods serve as the basisfor the design of predictive modeling, which uses the values found in asample to predict values for another sample.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 38: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

exploring is not predicting

Glynn (2014)

Expanding on Gries (2006), Glynn (2014) finds that usage features anddictionary senses are correlated with dialect and register thanks to twoexploratory multivariate techniques (correspondence analysis and multiplecorrespondence analysis). To confirm these findings, Glynn (ibid.) turnsto logistic regression. This confirmatory multivariate technique allowshim to specify which of the usage features and dictionary senses aresignificantly associated with either dialect or register, and determine theimportance of the associations.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 39: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

so why explore after all?

Nowadays, many linguists jump to powerful predictive methods (such aslogistic regression or discriminant analysis) without going through thetrouble of exploring their data sets first.This is a shame because the point of running a multifactorial exploratoryanalysis is to generate fine research hypotheses, which the far morepowerful predictive methods can only benefit from.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 40: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

Around the word

https://corpling.hypotheses.org/

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 41: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

Corpus Linguistics and Statistics with R springer.com

1st ed. 2017, XIII, 353 p. 98 illus., 55

illus. in color.

Printed book

Hardcover

79,99 € | £69.99 | $99.9985,59 € (D) | 87,99 € (A) | CHF [1]

88,00

eBook

67,82 € | £55.99 | $79.9967,82 € (D) | 67,82 € (A) | CHF [2]

70,00

Available from your library or

springer.com/shop

MyCopy [3]

Printed eBook for just

€ | $ 24.99springer.com/mycopy

Guillaume Desagulier

Corpus Linguistics and

Statistics with R

Introduction to Quantitative Methods in Linguistics

Series: Quantitative Methods in the Humanities and Social Sciences

Accessible introduction to quantitative methods for linguistics with emphasis

on learning the methods and then applying them with R

Includes downloadable supplementary materials for readers

Suitable for advanced undergraduate courses, graduate courses, and self-

study

This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides

both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-

on, step-by-step instructions to implement the techniques in the field. The statistical

methodology and R-based coding from this book teach readers the basic and then more

advanced skills to work with large data sets in their linguistics research and studies. Massive

data sets are now more than ever the basis for work that ranges from usage-based linguistics

to the far reaches of applied linguistics. This book presents much of the methodology in a

corpus-based approach. However, the corpus-based methods in this book are also essential

components of recent developments in sociolinguistics, historical linguistics, computational

linguistics, and psycholinguistics. Material from the book will also be appealing to researchers

in digital humanities and the many non-linguistic fields that use textual data analysis and text-

based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and

clustering methods. Case studies illustrate each chapter with accompanying data sets, R code,

and exercises for use by readers. This book may be used in advanced undergraduate courses,

graduate courses, and self-study.

Lifelong 40% discount for authors

Part of

Order online at springer.com / or for the Americas call (toll free) 1-800-SPRINGER /

or email us at: [email protected]. / For outside the Americas call +49 (0) 6221-345-4301 /

or email us at: [email protected].

The first € price and the £ and $ price are net prices, subject to local VAT. Prices indicated with [1] include VAT for

books; the €(D) includes 7% for Germany, the €(A) includes 10% for Austria. Prices indicated with [2] include VAT for

electronic products; 19% for Germany, 20% for Austria. All prices exclusive of carriage charges. Prices and other

details are subject to change without notice. All errors and omissions excepted. [3] No discount for MyCopy.

chapter 10 – (Desagulier 2017)

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 42: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

Practical Handbook of Corpus Linguistics

Guillaume Desagulier (to appear). “Multifactorial exploratoryapproaches.” In: Practical Handbook of Corpus Linguistics. Ed. byMagali Paquot and Stefan Thomas Gries. New York: Springer

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 43: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

Bibliography I

Bresnan, Joan et al. (2007). “Predicting the dative alternation.” In:Cognitive Foundations of Interpretation, pp. 69–94.

Desagulier, Guillaume (to appear). “Multifactorial exploratoryapproaches.” In: Practical Handbook of Corpus Linguistics. Ed. byMagali Paquot and Stefan Thomas Gries. New York: Springer.

– (2017). “Clustering Methods.” In: Corpus Linguistics and Statisticswith R. New York, NY: Springer, pp. 239–294.

Glynn, Dylan (2014). “The many uses of run.” In: Corpus Methods forSemantics: Quantitative Studies in Polysemy and Synonymy. Ed. byDylan Glynn and Justyna A. Robinson. Vol. 43. Human CognitiveProcessing. John Benjamins, pp. 117–144.

Gries, Stefan Th (2006). “Corpus-based methods and cognitive semantics:The many senses of to run.” In: Corpora in Cognitive Linguistics:Corpus-based Approaches to Syntax and Lexis. Ed. by Stefan Th Griesand Anatol Stefanowitsch. Mouton de Gruyter, pp. 57–99.

Multifactorial Exploratory Approaches Guillaume Desagulier

Page 44: Multifactorial Exploratory Approaches: fundamentals

outline introduction terminology commonalities differences exploring not predicting further reading References

Bibliography II

Kilgarriff, Adam (2005). “Language is never, ever, ever, random.” In:Corpus Linguistics and Linguistic Theory 1.2, pp. 263–276. URL:http://dx.doi.org/10.1515/cllt.2005.1.2.263.

Lacheret-Dujour, Anne et al. (2019). “The distribution of prosodicfeatures in the Rhapsodie corpus.” In: Rhapsodie: A prosodic andsyntactic treebank for spoken French. Ed. by Anne Lacheret-Dujourand Sylvain Kahane. Studies in Corpus Linguistics 89. John Benjamins.Chap. 17, pp. 315–338.

Multifactorial Exploratory Approaches Guillaume Desagulier