tracking l2 lexical and syntactic development

52
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010

Upload: adila

Post on 07-Jan-2016

32 views

Category:

Documents


2 download

DESCRIPTION

Tracking L2 Lexical and Syntactic Development. Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010. Outline. Lexical & syntactic complexity: The what and why Syntactic complexity in EFL writing Lexical complexity in EFL speaking. 2. Lexical and Syntactic Complexity: The What and Why. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tracking L2 Lexical and Syntactic Development

Tracking L2 Lexical and Syntactic Development

Xiaofei LuCALPER 2010 Summer Workshop

July 14, 2010

Page 2: Tracking L2 Lexical and Syntactic Development

2

OutlineLexical & syntactic complexity: The what and whySyntactic complexity in EFL writingLexical complexity in EFL speaking

Page 3: Tracking L2 Lexical and Syntactic Development

Lexical and Syntactic Complexity: The What and Why

Page 4: Tracking L2 Lexical and Syntactic Development

4

What is lexical and syntactic complexityLexical complexity

A multidimensional feature of language use encompassing lexical density, sophistication and variation (Wolfe-Quintero et al. 1998; Read 2000)

Does not focus on errors, a dimension in Read’s (2000) conceptualization of lexical richness

Syntactic complexityThe range of forms that surface in language production

and the degree of sophistication of such forms (Ortega 2003)

Page 5: Tracking L2 Lexical and Syntactic Development

5

Why measure linguistic complexity?First language acquisition & psycholinguistics

Studies of L1 developmental sequenceObjective measures of L1 developmental levelOrdering experimental stimuli by complexityRelationship of complexity in childhood to symptoms of

Alzheimer’s disease (Kemper et al. 2001)

Page 6: Tracking L2 Lexical and Syntactic Development

6

Why measure linguistic complexity?Second language acquisition

Objective L2 developmental indicesAssessing cross-proficiency differencesAssessing effect of pedagogical interventionTracking L2 learners’ linguistic development over timeRelationship between lexical/syntactic complexity and

proficiency claimed in many test rating scales

Page 7: Tracking L2 Lexical and Syntactic Development

Syntactic Complexity in EFL Writing

Lu, X. (forthcoming 2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15.Lu, X. (forthcoming 2010). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 44(4).

Page 8: Tracking L2 Lexical and Syntactic Development

8

OutlineMeasures of L2 syntactic complexityL2 syntactic complexity analyzerSyntactic complexity & EFL writing developmentSummary

Page 9: Tracking L2 Lexical and Syntactic Development

9

Measures of L2 syntactic complexityMeasures reviewed in two research syntheses

Wolfe-Quintero et al. (1998)Ortega (2003)

Selection criterionAt least one previous study showed at least weak

correlation with or effect for proficiency Issues among previous studies

Variation in measure selection and definitionVariation in experiment design Inconsistent results reported on the same measures

Page 10: Tracking L2 Lexical and Syntactic Development

10

Measures of L2 syntactic complexityLength of production

1. Mean length of clause (MLC)2. Mean length of sentence (MLS)3. Mean length of T-unit (MLT)

Sentence complexity4. Mean number of clauses per sentence (C/S)

Page 11: Tracking L2 Lexical and Syntactic Development

11

Measures of L2 syntactic complexitySubordination

5. Mean number of clauses per T-unit (C/T)6. Mean number of complex T-units per T-unit (CT/T)7. Mean number of dependent clauses per clause (DC/C)8. Mean number of dependent clauses per T-unit (DC/T)

Page 12: Tracking L2 Lexical and Syntactic Development

12

Measures of L2 syntactic complexityCoordination

9. Mean number of coordinate phrases per clause (CP/C)10.Mean number of coordinate phrases per T-unit (CP/T)11.Mean number of T-units per sentence (T/S)

Particular grammatical structures12.Mean number of complex nominals per clause (CN/C)13.Mean number of complex nominals per T-unit (CN/T)14.Mean number of verb phrases per T-unit (VP/T)

Page 13: Tracking L2 Lexical and Syntactic Development

13

L2 syntactic complexity analyzerInput: plain English text Step 1: Parsing using Stanford parserStep 2: Retrieving & counting occurrences of

Words, sentences, clauses, dependent clausesT-units, complex T-unitsCoordinate phrases, complex nominals, verb phrases

Step 3: Computing ratios for the 14 measuresOutput: 14 syntactic complexity indices

Page 14: Tracking L2 Lexical and Syntactic Development

14

How counting is doneWord: all non-punctuation tokensOther units: Tregex (Levy & Andrew, 2006)

Define the units linguisticallyFormulate Tregex patterns matching the unit definitionsQuery the parse trees with the Tregex patternsRetrieve/count (sub)trees matching each pattern

Page 15: Tracking L2 Lexical and Syntactic Development

15

Definition and pattern examplesClause: subject + finite verb (Polio 1997)

‘S|SINV|SQ < (VP <# MD|VBP|VBZ|VBD)’

Dependent clause: adverbial, adjectival or nominal clause ‘SBAR < (S|SINV|SQ < (VP <# MD|VBP|VBZ|VBD))’

Page 16: Tracking L2 Lexical and Syntactic Development

16

EvaluationExperiment setup

40 essays from the Written English Corpus of Chinese Learners (Wen et al. 2005), average 315 words

Written by English majors in four-year colleges in China20 used for training, 20 for testingTwo annotators counted unit occurrences in the essays

Inter-annotator agreementEvaluated on 10 essaysF-score for unit identification: .907 (CN) - 1.000 (S)Correlations of complexity ratios: .912 (CT/T) - 1.000 (MLS)

Page 17: Tracking L2 Lexical and Syntactic Development

17

Unit identification results on test dataCounts System-annotator agreement

Structure System Manual Identical Precision Recall F-scoreS 357 357 357 1.000 1.000 1.000

C 545 558 530 .972 .950 .961

DC 170 178 161 .947 .904 .925

T 376 380 369 .981 .971 .976

CT 129 136 126 .977 .926 .951

CP 138 135 125 .906 .926 .916

CN 660 572 511 .774 .893 .830

VP 750 758 698 .931 .921 .926

Page 18: Tracking L2 Lexical and Syntactic Development

18

Correlations of complexity ratiosMeasure Development Test Measure Development Test

MLC .941 .932 DC/T .950 .941

MLS 1.000 1.000 CP/C .845 .834

MLT .989 .987 CP/T .876 .871

C/S .939 .928 T/S .931 .919

C/T .978 .961 CN/C .883 .867

CT/T .903 .892 CN/T .904 .896

Page 19: Tracking L2 Lexical and Syntactic Development

19

Error analysisAttachment and conjunction scope errors

e.g., benefit a lot from [the Internet in academic study]More reliable in identifying higher-level units: S, C, T, CT

Learner errors not a major cause for problemsAdvanced EFL learnersIdiomaticity vs. grammatical completenessSome errors do not lead to structural misanalysis

Page 20: Tracking L2 Lexical and Syntactic Development

20

Syntactic complexity & EFL writing developmentResearch questionsThe WECCL corpusResultsSummary

Page 21: Tracking L2 Lexical and Syntactic Development

21

Research questions1) Effect of sampling condition2) Measures discriminating proficiency levels3) Magnitudes for differences to be significant4) Relationships between measures5) Patterns of development for the measures

Page 22: Tracking L2 Lexical and Syntactic Development

22

The WECCL corpus

Essay length: range=[89, 892], mean=315, sd=87

School Level

Argumentation Narration Exposition All

Timed

Untimed Timed Untimed Timed Untimed

1 695 395 89 0 30 0 1209

2 441 398 246 0 28 0 1113

3 504 459 91 0 30 0 1084

4 60 0 88 0 0 0 148

All 1700 1252 514 0 88 0 3554

Page 23: Tracking L2 Lexical and Syntactic Development

23

Effect of sampling conditionInstitution: sig. inter-institution dif. for

All metrics using all data12 metrics using Y1-3 timed arg essays

Genre: sig. dif. between arg vs. nar forAll metrics using arg & nar essaysAll metrics using timed arg & nar essays13 metrics using timed arg & nar essays from ND

Page 24: Tracking L2 Lexical and Syntactic Development

24

Sampling condition effect (cont)Timing: sig. dif. between un/timed arg for

13 measures using all arg essays11 metrics using arg essays from ND

Data for other research questions422 timed argumentative essays from ND

Page 25: Tracking L2 Lexical and Syntactic Development

25

Measures discriminating levels3 showed sig. dif between first 3 levels

MLC, CN/C, and CN/T4 showed sig. dif between first 2 levels

MLS, MLT, CP/C, and CP/T5 showed sig. dif. between non-adjacent levels

C/S, C/T, CT/T, DC/C, and DC/T2 showed no sig. between-level dif.

T/S and VP/T

Page 26: Tracking L2 Lexical and Syntactic Development

26

Significant magnitudesMetric Magnitude Levels Measure Magnitude Levels

MLC .573 2-3 DC/C -.033 1-4

MLS 1.658 1-2 DC/T -.071 1-4

MLT 1.651 1-2 CP/C .040 1-2

C/S -.112 2-4 CP/T .061 1-2

C/T -.078 2-4 CN/C .133 2-3

CT/T -.043 2-4 CN/T .178 2-3

Page 27: Tracking L2 Lexical and Syntactic Development

27

Relationships between measuresStrong relationship between measures of the same type or

involving the same structureMLS and MLT show weak-moderate correlations with

subordination measuresMLC shows low-weak negative correlations with

subordination measuresLength measures show moderate-high correlations with CN

measures and weak-moderate correlations with CP measuresCN and CP measures weakly correlated with each other

Page 28: Tracking L2 Lexical and Syntactic Development

28

Developmental patternsMeasures with sig. positive changes

Linear increase Y1-4: MLC, CN/CIncrease Y1-3 (Y4=Y3): CP/CIncrease Y1-3 (Y4<Y3, insig.): MLS, MLT, CP/T, CN/T

Measures with sig. negative changesLinear decrease Y1-4: C/SNonlinear Y1<Y2>Y3>Y4: DC/C, DC/T

Page 29: Tracking L2 Lexical and Syntactic Development

29

Summary of findingsImportant to control for the effects of relevant

learner-, task- and context-related factorsSeven measures recommended for future use

CN/C, MLC: discriminates 2+ adjacent levels, linear increasesCN/T, MLS, MLT: 2 adjacent levels; positive sig changesCP/C, CP/T: nonadjacent levels, positive sig changes

Developmental prediction: complexification at the phrasal level vs. the clausal level

Page 30: Tracking L2 Lexical and Syntactic Development

30

Summary of findings (cont.)Smaller magnitudes than reported previouslyClause as a potentially more informative unit of

analysis than T-unit

Page 31: Tracking L2 Lexical and Syntactic Development

31

Limitations and future research Incorporating more measures and flexible definitions of

structures into the analyzerOther conceptualizations of proficiency levelEffect of L1 on syntactic developmentRelationship between developmental measures of fluency,

accuracy and complexity at different linguistic levels

Page 32: Tracking L2 Lexical and Syntactic Development

Lexical Complexity in EFL Speaking

Lu, X. (under review). The relationship of lexical richness to the quality of ESL speakers’ oral narratives.

Page 33: Tracking L2 Lexical and Syntactic Development

33

OutlineResearch goals and motivationMeasures of lexical complexityMethodologyResultsConclusion

Page 34: Tracking L2 Lexical and Syntactic Development

34

Research goals and motivationResearch goals

Automate lexical complexity analysis using 25 measuresEvaluate the relationship of these measures plus the D

measure to the quality of EFL speakers’ oral narrativesMotivation

Lexical complexity an important construct in L2 teaching and research

Relationship between lexical complexity and proficiency claimed in many test rating scales

Page 35: Tracking L2 Lexical and Syntactic Development

35

Measures of lexical complexityLexical complexity measures proposed in language

acquisition studies and reviewed inWolfe-Quintero et al. (1998)Read (2000)Malvern et al. (2004)

Measures of the following three dimensionsLexical densityLexical sophisticationLexical variation

Page 36: Tracking L2 Lexical and Syntactic Development

36

Lexical densityProportion of lexical words (Nlw / N) (Ure 1971)

Previous findingsLower in spoken than written texts (Halliday 1985)

Affected by various sources (O’Loughlin 1995)

Relation to L2 writing non-significant (Engber 1995)

Inconsistent definition of lexical wordsAll nouns and adjectivesAdverbs with adjective baseFull verbs (excluding modal/auxiliary verbs)

Page 37: Tracking L2 Lexical and Syntactic Development

37

Lexical sophisticationFive measures examined

LS1: Nslw / Nlw (Linnarud 1986; Hyltenstam 1988)

LS2: Ts / T (Laufer 1984)

VS1: Tsv / Nv (Harley & King 1989)

CVS1: Tsv / sqrt(2Nv) (Wolfe-Quintero et al. 1998)

VS2: Tsv2 / Nv (Chaudron & Parker 1990)

Page 38: Tracking L2 Lexical and Syntactic Development

38

Lexical sophistication (cont.)Previous findings

LS1: NS-NNS dif sig (Linnarud 1986); non-sig (Hyltenstam 1988)

LS2: sig pre-and post-essay dif (Laufer 1984)

VS1: sig NS-NNS dif (Harley & King 1989)

Varying definitions of sophistication2000-word BNC frequency list (Leech et al. 2001)

Page 39: Tracking L2 Lexical and Syntactic Development

39

Lexical variation20 measures examined4 based on NDW

NDW: Number of different wordsNDW-50: NDW in first 50 words of sampleNDW-ER50: mean NDW of 10 random 50-word subsamplesNDW-ES50: mean NDW of 10 random 50-word sequences

Page 40: Tracking L2 Lexical and Syntactic Development

40

Lexical variation (cont.)7 based on TTR for total vocabulary

Type token ratio (TTR)Mean TTR of all 50-word segments (MSTTR) LogTTR, Corrected TTR, Root TTR, UberThe D measure (McKee et al. 2000)

9 based on TTR for word classes T{LW, V, N, Adj, Adv, Mod} / Nlw

Tv / Nv, Tv2 / Nv, Tv / sqrt(2Nv )

Page 41: Tracking L2 Lexical and Syntactic Development

41

Lexical variation (cont.)Previous findings

NDW and TTR useful, but affected by sample sizeTransformations of NDW & TTR not equally usefulD claimed superior; results mixed (Jarvis 2002; Yu 2010)

Mixed results for word class TTR measuresNo consensus on a single best measure

Page 42: Tracking L2 Lexical and Syntactic Development

42

Research questionsHow does LD relate to the quality of EFL speakers’ oral

narratives?How do the LS measures compare with and relate to each

other as indices of the quality of EFL speakers’ oral narratives?How do the LV measures compare with and relate to each

other as indices of the quality of EFL speakers’ oral narratives?How do LD, LS and LV compare with and relate to each other as

indices of the quality of EFL speakers’ oral narratives?

Page 43: Tracking L2 Lexical and Syntactic Development

43

DataSpoken English Corpus of Chinese Learners (Wen et

al. 2005)

Transcripts of TEM-4 Spoken Test data in 1996-2002Task 2 data used: 3-minute oral narrativesStudents ranked within groups of 32-3512 groups of data used (1999-2002; N=32-35 each)Only rankings available, but not actual scores

Example topic (2001)Describe a teacher of yours whom you found unusual

Page 44: Tracking L2 Lexical and Syntactic Development

44

Computing the measuresPreprocessing

Part-of-speech tagging (Stanford tagger)Lemmatization (Morpha)

Measure computationD measure: vocd utility in CLANType counting: w, sw, lw, slw, v, sv, n, adj, advToken counting: w, lw, slw, vComputation of the other 25 ratios

Page 45: Tracking L2 Lexical and Syntactic Development

45

AnalysisSpearman’s rho computed for each group

X: test takes’ rankings within the groupY: Values of each of the 26 measuresMeta-analysis to combine results from the 12 groups

Students divided into 4 levels based on rankingsLevels A, B, C and DANOVA’s run to determine inter-level differences

Page 46: Tracking L2 Lexical and Syntactic Development

46

Analysis (cont.)Alpha level = .05 / 28 = .0018Identification of discriminative measures

Significant combined rho (p < .0018)Significant between-level differences with linear decreases

from Level A to Level D

Page 47: Tracking L2 Lexical and Syntactic Development

47

Lexical density and sophisticationMeasure Combined rho p-value Measure Combined rho p-value

Words .437 .000 LS2 .050 .336

W/Min .437 .000 VS1 .133 .010

LD .011 .836 CVS1 .166 .001

LS1 .048 .355 VS2 .165 .001

Page 48: Tracking L2 Lexical and Syntactic Development

48

Lexical density and sophistication (cont.)Measure A B C D F Sig.

Words 336.16 295.95 297.76 256.34 28.335 .000

W/Min 112.052 98.650 99.252 85.446 28.335 .000

LD .417 .415 .409 .414 .896 .443

LS1 .227 .235 .221 .225 .681 .564

LS2 .261 .272 .256 .260 2.736 .043

VS1 .072 .086 .067 .073 2.629 .050

CVS1 .343 .383 .299 .297 3.722 .042

VS2 .314 .401 .274 .262 2.760 .042

Page 49: Tracking L2 Lexical and Syntactic Development

49

Lexical density and sophistication (cont.)

LS1 LS2 VS1 CVS1 VS2

LS1 1.000

LS2 .637** 1.000

VS1 .456** .391** 1.000

CVS1 .414** .382** .966** 1.000

VS2 .381** .350** .909** .935** 1.000

Page 50: Tracking L2 Lexical and Syntactic Development

50

Relationships among the dimensionsLow to weak correlations among measures in

different dimensionsLexical variation demonstrated strongest

relationships to raters’ judgments of the quality of EFL speakers’ oral narratives

Page 51: Tracking L2 Lexical and Syntactic Development

51

Summary of findingsThe three dimensions posited in language acquisition

literature appear different constructsNo/small effect for lexical density/sophistication foundLexical variation correlated strongly with quality9 LV measures recommendedNDW correlates strongly with length, but worth considering in

the case of timed oral narrativesTransformed TTR measures perform better than the original

TTR measures

Page 52: Tracking L2 Lexical and Syntactic Development

52

Limitations and future researchA factor analysis will show patterns of relationshipsNo scores available, so not possible to run regression modelsDivision of students into 4 levels could be problematicReplication using EFL writing data and other

conceptualizations of proficiencyEffects of task-related variablesRelations among factors determining quality