automating second language acquisition research

53
Introdution Dataset System Case study Conclusions Automating Second Language Acquisition Research: Integrating Information Visualisation and Machine Learning Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou University of Cambridge Visualisation of Linguistic Patterns EACL 2012 Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

Upload: others

Post on 17-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

IntrodutionDatasetSystem

Case studyConclusions

Automating Second Language AcquisitionResearch: Integrating Information Visualisation

and Machine Learning

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou

University of Cambridge

Visualisation of Linguistic Patterns

EACL 2012

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Outline

1 Introdution

2 Dataset

3 System

4 Case study

5 Conclusions

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Introduction

Common European Framework of Reference for Languages (CEFR)

International benchmark of language attainment at different stages oflearning

English Profile (EP) research programme

Enhance the learning, teaching and assessment of English as anadditional language

Reference level descriptions of the language abilities expected at eachlearning stage

Goal

Understand the linguistic abilities that characterise different levels ofattainment and, more generally, developmental aspects of learnergrammars

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Introduction

Common European Framework of Reference for Languages (CEFR)

International benchmark of language attainment at different stages oflearning

English Profile (EP) research programme

Enhance the learning, teaching and assessment of English as anadditional language

Reference level descriptions of the language abilities expected at eachlearning stage

Goal

Understand the linguistic abilities that characterise different levels ofattainment and, more generally, developmental aspects of learnergrammars

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Introduction

Common European Framework of Reference for Languages (CEFR)

Divides learners into three broad divisions:

A Basic User

A1 Breakthrough or beginnerA2 Waystage or elementary

B Independent User

B1 Threshold or intermediateB2 Vantage or upper intermediate

(e.g., can produce clear, detailed text on a wide range of subjects

and explain a viewpoint on a topical issue giving the advantages and

disadvantages of various options)

C Proficient User

C1 Effective Operational Proficiency or advancedC2 Mastery or proficiency

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Introduction

Common European Framework of Reference for Languages (CEFR)

International benchmark of language attainment at different stages oflearning

English Profile (EP) research programme

Enhance the learning, teaching and assessment of English as anadditional language

Reference level descriptions of the language abilities expected ateach learning stage

Goal

Understand the linguistic abilities that characterise different levels ofattainment and, more generally, developmental aspects of learnergrammars

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Introduction

Common European Framework of Reference for Languages (CEFR)

International benchmark of language attainment at different stages oflearning

English Profile (EP) research programme

Enhance the learning, teaching and assessment of English as anadditional language

Reference level descriptions of the language abilities expected at eachlearning stage

Goal

Understand the linguistic abilities that characterise different levels ofattainment and, more generally, developmental aspects of learnergrammars

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Theory-driven approach

Approach

Theory-driven approach

Linguistic intuition

Literature on learner English

Hypotheses that are well understood

Target language determiner systems cause problems forlearners whose native language doesn’t utilise determiners

Risks ’finding the obvious’

Large-scale databases

How can we extract data efficiently and reliably to evaluatelinguistic hypotheses?How can we make ”observations” or extract patterns that maylead to new hypotheses?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Theory-driven approach

Approach

Theory-driven approach

Linguistic intuition

Literature on learner English

Hypotheses that are well understood

Target language determiner systems cause problems for learnerswhose native language doesn’t utilise determiners

Risks ’finding the obvious’

Large-scale databases

How can we extract data efficiently and reliably to evaluatelinguistic hypotheses?How can we make ”observations” or extract patterns thatmay lead to new hypotheses?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

GoalTheory-driven approachData-driven approach

Data-driven approach

Our approach

More empirical perspective for linguistic hypotheses on learnergrammars

Machine Learning

Advantages

Partially automate the process of hypothesis creation

Alternative route to learner grammars

Useful adjunct to hypothesis-driven approach

Powerful methodology for exploring a large hypothesis space

Data-driven approaches quantitatively very powerful

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

First Certificate in English (FCE) exam

First Certificate in English (FCE) exam

FCE Writing Component

CEFR level: vantage or upper-intermediate (B2)

Two tasks eliciting free-text answers, each one between 120 and 180words (e.g. ‘write a short story commencing ...’)

Answers annotated with mark (in the range 1–40), fitted to aRASCH model (Fischer and Molenaar, 1995)

Manually error-coded using a taxonomy of ∼80 error types (Nicholls,2003)

Meta-data

Candidate’s grades

Native language

Age

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

First Certificate in English (FCE) exam

First Certificate in English (FCE) exam – cont.

FCE Writing Component

Manually error-coded using a taxonomy of ∼80 error types (Nicholls, 2003)

Examples

It is a very beautiful place and the people there <NS type=‘AGV’><i>is</i> <c>are</c> </NS> very kind and generous .

I will give you all <NS type=‘MD’> <c>the</c> </NS>information you need.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

First Certificate in English (FCE) exam

First Certificate in English (FCE) exam – cont.

FCE Writing Component

Manually error-coded using a taxonomy of ∼80 error types (Nicholls, 2003)

Examples

It is a very beautiful place and the people there <NS type=‘AGV’><i>is</i> <c>are</c> </NS> very kind and generous .

I will give you all <NS type=‘MD’> <c>the</c> </NS>information you need.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

First Certificate in English (FCE) exam

First Certificate in English (FCE) exam – cont.

FCE Writing Component

Manually error-coded using a taxonomy of ∼80 error types (Nicholls, 2003)

Examples

It is a very beautiful place and the people there <NS type=‘AGV’><i>is</i> <c>are</c> </NS> very kind and generous .

I will give you all <NS type=‘MD’> <c>the</c> </NS>information you need.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Machine LearningFeature SetInformation VisualisationVisual User Interface

Machine Learning

Discriminative Learning

Supervised discriminative machine learning methods to automate the

assessment of the FCE exam (Briscoe et al., 2010)

Binary classifier that best discriminates passing from failingFCE scripts (trained on FCE scripts)

Linear Perceptron classifier

Feature set: lexical and part-of-speech (POS) ngrams (among

other feature types)

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Machine LearningFeature SetInformation VisualisationVisual User Interface

Highly Ranked Discriminative Feature Instances

FeatureVM RR (+), because (−)how to (−)necessary (+)the people (−)probably (+)VV∅ VV∅ (−)NN2 VVG (+)II VVN (−)

TypePOS bigramword bigramword bigramword unigramword bigramword unigramPOS bigramPOS bigramPOS bigram

Examplecould clearly, because of*teach the others how to danceit is necessary that*the people are cleverwe are probably going*technology keep developchildren smiling*I want to gone

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Machine LearningFeature SetInformation VisualisationVisual User Interface

Discriminative Instances

Issues

Hundreds of thousands of discriminative feature instances

Proxies to aspects of the grammar and need interpretation

Evaluate higher-level, more general and comprehensiblehypotheses

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Machine LearningFeature SetInformation VisualisationVisual User Interface

Information Visualisation

Appeal

Gain a deeper understanding of important phenomenathat are represented in a database

Navigate large amounts of data faster, intuitively, andwith relative ease

No need to learn query language syntax

Identify the most productive paths to pursue

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Machine LearningFeature SetInformation VisualisationVisual User Interface

Visual User Interface

Feature relations

Given S = {s1, s2, ..., sN} and F = {f1, f2, . . . , fM}, a feature fi ∈ F is

associated with a feature fj ∈ F , where i 6= j and 1 ≤ i , j ≤ M, if their

relative co-occurrence score is within a predefined range:

score(fj , fi ) =

∑Nk=1 exists(fj , fi , sk)∑Nk=1 exists(fi , sk)

(1)

where sk ∈ S , 1 ≤ k ≤ N, exists() is a binary function that returns 1 if

the input features occur in sk , and 0 ≤ score(fj , fi ) ≤ 1.

Two features are connected by an edge if their score is within a

user-defined range.

Outgoing edges of fi : score(fj , fi ), incoming edges of fi : score(fi , fj).

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Machine LearningFeature SetInformation VisualisationVisual User Interface

Dynamic creation of graphs

The functionality, usability

and tractability of graphs is

severely limited when the

number of nodes and edges

grows by more than a few

dozen (Fry, 2007)

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Machine LearningFeature SetInformation VisualisationVisual User Interface

Dynamic creation of graphs – cont.

Graph of the 5 most frequent negative features using a score range of 0.8–1

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 (−): 18th most discriminative (negative) feature

Degree adverb followed by an adjective and a singular noun (e.g.,very good boy)

Why negative?

Related to:

very good (−)JJ NN1 II (−) (e.g., difficult sport at)

VBZ RG (−) (e.g., is very)

Examples

1a It might seem to be very difficult sport at the beginning.

1b We know a lot about very difficult situation in your country.

1c I think it’s very good idea to spending vacation together.

1d Unix is very powerful system but there is one thing against it.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 (−): 18th most discriminative (negative) feature

Degree adverb followed by an adjective and a singular noun(e.g., very good boy)

Why negative?

Related to:

very good (−)JJ NN1 II (−) (e.g., difficult sport at)

VBZ RG (−) (e.g., is very)

Examples

1a It might seem to be very difficult sport at the beginning.

1b We know a lot about very difficult situation in your country.

1c I think it’s very good idea to spending vacation together.

1d Unix is very powerful system but there is one thing against it.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 (−): 18th most discriminative (negative) feature

Degree adverb followed by an adjective and a singular noun (e.g.,very good boy)

Why negative?

Related to:

very good (−)JJ NN1 II (−) (e.g., difficult sport at)

VBZ RG (−) (e.g., is very)

Examples

1a It might seem to be very difficult sport at the beginning.

1b We know a lot about very difficult situation in your country.

1c I think it’s very good idea to spending vacation together.

1d Unix is very powerful system but there is one thing against it.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 (−): 18th most discriminative (negative) feature

Degree adverb followed by an adjective and a singular noun (e.g.,very good boy)

Why negative?

Related to:

very good (−)JJ NN1 II (−) (e.g., difficult sport at)

VBZ RG (−) (e.g., is very)

Examples

1a It might seem to be very difficult sport at the beginning.

1b We know a lot about very difficult situation in your country.

1c I think it’s very good idea to spending vacation together.

1d Unix is very powerful system but there is one thing against it.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 (−): 18th most discriminative (negative) feature

Degree adverb followed by an adjective and a singular noun (e.g.,very good boy)

Why negative?

Related to:

very good (−)

JJ NN1 II (−) (e.g., difficult sport at)

VBZ RG (−) (e.g., is very)

Examples

1a It might seem to be very difficult sport at the beginning.

1b We know a lot about very difficult situation in your country.

1c I think it’s very good idea to spending vacation together.

1d Unix is very powerful system but there is one thing against it.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 (−): 18th most discriminative (negative) feature

Degree adverb followed by an adjective and a singular noun (e.g.,very good boy)

Why negative?

Related to:

very good (−)JJ NN1 II (−) (e.g., difficult sport at)

VBZ RG (−) (e.g., is very)

Examples

1a It might seem to be very difficult sport at the beginning.

1b We know a lot about very difficult situation in your country.

1c I think it’s very good idea to spending vacation together.

1d Unix is very powerful system but there is one thing against it.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 (−): 18th most discriminative (negative) feature

Degree adverb followed by an adjective and a singular noun (e.g.,very good boy)

Why negative?

Related to:

very good (−)JJ NN1 II (−) (e.g., difficult sport at)

VBZ RG (−) (e.g., is very)

Examples

1a It might seem to be very difficult sport at the beginning.

1b We know a lot about very difficult situation in your country.

1c I think it’s very good idea to spending vacation together.

1d Unix is very powerful system but there is one thing against it.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18

RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18

RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18

RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18

RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18RG JJ NN1: 2.75

VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with articleomission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

RG JJ NN1 – related to article omission errors?

23% of sentences that contain RG JJ NN1 also have a MD error

very good (12%), JJ NN1 II (14%), VBZ RG (15%)

MD:doc

Across all scripts: 2.18RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73

VBZ RG JJ & RG JJ NN1: 3.68

The error mostly involves the indefinite article

Why these richer nominals should associate with article omission?

Why only singular nouns are implicated in this feature?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

Why these richer nominals should associate with article omission?

Typical of learners coming from L1s lacking an article system(Robertson, 2000; Ionin and Montrul, 2010; Hawkins andButtery, 2010)

Learners analyse articles as adjectival modifiers rather than as aseparate category of determiners or articles (Trenkic, 2008)

Hypothesis: with complex adjectival phrases, learners may omit the

article

Is article omission more pronounced with more complexadjectival phrases?

Is this primarily the case for learners from L1s lacking articles?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

Why these richer nominals should associate with article omission?

Typical of learners coming from L1s lacking an article system(Robertson, 2000; Ionin and Montrul, 2010; Hawkins and Buttery,2010)

Learners analyse articles as adjectival modifiers rather than as aseparate category of determiners or articles (Trenkic, 2008)

Hypothesis: with complex adjectival phrases, learners may omit the

article

Is article omission more pronounced with more complexadjectival phrases?

Is this primarily the case for learners from L1s lacking articles?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

Why these richer nominals should associate with article omission?

Typical of learners coming from L1s lacking an article system(Robertson, 2000; Ionin and Montrul, 2010; Hawkins and Buttery,2010)

Learners analyse articles as adjectival modifiers rather than as aseparate category of determiners or articles (Trenkic, 2008)

Hypothesis: with complex adjectival phrases, learners may omit

the article

Is article omission more pronounced with more complexadjectival phrases?

Is this primarily the case for learners from L1s lacking articles?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

Why these richer nominals should associate with article omission?

Typical of learners coming from L1s lacking an article system(Robertson, 2000; Ionin and Montrul, 2010; Hawkins and Buttery,2010)

Learners analyse articles as adjectival modifiers rather than as aseparate category of determiners or articles (Trenkic, 2008)

Hypothesis: with complex adjectival phrases, learners may omit the

article

Is article omission more pronounced with more complexadjectival phrases?

Is this primarily the case for learners from L1s lacking articles?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

Why these richer nominals should associate with article omission?

Typical of learners coming from L1s lacking an article system(Robertson, 2000; Ionin and Montrul, 2010; Hawkins and Buttery,2010)

Learners analyse articles as adjectival modifiers rather than as aseparate category of determiners or articles (Trenkic, 2008)

Hypothesis: with complex adjectival phrases, learners may omit the

article

Is article omission more pronounced with more complexadjectival phrases?

Is this primarily the case for learners from L1s lacking

articles?

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

Is article omission more pronounced with more complex adjectival phrases?

MD:doc

Across all scripts: 2.18RG JJ NN1: 2.75VBZ RG: 2.68JJ NN1 II: 2.48very good: 2.32VBZ RG JJ: 2.73VBZ RG JJ & RG JJ NN1: 3.68

JJ NN1: 2.20

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features

Is this primarily the case for learners from L1s lacking articles?

sentences% MD:doc

Language RG JJ NN1 VBZ RG JJ RG JJ NN1 VBZ RG JJ

all 23.0 15.6 2.75 2.73

Turkish 45.2 29.0 5.81 5.82

Japanese 44.4 22.3 4.48 3.98

Korean 46.7 35.0 5.48 5.31

Russian 46.7 23.4 5.42 4.59

Chinese 23.4 13.5 3.58 3.25

French 6.9 6.7 1.32 1.49

German 2.1 3.0 0.91 0.92

Spanish 10.0 9.6 1.18 1.35

Greek 15.5 12.9 1.60 1.70

Table: sentences%: proportion of sentences containing fi that also contain

a MD.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Interpreting discriminative features: a case study

Why only singular nouns are implicated in this feature?

The association with predicative contexts may provide a clue. Suchcontexts select nominals which require the indefinite article only in thesingular case.

Example: Unix is (a) very powerful system vs. Macs are very elegantmachines.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Conclusions & Future Work

Formed initial interpretations for why a particular feature isnegatively discriminative

Nominals with complex adjectival phrases appear particularlysusceptible to article omission errors by learners of English withL1s lacking articles

Usefulness of visualisation techniques for navigating andinterpreting large amounts of data

Relevance of features weighted by discriminative classifiers

More rigorous evaluation techniques, such as longitudinal casestudies (Shneiderman and Plaisant, 2006; Munzner, 2009)

Investigation and evaluation of different visualisation techniques ofmachine learned or extracted features that support hypothesisformation about learner grammars

Available upon request as a web service on FCE scripts

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Conclusions & Future Work

Formed initial interpretations for why a particular feature isnegatively discriminative

Nominals with complex adjectival phrases appear particularlysusceptible to article omission errors by learners of English with L1slacking articles

Usefulness of visualisation techniques for navigating and interpretinglarge amounts of data

Relevance of features weighted by discriminative classifiers

More rigorous evaluation techniques, such as longitudinal casestudies (Shneiderman and Plaisant, 2006; Munzner, 2009)

Investigation and evaluation of different visualisation techniquesof machine learned or extracted features that support hypothesisformation about learner grammars

Available upon request as a web service on FCE scripts

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Conclusions & Future Work

Formed initial interpretations for why a particular feature isnegatively discriminative

Nominals with complex adjectival phrases appear particularlysusceptible to article omission errors by learners of English with L1slacking articles

Usefulness of visualisation techniques for navigating and interpretinglarge amounts of data

Relevance of features weighted by discriminative classifiers

More rigorous evaluation techniques, such as longitudinal casestudies (Shneiderman and Plaisant, 2006; Munzner, 2009)

Investigation and evaluation of different visualisation techniques ofmachine learned or extracted features that support hypothesisformation about learner grammars

Available upon request as a web service on FCE scripts

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research

IntrodutionDatasetSystem

Case studyConclusions

Thank you!

Acknowledgments: we are grateful to Cambridge ESOL for supporting this

research. The third author acknowledges support from Education First. We

would like to thank Marek Rei, Øistein Andersen, Tim Parish, Paula Buttery,

Angeliki Salamoura as well as the anonymous reviewers for their valuable

comments and suggestions.

Helen Yannakoudakis, Ted Briscoe, Theodora Alexopoulou Automating Second Language Acquisition Research