combining pattern-based and machine learning methods to detect definitions for elearning purposes
DESCRIPTION
Combining pattern-based and machine learning methods to detect definitions for eLearning purposes. Eline Westerhout & Paola Monachesi. Overview. Extraction of definitions within eLearning Types of definitory contexts Grammar approach Machine learning approach Conclusions Future work - PowerPoint PPT PresentationTRANSCRIPT
Combining pattern-based and machine learning
methods to detect definitions for eLearning
purposes
Eline Westerhout&
Paola Monachesi
Overview
• Extraction of definitions within eLearning
• Types of definitory contexts• Grammar approach• Machine learning approach• Conclusions• Future work• Discussion
Extraction of definitions within eLearning
• Definition extraction:– question answering– building dictionaries from text– ontology learning
• Challenges within eLearning:– corpus– size of LOs
Types - I
• is_def: Gnuplot is een programma om grafieken te maken‘Gnuplot is a program for drawing graphs’• verb_def: E-learning omvat hulpmiddelen en toepassingen die
via het internet beschikbaar zijn en creatieve mogelijkheden bieden om de leerervaring te verbeteren .
‘eLearning comprises resources and applications that are available via the internet and provide creative possibilities to improve the learning experience’
• punct_def Passen: plastic kaarten voorzien van een magnetische
strip, [...] toegang krijgt tot bepaalde faciliteiten.‘Passes: plastic cards equipped with a magnetic strip,
that [...] gets access to certain facilities. ’• pron_def Dedicated readers. Dit zijn speciale apparaten,
ontwikkeld met het exclusieve doel e-boeken te kunnen lezen.
‘Dedicated readers. These are special devices, developed with the exclusive goal to make it possible to read e-books.’
Types - II
Identification of definitory contexts• Make use of the linguistic annotation of LOs
(part-of-speech tags)• Domain: computer science for non-experts• Use of language specific grammars • Workflow
– Searching and marking definitory contexts in LOs (manually)
– Drafting local grammars on the basis of these examples
– Apply the grammars to new LOs
<rule name="simple_NP" >
<seq>
<and>
<ref name="art"/>
<ref name="cap"/>
</and>
<ref name="adj" mult="*"/>
<ref name="noun" mult="+"/>
</seq>
</rule>
Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.
<query match="tok[@ctag='V' and @base='zijn' and @msd[starts-with(.,'hulpofkopp')]]"/>
Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.
<rule name="noun_phrase">
<seq>
<ref name="art" mult="?"/>
<ref name="adj" mult="*" />
<ref name="noun" mult="+" />
</seq>
</rule>
Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.
<rule name="is_are_def">
<seq>
<ref name="simple_NP"/>
<query match="tok[@ctag='V' and @base='zijn' and
@msd[starts-with(.,'hulpofkopp')]]"/>
<ref name="noun_phrase" />
<ref name="tok_or_chunk" mult="*"/>
</seq>
</rule>
Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.
<definingText>
<markedTerm>
<tok sp="n" msd="onbep,zijdofonzijd,neut" ctag="Art" base="een" id="t214.2">Een</tok>
<tok sp="n" msd="attr,stell,vervneut" ctag="Adj" base="vet" id="t214.3">vette</tok>
<tok sp="n" msd="soort,ev,neut" ctag="N" base="letter" id="t214.4">letter</tok>
</markedTerm>
<tok sp="n" msd="hulpofkopp,ott,3,ev" ctag="V" base="zijn" id="t214.5">is</tok>
<tok sp="n" msd="onbep,zijdofonzijd,neut" ctag="Art" base="een"
id="t214.6">een</tok>
<tok sp="n" msd="soort,ev,neut" ctag="N" base="letter" id="t214.7">letter</tok>
...
<tok sp="n" msd="onbep,neut,attr" ctag="Pron" base="andere"
id="t214.14">andere</tok>
<tok sp="n" msd="soort,mv,neut" ctag="N" base="letter" id="t214.15">letters</tok>
<tok sp="n" msd="punt" ctag="Punc" base="." id="t214.16">.</tok>
</definingText>
Results (grammar)
P R Fis_def 0.2810 0.8652 0.4242verb_def 0.4464 0.7576 0.5618punct_def 0.0991 0.6818 0.1731pron_def 0.0918 0.4130 0.1502
Features
• Text properties: bag-of-words, bigrams, and bigram preceding the definition
• Syntactic properties: type of determiner within the defined term (definite, indefinite, no determiner)
• Proper nouns: presence of a proper noun in the defined term
ConfigurationsSetting Attributes
1 using bag-of-words
2 using bigrams
3 combining bag-of-words and bigrams
4 adding bigram preceding definition to setting 3
5 adding definiteness of article in marked term to setting 3
6 adding presence of proper noun to setting 3
7 adding bigram preceding definition & definiteness of article in markedterm to setting 3
8 adding bigram preceding definition & presence of proper noun tosetting 3
9 adding definiteness of article in marked term & presence of propernoun to setting 3
10 using all attributes
Results – is_def (ML)
P R F1 0.6944 0.6494 0.67112 0.6625 0.6883 0.67523 0.7662 0.7662 0.76624 0.7662 0.7662 0.76625 0.7763 0.7662 0.77126 0.7662 0.7662 0.76627 0.7867 0.7662 0.77638 0.7632 0.7532 0.75829 0.7895 0.7792 0.7843
10 0.8000 0.7792 0.7895
Results – is_def (final)P R F
1 0.6944 0.5618 0.62112 0.6625 0.5955 0.62723 0.7662 0.6629 0.71084 0.7662 0.6629 0.71085 0.7763 0.6629 0.71526 0.7662 0.6629 0.71087 0.7867 0.6629 0.71958 0.7632 0.6517 0.70309 0.7895 0.6742 0.727310 0.8000 0.6742 0.7317
Results – punct_def (ML)
P R F1 0.4324 0.3556 0.39022 0.3171 0.2889 0.30233 0.4510 0.5111 0.47924 0.4681 0.4889 0.47835 0.4528 0.5333 0.48986 0.5000 0.5333 0.51617 0.5106 0.5333 0.52178 0.5000 0.5333 0.51619 0.5000 0.5778 0.5361
10 0.5000 0.5333 0.5161
Results – punct_def (final)P R F
1 0.4324 0.2424 0.31072 0.3171 0.1970 0.24303 0.4510 0.3485 0.39324 0.4681 0.3333 0.38945 0.4528 0.3636 0.40346 0.5000 0.3636 0.42117 0.5106 0.3636 0.42488 0.5000 0.3636 0.42119 0.5000 0.3939 0.4407
10 0.5000 0.3636 0.4211
Final resultsP R F
is_def before 0.2810 0.8652 0.4242after (10) 0.8000 0.6742 0.7317
punct_def before 0.0991 0.6818 0.1731after (9) 0.5000 0.3939 0.4407
• precision (50 % and 40 %)• recall (20 % and 30 %)• f-score (30 % and 25 %)
Related work
• Question answering:– Fahmi & Bouma (2006)– Miliaraki & Androutsopoulos (2004)
• Glossary creation:– Muresan & Klavans (2002)
• Ontology learning:– Storrer & Wellinghof (2006)– Walter & Pinkal (2006)
Future work
• try different features• evaluate other classifiers• extend to all types of definitions• scenario based evaluation of the GCD