question answering on romanian, english and french languages
TRANSCRIPT
Question Answering on Romanian, English and
French Languages
„„ Al. I. Cuza” University of IaAl. I. Cuza” University of Ia ss i, Romi, Rom aa niania
Faculty of Computer ScienceFaculty of Computer Science
Introduction System components◦ Questions analysis◦ Index creation and information retrieval◦ Answer extraction
Results Application of QA system◦ eLearning◦ Robotics◦ CriES 2010
Conclusions
Our group participate in CLEF exercises from 2006:◦ 2006 – Ro–En (English collection) – 9.47% right answers ◦ 2007 – Ro–Ro (Romanian Wikipedia) – 12 %◦ 2008 – Ro–Ro (Romanian Wikipedia) – 31 %◦ 2009 – Ro–Ro, En–En (JRC-Acquis) – 47.2 % (48.6%)◦ 2010 – Ro-Ro, En-En, Fr-Fr (JRC-Acquis, Europarl) – 47.5%
(42.5%, 27 %)
Lucene queries
Lucene Index
Question analysis: - Tokenization & lemmatization - Focus, keywords and names entities identification - Question classification
JRC-Acquis corpus
Initial questions
Information Retrieval
Relevant snippets
Romanian Grammar
Definition Answer Extraction
Reason Answer Extraction
Other Answer Extraction
Final Answers
EUROPARL
corpus
Q1: What percentage of people in Italy relies on television for information?
<q q_id="0001" source_lang="EN" target_lang="RO"> <string>Ce procent al populaţiei din Italia contează pe televiziune
pentru a obţine informaţii</string> <focus>procent</focus> <verb>contează obţine</verb> <noun>populaţiei televiziune informaţii</noun> <nameEntities>Italia</nameEntities> <luceneQuery>procent~0.7 populaţiei~0.7 Italia^3 (contează^2
conta) televiziune~0.7 obţine informaţii~0.7 </luceneQuery> <questionType>FACTOID</questionType> ~ 40 patterns <answerType>MEASURE</answerType> ~ 30 patterns</q>
We used Lucene and we have created two indexes, one at paragraph level and one at document level
Using the Lucene queries and the indexes we used the Lucene search engine to extract a ranked list of snippets for every question as possible answer candidates
Depend by Lucene score and additional we built special modules to extract answers for questions of type DEFINITION, REASON-PURPOSE, PROCEDURE, OPINION
Two thresholds values◦ A higher one - in this case, the system offers many NOA
answers – RA is affected, but c@1 is higher◦ A lower one – in this case we offer only a few NOA
answers – RA is higher, but c@1 is lower
RO-RO EN-EN FR-FRanswered right 95 102 85 78 54 47
answered wrong 74 93 98 99 124 153
total answered 169 195 183 177 178 200
unanswered right 0 0 0 0 0 0
unanswered wrong
0 0 0 0 0 0
unanswered empty
31 5 17 23 22 0
total unanswered 31 5 17 23 22 0
c@1 measure 0.55 0.42 0.46 0.43 0.30 0.24
eLearning – fast answer for ~30% questions
Robotics – communication
CriES 2010 – identify experts on Yahoo! Answers
Similar questions Answer Ok?
Exista aplicatii in care folosirea Design Patterns-urilor sa nu fie eficienta?
Evident (de exemplu un program Hello World)... Go
La ce altceva design pattrenurile ar putea fi aplicate?
Gasirea solutiei la o problema, crearea unui limbaj de programator avansat, in scrierea documentatiei, in discutii cu colegii de la o firma de software.
Go
Care este diferenta dintre pattern si o expresie (idiom) de codare.
Nu am auzit de expresia idiom de codare, dar pare ceva particular, pe cand design pattern-ul este general...
Go
Care este diferenta dintre pattern si clase.
Un design pattern este o solutie la o problema si prin urmare este compusa dintr-o ierarhie de clase intre care avem relatii.
Go
Este design pattern-ul diferit de pattern? De ce a fost ales acest nume?
Design pattern-ul este un pattern in domeniul ingineriei software. Nu stiu de ce a fost ales acest nume.... :)
Go
Folosim Design Patterns in aceeasi aplicatie sau le folosim in aplicatii diferite?
In aceeasi aplicatie. Go
Ce este un design pattern? In primul rand: un nume, o problema si o solutie Go
Questions Answer Priority Status Details
La ce se folosesc design pattern-urile?
normal nevoieNeaparat
Go
Raspunde la intrebare
Raspuns
Go
Exception handlingul in Java poate fi considerat o aplicatie a Decorator pattern?
urgent
nevoieNeaparat
Go
Raspunde la intrebare
Raspuns
Go
Exista aplicatii in care folosirea Design Patterns-urilor sa nu fie eficienta?
Evident (de exemplu un program Hello World)...
normal doarAsa
La ce altceva design pattrenurile ar putea fi aplicate?
Gasirea solutiei la o problema, crearea unui limbaj de programator avansat, in scrierea documentatiei, in discutii cu colegii de la o firma de software.
normal saAfluMulte
With Swoogle we extend the knowledge base The ontologies returned are then converted to AIML
format and saved in the robot’s memory
Initial digraph
Initial Yahoo!answers collections
en fr ge sp
Eliminate stop words
Domains keywords
Initial users questions
Eliminate stop words
Questions keywords
Relevant words for questions
Relevant words for domains
Similarity score between questions and domains
Run 2 Run 1Run 0
UAIC QA system evolved over time (from 9 % in 2006 at 47.5 % in 2010)
The main problem is related to quality and quantity of Romanian resources involved
In present we are concerned with using of QA components in other applications in order to improve their capabilities