triplet extraction from sentences technical university of cluj-napoca conf. dr. ing. tudor mureşan...

Post on 05-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Triplet Extraction from Triplet Extraction from SentencesSentences

Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan“Jožef Stefan” Institute, Ljubljana, Slovenia

Assist. Prof. Dr. Dunja MladenićBlaž FortunaMarko Grobelnik

Lorand Dali June 2008

Location of the project in the Location of the project in the field of Computer Sciencefield of Computer Science

Artificial IntelligenceNatural Language ProcessingMachine Learning

My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..

Motivation of Triplet ExtractionMotivation of Triplet Extraction

Advantages◦ compact and simple representation of the

information contained in a sentence◦ avoids the complexity of a full parse◦ contains semantic information

Applications◦ building the semantic graph of a document◦ summarization◦ question answering

Triplet Extraction – 2 Triplet Extraction – 2 ApproachesApproachesExtraction from the parse tree of the

sentence using heuristic rules◦ OpenNLP – Treebank Parsetree◦ Link Parser – Link Grammar (a type of dependency

grammar)

Extraction using Machine Learning◦ Support Vector Machines (SVM) are used◦ The SVM model is trained on human annotated data

Short review of SVMShort review of SVM

Features of the triplet Features of the triplet candidatescandidatesOver 300 features depending on:Sentence

◦ length of sentence, number of words, etcCandidate

◦ context of Subj, Verb and Obj;◦ distance between Subj, Verb, Obj

Linkage◦ number of links, of link types, nr of links from S, V, O

Minipar◦ depth, diameter, siblings, uncles, cousins, categories,

relations

Treebank◦ depth, diameter, siblings, uncles, cousins, path to root, POS

Evaluation and TestingEvaluation and TestingTraining set = 700 annotated sentences

Test set = 100 annotated sentences

Compare the extracted triplets from a sentence to the annotated triplets from that same sentence

Comparison is done according to a similaritry measure [0, 1] between two triplets

extracted to annotated => precision

annotated to extracted => recall

ConclusionsConclusions

Triplet extraction using hand rulesTriplet extraction using machine

learning (SVM)Question answering system based on

triplets

QuestionsQuestions

Triplet Similarity MeasureTriplet Similarity Measure

S V O

S’ V’ O’

SubjSim VerbSim ObjSim

TrSim = (SubjSim + VerbSim + ObjSim) / 3

TrSim, SubjSim, VerbSim, ObjSim [0, 1]

String Similarity MeasureString Similarity Measure

The way to success is under heavy construction

The road to success is always under construction

road success under construction

way success under heavy construction

Sim = nMatch / maxLen = 3 / 5 = 0.6

Evaluating the extracted Evaluating the extracted tripletstriplets

Sentence Sentence

Tr1

Tr2

Tr3

Tr1

Tr2

Precision

Recall

Extracted Golden Standard

My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..

Question TypesQuestion TypesYes/No QuestionsList QuestionsReason QuestionsQuantity QuestionsLocation QuestionsTime Questions

Block Diagram of QA Block Diagram of QA SystemSystem

Parse and

determine

question type

BuildQuery

SearchTriplets

Question Answer

If a If a listenerlistener nodsnods his his headhead while while youyou're 're explainingexplaining your your programprogram; wake him up.; wake him up.

top related