triplet extraction from sentences technical university of cluj-napoca conf. dr. ing. tudor mureşan...

31
Triplet Extraction from Triplet Extraction from Sentences Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist. Prof. Dr. Dunja Mladenić Blaž Fortuna Marko Grobelnik Lorand Dali June 2008

Upload: samson-curtis

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Triplet Extraction from Triplet Extraction from SentencesSentences

Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan“Jožef Stefan” Institute, Ljubljana, Slovenia

Assist. Prof. Dr. Dunja MladenićBlaž FortunaMarko Grobelnik

Lorand Dali June 2008

Page 2: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Location of the project in the Location of the project in the field of Computer Sciencefield of Computer Science

Artificial IntelligenceNatural Language ProcessingMachine Learning

Page 3: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..

Page 4: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Motivation of Triplet ExtractionMotivation of Triplet Extraction

Advantages◦ compact and simple representation of the

information contained in a sentence◦ avoids the complexity of a full parse◦ contains semantic information

Applications◦ building the semantic graph of a document◦ summarization◦ question answering

Page 5: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 6: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 7: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Triplet Extraction – 2 Triplet Extraction – 2 ApproachesApproachesExtraction from the parse tree of the

sentence using heuristic rules◦ OpenNLP – Treebank Parsetree◦ Link Parser – Link Grammar (a type of dependency

grammar)

Extraction using Machine Learning◦ Support Vector Machines (SVM) are used◦ The SVM model is trained on human annotated data

Page 8: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Short review of SVMShort review of SVM

Page 9: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 10: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Features of the triplet Features of the triplet candidatescandidatesOver 300 features depending on:Sentence

◦ length of sentence, number of words, etcCandidate

◦ context of Subj, Verb and Obj;◦ distance between Subj, Verb, Obj

Linkage◦ number of links, of link types, nr of links from S, V, O

Minipar◦ depth, diameter, siblings, uncles, cousins, categories,

relations

Treebank◦ depth, diameter, siblings, uncles, cousins, path to root, POS

Page 11: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Evaluation and TestingEvaluation and TestingTraining set = 700 annotated sentences

Test set = 100 annotated sentences

Compare the extracted triplets from a sentence to the annotated triplets from that same sentence

Comparison is done according to a similaritry measure [0, 1] between two triplets

extracted to annotated => precision

annotated to extracted => recall

Page 12: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 13: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

ConclusionsConclusions

Triplet extraction using hand rulesTriplet extraction using machine

learning (SVM)Question answering system based on

triplets

Page 14: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

QuestionsQuestions

Page 15: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 16: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 17: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 18: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 19: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Triplet Similarity MeasureTriplet Similarity Measure

S V O

S’ V’ O’

SubjSim VerbSim ObjSim

TrSim = (SubjSim + VerbSim + ObjSim) / 3

TrSim, SubjSim, VerbSim, ObjSim [0, 1]

Page 20: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

String Similarity MeasureString Similarity Measure

The way to success is under heavy construction

The road to success is always under construction

road success under construction

way success under heavy construction

Sim = nMatch / maxLen = 3 / 5 = 0.6

Page 21: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Evaluating the extracted Evaluating the extracted tripletstriplets

Sentence Sentence

Tr1

Tr2

Tr3

Tr1

Tr2

Precision

Recall

Extracted Golden Standard

Page 22: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..

Page 23: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 24: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 25: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 26: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 27: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Question TypesQuestion TypesYes/No QuestionsList QuestionsReason QuestionsQuantity QuestionsLocation QuestionsTime Questions

Page 28: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

Block Diagram of QA Block Diagram of QA SystemSystem

Parse and

determine

question type

BuildQuery

SearchTriplets

Question Answer

Page 29: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 30: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist
Page 31: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist

If a If a listenerlistener nodsnods his his headhead while while youyou're 're explainingexplaining your your programprogram; wake him up.; wake him up.