entity-relationship extraction from wikipedia unstructured text - overview

16
Entity-Relationship Extraction from Wikipedia Unstructured Text Radityo Eko Prasojo (Rido) PhD Student @ KRDB, Free University of Bozen-Bolzano Supervised by: Mouna Kacimi & Werner Nutt 20.07.16, Bilbao, Spain

Upload: radityo-eko-prasojo

Post on 24-Jan-2018

141 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Entity-RelationshipExtractionfromWikipediaUnstructuredText

RadityoEkoPrasojo (Rido)PhDStudent@KRDB,FreeUniversityofBozen-Bolzano

Supervisedby:Mouna Kacimi &WernerNutt

20.07.16,Bilbao,Spain

Page 2: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Automaticallygenerated Manuallycurated

Automatedextractionwithout(yet)aKBasaresult

KnowledgeVault[1]

KnowledgeGraph

NELL[2]

220/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

Infobox completion [3][4]

Page 3: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

320/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

Page 4: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

420/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

Page 5: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

520/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

WherewasObamaborn?

WhoarethechildrenofObama?

Page 6: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

620/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

WhenwasObamaborn?

WhoarethechildrenofObama?

Yeswecan!

Honolulu, HawaiiMaliaandSashaObama

Page 7: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

720/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

WhichareObama’sfavourite sportsteam?

DoesObamahavepets?

Page 8: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

OurgoalistoenrichexistingKnowledgeBasesbyextractingnewfactsintheformofmachine-readableentity-relationshipfromWikipediaunstructuredtext.

Specificfocus:RDF

820/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

Page 9: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Whyisitdifficult?

• Theextractionproblem• Entityextraction&disambiguation• Relationextraction

• Therepresentationproblem• Lackofpredefinedschema/ontology• Topic-independency• Complexfactrepresentation

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 9

Page 10: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Whyisitdifficult?Example

• “Obamaisasupporterofthe ChicagoWhiteSox”• Straightforward,singletoninformation• Puresyntacticextractionpossible• Barack_Obama supporterOf Chicago_White_Sox

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 10

Page 11: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Whyisitdifficult?Example

• “Obamaisasupporterofthe ChicagoWhiteSox”• Straightforward,singletoninformation• Puresyntacticextractionpossible• Barack_Obama supporterOf Chicago_White_Sox

• “He isalsoprimarily a ChicagoBears footballfaninthe NFL,butinhischildhoodandadolescencewas a fanofthePittsburghSteelers”• Complex,multipleinformation• Semanticunderstandingnecessary• …howdowerepresentthis?

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 11

Page 12: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Example:representingcomplexfact

• “He isalsoprimarily a ChicagoBears footballfaninthe NFL,butinhischildhoodandadolescencewas a fanofthePittsburghSteelers”• Barack_Obama footballFan Chicago_Bears in NFL• supporterOf vsfootballFan• IsitnecessarytoincludeNFL inthewholerelations?• Whatabouttheadjectiveprimarily?Whatinformationdoesitimply?

• Barack_Obama fanOf Pittsburgh_Steelers• fanOf vs supporterOf• Missingthetimeinformationreferredin“inhischildhoodandadolescencewas”

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 12

Page 13: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Approach

• Documentpreprocessingtoannotateallentityoccurences.• Grammaticaldependencytoextract(candidate)relations.

• Separationbetweentheextractionproblemandtherepresentationproblem• Wefirstextractallcandidaterelationsandthenlaterapplysemanticrefinementforbetterrepresentation.

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 13

Page 14: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Preliminaryresults

• Groundtruthmanuallycuratedfrom25Wikipediaarticlesoffamouspeople.• Preprocessing• 4handcraftedextractionrulesleveraginggrammaticaldependency

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 14

Page 15: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Ongoingwork• Automatedrulemining• Semanticrefinementforknowledgerepresentation• Ontologybuilding

• Namingandtaxonomyofentities,classes,andrelations• Handlingcomplexfact

• Obamaappointsxasyinz• Handlingmodality,adjectives,andsentiment

• “Inthepast”,“itisrumoured that”,“itisnottruethat”

• Futureevaluation• Biggergroundtruth(amount+topiccoverage)• EvaluatehowwellweenrichexistingKBs

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 15

Page 16: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

Futurework

• Metadataextraction• Dataquality,datacompleteness

• NaturallanguagequestionansweringbasedontheenrichedKB.

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 16