relation extraction - github pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf ·...
TRANSCRIPT
![Page 1: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/1.jpg)
RelationExtraction
ManyslidesfromDanJurafsky
![Page 2: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/2.jpg)
Extractingrelationsfromtext
• Companyreport: “InternationalBusinessMachinesCorporation(IBMorthecompany)wasincorporatedintheStateofNewYorkonJune16,1911,astheComputing-Tabulating-RecordingCo.(C-T-R)…”• ExtractedComplexRelation:
Company-FoundingCompany IBMLocation NewYorkDate June16,1911Original-Name Computing-Tabulating-RecordingCo.
• ButwewillfocusonthesimplertaskofextractingrelationtriplesFounding-year(IBM,1911)Founding-location(IBM,New York)
![Page 3: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/3.jpg)
ExtractingRelationTriplesfromTextTheLelandStanfordJuniorUniversity,commonlyreferredtoasStanfordUniversityorStanford,isanAmericanprivateresearchuniversitylocatedinStanford,California …nearPaloAlto,California…LelandStanford…foundedtheuniversityin1891
Stanford EQLelandStanfordJuniorUniversityStanford LOC-INCaliforniaStanford IS-Aresearch universityStanford LOC-NEARPaloAltoStanford FOUNDED-IN1891StanfordFOUNDERLelandStanford
![Page 4: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/4.jpg)
WhyRelationExtraction?
• Createnewstructuredknowledgebases,usefulforanyapp• Augmentcurrentknowledgebases• AddingwordstoWordNet thesaurus,factstoFreeBase orDBPedia
• Supportquestionanswering• Thegranddaughterofwhichactorstarredinthemovie“E.T.”?(acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x ?y)
• Butwhichrelationsshouldweextract?
4
![Page 5: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/5.jpg)
AutomatedContentExtraction(ACE)
ARTIFACT
GENERALAFFILIATION
ORGAFFILIATION
PART-WHOLE
PERSON-SOCIAL PHYSICAL
Located
Near
Business
Family Lasting Personal
Citizen-Resident-Ethnicity-Religion
Org-Location-Origin
Founder
EmploymentMembership
OwnershipStudent-Alum
Investor
User-Owner-Inventor-Manufacturer
GeographicalSubsidiary
Sports-Affiliation
17relationsfrom2008“RelationExtractionTask”
![Page 6: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/6.jpg)
AutomatedContentExtraction(ACE)
• Physical-LocatedPER-GPEHe was in Tennessee
• Part-Whole-SubsidiaryORG-ORGXYZ, the parent company of ABC
• Person-Social-FamilyPER-PERJohn’s wife Yoko
• Org-AFF-FounderPER-ORGSteve Jobs, co-founder of Apple…
•
6
![Page 7: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/7.jpg)
UMLS:UnifiedMedicalLanguageSystem
• 134entitytypes,54relations
Injury disrupts PhysiologicalFunctionBodilyLocation location-of BiologicFunctionAnatomicalStructure part-of OrganismPharmacologicSubstancecauses PathologicalFunctionPharmacologicSubstancetreats PathologicFunction
![Page 8: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/8.jpg)
ExtractingUMLSrelationsfromasentence
Doppler echocardiography can be used to diagnose left anterior descending artery stenosis in patients with type 2 diabetes
ê
Echocardiography,DopplerDIAGNOSES Acquiredstenosis
8
![Page 9: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/9.jpg)
DatabasesofWikipedia Relations
9
RelationsextractedfromInfoboxStanfordstateCaliforniaStanfordmotto “DieLuft derFreiheit weht”…
WikipediaInfobox
![Page 10: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/10.jpg)
RelationdatabasesthatdrawfromWikipedia
• ResourceDescriptionFramework(RDF)triplessubjectpredicate objectGolden Gate Park location San Franciscodbpedia:Golden_Gate_Park dbpedia-owl:location dbpedia:San_Francisco
• DBPedia:1billionRDFtriples,385fromEnglishWikipedia• FrequentFreebaserelations:
people/person/nationality,location/location/containspeople/person/profession,people/person/place-of-birthbiology/organism_higher_classification film/film/genre
10
![Page 11: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/11.jpg)
Ontologicalrelations
• IS-A(hypernym):subsumption betweenclasses•Giraffe IS-Aruminant IS-A ungulate IS-Amammal IS-Avertebrate IS-Aanimal…
• Instance-of:relationbetween individual andclass•San Francisco instance-of city
ExamplesfromtheWordNet Thesaurus
![Page 12: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/12.jpg)
Howtobuildrelationextractors
1. Hand-writtenpatterns2. Supervisedmachinelearning3. Semi-supervisedandunsupervised• Bootstrapping(usingseeds)• Distantsupervision• Unsupervisedlearningfromtheweb
![Page 13: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/13.jpg)
RelationExtraction
Whatisrelationextraction?
![Page 14: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/14.jpg)
RelationExtraction
Usingpatternstoextractrelations
![Page 15: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/15.jpg)
RulesforextractingIS-Arelation
EarlyintuitionfromHearst(1992)• “Agarisasubstancepreparedfromamixtureofredalgae,suchasGelidium, forlaboratoryorindustrial use”
• WhatdoesGelidium mean?• Howdoyouknow?`
![Page 16: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/16.jpg)
RulesforextractingIS-Arelation
EarlyintuitionfromHearst(1992)• “Agarisasubstancepreparedfromamixtureofredalgae,suchasGelidium,forlaboratoryorindustrial use”
• WhatdoesGelidium mean?• Howdoyouknow?`
![Page 17: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/17.jpg)
Hearst’sPatternsforextractingIS-Arelations
(Hearst,1992):AutomaticAcquisitionofHyponyms
“Y such as X ((, X)* (, and|or) X)”“such Y as X”“X or other Y”“X and other Y”“Y including X”“Y, especially X”
![Page 18: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/18.jpg)
Hearst’sPatternsforextractingIS-Arelations
Hearstpattern ExampleoccurrencesXandother Y ...temples,treasuries,andotherimportantcivicbuildings.
XorotherY Bruises,wounds,brokenbonesorotherinjuries...
YsuchasX Thebowlute,suchastheBambarandang...
Such YasX ...such authorsas Herrick,Goldsmith,andShakespeare.
YincludingX ...common-lawcountries,including CanadaandEngland...
Y,especiallyX Europeancountries,especially France,England,andSpain...
![Page 19: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/19.jpg)
ExtractingRicherRelationsUsingRules
• Intuition: relationsoftenholdbetweenspecificentities• located-in(ORGANIZATION, LOCATION)• founded (PERSON,ORGANIZATION)• cures(DRUG,DISEASE)
• StartwithNamedEntitytagstohelpextractrelation!
![Page 20: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/20.jpg)
NamedEntitiesaren’tquiteenough.Whichrelationsholdbetween2entities?
Drug Disease
Cure?Prevent?
Cause?
![Page 21: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/21.jpg)
Whatrelationsholdbetween2entities?
PERSON ORGANIZATION
Founder?
Investor?
Member?
Employee?
President?
![Page 22: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/22.jpg)
ExtractingRicherRelationsUsingRulesandNamedEntities
Whoholdswhatofficeinwhatorganization?PERSON, POSITION of ORG
• GeorgeMarshall,SecretaryofStateoftheUnitedStates
PERSON(named|appointed|chose|etc.) PERSON Prep?POSITION• TrumanappointedMarshallSecretaryofState
PERSON [be]?(named|appointed|etc.) Prep?ORG POSITION• GeorgeMarshallwasnamedUSSecretaryofState
![Page 23: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/23.jpg)
Hand-builtpatternsforrelations• Plus:•Humanpatternstendtobehigh-precision• Canbetailoredtospecificdomains
•Minus•Humanpatternsareoftenlow-recall•Alotofworktothinkofallpossiblepatterns!•Don’twanttohavetodothisforeveryrelation!•We’dlikebetteraccuracy
![Page 24: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/24.jpg)
RelationExtraction
Usingpatternstoextractrelations
![Page 25: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/25.jpg)
RelationExtraction
Supervisedrelationextraction
![Page 26: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/26.jpg)
Supervisedmachinelearningforrelations
• Chooseasetofrelationswe’dliketoextract• Chooseasetofrelevantnamedentities• Findandlabeldata• Choosearepresentativecorpus• Labelthenamedentitiesinthecorpus• Hand-labeltherelationsbetweentheseentities• Breakintotraining,development,andtest
• Trainaclassifieronthetrainingset
26
![Page 27: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/27.jpg)
Howtodoclassificationinsupervisedrelationextraction
1. Findallpairsofnamedentities(usuallyinsamesentence)
2. Decideif2entitiesarerelated3. Ifyes,classifytherelation•Whytheextrastep?• Fasterclassificationtrainingbyeliminatingmostpairs• Canusedistinctfeature-setsappropriateforeachtask.
27
![Page 28: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/28.jpg)
AutomatedContentExtraction(ACE)
ARTIFACT
GENERALAFFILIATION
ORGAFFILIATION
PART-WHOLE
PERSON-SOCIAL PHYSICAL
Located
Near
Business
Family Lasting Personal
Citizen-Resident-Ethnicity-Religion
Org-Location-Origin
Founder
EmploymentMembership
OwnershipStudent-Alum
Investor
User-Owner-Inventor-Manufacturer
GeographicalSubsidiary
Sports-Affiliation
17sub-relationsof6relationsfrom2008“RelationExtractionTask”
![Page 29: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/29.jpg)
RelationExtraction
Classifytherelationbetweentwoentities inasentence
AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman
TimWagnersaid.
SUBSIDIARY
FAMILYEMPLOYMENT
NIL
FOUNDER
CITIZEN
INVENTOR…
![Page 30: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/30.jpg)
WordFeaturesforRelationExtraction
• HeadwordsofM1andM2,andcombinationAirlinesWagnerAirlines-Wagner
• BagofwordsandbigramsinM1andM2{American,Airlines,Tim,Wagner,AmericanAirlines,TimWagner}
• WordsorbigramsinparticularpositionsleftandrightofM1/M2M2:-1spokesmanM2:+1said
• Bagofwordsorbigramsbetweenthetwoentities{a,AMR,of,immediately,matched,move,spokesman,the,unit}
AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman TimWagnersaidMention1 Mention2
![Page 31: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/31.jpg)
NamedEntityTypeandMentionLevelFeaturesforRelationExtraction
• Named-entitytypes• M1:ORG• M2:PERSON
• Concatenationofthetwonamed-entitytypes• ORG-PERSON
• EntityLevelofM1andM2 (NAME,NOMINAL,PRONOUN)• M1:NAME [itor hewouldbePRONOUN]• M2:NAME [thecompanywouldbeNOMINAL]
AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman TimWagnersaidMention1 Mention2
![Page 32: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/32.jpg)
ParseFeaturesforRelationExtraction
• BasesyntacticchunksequencefromonetotheotherNPNPPPVPNPNP
• ConstituentpaththroughthetreefromonetotheotherNPé NPé Sé Sê NP
• DependencypathAirlines<- matched->Wagner->said
AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman TimWagnersaidMention1 Mention2
![Page 33: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/33.jpg)
Gazeteer andtriggerwordfeaturesforrelationextraction• Triggerlistforfamily:kinshipterms• parent,wife,husband,grandparent,etc.[fromWordNet]
• Gazeteer:• Listsofusefulgeoorgeopoliticalwords
• Countrynamelist• Othersub-entities
![Page 34: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/34.jpg)
AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesmanTimWagnersaid.
![Page 35: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/35.jpg)
Classifiersforsupervisedmethods
• Nowyoucanuseanyclassifieryoulike•MaxEnt• NaïveBayes• SVM• ...
• Trainitonthetrainingset,tuneonthedev set,testonthetestset
![Page 36: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/36.jpg)
EvaluationofSupervisedRelationExtraction
•ComputeP/R/F1 foreachrelation
36
P = # of correctly extracted relationsTotal # of extracted relations
R = # of correctly extracted relationsTotal # of gold relations
F1 =2PRP + R
![Page 37: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/37.jpg)
Summary:SupervisedRelationExtraction
+ Cangethighaccuracieswithenoughhand-labeledtrainingdata,iftestsimilarenoughtotraining- Labelingalargetraining setisexpensive
- Supervisedmodelsarebrittle, don’tgeneralizewelltodifferentgenres
![Page 38: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/38.jpg)
RelationExtraction
Supervisedrelationextraction
![Page 39: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/39.jpg)
RelationExtraction
Semi-supervisedandunsupervisedrelationextraction
![Page 40: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/40.jpg)
Seed-basedorbootstrappingapproachestorelationextraction
•Notraining set?Maybeyouhave:• Afewseedtuplesor• Afewhigh-precisionpatterns
•Canyouusethoseseedstodosomethinguseful?• Bootstrapping:usetheseedstodirectlylearntopopulatearelation
![Page 41: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/41.jpg)
RelationBootstrapping(Hearst1992)
•GatherasetofseedpairsthathaverelationR• Iterate:1. Findsentenceswiththesepairs2. Lookatthecontextbetweenoraroundthepair
andgeneralizethecontexttocreatepatterns3. Usethepatterns forgrep formorepairs
![Page 42: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/42.jpg)
Bootstrapping
• <MarkTwain,Elmira>Seedtuple• Grep (google)fortheenvironmentsoftheseedtuple“MarkTwainisburiedinElmira,NY.”
XisburiedinY“ThegraveofMarkTwainisinElmira”
ThegraveofXisinY“ElmiraisMarkTwain’sfinalrestingplace”
YisX’sfinalrestingplace.
• Usethosepatternstogrep fornewtuples• Iterate
![Page 43: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/43.jpg)
Dipre:Extract<author,book>pairs
• Startwith5seeds:
• FindInstances:TheComedyofErrors,byWilliamShakespeare,wasTheComedyofErrors,byWilliamShakespeare,isTheComedyofErrors,oneofWilliamShakespeare'searliestattemptsTheComedyofErrors,oneofWilliamShakespeare'smost
• Extractpatterns(groupbymiddle,takelongestcommonprefix/suffix)?x , by ?y , ?x , one of ?y ‘s
• Nowiterate,findingnewseedsthatmatchthepattern
Brin,Sergei.1998.ExtractingPatternsandRelationsfromtheWorldWideWeb.Author BookIsaacAsimov TheRobots ofDawnDavidBrin Startide RisingJamesGleick Chaos:MakingaNewScienceCharlesDickens GreatExpectationsWilliamShakespeare TheComedyofErrors
![Page 44: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/44.jpg)
Snowball
• Similariterativealgorithm
• Groupinstancesw/similarprefix,middle,suffix,extractpatterns• ButrequirethatXandYbenamedentities• Andcomputeaconfidenceforeachpattern
{’s, in, headquarters}
{in, based} ORGANIZATIONLOCATION
Organization LocationofHeadquartersMicrosoft RedmondExxon IrvingIBM Armonk
E.Agichtein andL.Gravano 2000.Snowball:ExtractingRelationsfromLargePlain-TextCollections.ICDL
ORGANIZATION LOCATION .69
.75
![Page 45: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/45.jpg)
DistantSupervision
•Combinebootstrappingwithsupervised learning• Insteadof5seeds,• Usealargedatabasetogethuge#ofseedexamples
•Createlotsoffeaturesfromalltheseexamples•Combineinasupervised classifier
Snow,Jurafsky,Ng.2005.Learningsyntacticpatternsforautomatichypernymdiscovery.NIPS17Fei WuandDanielS.Weld.2007.AutonomouslySemantifyingWikipeida.CIKM2007Mintz,Bills,Snow,Jurafsky.2009.Distantsupervisionforrelationextractionwithoutlabeleddata.ACL09
![Page 46: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/46.jpg)
Distantsupervisionparadigm
• Likesupervised classification:• Usesaclassifierwithlotsoffeatures• Supervisedbydetailedhand-createdknowledge• Doesn’trequireiterativelyexpandingpatterns
• Likeunsupervised classification:• Usesverylargeamountsofunlabeleddata• Notsensitivetogenreissuesintrainingcorpus
![Page 47: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/47.jpg)
Distantlysupervisedlearningofrelationextractionpatterns
Foreachrelation
Foreachtupleinbigdatabase
Findsentencesinlargecorpuswithbothentities
Extractfrequentfeatures(parse,words,etc)
Trainsupervisedclassifierusingthousandsofpatterns
4
1
2
3
5
PERwasborninLOCPER,born(XXXX), LOCPER’sbirthplaceinLOC
<EdwinHubble,Marshfield><AlbertEinstein,Ulm>
Born-In
HubblewasborninMarshfieldEinstein,born(1879),UlmHubble’sbirthplaceinMarshfield
P(born-in | f1,f2,f3,…,f70000)
![Page 48: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/48.jpg)
Unsupervisedrelationextraction
• OpenInformationExtraction:• extractrelationsfromthewebwithnotrainingdata,nolistofrelations
1. Useparseddatatotraina“trustworthytuple”classifier2. Single-passextractallrelationsbetweenNPs,keepiftrustworthy3. Assessorranksrelationsbasedontextredundancy
(FCI,specializesin,softwaredevelopment)(Tesla,invented,coiltransformer)
48
M.Banko,M.Cararella,S.Soderland,M.Broadhead, andO.Etzioni.2007.Openinformationextractionfromtheweb. IJCAI
![Page 49: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/49.jpg)
EvaluationofSemi-supervisedandUnsupervisedRelationExtraction
• Sinceitextractstotallynewrelationsfromtheweb• Thereisnogoldsetofcorrectinstancesofrelations!• Can’tcomputeprecision(don’tknowwhichonesarecorrect)• Can’tcomputerecall(don’tknowwhichonesweremissed)
• Instead,wecanapproximateprecision(only)• Drawarandomsampleofrelationsfromoutput,checkprecisionmanually
• Canalsocomputeprecisionatdifferentlevelsofrecall.• Precisionfortop1000newrelations,top10,000newrelations,top100,000• Ineachcasetakingarandomsampleofthatset
• Butnowaytoevaluaterecall49
P̂ = # of correctly extracted relations in the sampleTotal # of extracted relations in the sample
![Page 50: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any](https://reader030.vdocuments.us/reader030/viewer/2022040700/5d53392a88c993bb198b6476/html5/thumbnails/50.jpg)
RelationExtraction
Semi-supervisedandunsupervisedrelationextraction