burc: b ootstrapping u sing r esearch c yc

33
BURC: BURC: B B ootstrapping ootstrapping U U sing sing R R esearch esearch C C yc yc By Kino Coursey By Kino Coursey

Upload: urian

Post on 30-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

BURC: B ootstrapping U sing R esearch C yc. By Kino Coursey. Introduction to the Problem. Goal: To extend Cyc’s knowledge base using “relationships implied to be possible, normal or commonplace in the world” Prior work with Cyc knowledge entry has been manually oriented - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BURC:  B ootstrapping  U sing  R esearch C yc

BURC: BURC: BBootstrapping ootstrapping UUsing sing RResearchesearchCCycyc

By Kino CourseyBy Kino Coursey

Page 2: BURC:  B ootstrapping  U sing  R esearch C yc

Introduction to the ProblemIntroduction to the Problem Goal: To extend Cyc’s knowledge base Goal: To extend Cyc’s knowledge base

using using “relationships implied to be possible, “relationships implied to be possible, normal or commonplace in the world”normal or commonplace in the world”

Prior work with Cyc knowledge entry has Prior work with Cyc knowledge entry has been manually orientedbeen manually oriented

How will we collect commonsense without How will we collect commonsense without a body and manual labor…?a body and manual labor…?

Read, Parse, Mine!Read, Parse, Mine! Proposal: Read text, Parse into a database, Proposal: Read text, Parse into a database,

Extract relations between words, Propose Extract relations between words, Propose hypothetical relations between conceptshypothetical relations between concepts

Page 3: BURC:  B ootstrapping  U sing  R esearch C yc

Common KnowledgeCommon Knowledge Using an information channel modelUsing an information channel model

• Information the Sender considers the Receiver to Information the Sender considers the Receiver to already knowalready know

• If the Sender does sends the info then …If the Sender does sends the info then … Receiver will consider the Receiver will consider the SenderSender to ‘lack intelligence or to ‘lack intelligence or

experience’ (experience’ (The sender is stupidThe sender is stupid).). Receiver will believe the sender thinks the Receiver will believe the sender thinks the ReceiverReceiver ‘lacks ‘lacks

intelligence or experience’ (intelligence or experience’ (The sender thinks I’m stupidThe sender thinks I’m stupid)) Possibly the Sender is clarifying which among many Possibly the Sender is clarifying which among many

possible common options they mean in this casepossible common options they mean in this case• Since both parties know the information to send it would Since both parties know the information to send it would

generate generate negative information contentnegative information content Explains why it is hard to find common sense on Explains why it is hard to find common sense on

the Internet!the Internet!

Page 4: BURC:  B ootstrapping  U sing  R esearch C yc

Basic AnalogyBasic Analogy

The Shotgun approach to the Human The Shotgun approach to the Human GenomeGenome

Extract millions of fragments then Extract millions of fragments then knit them back together by finding knit them back together by finding commonalitiescommonalities

Will it work for the Human Menome?Will it work for the Human Menome?

Page 5: BURC:  B ootstrapping  U sing  R esearch C yc

What is Cyc?What is Cyc? ““the world's largest and the world's largest and

most complete general most complete general knowledge base and knowledge base and commonsense reasoning commonsense reasoning engine”engine”

Started in mid 1980’s Started in mid 1980’s (“should take only 10 (“should take only 10 years….”)years….”)

Logic BasedLogic Based LISP orientedLISP oriented For WordNet users, each For WordNet users, each

Concept Concept ≈≈ Synset Synset Available from Available from

http://www.opencyc.orghttp://www.opencyc.org http://http://researchcyc.cyc.comresearchcyc.cyc.com

Big (ResearchCyc Big (ResearchCyc v0.8)v0.8)• Constants Constants 89,37989,379• Assertions Assertions 968,985968,985• Deduction Deduction 361,185361,185

Sample Collection ExtentsSample Collection Extents• EnglishWord EnglishWord 18,00718,007• Event Event 6,0506,050• PartiallyTangible PartiallyTangible 24,38724,387• Microtheory Microtheory 1,6881,688

Page 6: BURC:  B ootstrapping  U sing  R esearch C yc

Example of what Cyc currently Example of what Cyc currently knows about fingersknows about fingers

Collection : Collection : FingerFingerGAF Arg : 1GAF Arg : 1Mt : Mt : UniversalVocabularyMtUniversalVocabularyMt

isaisa : : AnimalBodyPartTypeAnimalBodyPartType genlsgenls : : Digit-Digit-AnatomicalPartAnatomicalPart

commentcomment : : "The collection of all digits "The collection of all digits of all of all HandHands (q.v.). Fingers are s (q.v.). Fingers are (typically) flexibly jointed and are (typically) flexibly jointed and are necessary to enabling the hand (and its necessary to enabling the hand (and its owner) to perform grasping and owner) to perform grasping and manipulation actions." manipulation actions."

Mt : Mt : BaseKBBaseKBdefiningMtdefiningMt : : AnimalPhysiologyVocabularyMtAnimalPhysiologyVocabularyMt

Mt : Mt : AnimalPhysiologyMtAnimalPhysiologyMtproperPhysicalPartTypesproperPhysicalPartTypes : : FingernailFingernail

Mt : Mt : WordNetMappingMtWordNetMappingMt ((synonymousExternalConceptsynonymousExternalConcept FingerFinger WordNet-Version2_0WordNet-Version2_0 "N05247839") "N05247839") ((synonymousExternalConceptsynonymousExternalConcept FingerFinger WordNet-1997VersionWordNet-1997Version "N04312497") "N04312497")

GAF Arg : 2GAF Arg : 2Mt : Mt : UniversalVocabularyMtUniversalVocabularyMt

((genlsgenls LittleFingerLittleFinger FingerFinger)) ((genlsgenls IndexFingerIndexFinger FingerFinger)) ((genlsgenls ThumbThumb FingerFinger)) ((genlsgenls RingFingerRingFinger FingerFinger)) ((genlsgenls MiddleFingerMiddleFinger FingerFinger))

Mt : HumanActivitiesMtMt : HumanActivitiesMt (bodyPartsUsed-TypeType Typing (bodyPartsUsed-TypeType Typing Finger)Finger)

Mt : HumanSocialLifeMtMt : HumanSocialLifeMt (bodyPartsUsed-TypeType (bodyPartsUsed-TypeType PointingAFinger Finger)PointingAFinger Finger)

Page 7: BURC:  B ootstrapping  U sing  R esearch C yc

Example of what Cyc currently Example of what Cyc currently knows about fingers - 2knows about fingers - 2

Mt : Mt : AnimalPhysiologyMtAnimalPhysiologyMt

-(-(conceptuallyRelatedconceptuallyRelated FingernailFingernail FingerFinger)) ((properPhysicalPartTypesproperPhysicalPartTypes HandHand FingerFinger)) ((relationAllInstancerelationAllInstance ageage FingerFinger               ((YearsDurationYearsDuration 0 200)) 0 200)) ((relationAllInstancerelationAllInstance widthOfObjectwidthOfObject FingerFinger               ((MeterMeter 0.001 0.2)) 0.001 0.2)) ((relationAllInstancerelationAllInstance heightOfObjectheightOfObject FingerFinger               ((MeterMeter 0.001 0.2)) 0.001 0.2)) ((relationAllInstancerelationAllInstance lengthOfObjectlengthOfObject FingerFinger               ((MeterMeter 0.01 0.5)) 0.01 0.5)) ((relationAllInstancerelationAllInstance massOfObjectmassOfObject FingerFinger               ((KilogramKilogram 0.001 1)) 0.001 1))

GAF Arg : 3GAF Arg : 3

Mt : Mt : HumanPhysiologyMtHumanPhysiologyMt ((relationAllExistsrelationAllExists anatomicalPartsanatomicalParts HomoSapiensHomoSapiens FingerFinger))

Mt : Mt : VertebratePhysiologyMtVertebratePhysiologyMt ((relationAllExistsCountrelationAllExistsCount physicalPartsphysicalParts HandHand FingerFinger 5) 5)

Mt : Mt : UniversalVocabularyMtUniversalVocabularyMt ((relationAllOnlyrelationAllOnly wornOnwornOn Ring-JewelryRing-Jewelry FingerFinger))

Mt : Mt : AnimalPhysiologyMtAnimalPhysiologyMt ((relationExistsAllrelationExistsAll physicalPartsphysicalParts HandHand FingerFinger))

GAF Arg : 4GAF Arg : 4

Mt : Mt : GeneralEnglishMtGeneralEnglishMt ((denotationdenotation Finger-Finger-TheWordTheWord CountNounCountNoun 0 0 FingerFinger))

Page 8: BURC:  B ootstrapping  U sing  R esearch C yc

Bootstrapping with ResearchCycBootstrapping with ResearchCyc

Cyc has vocabulary about objects in the Cyc has vocabulary about objects in the world and relationshipsworld and relationships

Cyc could still use more common Cyc could still use more common relationshipsrelationships

BURC uses what Cyc already has + lots of BURC uses what Cyc already has + lots of parsed text to create new Cyc entries for parsed text to create new Cyc entries for common relationships found in the textcommon relationships found in the text

Lenat’s Bootstrap HypothesisLenat’s Bootstrap Hypothesis: once : once Cyc reaches a certain level/scale it can Cyc reaches a certain level/scale it can help in its own development and start help in its own development and start using NLP to augment its knowledge baseusing NLP to augment its knowledge base

BURC should help test this hypothesisBURC should help test this hypothesis

Page 9: BURC:  B ootstrapping  U sing  R esearch C yc

The BURC ProcessThe BURC Process From seeds…Hypothe-seed’s From seeds…Hypothe-seed’s

Use the link grammar parser for bulk Use the link grammar parser for bulk parsing of text, primarily narratives parsing of text, primarily narratives based in ‘worlds like ours’. Other text based in ‘worlds like ours’. Other text styles could be included. styles could be included.

Operates in two directions: Operates in two directions: • Forward from text to CycLForward from text to CycL• Backwards from existing CycL to the text to Backwards from existing CycL to the text to

find new forward patternsfind new forward patterns

Page 10: BURC:  B ootstrapping  U sing  R esearch C yc

BURC Process - 2BURC Process - 2 Load the link fragments into a database (1 and 2 Load the link fragments into a database (1 and 2

link fragments), and compute frequency of link fragments), and compute frequency of fragment occurrences. The database will be in a fragment occurrences. The database will be in a SQL format so multiple queries can be formed SQL format so multiple queries can be formed dynamically.dynamically.

Using Cyc knowledge as a starting point (the Using Cyc knowledge as a starting point (the seeds), extract knowledge for use in Cyc:seeds), extract knowledge for use in Cyc:• Given a set of seed facts in Cyc, identify how those facts Given a set of seed facts in Cyc, identify how those facts

are represented as link fragments in the databaseare represented as link fragments in the database• Generate conjectures as to new knowledge AND new Generate conjectures as to new knowledge AND new

knowledge extraction patterns using the fragment knowledge extraction patterns using the fragment patterns.patterns.

Page 11: BURC:  B ootstrapping  U sing  R esearch C yc

BURC Process - 3BURC Process - 3 Use Cyc knowledge directly to conjecture new Use Cyc knowledge directly to conjecture new

statements: statements: • Cyc has lexical knowledge, which can be used as Cyc has lexical knowledge, which can be used as

templates against the DB to form new statementstemplates against the DB to form new statements• For example, common adjectives applied to noun classes For example, common adjectives applied to noun classes • Cyc knows “WhiteColor” and “Blouse” but does not know Cyc knows “WhiteColor” and “Blouse” but does not know

that white is a common blouse color, although it becomes that white is a common blouse color, although it becomes apparent after reading some textapparent after reading some text

Optionally, gather supporting background statistics Optionally, gather supporting background statistics for hypothesis verification using other sources: for hypothesis verification using other sources: • Perhaps Google desktop with a larger than fully parsed Perhaps Google desktop with a larger than fully parsed

corpuscorpus• Perhaps check against answer extraction enginesPerhaps check against answer extraction engines

Page 12: BURC:  B ootstrapping  U sing  R esearch C yc

Flow of ProcessingFlow of Processing

BNC Data

Frag File

Merged Frag File

Cyc/Rcyc

Hypothesis File

Extractor / DB Manager

Parser1 Parser2 Parser3 Parser4 Parser5

Frag File

Frag File

Frag File

Frag File

Link Fragments DB

Page 13: BURC:  B ootstrapping  U sing  R esearch C yc

KNEXT (KNEXT (KNKNowledge owledge EXEXtraction traction from from TText)ext)

Deriving general world knowledge from texts and Deriving general world knowledge from texts and taxonomies:taxonomies:• http://www.cs.rochester.edu/~schubert/projects/world-http://www.cs.rochester.edu/~schubert/projects/world-

knowledge-mining.htmlknowledge-mining.html• Lenhart K. Schubert and Matthew Tong, Lenhart K. Schubert and Matthew Tong,

"Extracting and evaluating general world knowledge from "Extracting and evaluating general world knowledge from the Brown Corpus"the Brown Corpus", , Proc. of the HLT-NAACL Workshop on Text MeaningProc. of the HLT-NAACL Workshop on Text Meaning, May , May 31, 2003, Edmonton, Alberta, pp. 7-13.31, 2003, Edmonton, Alberta, pp. 7-13.

System extracts commonsense relationships from System extracts commonsense relationships from texttext

Limited to the pre-parsed Penn TreebankLimited to the pre-parsed Penn Treebank Generated 117,326 propositions (about 2 per Generated 117,326 propositions (about 2 per

sentence)sentence) About 60% judged reasonable by any given judgeAbout 60% judged reasonable by any given judge

Page 14: BURC:  B ootstrapping  U sing  R esearch C yc

KNEXT (Example) KNEXT (Example) (BLANCHE KNEW 0 SOMETHING MUST BE CAUSING STANLEY 'S NEW, STRANGE (BLANCHE KNEW 0 SOMETHING MUST BE CAUSING STANLEY 'S NEW, STRANGE

BEHAVIOR BUT SHE NEVER ONCE CONNECTED IT WITH KITTI WALKER.) BEHAVIOR BUT SHE NEVER ONCE CONNECTED IT WITH KITTI WALKER.)

A FEMALE-INDIVIDUAL MAY KNOW A PROPOSITION.A FEMALE-INDIVIDUAL MAY KNOW A PROPOSITION.SOMETHING MAY CAUSE A BEHAVIOR. SOMETHING MAY CAUSE A BEHAVIOR. A MALE-INDIVIDUAL MAY HAVE A BEHAVIOR. A MALE-INDIVIDUAL MAY HAVE A BEHAVIOR. A BEHAVIOR CAN BE NEW. A BEHAVIOR CAN BE NEW. A BEHAVIOR CAN BE STRANGE. A BEHAVIOR CAN BE STRANGE. A FEMALE-INDIVIDUAL MAY CONNECT A THING-REFERRED-TO WITH A FEMALE-A FEMALE-INDIVIDUAL MAY CONNECT A THING-REFERRED-TO WITH A FEMALE-

INDIVIDUAL.INDIVIDUAL. ((:I (:Q DET FEMALE-INDIVIDUAL) KNOW[V] (:Q DET PROPOS))((:I (:Q DET FEMALE-INDIVIDUAL) KNOW[V] (:Q DET PROPOS)) (:I (:F K SOMETHING[N]) CAUSE[V] (:Q THE BEHAVIOR[N])) (:I (:F K SOMETHING[N]) CAUSE[V] (:Q THE BEHAVIOR[N])) (:I (:Q DET MALE-INDIVIDUAL) HAVE[V] (:Q DET BEHAVIOR[N])) (:I (:Q DET MALE-INDIVIDUAL) HAVE[V] (:Q DET BEHAVIOR[N])) (:I (:Q DET BEHAVIOR[N]) NEW[A]) (:I (:Q DET BEHAVIOR[N]) NEW[A]) (:I (:Q DET BEHAVIOR[N]) STRANGE[A]) (:I (:Q DET BEHAVIOR[N]) STRANGE[A]) (:I (:Q DET FEMALE-INDIVIDUAL) CONNECT[V] (:Q DET THING-REFERRED-TO) (:I (:Q DET FEMALE-INDIVIDUAL) CONNECT[V] (:Q DET THING-REFERRED-TO) (:P WITH[P] (:Q DET FEMALE-INDIVIDUAL))))(:P WITH[P] (:Q DET FEMALE-INDIVIDUAL))))

Page 15: BURC:  B ootstrapping  U sing  R esearch C yc

Other Extraction Pattern ResearchOther Extraction Pattern Research

Towards Terascale Knowledge Acquisition Towards Terascale Knowledge Acquisition (Pantel, Ravichandran and Hovy, 2004)(Pantel, Ravichandran and Hovy, 2004)

Learning Surface Text Patterns for a Learning Surface Text Patterns for a Question Answering System (Ravichandran Question Answering System (Ravichandran & Hovy, 2002)& Hovy, 2002)

Defined Pattern Precision P = Ca/CoDefined Pattern Precision P = Ca/CoCa = total number of patterns with answer term presentCa = total number of patterns with answer term presentCo = Total number of patterns with any term presentCo = Total number of patterns with any term present

DIRT – DIRT – DDiscovery of iscovery of IInference nference RRules from ules from TText (Lin & Pantel, 2001)ext (Lin & Pantel, 2001)

Page 16: BURC:  B ootstrapping  U sing  R esearch C yc

Other Lexical Knowledge ResearchOther Lexical Knowledge Research

VerbOcean (Chklovski & Pantel): Collecting VerbOcean (Chklovski & Pantel): Collecting pairs and searching to verify relationshipspairs and searching to verify relationships

Lexical Acquisition via Constraint Solving Lexical Acquisition via Constraint Solving (Pedersen & Chen): Acquiring syntactic (Pedersen & Chen): Acquiring syntactic and semantic classification rules of and semantic classification rules of unknown words for LGPunknown words for LGP

Information Extraction Using Link Information Extraction Using Link Grammar papersGrammar papers

Automatic Meaning Discovery Using Automatic Meaning Discovery Using GoogleGoogle

Page 17: BURC:  B ootstrapping  U sing  R esearch C yc

Forward Mining Adjective RelationsForward Mining Adjective Relations

There are 1941 GAF’s on There are 1941 GAF’s on adjSemTrans,adjSemTrans, the the primary lexical adjective predicateprimary lexical adjective predicate

Find applicable fragments and use definitions:Find applicable fragments and use definitions:• ““Select * from LGPTable Where NumLinks=1 and Select * from LGPTable Where NumLinks=1 and

Link1='a' and Term1 like '%.a' and Term2 like '%.n‘ ”Link1='a' and Term1 like '%.a' and Term2 like '%.n‘ ”• Returns records [Term1.a | a | Term2.n] Returns records [Term1.a | a | Term2.n] • Potentially test using either an internal or search engine Potentially test using either an internal or search engine

based relevancy metricbased relevancy metric• Query Cyc for “(adjSemTrans <term1>-TheWord ?N Query Cyc for “(adjSemTrans <term1>-TheWord ?N

RegularAdjFrame (?Pred :NOUN ?Val))”RegularAdjFrame (?Pred :NOUN ?Val))”• Generate (plausiblePredValOFType <term2> <?Pred> Generate (plausiblePredValOFType <term2> <?Pred>

<?Val>)<?Val>)• Possibly generate parsing rulePossibly generate parsing rule

Page 18: BURC:  B ootstrapping  U sing  R esearch C yc

Mining Adjective Knowledge Mining Adjective Knowledge ExampleExample

““white blouse” as factoidwhite blouse” as factoid [white.a | a | blouse.n][white.a | a | blouse.n] Potentially test using an internal or search Potentially test using an internal or search

engine relevancy metric [GC=70400]engine relevancy metric [GC=70400] (adjSemTrans White-TheWord 11 (adjSemTrans White-TheWord 11

RegularAdjFrame RegularAdjFrame (mainColorOfObject :NOUN WhiteColor))(mainColorOfObject :NOUN WhiteColor))

Hypothesis: Hypothesis: (plausiblePredValueOfType (plausiblePredValueOfType Blouse mainColorOfObject WhiteColor)Blouse mainColorOfObject WhiteColor)

Page 19: BURC:  B ootstrapping  U sing  R esearch C yc

Mined Finger DescriptionsMined Finger Descriptions000010:(#$plausiblePredValueOfType #$Finger #$feelsSensation (#$PositiveAmountFn 000010:(#$plausiblePredValueOfType #$Finger #$feelsSensation (#$PositiveAmountFn

#$LevelOfSoreness)) #$LevelOfSoreness)) 000037:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong) 000037:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong) 000025:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong)000025:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong)000025:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject #$Hard) 000025:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject #$Hard) 000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject 000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject

(#$MediumToVeryHighAmountFn #$Hardness))(#$MediumToVeryHighAmountFn #$Hardness))000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject 000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject

(#$MediumToVeryHighAmountFn #$Hardness))(#$MediumToVeryHighAmountFn #$Hardness))000002:(#$plausiblePredValueOfType #$Finger #$hasEvaluativeQuantity 000002:(#$plausiblePredValueOfType #$Finger #$hasEvaluativeQuantity

(#$MediumToVeryHighAmountFn #$Goodness-Generic))(#$MediumToVeryHighAmountFn #$Goodness-Generic))000002:(#$plausiblePredValueOfType #$Finger #$hasPhysicalAttractiveness #$GoodLooking) 000002:(#$plausiblePredValueOfType #$Finger #$hasPhysicalAttractiveness #$GoodLooking) 000047:(#$plausiblePredValueOfType #$Finger #$isa (#$LeftObjectOfPairFn :REPLACE)) 000047:(#$plausiblePredValueOfType #$Finger #$isa (#$LeftObjectOfPairFn :REPLACE)) 000015:(#$plausiblePredValueOfType #$Finger #$isa (#$RightObjectOfPairFn :REPLACE)) 000015:(#$plausiblePredValueOfType #$Finger #$isa (#$RightObjectOfPairFn :REPLACE)) 000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn 000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn

#$lengthOfObject :REPLACE #$highAmountOf))#$lengthOfObject :REPLACE #$highAmountOf))000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn 000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn

#$lengthOfObject :REPLACE #$highToVeryHighAmountOf)) #$lengthOfObject :REPLACE #$highToVeryHighAmountOf)) 000003:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$BlackColor) 000003:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$BlackColor) 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$LightYellowishBrown-000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$LightYellowishBrown-

Color) Color) 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject

#$ModerateYellowishBrown-Color)#$ModerateYellowishBrown-Color)000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$SunTan-FleshColor) 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$SunTan-FleshColor) 000002:(#$plausiblePredValueOfType #$Finger #$possessiveRelation #$SuddenChange) 000002:(#$plausiblePredValueOfType #$Finger #$possessiveRelation #$SuddenChange)

Page 20: BURC:  B ootstrapping  U sing  R esearch C yc

Mined Finger DescriptionsMined Finger Descriptions000006:(#$plausiblePredValueOfType #$Finger #$possessiveRelation (#$HighAmountFn 000006:(#$plausiblePredValueOfType #$Finger #$possessiveRelation (#$HighAmountFn

#$Speed))#$Speed))000094:(#$plausiblePredValueOfType #$Finger #$rigidityOfObject (#$HighAmountFn 000094:(#$plausiblePredValueOfType #$Finger #$rigidityOfObject (#$HighAmountFn

#$Rigidity))#$Rigidity))000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject

(#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highAmountOf)) (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highAmountOf)) 000052:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000052:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject

(#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highToVeryHighAmountOf))#$highToVeryHighAmountOf))

000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highToVeryHighAmountOf))#$highToVeryHighAmountOf))

000285:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000285:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$veryLowToLowAmountOf))#$veryLowToLowAmountOf))

000074:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000074:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$veryLowToLowAmountOf)) #$veryLowToLowAmountOf))

000029:(#$plausiblePredValueOfType #$Finger #$speedOfObject-Underspecified 000029:(#$plausiblePredValueOfType #$Finger #$speedOfObject-Underspecified (#$LowAmountFn #$Speed)) (#$LowAmountFn #$Speed))

000138:(#$plausiblePredValueOfType #$Finger #$surfaceFeatureOfObj #$Slippery) 000138:(#$plausiblePredValueOfType #$Finger #$surfaceFeatureOfObj #$Slippery) 000074:(#$plausiblePredValueOfType #$Finger #$temperatureOfObject #$Warm) 000074:(#$plausiblePredValueOfType #$Finger #$temperatureOfObject #$Warm) 000004:(#$plausiblePredValueOfType #$Finger #$textureOfObject #$Rough) 000004:(#$plausiblePredValueOfType #$Finger #$textureOfObject #$Rough) 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject

(#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highAmountOf)) (#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highAmountOf)) 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject

(#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highToVeryHighAmountOf))(#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highToVeryHighAmountOf))000182:(#$plausiblePredValueOfType #$Finger #$wetnessOfObject #$Wet)000182:(#$plausiblePredValueOfType #$Finger #$wetnessOfObject #$Wet)

Page 21: BURC:  B ootstrapping  U sing  R esearch C yc

Verb Semantic Filtering -1Verb Semantic Filtering -1Discovering what a finger can do…Discovering what a finger can do…

A similar process can be used finding information based on verb A similar process can be used finding information based on verb semantic parsing framessemantic parsing frames

For each potential <NOUNWORD>-<VERB> pair query Cyc to find For each potential <NOUNWORD>-<VERB> pair query Cyc to find basic relationships using the verb semantic templatesbasic relationships using the verb semantic templates

(#$and (#$and (#$denotation <NOUNWORD> ?NOUNTYPE ?N ?CYCTERM)(#$denotation <NOUNWORD> ?NOUNTYPE ?N ?CYCTERM) (#$wordForms ?WORD ?PRED ""<VERB>"")(#$wordForms ?WORD ?PRED ""<VERB>"") (#$speechPartPreds ?POS ?PRED)(#$speechPartPreds ?POS ?PRED) (#$semTransPredForPOS ?POS ?SEMTRANSPRED)(#$semTransPredForPOS ?POS ?SEMTRANSPRED) (?SEMTRANSPRED ?WORD ?NUM ?FRAME ?TEMPLATE))(?SEMTRANSPRED ?WORD ?NUM ?FRAME ?TEMPLATE))

Verify for each potential relationship (<SPRED> <VERTERM> Verify for each potential relationship (<SPRED> <VERTERM> <CYCTERM>) derivable from ?TEMPLATE that it makes sense in <CYCTERM>) derivable from ?TEMPLATE that it makes sense in the ontologythe ontology

(#$and (#$and (#$arg1Isa <SPRED> ?VTYP)(#$arg1Isa <SPRED> ?VTYP) (#$arg2Isa <SPRED> ?CTYP)(#$arg2Isa <SPRED> ?CTYP) (#$genls <CYCTERM> ?CTYP)(#$genls <CYCTERM> ?CTYP) (#$genls <VERBTERM> ?VTYP) )(#$genls <VERBTERM> ?VTYP) )

Page 22: BURC:  B ootstrapping  U sing  R esearch C yc

Verb Semantic Filtering -2Verb Semantic Filtering -2Templates of Movement…Templates of Movement…

((verbSemTransverbSemTrans Move-Move-TheWordTheWord 0 0 IntransitiveVerbFrameIntransitiveVerbFrame        (       (andand            (           (isaisa :ACTION :ACTION MovementEventMovementEvent) )            (           (primaryObjectMovingprimaryObjectMoving :ACTION :SUBJECT))) :ACTION :SUBJECT)))

((verbSemTransverbSemTrans Move-Move-TheWordTheWord 1 1 IntransitiveVerbFrameIntransitiveVerbFrame        (       (andand            (           (isaisa :ACTION :ACTION ChangeOfResidenceChangeOfResidence) )            (           (performedByperformedBy :ACTION :SUBJECT))) :ACTION :SUBJECT)))

((verbSemTransverbSemTrans Move-Move-TheWordTheWord 2 2 TransitiveNPFrameTransitiveNPFrame        (       (andand            (           (isaisa :ACTION :ACTION CausingAnotherObjectsTranslationalMotionCausingAnotherObjectsTranslationalMotion) )            (           (objectActedOnobjectActedOn :ACTION :OBJECT) :ACTION :OBJECT)            (           (doneBydoneBy :ACTION :SUBJECT))) :ACTION :SUBJECT)))

((arg1Isaarg1Isa performedByperformedBy ActionAction))((arg2Isaarg2Isa performedByperformedBy Agent-GenericAgent-Generic) )

Page 23: BURC:  B ootstrapping  U sing  R esearch C yc

Verb Semantic Filtering - 3Verb Semantic Filtering - 3 BURC can use Cyc’s knowledge of what things can perform BURC can use Cyc’s knowledge of what things can perform

what actions or have what attributes to filter out what actions or have what attributes to filter out implausible relationships.implausible relationships.

(#$behaviorCapableOf #$Finger #$CausingAnotherObjectsTranslationalMotion #$doneBy) (#$behaviorCapableOf #$Finger #$CausingAnotherObjectsTranslationalMotion #$doneBy) (#$behaviorCapableOf #$Finger #$ChangeOfResidence #$performedBy)(#$behaviorCapableOf #$Finger #$ChangeOfResidence #$performedBy)(#$behaviorCapableOf #$Finger #$Inspecting #$performedBy)(#$behaviorCapableOf #$Finger #$Inspecting #$performedBy) (#$behaviorCapableOf #$Finger #$Movement-TranslationEvent #$primaryObjectMoving) (#$behaviorCapableOf #$Finger #$Movement-TranslationEvent #$primaryObjectMoving) (#$behaviorCapableOf #$Finger #$MovementEvent #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$MovementEvent #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$PushingAnObject #$providerOfMotiveForce)(#$behaviorCapableOf #$Finger #$PushingAnObject #$providerOfMotiveForce)(#$behaviorCapableOf #$Finger #$Sliding-Generic #$objectMoving) (#$behaviorCapableOf #$Finger #$Sliding-Generic #$objectMoving) (#$behaviorCapableOf #$Finger #$Sliding-Generic #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$Sliding-Generic #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$Slipping #$objectMoving) (#$behaviorCapableOf #$Finger #$Slipping #$objectMoving) (#$behaviorCapableOf #$Finger #$Slipping #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$Slipping #$primaryObjectMoving)

Cyc Cyc cancan help in its own knowledge entry process. 62% of help in its own knowledge entry process. 62% of generated hypothesis were filtered out using semantic role generated hypothesis were filtered out using semantic role filtering.filtering.

Page 24: BURC:  B ootstrapping  U sing  R esearch C yc

The General Backwards ModelThe General Backwards Model

Given some Cyc relation Pred(?X,?Y)Given some Cyc relation Pred(?X,?Y) Create SQL search queryCreate SQL search query

• Lookup in Cyc lexical entries for X & Y Lookup in Cyc lexical entries for X & Y LX, LY LX, LY• Select * from LGPTable where Term1="<LX>" and Select * from LGPTable where Term1="<LX>" and

Term3="<LY>“Term3="<LY>“• System returns records [LX | Link1 | Term2 | Link2 | LY] (Freq) System returns records [LX | Link1 | Term2 | Link2 | LY] (Freq)

Generate new hypothetical extraction Generate new hypothetical extraction patternspatterns• Select * from LGPTable where Link1="<L1>" and Link2="<L2>" Select * from LGPTable where Link1="<L1>" and Link2="<L2>"

and Term2="<T2>“and Term2="<T2>“• [* L1 T2 L2 *] [* L1 T2 L2 *] generate hypothetical record ( Pred |?S1|?S3 ) generate hypothetical record ( Pred |?S1|?S3 )• Frequency information is propagated forwardFrequency information is propagated forward

Page 25: BURC:  B ootstrapping  U sing  R esearch C yc

Flow of ProcessingFlow of Processing

BNC Data

Frag File

Merged Frag File

Cyc/Rcyc

Hypothesis File

Extractor / DB Manager

Parser1 Parser2 Parser3 Parser4 Parser5

Frag File

Frag File

Frag File

Frag File

Link Fragments DB

Page 26: BURC:  B ootstrapping  U sing  R esearch C yc

Running the systemRunning the system

Used a filtered set of the BNC (650 Used a filtered set of the BNC (650 Meg of data)Meg of data)

5 parsers running in parallel for 70 5 parsers running in parallel for 70 hours generated 1.91 Gig of outputhours generated 1.91 Gig of output

Reduced to 1 Gig of unique records Reduced to 1 Gig of unique records with countswith counts

783 Meg or 22 million fragments783 Meg or 22 million fragments

Page 27: BURC:  B ootstrapping  U sing  R esearch C yc

Frequency of FragmentsFrequency of Fragments

The distribution of The distribution of fragments follow a fragments follow a smooth curve in smooth curve in log spacelog space

Similar to zipf Similar to zipf distribution for distribution for words, characters words, characters and n-gramsand n-grams

Number of fragments at each frequency level

1

10

100

1000

10000

100000

1000000

10000000

100000000

Number of Occurances

Nu

mb

er o

f F

rag

men

ts

Page 28: BURC:  B ootstrapping  U sing  R esearch C yc

The Hunt for Common FragmentsThe Hunt for Common Fragments

Forward mining was run over Forward mining was run over adjective links with more than one adjective links with more than one fragment and subject-verb with more fragment and subject-verb with more than two linksthan two links

In both cases this was approximately In both cases this was approximately the top 15% for each search classthe top 15% for each search class

Page 29: BURC:  B ootstrapping  U sing  R esearch C yc

ReductionsReductions

0

200000

400000

600000

800000

1000000

Elements

Filtering Stage

From Fragments into Hypothesis

Adjectives 996810 147074 26690

Subject-Verbs 934029 144208 9079

Raw Fragments

Common Fragments

Generated Hypothesis

Page 30: BURC:  B ootstrapping  U sing  R esearch C yc

A source of potential knowledgeA source of potential knowledge The various versions The various versions

of Cyc have 10 to 20 of Cyc have 10 to 20 assertions per assertions per constantconstant

BURC generates 14.29 BURC generates 14.29 hypothetical hypothetical assertions per assertions per constantconstant

Need to quantify the Need to quantify the quality of BURC quality of BURC knowledgeknowledge

0

50

100

150

200

250

300

350

1 5 9 13 17 21 25 29 33 37 41 45 49

Number of Hypothesis

Hypothesis generated for constants

Number ofconstants

Page 31: BURC:  B ootstrapping  U sing  R esearch C yc

Future Work -1Future Work -1 Modify Cyc to utilize the extracted knowledgeModify Cyc to utilize the extracted knowledge

• Question generation (curiosity ?)Question generation (curiosity ?)• Noticing exceptionsNoticing exceptions

Update parser and generate data in other Update parser and generate data in other knowledge formats (i.e. OpenMind/ConceptNet)knowledge formats (i.e. OpenMind/ConceptNet)

Generate better filtering methods for polysemous Generate better filtering methods for polysemous words in fragmentswords in fragments

Use synonyms and antonyms to expand Use synonyms and antonyms to expand hypothesis using WordNethypothesis using WordNet

Examine effect of reporting the unusual instead of Examine effect of reporting the unusual instead of the usualthe usual

Page 32: BURC:  B ootstrapping  U sing  R esearch C yc

Future Work -2Future Work -2 Define admissibility criteria. How much Define admissibility criteria. How much

evidence is necessary to consider a fact evidence is necessary to consider a fact worthy of addition to the KB as worthy of addition to the KB as commonplace? commonplace?

Determine performance relative to and in Determine performance relative to and in conjunction with volunteer commonsense conjunction with volunteer commonsense knowledge entry projects.knowledge entry projects.

Create an interface for quick review of Create an interface for quick review of hypothesis by humans.hypothesis by humans.

Utilize knowledge and experience on the Utilize knowledge and experience on the backwards minerbackwards miner

Page 33: BURC:  B ootstrapping  U sing  R esearch C yc

Can we ever be “Done” ?Can we ever be “Done” ?

Explore definition of semantic Explore definition of semantic coverage metrics for unmapped coverage metrics for unmapped domains. domains.

The space of 2.4K of binary The space of 2.4K of binary predicates applied to 85K constants predicates applied to 85K constants provides a 16 trillion combination provides a 16 trillion combination search space, only a fraction of search space, only a fraction of would be considered part of would be considered part of ‘common knowledge’. ‘common knowledge’.