quirk:project progress report december 3-5 2002 cycorp ibm

25
QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Upload: sophia-hannah-mccormick

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

QUIRK:Project Progress Report December 3-5 2002

Cycorp IBM

Page 2: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Notable Progress

• Query decomposition extensions

• Argument-structure approximation

• Syntactic analysis of textual sources

• Reflexive justifications

Page 3: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Single Literal Query Decomposition

P?

Q?, R?, Z?

Q? R? Z?

(Q & R & Z) P

Page 4: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Multi Literal Query Decomposition

P?

Q?, R?, Z?

Q?, R? Z?

(Q & R & Z) P

Page 5: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Multi Literal Query Decomposition

(likes Bob ?X)

(isa X French), (isa X Movie), (likes Amy X)

((isa X French) & (isa X Movie) & (likes Amy X)) (likes Bob X)

(likes Amy X)(isa X French), (isa X Movie),

Page 6: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Examples

• Joins in external DBs (NIMA, USGS… )

– Airports in Travis County, TX

– Hospitals located in port cities

– ...

• Web services, e.g. IMDB

– Actors from the ‘50s

• As a bridge between KR formats

Page 7: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Davidsonian KR bridge

Wellington defeated Napoleon in Waterloo.

(thereExists ?EV (and (isa ?EV DefeatingAnOpponent) (performedBy ?EV Wellington) (objectActedOn ?EV Napoleon) (eventOccursAt ?EV Waterloo)))

Page 8: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Argument-Type bridge

John lives in a French village

(thereExists ?V (and (isa ?V Village) (geographicalSubRegions France ?V) (residesInRegion John ?V)))

Page 9: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Registration of multi-literal removal modules

• at the moment sufficiently few such modules exist that they can be defined in code

• plans for declarative registration of such modules in Cyc’s KB even with run-time KB edits.

Page 10: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Arg based query generation

(thereExists ?EV (and (isa ?EV AttackOnObject) (maleficiary ?EV Djibouti) (performedBy ?EV ?WHO))

[SUBJ [VERB OBJ]]

@PHR(2 PERSON$ attack *Djibouti)

Page 11: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Secretary

• Input:– A CycL query such as (president France ?WHO)– A textual paragraph

• Output: a ranked list of CycL terms that– represent entities mentioned in the paragraph – are type-appropriate as substitutions for the free

variables in the query (?WHO:Person)

• Three types of Secretary

Page 12: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Secretary 1• Use IBM’s Talent system to learn new lexical

entries

• Tag paragraph with lexical mappings

• Select type-appropriate CycL tags

• Rank them by proximity to query focus, as determined by recorded position in the paragraph of all the ground terms in the CycL query

Page 13: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Secretary 2

• Use IBM’s Talent system to learn new lexical entries

• Use output of UPenn’s dependency parser to generate a set of CycL interpretation of the paragraph

• Select “best” interpretation

• Return CycL entity in the appropriate relationship to the query’s predicate.

Page 14: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Secretary 3• Use IBM’s Talent system to learn new lexical

entries

• Use output of UPenn’s dependency parser to generate a set of CycL interpretation of the paragraph

• Select “best” interpretation and turn it into a virtual assertion in Cyc’s KB

• Ask the original query in the KB so obtained

• Return all answers.

Page 15: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

General observations

• Secretary 2 and 3 have better precision than Secretary 1, but much lower recall– possibly due to the non-verb-like nature of

many Cyc predicates; need to check if the same holds true of multi-literal events

• Linear proximity of Secretary 1 is almost as good as the argument based analysis of Secretary 2 and 3

Page 16: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Introspective Justifications

Page 17: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Introspective Justifications

Page 18: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Dialog evaluation

• Basic knowledge representation performed for each of the topics

• Used KRAKEN GUI for interpretation of questions

• Used KRAKEN NL generation for reporting answers to analyst

Page 19: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

KRAKEN GUI

• Which paintings about war did Picasso create?

Page 20: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Contextual vs Keyhole approach

• Several questions asked simultaneously:– “Need background data on the Cuban dissident

Elizardo Sanchez to include birth data, education, work ethics, organization affiliations to name a few.”

• Analyst happy with a summary of all facts known about an entity of interest

Page 21: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Lessons learned

• Analysts like to ask/see questions/anwers in context

• Single question/single answer approach could be extended to:– dossier about entity X– preliminary dialog on desired properties of dossier

inferred from properties of entity X

• Justifications become interesting only if answers are sufficiently surprising.

Page 22: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Definitional Questions Evaluation

• Expectations:– large answer set– both redundancy AND irrelevance– opportunity for structuring answer set by salient

features of question focus

• Actual experience– limited answer set– mostly redundancy

Page 23: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Original Plan• Use appositives to learn type of entity

“Massimo Cacciari” “Venice Mayor”• Use Cyc to

– understand type (a kind of elected official)– generatelist of questions salient for the type

• when was he elected?

• what is his party affiliation?

• …

• Answer salient questions from textual sources

Page 24: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Revised Plan

• Use syntactic analysis to extract appositives and relevant VPs

• Cluster strings so extracted

• Return one string from each cluster, ranked by the size of the cluster.

Page 25: QUIRK:Project Progress Report December 3-5 2002 Cycorp IBM

Lessons learned

• Punctuation and function words are crucial

• Textual sources don’t always support an analysis by “salient features”

• Semantic analysis not necessarily useful of the end result is expected to be a string that could be easily interpreted by the analyst.