dr. douglas b. lenat, 7718 wood hollow drive, austin, tx 78731 email: [email protected] phone:...

82
Dr. Douglas B. Lenat , 7718 Wood Hollow Drive, Austin, TX 78731 Email: [email protected] Phone: 512-342-4001 Computers Computers versus versus Common Sense Common Sense C C YC: YC: Software that partially Software that partially understands understands The impact of that on the AIM The impact of that on the AIM dream dream May 4, 2009

Upload: melissa-roberts

Post on 26-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

Dr. Douglas B. Lenat

, 7718 Wood Hollow Drive, Austin, TX 78731 Email: [email protected] Phone: 512-342-4001

Computers Computers versus versus Common SenseCommon Sense

• CCYC: YC: Software that partially understandsSoftware that partially understands

• The impact of that on the AIM dreamThe impact of that on the AIM dream

May 4, 2009

Page 2: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

ELIZA (DOCTOR)

1965 Joe Weizenbaum, MIT

• Carl Rogers advocated reflectionPatient: “I swear a lot.”

“How do you feel about the fact that you swear a lot?”

• Counted a lot on anthropomorphism

• Count on patient-to-doctor respect/awe/…Patient: “My dog’s mother died recently.”

“Tell me more about your mother.”

Veneer of intelligence, not intelligence

Page 3: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

ALICE (40 years post-Eliza)

• JUDGE11[9:26:0]I WILL ASK YOU SOME SIMPLE QUESTIONS AND GIVE SIMPLE COMMANDS

• PROGRAM[9:26:0]Do you think your plan will succeed?

• JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR• PROGRAM[9:26:16]WHAT COLOR IS a blue car

• JUDGE11[9:26:35]WHERE IS SUE’S NOSE WHEN SUE IS IN HER HOUSE

• PROGRAM[9:26:35]Where it belongs. Try searching the Web.

Page 4: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 5: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 6: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

2009: Software is still Brittle

“How old was Martin van Buren when he was elected President of the U.S.?”

“Is the Space Needle taller than the Eiffel Tower?”

Page 7: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Natural Language Understanding requires having lots of knowledge

1. The pen is in the box.

The box is in the pen.

2. The police watched the demonstrators because they feared violence.

The police watched the demonstrators because because they advocated violence.

3. Mary and Sue are sisters.

Mary and Sue are mothers.

4. Every American has a mother.

Every American has a president.

5. John saw his brother skiing on TV. The fool didn’t have a coat on!

John saw his brother skiing on TV. The fool didn’t recognize him!

Page 8: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

7. “…include all the re-do CABG procedures utilizing ITA and SVG in 1991”.

“And” usually does mean “and”. But in this query, “and” really must mean “or”. Medical knowledge, not grammar, disambiguates this: a single CABG will not have both an ITA and a SVG.

8. “…that the tumor cells are stopping dividing or dying…”

Do they mean “stopping dividing or stopping dying”? Of course not, but in 16 of 30 randomly selected syntactically similar constructions from www.clinicaltrials.gov, the coordination (i.e., the wider scope of the modifier, in this case the word “stopping”) was the intended meaning. In each case, only one choice “makes sense” (is consistent with medical knowledge and common sense).

9. “Adult patients who underwent MAZE III with or without Mitral Valve Repair or Replacements.”

Is the second half of that query just a waste of space? Discourse pragmatics says no, the physician must have had some reason for saying that. Medical knowledge provides a plausible interpretation: “Adult patients who underwent MAZE III with no concomitant procedures other than Mitral Valve Repair or Replacements”

May 2009

Page 9: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 20092 July 2005

The basic idea:

Get the computer to understand, not just store, information. Then it can

reason to answer your queries.

Okay, so let’s tell the computer the same sorts of things that human beings know about cars, and colors, heights, movies, time, driving to a place, etc. all the other stuff that everybody knows.

Page 10: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 20092 July 2005

The basic idea:

Get the computer to understand, not just store, information. Then it can

reason to answer your queries.

MicrowaveOven is a type of Kitchen-Appliance

Dishwasher is a type of Kitchen-Appliance

Page 11: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 20092 July 2005

Rthagide-disjaks is a type of Kitchen-Appliance

Gracinimumples is a type of Kitchen-Appliance

Rthagide-disjaks alorxes Vorawnistz.

Gracinimumples alorxes Vorawnistz and Buzqa.

Buzqa is a Thwarn and supplied through Epluns.

You can’t use X if it alorxes Y but lacks any Y

Page 12: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 20092 July 2005

The basic idea:

Get the computer to understand, not just store, information. Then it can

reason to answer your queries.

Eventually, after writing millions of these rules, the system knows as much about pipes, liquids, water, electricity, microwave ovens, dishwashers, cars, colors, movies, heights, etc. as you and I do.

Ultimately, there is just 1 interpretation of that model, and it corresponds to the real world.

etc. all the other stuff that everybody knows.

Long before that, incrementally, the system gains competence and trustworthiness

Page 13: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Cyc is…

– The typical bird has 1 beak, 1 heart, lots of feathers,…

– Hearts are internal organs; feathers are external protrusions

– Most vehicles are steered by an awake, sane, adult,… human

– Tangible objects can’t be in 2 (disjoint) places at once

– Badly injuring a child is much worse than killing a dog

– Causes temporally precede (i.e., start before) their effects

– A stabbing requires 2 cotemporal and proximate actors

– etc.

Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

Page 14: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

- Each of these represented in formal logic- Info. about a set of hundreds of thousands of terms- Language-independent

PenitentiaryEnglishWord-Plume

EnglishWord-Pen

FrenchWord-Plume

WritingPen

BirdFeather

Authoring

ChineseWordForWritingPen

Cyc is…Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

Page 15: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

- Each of these represented in formal logic- Info. about a set of hundreds of thousands of terms

• An inference engine that produces the same sorts of inferences from those that people would.

• Interfaces so the system can communicate with people, data bases, spreadsheets, websites, etc.

Cyc is…Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

Page 16: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

• bits/bytes/streams/network…• alphabet, special characters,…• words, morphological variants,…• syntactic meta-level markups (HTML)• semantic meta-level markups (SGML, XML)• content (logical representation of doc/page/...)• context (common sense, recent utterances, and n

dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.)

What Needs to be Shared?

Sem.Web

Page 17: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

• Query: “Someone smiling”

• Caption: “A man helping his daughter take her first step”

find information

find information

by inference (+KB)

by inference (+KB)

When you become happy, you smile.

You become happy when someone you love accomplishes a milestone.

Taking one’s first step is a milestone.

Parents love their children.

.

How formalized knowledge helps search(ForAll ?P (ForAll ?C        (implies (and            (isa ?P Person)            (children ?P ?C))        (loves ?P ?C))))

May 2009

Page 18: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Query: “Show me pictures of strong and adventurous people”

Caption: “A man climbing a rock face”

find information

find information

by inference (+KB)

by inference (+KB)

How formalized knowledge helps search

Page 19: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Text Document

Query: “Government buildings damaged in terrorist events in Beirut between 1990 and 2001”

Document: “1993 pipe bombing of France’s embassy in Lebanon.”

find information

find information

by inference (+KB)

by inference (+KB)

How formalized knowledge helps search

Page 20: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

How can our programs be intelligent, not merely have the veneer of it?

• ANSWER: By having a large corpus of knowledge, spanning the gamut from specific domain-dependent all the way up to general common sense.

• The computer needs to be able to apply the knowledge, not just store some English gloss– Represent it formally (predicate calculus), and apply logic

– Represent it numerically, and apply mathematics/statistics

• And after all that: Be compelling to the human deciding

Page 21: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

• Magic tricks– “How do they do that?!” “How was I ever fooled by that?!”

• Efficacy of punishment vs reward– “Punishment is more effective, and the statistics back me up”

• Clinical decision-making (by doctors and by patients)– “Because 0.814” versus “Because < plausible causal rationale >”

• Organ donation in European countries:– Why is it so often 15%/85% or 85%/15% ?

[Answer: Because when you apply for a drivers license in some countries, you have to check a box to “opt in”; in others, you have to check a box to “opt out”; and in the U.S. and most European countries at least, 85% of the people don’t know what they should do, even though it’s an emotional, serious choice, and end up just leaving it unchecked.]

• And after all that: Be compelling to the human deciding

One Good Explanation is worth 20 points of IQ

Page 22: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

Reflection Framing EffectPhiladelphia is preparing for a Legionaire’s Disease outbreak expected to kill 600 people today. Two alternative programs to combat the disease have been proposed. The consequences of each program are as follows:

If Program A is adopted, 200 people will be saved. (72%)

If Program B is adopted, there is a 1/3 chance that all 600 will be saved, anda 2/3 chance that no liveswill be saved. (28%)

If Program A’ is adopted, 400 people will die. (22%)

If Program B ’ is adopted, there is a 2/3 chance that 600 will die, and a 1/3 chance that no one will die. (78%)

=

=

For more information, see: Kahneman, D. and Tversky, A. (1984). Choices, values, and frames. American Psychologist, 39, 341-350.

Page 23: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

Conjunction Fallacy A health survey was conducted in a representative sample of adult males in

Chicago of all ages and occupations. Mr. F was included in the sample. He was selected by random chance from the list of participants.

Please rank the following statements in terms of which is most likely to be true of Mr. F. (1=more likely to be true, 6=least likely)

____ Mr. F smokes more than 1 cigarette per day on average.

____ Mr. F has had one or more heart attacks. A

____ Mr. F had a flu shot this year. A and B

____ Mr. F eats red meat at least once per week.

____ Mr. F has had one or more heart attacks and he is over 55 years old.

____ Mr. F never flosses his teeth.

For more information, see: Tversky, A. and Kahneman, D. (1983). Extensional vs. intui-tive reasoning: The conjunction fallacy in probability judgment. Psych.Rev. 90, 293-315.

58% rated “A and B” more likely than A

Page 24: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

Why there is a need for meta-logical elements (rationale and POV) to convince decision-makers

• Early hominids: pre-rational decision-makers

• Later hominids: usually rational

• Even later hominids: almost always rational

Page 25: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

A 67 year old woman suffering from ICM with elevated bilirubin, history of diabetes, body mass index of 39.5, NYHA function class III, mitral valve regurgitation grade (MVRG) of 2+, and no aortic valve regurgitation (AVR) is assigned to CABG surgery.  RF+Cyc is consulted and the RF (random forest statistical reasoning) component, having been trained on a large database, identifies CABG alone as the most likely treatment option, citing an odds ratio of 2.6 over the next most favorable treatment, CABG+MVA. As rationale, the Cyc (AI) component observes that the low MVRG is atypical of MVA which is a surgical procedure typically reserved for patients with severe mitral regurgitation and thus the simpler CABG procedure is preferred.  However, an intraoperative transesophageal echocardiogram (TEE) suggests MVRG is 3+. Based on this, the surgical team overrides the initial diagnosis without consultation, opting instead for CABG+MVA.  The patient dies 3 days later from complications due to surgery.  

  In this setting, RF+Cyc, if consulted, could have alerted the heart team to additional data that might have swayed their decision, thus potentially saving a life. RF+Cyc would have noted that while an MVRG of 3+ is consistent with CABG+MVA, the odds favoring CABG only marginally decrease from 2.6:1 to 1.7:1 when MVRG is upstaged for this patient from 2+ to 3+, and that surgery under CABG alone offers a 20% increase in median survival compared to CABG+MVA.  RF+Cyc could further argue that intraoperative MVRG can falsely appear to be upstaged due to altered hemodynamics in anesthetized patients.  An Cyc-assisted semantic search of the recent literature reveals that transesophageal transthoracic echocardiograms (TTE) more reliably reflect the degree of mitral regurgitation than TEE. That (+co-morbidities) argues for just CABG. 

Page 26: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

4 Pitfalls of Semantic Technology

• Ignorance-based: A small theory size (#terms, instances, rules)

• Static KB (massively tuned, optimized, cached ahead of time)

• Simple assertions (SAT constraints; propositional calculus; Horn clause logic; Description Logic; first order logic)

• 1 global context (no contradic.’s, tiny domain, simplified world)

Page 27: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

• Cyc is a power source, not a single application.Like oil, electricity, telephony, computers,… Cyc can spawn and sustain a knowledge utility

industry.

• It can cost-effectively underlie almost all apps.(Provide a common-sense layer to reduce brittleness when faced with unexpected inputs/situations)

• To apply Cyc, we extend its ontology, its KB, and possibly its suite of specialized reasoning modules

Applying Cyc

Page 28: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

"What sequences of events could lead to

the destruction of Hoover Dam?"

“Were there any attacks on targets of symbolic value to

Muslims since 1987 on a Christian holy day?"

CycCyc

Terrorism KnowledgeTerrorism Knowledge

ReasoningModules

ReasoningModulesCycCyc ReasoningModules

ReasoningModules

Cycorp Tools For:Ontology-Building,

-Browsing, -Editing, & Fact/Rule Entry

Domain Experts Scenario

GenerationExplanation Generation

Query Formulation

Scenario Generator

Explanation Generator

Query Formulator

Others’/GOTSAnalysis and Collaboration Components

Interface to Data Repositories

Border Crossings

HIDObserva-

tions

Travel Records

Credit Card

Records

GeopoliticalData

GlobalTerrain

Data

Weather Data

Satellite Intel

HUMINTMessages

INSData

MilitaryIntel

output ofCOTS Text ExtractionSystems

SIGINTMessageContent

AKB

The Analyst’s Knowledge Base

Relational DB “projection” of the AKB

CT Analyst

Terrorism Knowledge

GeneralKnowledgeTerrorism Knowledge

Base

Terrorism Knowledge

Base)Terrorism Knowledge

GeneralKnowledge

OWL &

Page 29: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

A more recent example

“What major US cities are particularly vulnerable to an anthrax attack?”

The answer is logically implied by data dispersed through several sources:

USGSGNISDB

AMVAKB

RAND R

UNFAODB

DTRACATS

DB

Page 30: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

“major US city” ?C is a U.S. City with >1M population

“particularly vulnerable to an anthrax attack” – the current ambient temperature at ?C is above freezing,

and– ?C has more than 100 people for each hospital bed,

and– the number of anthrax host animals near ?C exceeds 100k

“What major US cities are particularly vulnerable to an anthrax attack?”

Page 31: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

 state |         name          | type  |     county     | state_fips |  -------+-----------------------+-------+----------------+------------+ TX    | Dallas                | ppl   | Dallas         |         48 | MN    | Hennepin County       | civil | Hennepin       |         27 |    CA    | Sacramento County     | civil | Sacramento     |          6 |    AZ    | Phoenix               | ppl   | Maricopa       |          4 |  

primary_lat | primary_long| elevation | population |     status      | ------------+-------------+-----------+------------+------------------+  32.78333 |       -96.8 |       463 |    1022830 | BGN 1978 1959  45.01667 |      -93.45 |         0 |    1032431 |  38.46667 |  -121.31667 |         0 |    1041219 |  33.44833 |  -112.07333 |      1072 |    1048949 | BGN 1931 1900 1897

Page 32: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

So how do we explain to our system that:

• row 1 of that table is “about” the city of Dallas, TX

• the population field of that table contains the numberof inhabitants of the city that that row is “about”

• here is exactly how to access tuples of that database

• that access will be fast, accurate, recent, complete

Page 33: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• the population field of that table contains the numberof inhabitants of the city that that row is “about”

We provide the field encodings and decodings, some of which correspond to explicit fields like population, two-letter state codes, etc:

(fieldDecoding Usgs-Gnis-LS ?x       (TheFieldCalled “population”) (numberOfInhabitants

(TheReferentOfTheRow Usgs-Gnis) ?x))

Page 34: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• how to access tuples of that database We provide all the information needed for a JDBC connection script:

We assert, in the context (MappingMtFn Usgs-KS), all of these:

(passwordForSKS Usgs-KS "geografy")(portNumberForSKS Usgs-KS 4032)(serverOfSKS Usgs-KS "sksi.cyc.com")(sqlProgramForSKS Usgs-KS PostgreSQL)(structuredKnowledgeSourceName Usgs-KS "usgs")(subProtocolForSKS Usgs-KS "postgresql")(userNameForSKS "sksi")

Page 35: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc.

We assert, in the context (MappingMtFn Usgs-KS):

(schemaCompleteExtentKnownForValueTypeInArg Usgs-Gnis-LSUSCitynumberOfInhabitants 1)

Page 36: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc.

We assert, in the context (MappingMtFn Usgs-KS):

(resultSetCardinality Usgs-Gnis-PS        (TheSet (PhysicalFieldFn Usgs-Gnis-PS "state")) TheEmptySet 60.0)

(resultSetCardinality Usgs-Gnis-PS        (TheSet            (PhysicalFieldFn Usgs-Gnis-PS "primary_long")            (PhysicalFieldFn Usgs-Gnis-PS "primary_lat")            (PhysicalFieldFn Usgs-Gnis-PS "name"))        (TheSet            (PhysicalFieldFn Usgs-Gnis-PS "county")            (PhysicalFieldFn Usgs-Gnis-PS "state")) 530.36)

Page 37: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

“major US city” U.S. City with >1M population

“particularly vulnerable to an anthrax attack” – the current ambient temperature at ?C is above freezing,

and– ?C has more than 100 people for each hospital bed,

and– the number of anthrax host animals near ?C exceeds 100k

“What major US cities are particularly vulnerable to an anthrax attack?”

Cyc knows that pullets are chickens, so don’t add those two numbers together!

Page 38: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 39: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 40: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 41: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 42: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 43: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

“In what countries bordering Pakistan are there members of the ANVC?”

Even simple queries often require 1-4 reasoning stepsEven simple queries often require 1-4 reasoning steps

Each answer that CAE finds for this generally involves a 1-4-step (not 0-step) argument (reasoning chain):

E.g., for the answer “India”, the justification is:

• According to the web site ‘Inside Terrorism’, the ANVC’s headquarters has been in Garo Hills, India from the beginning of January, 1996 through today.

• If an organization’s HQ is in place x, then there are members of that organization in place x.

• If someone is in place x, they are in every super-region of x.

• India borders Pakistan.

Don

’t inclu

de P

rior &

Tacit

Kn

ow

led

ge

Page 44: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

The Cyc Knowledge Base

ThingThing

IntangibleThing

IntangibleThing IndividualIndividual

TemporalThing

TemporalThing

SpatialThing

SpatialThing

PartiallyTangible

Thing

PartiallyTangible

ThingPathsPaths

SetsRelations

SetsRelations

LogicMathLogicMath

HumanArtifactsHumanArtifacts

SocialRelations,

Culture

SocialRelations,

Culture

HumanAnatomy &Physiology

HumanAnatomy &Physiology

EmotionPerception

Belief

EmotionPerception

Belief

HumanBehavior &

Actions

HumanBehavior &

ActionsProductsDevices

ProductsDevices

ConceptualWorks

ConceptualWorks

VehiclesBuildingsWeapons

VehiclesBuildingsWeapons

Mechanical& Electrical

Devices

Mechanical& Electrical

Devices

SoftwareLiterature

Works of Art

SoftwareLiterature

Works of ArtLanguageLanguage

AgentOrganizations

AgentOrganizations

OrganizationalActions

OrganizationalActions

OrganizationalPlans

OrganizationalPlans

Types ofOrganizations

Types ofOrganizations

HumanOrganizations

HumanOrganizations

NationsGovernmentsGeo-Politics

NationsGovernmentsGeo-Politics

Business, Military

Organizations

Business, Military

Organizations

LawLaw

Business &CommerceBusiness &Commerce

PoliticsWarfarePoliticsWarfare

ProfessionsOccupationsProfessionsOccupations

PurchasingShopping

PurchasingShopping

TravelCommunication

TravelCommunication

Transportation& Logistics

Transportation& Logistics

SocialActivities

SocialActivities

EverydayLiving

EverydayLiving

SportsRecreation

Entertainment

SportsRecreation

Entertainment

ArtifactsArtifacts

MovementMovement

State ChangeDynamics

State ChangeDynamics

MaterialsParts

Statics

MaterialsParts

Statics

PhysicalAgents

PhysicalAgents

BordersGeometryBorders

Geometry

EventsScriptsEventsScripts

SpatialPaths

SpatialPaths

ActorsActionsActorsActions

PlansGoalsPlansGoals

TimeTime

AgentsAgents

SpaceSpace

PhysicalObjectsPhysicalObjects

HumanBeingsHumanBeings

Organ-izationOrgan-ization

HumanActivitiesHuman

Activities

LivingThingsLivingThings

SocialBehaviorSocial

Behavior

LifeFormsLife

Forms

AnimalsAnimals

PlantsPlants

EcologyEcology

NaturalGeography

NaturalGeography

Earth &Solar System

Earth &Solar System

PoliticalGeography

PoliticalGeography

WeatherWeather

General Knowledge about Various DomainsGeneral Knowledge about Various Domains

Cyc contains:15,000 Predicates

500,000 Concepts5,200,000 Assertions

Represented in:• First Order Logic• Higher Order

Logic• Context Logic• Micro-theories

Specific data, facts, and observationsSpecific data, facts, and observations

These numbers are not a good way to really get a handle on the Cyc KB

Page 45: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Cyc contains:15,000 Predicates

500,000 Concepts5,200,000 Assertions

These numbers are not a good way to really get a handle on the Cyc KB

The Cyc Knowledge Base

“Is any seagull also a moose?”

If Cyc knows 10,000 kinds of animals, it should be able to answer 100,000,000 queries like that.

Option 1: Add those 100M assertions to the KB

Option 2: Add 50M disjointWith assertions instead

Option 3: Add about 10k Linnaean taxonomy assertions to the KB, plus one extra assertion: (isa BiologicalTaxon SiblingDisjointCollectionType)

If taxons A and B are not explicitly known (via those 10k assertions) to be in a subset/superset relationship, then assume that they are disjoint.

A few hundred such SiblingDisjoint assertions take the place of over 6 billion disjointness ones…which in turn take the place of 100 trillion ones like this: (not (isa Cher Moose))

Page 46: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

E.g., Cyc’s 5M axioms are divided into thousands of contexts by:

granularity, topic, culture, geospatial place, time,...

There is no one correct monolithic ontology.

There is a correct monolithic reasoning mechanism, but it is so deadly slow that we never call on it unless we have to

E.g., the Cyc inference engine is a community of 1000 “agents” that attack every problem and, recursively, every subproblem (subgoal). One of these 1000 is a general theorem prover; the others have special-purpose data structures/algorithms to handle the most important, most common cases, very fast.May 2009

Page 47: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

What factors argue <for/against> the conclusion that <ETA> <performed> <the March 2004 Madrid attacks>?

For:- ETA often executes attacks near national election- ETA has performed multi-target coordinated attacks- Over the past 30 years, ETA performed 75% of all terrorist attacks in Spain- Over the past 30 years, 98% of all terrorist attacks in Spain were performed by Spain-based groups, and ETA is a Spain-based group.

Against:-ETA warns (a few minutes ahead of time) of attacks that would result in a high number civilian casualties, to prevent them. There was no such warning prior to this attack.-ETA generally takes responsibility for its attacks, and it did not do so this time.-ETA has never been known to falsely deny responsibility for an attack, and it did deny responsibility for this attack.

Page 48: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Building Cyc qua Engineering Task

amount known

rate

of

lear

ning

learning by discovery

learning via

natural language

CYC

900 person-years

23 realtime years

$90 million

Frontier of human knowledge

198

4

200

4to

day

codify & enter each piece of knowledge, by hand

Page 49: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 50: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Temporal Relations

37 Relations Between Temporal Things

temporalBoundsIntersect

temporallyIntersects

startsAfterStartingOf

endsAfterEndingOf

startingDate

temporallyContains

temporallyCooriginating

temporalBoundsContain

temporalBoundsIdentical

startsDuring

overlapsStart

startingPoint

simultaneousWith

after

Page 51: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Temporal Relations

“Ariel Sharon was in Jerusalem during 2005 with granularity calendar-week”

“Condoleezza Rice made a ten-day trip to Jerusalem in February of 2005”

Both of them were in Jerusalem during February 2005

Page 52: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

• Rather than struggling to reason in natural language sentences, use

logic as the representation language.

• Most knowledge is default; reason by argumentation

• Rather than striving in vain for a single fast inference engine, use a suite of 1000+ heuristic modules that each handles a class of commonly-occurring problems very fast. [EL HL split]

• Some of these HL modules act as tacticians (meta-reasoners) to guide the reasoning; a few are strategists (meta-meta-reasoners)

• Bridging the knowledge gap: do the “intermediate theories.”

• Probabilities / certainty factors are useful (risk: overdependence)

• Rather than striving in vain for a monolithic consistent KB, divide the KB up into many locally-consistent contexts

Lessons LearnedLessons Learned

Page 53: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Each assertion should be situated in a context: in a region of context-space

• We identified 12 dimensions of mt-space

• We developed a vocabulary of predicates and terms to describe points and regions along each of those 12 dimensions; and

• We have been situating assertions more and more precisely, and we have been working out calculi for inferring contexts

– E.g., if P is true in C1, and P=>Q is true in C2, in what context C2 can Q be validly concluded?

• Anthropacity• Time• GeoLocation• TypeOfPlace• TypeOfTime• Culture• Sophistication/Security• Topic• Granularity• Modality/Disposition

/Epistemology• Argument-Preference• Justification

Page 54: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Mathematical Factoring of Context-space Dimensions

UnitedStatesIn1985Context: Ronald Reagan is president.

PennsylvaniaIn1985Context: Dick Thornburgh is governor.

LehighCountyInFebruary1985Context: Dick Thornburgh is governor and Ronald

Reagan is president.

This inference depends

on the time, space, and

respective granularities

of the contexts.

There are at least 900,000 doctors.

Dick Thornburgh is governor and there

are at least 900,000 doctors.

Page 55: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Time Indices and Granularities

But should remain noncommittal about:

Doug is talking, at 14:42:09 , on 4 May 2009.

Doug is talking, at 1400-1500, on 4 May 2009.

Doug is talking, at 14:42-14:47, on 4 May 2009.

Therefore Cyc should infer (as a default):

Page 56: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Time Indices and Granularities

t = that two-hour interval

t’ = a continuous 15-min. sub-interval

Futuret t’

So: Talking during each 15-minute interval? Yes

Talking during each 2-second interval: Unknown

Calendar Minutes

P = Doug is talking.

Doug is talking, at 14:00 to 15:00, on 4 May 2009 with temporal granularity 1 calendar minute

Past|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Page 57: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

performedBy causes-EventEvent objectPlaced objectOfStateChange outputsCreated inputsDestroyed assistingAgent beneficiary

fromLocation toLocation deviceUsed driverActor damages vehicle providerOfMotiveForce

transportees

Relations Between Relations Between an Event and its Participantsan Event and its Participants

Over 400 more.

Page 58: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

In In Our Geospatial OntologyIn In Our Geospatial Ontology

• We started in 1984 with just one binary predicate, “in”.

• in(X,Y) means the inner object X is spatially located in the region defined by the outer object Y.

• If I just tell you in(X,Y), and you aren’t told what X and Y are, then you (and Cyc) can’t answer questions like these: – From the outside of Y, can I see any part of X? – If I turn Y over and shake it, will X fall out? – Is there room to put more things in Y? – Is X actually a part of Y?

• Such failures led to our introducing new, more precise, more specialized versions of “in”. By now there are over 75 such predicates, organized in a graphical taxonomy.

Page 59: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Propositional Attitudes Relations Between Agents and Propositions

• goals• intends• desires• hopes• expects• believes

• opinesThat• knowsThat• remembersThat• perceivesThat• seesThat• fearsThat

Most of these are modal; assertions using them go beyond 1st-order logic

Page 60: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Represented in:• First Order Logic• Higher Order

Logic• Context Logic• Microtheories

Handcrafted Cyc KB

ThingThing

IntangibleThing

IntangibleThing IndividualIndividual

TemporalThing

TemporalThing

SpatialThing

SpatialThing

PartiallyTangible

Thing

PartiallyTangible

ThingPathsPaths

SetsRelations

SetsRelations

LogicMathLogicMath

HumanArtifactsHumanArtifacts

SocialRelations,

Culture

SocialRelations,

Culture

HumanAnatomy &Physiology

HumanAnatomy &Physiology

EmotionPerception

Belief

EmotionPerception

Belief

HumanBehavior &

Actions

HumanBehavior &

ActionsProductsDevices

ProductsDevices

ConceptualWorks

ConceptualWorks

VehiclesBuildingsWeapons

VehiclesBuildingsWeapons

Mechanical& Electrical

Devices

Mechanical& Electrical

Devices

SoftwareLiterature

Works of Art

SoftwareLiterature

Works of ArtLanguageLanguage

AgentOrganizations

AgentOrganizations

OrganizationalActions

OrganizationalActions

OrganizationalPlans

OrganizationalPlans

Types ofOrganizations

Types ofOrganizations

HumanOrganizations

HumanOrganizations

NationsGovernmentsGeo-Politics

NationsGovernmentsGeo-Politics

Business, Military

Organizations

Business, Military

Organizations

LawLaw

Business &CommerceBusiness &Commerce

PoliticsWarfarePoliticsWarfare

ProfessionsOccupationsProfessionsOccupations

PurchasingShopping

PurchasingShopping

TravelCommunication

TravelCommunication

Transportation& Logistics

Transportation& Logistics

SocialActivities

SocialActivities

EverydayLiving

EverydayLiving

SportsRecreation

Entertainment

SportsRecreation

Entertainment

ArtifactsArtifacts

MovementMovement

State ChangeDynamics

State ChangeDynamics

MaterialsParts

Statics

MaterialsParts

Statics

PhysicalAgents

PhysicalAgents

BordersGeometryBorders

Geometry

EventsScriptsEventsScripts

SpatialPaths

SpatialPaths

ActorsActionsActorsActions

PlansGoalsPlansGoals

TimeTime

AgentsAgents

SpaceSpace

PhysicalObjectsPhysicalObjects

HumanBeingsHumanBeings

Organ-izationOrgan-ization

HumanActivitiesHuman

Activities

LivingThingsLivingThings

SocialBehaviorSocial

Behavior

LifeFormsLife

Forms

AnimalsAnimals

PlantsPlants

EcologyEcology

NaturalGeography

NaturalGeography

Earth &Solar System

Earth &Solar System

PoliticalGeography

PoliticalGeography

WeatherWeather

Real World Domain KnowledgeReal World Domain Knowledge

Cyc contains:15,000 Predicates

500,000 Concepts5,200,000 Assertions

Specific cases, facts, details,…Specific cases, facts, details,…

The pump has been primed,

Use it as an inductive bias to power more automatic knowledge acquisition

Page 61: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

• Abu Sayyaf was founded in ___

• Al Harakat Islamiya, established in ___

• ASG was established in ___

Search Strings

Abu Sayyaf was founded in the early 1990s

Parse

(foundingDate AbuSayyaf (EarlyPartFn (DecadeFn 199)))

(foundingDate AbuSayyaf ?X)

AKA by Shallow Fishing

Automated Knowledge Acquisition

Page 62: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

• The height of the Eiffel Tower is ___

• The Eiffel Tower is ___ tall

Search Strings

(height EiffelTower ?x)

AKA by Shallow Fishing

Automated Knowledge Acquisition

The height of the Eiffel Tower is 36 feet

The height of the Eiffel Tower is 984 feet Parse

(height EiffelTower (Foot 36))

(height EiffelTower (Foot 984))

Page 63: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

WWW.CYC.COM

Page 64: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 65: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 66: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 67: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Recent/Future AKB Directions

• Make it comprehensive (13% 100%); apply it to other dom.• Make it easier for SME’s to enter/vet/modify info.• Improve the automatic acquis. (parsing / fishing from unstructured texts;

SKSI to structured sources, incl. SPARQL)• Make it easier for end users to pose questions:

– Automatically select (a small superset of) the relevant fragments– Use semantic constraints (argIsa, disjointness, domain knowledge…) to

combine the relevant fragments into a meaningful logical query

• Make justifications more terse and more compelling• Speed up inference (in general; and for AKB entry and AKB query-answering)

• Graceful degradation [½-way betw. QA & Google] falling back on Semantic Search of auto. tagged documents (tagged with Cyc terms)

CYC

Page 68: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

• Extend Cyc’s KB– Augment its ontology– New assertions involving those new terms

• New Heuristic Level modules– Identify the need(s) for them– Design, build, and debug them

• New interface modules– For manual entry; for SKSI mapping; for end users– Domain-specific interfaces (e.g., sketching military

unit movements; drawing chemical formulae; etc.)

Developing a Cyc App.

Page 69: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

OpenCycOpen Source release of: [most of] the Cyc

Ontology + Simple Relns. + Inference Engine

ResearchCycAlmost All of Cyc (for free for R&D purposes)

Page 70: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

The OntologyThe OntologyPre-existing general medical knowledge frameworkPrior to the CCF project, Cyc’s KB had184 specializations of MedicalCareEvent:

MedicalCareEvent

Ablation

Ligation CoronaryArteryBypassGraft Biopsy-SurgicalProcedure TrephiningSomeone Prostatectomy

RoboticSurgery OutpatientSurgery InpatientSurgery LiposuctionSurgery RemovalOfUniqueBodyPart Appendectomy

Tonsillectomy

GumSurgery

SurgicalTreatment TransplantSurgery HeartTransplantSurgery GeneralSurgery

MajorSurgery

OpenHeartSurgery RootCanalSurgery VaccinationEvent BoosterVaccinationEvent AnthraxMilitaryVaccinationScript

MedicalTesting

Page 71: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

The OntologyThe OntologyPre-existing general medical knowledge frameworkPrior to the CCF project, Cyc’s KB had 350+ specializations of AilmentCondition:

AttentionDeficitDisorder Glaucoma SpinalStenosis SleepDeprivation Ache-AilmentCondition Migraine Hemorrhaging-TheCondition Jaundice ParasiticAilment BacillaryAngiomatosis Cryptosporidiosis Rickettsiosis EpidemicTyphus-NAmerica ArthropodInfestation ExternalArthropodInfestation InternalArthropodInfestation Trichinosis Schistosomiasis Ascariasis BladderFlukeInfestation

Atherosclerosis MultiplePersonalityDisorder Adenomyosis Scabies AmyotrophicLateralSclerosis Scoliosis Hypoglycemia TemproMandibularJointSyndrome AcetylcholinePoisoning CadmiumPoisoning CarbonMonoxidePoisoning FoodborneBotulism InhalationalBotulism WoundBotulism InfantBotulism Endometriosis Neuralgia Sciatica Diverticulitis Gout MacularDegeneration

Page 72: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

The OntologyThe OntologyPre-existing general medical knowledge frameworkPrior to the CCF project, Cyc’s KB had 200+ specializations of Bacterium:

StreptococcusPneumoniae StreptococcusPyogenes

Bacillaceae-Family

Bacillus-Genus

BacillusCereus-Species

Monotrichous

Bacterium-Monotrichous

Peritrichous

Bacterium-Peritrichous

Amphitrichous

Bacterium-Amphitrichous

Tenericutes-Division

Mollicutes-Class

Anaeroplasmataceae-Family

Asteroplasma-Genus

Acholeplasmatales-Order Acholeplasmataceae-Family Acholeplasma-Genus

Phytoplasma-Genus

Eperythrozoon-Genus

Mycoplasmatales-Order Mycoplasmataceae-Family

Mycoplasma-Genus MycoplasmaPneumoniae-Species Spirillales-Order

Vibrionaceae-Family

Vibrio-Genus

VibrioCholerae-Species

Page 73: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

The OntologyThe OntologyHundreds of pre-existing relevant relationships

General Role Predicates:

objectActedOn

eventOccursAt

dateOfEvent

objectPlaced

objectRemoved

deviceUsed

Medical domain specific relations:

infectionCausedByOrganism

infectingPathogen

patientTreated

deviceTypeTreatsConditionType

causeOfDeathTypeOfType

formOfDisease ailmentTypeAffects ailmentEpidemicType ailmentAcquiredBy ailmentTypicallyAcquiredBy indicatedDrug mortalityRiskForCondition survivalRate riskOfInfectionFromTypeToType

Page 74: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

The OntologyThe OntologyMethodology

• Establish bridging (translation) rules• Define rules that allow users to associate patients, dates, locations, etc. with the various events – e.g. define patientTreated as a relationship between a medical event and a patient.• Define rules that allow users to easily express complicated logical conditions – e.g. the defining rules for PrimarySurgery, isolatedProcedureOfType, concomitantProcedures, etc. • Define concise vocabulary for constructions that are complicated or difficult to express – e.g. “aortic valve replacement’ is represented as a single non-atomic term. This allows the user to specify this very common procedure with a single fragment instead of three distinct fragments in the CCF ontology (which in turn came about due to there not being an explicit functional term composition construct in the CCF representation).

Page 75: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

Typical Query for outcomes study The examples in this presentation were short, simple, “Medical English” queries; the ones being focused on while building the

application, and now that it is actually being used at CCF, are much larger ones, e.g.:

IDENTIFY PATIENT POPULATION:

• FIND all native aortic valve replacements performed at CCF between January 1, 2000 and December 31, 2004 with a pre-operative diagnosis, as determined by echocardiogram, of moderately severe or severe aortic stenosis and moderate to severe left ventricular impairment.

• INCLUDE operations in which concomitant primary CABG or concomitant mitral or tricuspid valve repair was performed.

• EXCLUDE all patients with any prior valve repair or replacement; or with concomitant pulmonary valve repair; or with concomitant mitral, tricuspid, or pulmonary valve replacement; or with aortic regurgitation greater than moderate degree.

Page 76: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

Researchers and clinicians sometimes ask the same queries

“Are there cases in the last decade where patients had pericardial aortic valves inserted in the reverse position, to serve as mitral valve replacements, and how often in such cases did endocarditis or tricuspid valve infection develop, and how long after the procedure?”

May 2009

Page 77: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

77

• Get a large set of use-cases (CCF task: the last 900 queries)

• Arrange them into maximally mutually-dissimilar classes

• Manually represent a couple from each of those buckets– Reveals most of the necessary new predicates (+ interfaces)

• Now go through each of the use-cases, trolling for new domain-specific terms to add to the ontology

– Can be done manually, but we are beginning to rely more on semi-automatic methods where the system itself helps with that process

– As appropriate, lexify the terms and/or align them to existing standards

• Run exemplars from each bucket (i.e., to completion)– tracer bullets to reveal nec. new rules, reasoning modules (+interfaces)

• Replace the largest bucket by 2-4 spec.’s, recur (i.e., repeat the preceding 3 steps, and this one, again) until there is no new gain

Page 78: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

78

• Test the system on previously-unseen use-cases (or at least ones which were

not among those previously-selected from their bucket)

• Have users try to use the system, and watch them (their results, of course, but also to the extent possible their time-feature trajectory)– Which features did they rarely or never use (to good effect)?

– Which features did they make heavy use of?

– Independent of this, ask them for their feedback and suggestions

– Try to identify classes of users which will translate into classes of documentation and training materials/regimes/interface specifics

• All along, identify what elements of the ontology (if any) are proprietary, and assimilate everything else into future versions of OpenCyc and ResearchCyc

Page 79: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

May 2009

Page 80: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

(implies

(and      (cCFhasLeftAtriumDiameter ?EVT ?D)     (greaterThan ?D ((Centi Meter) 3.8))     (patientTreated ?EVT ?PAT)      (patientSex ?PAT FemaleHuman)      (rdf-type ?EVT ?TYPE)     (genls ?TYPE CCF-Evaluation))   (isa ?EVT EvaluationThatIndicates-

LeftAtrialEnlargement))

Page 81: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

1784 pieces of pre-existing (prior to this project) Cyc KB knowledge used while handling a typical query. E.g.:

Inferred Disjointness constraints:(disjointWith PericardialWindow-SurgicalProcedure MedicalPatient)

Justification: [we are “counting” each of these assertions, in the total:](genls PericardialWindow-SurgicalProcedure PericardialProcedure-Surgical) in UniversalVocabularyMt(genls PericardialProcedure-Surgical CardiacProcedure-Surgical) in UniversalVocabularyMt(genls CardiacProcedure-Surgical SurgicalProcedure) in UniversalVocabularyMt(genls SurgicalProcedure MedicalCareEvent) in BaseKB(genls MedicalCareEvent PhysicalSituation) in BaseKB(genls PhysicalSituation Situation-Localized) in UniversalVocabularyMt(genls Situation-Localized Situation) in UniversalVocabularyMt(disjointWith SpatialThing-NonSituational Situation) in BaseKB(genls EnduringThing-Localized SpatialThing-NonSituational) in UniversalVocabularyMt(genls Agent-NonGeographical EnduringThing-Localized) in UniversalVocabularyMt(genls EmbodiedAgent Agent-NonGeographical) in UniversalVocabularyMt(genls PerceptualAgent-Embodied EmbodiedAgent) in UniversalVocabularyMt(genls Animal PerceptualAgent-Embodied) in UniversalVocabularyMt(genls MedicalPatient Animal) in UniversalVocabularyMt

Page 82: Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that

Ideas for NLM Grand Challenges

• Comprehensive Ontology of Medicine– Ties to terminological standards (Snomed, ICD…), lexical ones (WordNet), conceptual ones (Cyc)

– Knowledge about/involving the concepts• Contextualized for time, source, level of detail,…

• Sample sub-project: multicultural Engl.-Engl. translation

• English-to-English “translation”– Using the above ontology of medicine, and models of discourse, models of classes of users (by age,

occupation, etc.), models of individual users (built up over time and stored HIPAA-securely)

– Translate articles, web pages, medicine bottle labels, etc. into comprehensible form for that user• In some cases this means literally writing more text expanding its length, or paring it down (eliminating prior knowledge)

• In less clear cases (where the user might or might not already know some piece of information), the best way to expand the original text might be to add footnotes containing the borderline information, and to pare down the original text by relegating borderline material to footnote form

– The translations needn’t just be static; they can sync with the user’s calendars, cell phones, computers, etc., to provide reminders, proactively send them relevant news articles or new warnings, and so on

• Automated Clinical/Biomedical Discovery– Hypothesis formation, Experiment design, Data gathering, Analysis, New terms&hypotheses

May 2009