november 17, 2005 1 dr. douglas b. lenat, 3721 executive center drive, suite 100, austin, tx 78731...

151
November 17, 2005 1 Dr. Douglas B. Lenat , 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: [email protected] Phone: (512) 342-4001 Fax: (512) 342-4040 CYC: Lessons Learned in CYC: Lessons Learned in Large-Scale Ontological Large-Scale Ontological Engineering Engineering 2 July 2005

Upload: katherine-horton

Post on 27-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

1

Dr. Douglas B. Lenat

, 3721 Executive Center Drive, Suite 100, Austin, TX 78731

Email: [email protected]

Phone: (512) 342-4001

Fax: (512) 342-4040

CYC: Lessons Learned in Large-CYC: Lessons Learned in Large-Scale Ontological EngineeringScale Ontological Engineering

2 July 2005

Page 2: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

2

What Led to Cyc?

1. Programs need general world knowledge, and commonsense, to break the “brittleness bottleneck”

NL understanding, speech understanding, robotics, learning, expert systems, search,…

2. We know enough to do this; it is more an engineering task than a scientific research task.

3. The time was right (1984).

Page 3: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

3

ELIZA (DOCTOR)

1965 Joe Weizenbaum, MIT

• Carl Rogers-like reflectionPatient: “I swear a lot.”

“How do you feel about the fact that you swear a lot?”

• Count on patient-to-doctor respect/awe/…

Patient: “My dog’s mother died recently.”

“Tell me more about your mother.”“blah blah blah mother blah blah blah”

Page 4: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

4

40 years after ELIZA: ALICE

• JUDGE11[9:26:0]I WILL ASK YOU SOME SIMPLE QUESTIONS AND GIVE SIMPLE COMMANDS

• PROGRAM[9:26:01]Do you think your plan will succeed?

• JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR• PROGRAM[9:26:16]WHAT COLOR IS a blue car

• JUDGE11[9:26:35]WHERE IS SUE’S NOSE WHEN SUE IS IN HER HOUSE

• PROGRAM[9:26:35]Where it belongs.Try searching the World Wide Web.

Page 5: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

5

Page 6: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

6

Is Natural Language a Good Repr.?

• + there’s a lot of it Everyone understands it• + index and search it, rapidly, using keywords

• Boolean combinations of keywords• Synonyms, hyponyms, hypernyms,… of keywords

• - there are a lot of different languages• - meanings vary (era, place, age group…)• - often the analyst’s query requires finding and

combining n pieces of data• - can be inefficient

ArithmeticLogic

Page 7: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

7

Carol and Sam begat Sara and Fred. Fred and Jane begat

Ethan, Elaine, and Edward. John and Sara begat Steven,

Mary, and Seth. Ann and Andy begat Sue and Bob. But

then Sara cleaved not to John and with Bob begat Joan.

Is Edward an ancestor or descendant of Sue?

Joan Steven Mary Seth

Sara --

Carol -- Sam

John Fred --Jane

Ethan Elaine Edward

Ann -- Andy

Sue Bob --

Page 8: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

8

Five friends get together to play 5 doubles matches, with a different group of 4 players each time. The sums of the ages of the players for the different matches are 124, 128, 130, 136 and 142 years. What is the age of the youngest player ?

v+w+x+y = 124

v+w+x+z = 128

v+w+y+z = 130

v+x+y+z = 136

w+x+y+z = 142

Page 9: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

9

Natural Language Understanding requires having lots of knowledge

1. The pen is in the box. The box is in the pen.

2. The police watched the demonstrators……because they feared violence.…because they advocated violence.

3. Every American has a mother.Every American has a president.

Page 10: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

10

Natural Language Understanding requires having lots of knowledge

4. Mary and Sue are sisters.

Mary and Sue are mothers.

5. The White House announced today that...

6. John saw his brother skiing on TV. The fool…

...didn’t have a coat on!

…didn’t recognize him!

Page 11: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

11

An example: an analyst’s query posed as

part of HPKB (1996) that Cyc answered.

Logically and Arithmetically Combining n Pieces of Info.)(

Information from multiple sources

Knowledge about the domain in general

Commonsense knowledge about the real world

Page 12: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

12

Page 13: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

13

Page 14: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

14

Page 15: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

15

Page 16: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

16

Page 17: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

17

Ontology holds the key to doing this! BUT there are so many ways to “cut corners” and unwittingly fool oneself!

Logically and Arithmetically Combining n Pieces of Info.)(Information from multiple sources

Knowledge about the domain in general

Commonsense knowledge about the real world

The original dream of Arpanet, EDI, EDR, the Semantic Web,…

Page 18: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

OFAC DB8 USGS NARCL

FBI Most

WantedCATS CDE DB4

DB4

Qusay Hussein

Uday Hussein

SuspN

DB8Prenom

Qusai Hussein 30

Odai Hussein

Surnom ann

Dec. 31, 1996

Sept. 9, 2003YOB

1964

Non-ontology-based methods for DB inte-gration are quadratic

Query: “How different in age were Uday and Qusay Hussein?”

Page 19: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

you! HAL CYC

#$QusayHusseinAl-Takriti

#$UdaiHusseinAl-Takriti

(age ?PERSON (YearsDuration ?AGE))

(birthDate ?PERSON ?BIRTH-DATE)

RULES

CONCEPTS

DB4YOB

Qusay Hussein

Uday Hussein 1964

DB8Prenom ann

Qusai Hussein 30

Odai Hussein

OFAC DB8 USGS NARCL

FBI Most

WantedCATS CDE DB4

Dec. 31, 1996

Sept. 9, 2003SuspN

Surnom

1966

32

Ontology-Based Methods of DB Integration Can Scale Linearly

(…and, by the way, enables DB population/enrichment)

Page 20: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

DB4YOB

Qusay Hussein

Uday Hussein 1964

DB8Prenom ann

Qusai Hussein 30

Odai Hussein

OFAC DB8 USGS NARCL

FBI Most

WantedCATS CDE DB4

Dec. 31, 1996

Sept. 9, 2003SuspN

Surnom

1966

32

(…and, by the way, enables DB population/enrichment)

A Solution that Scales Linearly

Page 21: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

21

The answer is logically implied by data dispersed through several sources:

USGSGNISDB

AMVAKB

RAND R

UNFAODB

DTRACATS

DB

“What major US cities are particularly vulnerable to an anthrax attack?”

Page 22: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

22

“major US city” ?C is a U.S. City with >1M population

“particularly vulnerable to an anthrax attack” – the current ambient temperature at ?C is above freezing,

and– ?C has more than 100 people for each hospital bed,

and– the number of anthrax host animals near ?C exceeds 100k

“What major US cities are particularly vulnerable to an anthrax attack?”

(> (NumberOfInhabitantsFn ?C) 106)

Don’t add #pullets and #chickens

Page 23: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

23

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

 state |         name          | type  |     county     | state_fips |  -------+-----------------------+-------+----------------+------------+ TX    | Dallas                | ppl   | Dallas         |         48 | MN    | Hennepin County       | civil | Hennepin       |         27 |    CA    | Sacramento County     | civil | Sacramento     |          6 |    AZ    | Phoenix               | ppl   | Maricopa       |          4 |  

primary_lat | primary_long| elevation | population |     status      | ------------+-------------+-----------+------------+------------------+  32.78333 |       -96.8 |       463 |    1022830 | BGN 1978 1959  45.01667 |      -93.45 |         0 |    1032431 |  38.46667 |  -121.31667 |         0 |    1041219 |  33.44833 |  -112.07333 |      1072 |    1048949 | BGN 1931 1900 1897

Page 24: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

24

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

So how do we explain to our system that:

• row 1 of that table is “about” the city of Dallas, TX

• the population field of that table contains the numberof inhabitants of the city that that row is “about”

• here is exactly how to access tuples of that database

• that access will be fast, accurate, recent, complete

Page 25: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

25

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• the population field of that table contains the numberof inhabitants of the city that that row is “about”

We provide the field encodings and decodings, some of which correspond to explicit fields like population, two-letter state codes, etc:

(fieldDecoding Usgs-Gnis-LS ?x       (TheFieldCalled “population”) (numberOfInhabitants

(TheReferentOfTheRow Usgs-Gnis) ?x))

Page 26: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

26

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• row 1 of that table is “about” the city of Dallas, TX We provide the field encodings and decodings, some of which correspond to explicit fields like population, and some correspond to entities whose existence is merely implied by the existence of that row in that table (in this case, the first row implies the existence of -- and describes some specifics of -- the geographic entity that is the real-world city of Dallas, Texas, which is represented in Cyc’s KB by the term #$CityOfDallasTexas)

There is a logical field name for that entity, (TheReferentOfTheRow Usgs-Gnis) ,even though it is only talked about by the explicit fields.

Page 27: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

27

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• how to access tuples of that database We provide all the information needed for a JDBC connection script:

We assert, in the context (MappingMtFn Usgs-KS), all of these:

(passwordForSKS Usgs-KS "geografy")(portNumberForSKS Usgs-KS 4032)(serverOfSKS Usgs-KS "sksi.cyc.com")(sqlProgramForSKS Usgs-KS PostgreSQL)(structuredKnowledgeSourceName Usgs-KS "usgs")(subProtocolForSKS Usgs-KS "postgresql")(userNameForSKS "sksi")

Page 28: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

28

The Geographic Names Information System (GNIS)

DB maintained by the US Geological Survey (USGS).

USGSGNISDB

• that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc.

We assert, in the context (MappingMtFn Usgs-KS):

(schemaCompleteExtentKnownForValueTypeInArg Usgs-Gnis-LSUSCitynumberOfInhabitants 1)

Page 29: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

29

USGSGNISDB

Cyc automatically gathers statistics like these, and uses them to order search:

(resultSetCardinality Usgs-Gnis-PS        (TheSet (PhysicalFieldFn Usgs-Gnis-PS "state")) TheEmptySet 60.0)

(resultSetCardinality Usgs-Gnis-PS        (TheSet            (PhysicalFieldFn Usgs-Gnis-PS "primary_long")            (PhysicalFieldFn Usgs-Gnis-PS "primary_lat")            (PhysicalFieldFn Usgs-Gnis-PS "name"))        (TheSet            (PhysicalFieldFn Usgs-Gnis-PS "county")            (PhysicalFieldFn Usgs-Gnis-PS "state")) 530.36)

Page 30: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

30

Page 31: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

31

Page 32: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

32

Page 33: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

33

Page 34: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

34

Page 35: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

35

Semantic Knowledge Source Integration (SKSI) summary

• Some of the knowledge needed will generally be in the Cyc KB already

• Some will reside in already-mapped sources: data bases, web pages, simulators, etc.

• For each needed new source, explain the meaning of its schema elements to Cyc– Write Cyc assertions to convey the meaning of each field, each

polymorphism, each idiosyncratic entry code, plus meta-information: when this was created/updated, level of granularity, its sources, its degree of completeness, what it can do quickly, what it can do (slowly), how to access it, etc.

Structured sources

Page 36: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

36

What Led to Cyc?

1. Programs need general world knowledge, and commonsense, to break the “brittleness bottleneck”

NL understanding, speech understanding, robotics, learning, expert systems, search,…

2. We know enough to do this; it is more an engineering task than a scientific research task.

3. The time was right (1984).

Page 37: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

How “general knowledge” helps search

• Query: “Someone smiling”

• Caption: “A man helping his daughter take her first step”

find information

find information

by inference (+KB)

by inference (+KB)

Page 38: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

38

Query: “Show me pictures of strong and adventurous people”

Caption: “A man climbing a rock face”

How “general knowledge” helps search

find information

find information

by inference (+KB)

by inference (+KB)

Page 39: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

39

Text Document

Query: “Outdoor explosions in terrorist events Lebanon between 1990 and 2001”

Document: “1993 pipe bombing on the patio of the Beirut Olive Garden”

How “general knowledge” helps search

find information

find information

by inference (+KB)

by inference (+KB)

Page 40: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

40

Text Document

Query: “Threats to low-flying US airliners in Lebanon”

Document: “Hezballah buys ten SA-7’s.”

How “general knowledge” helps search

find information

find information

by inference (+KB)

by inference (+KB)

+ domain knowledge^

Page 41: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

41

XYZCoID #

birthdate

hiredate

salu-tation

firstname

lastname

emergcontact

signifother

8041 9/1/57 8/5/91 Mr Pat Jones 8053 8053

8053 3/3/49 2/9/48 Ms Jan Smith 8053 8199

Find and clean (consistency-check) Find and clean (consistency-check) information by inference (+KB)information by inference (+KB)

If Pat and Jan are married, their date of marriage should be the same; their address is likely to be the same; their genders are likely to differ; and so on.

Page 42: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

42

What Led to Cyc?

1. Programs need general world knowledge, and commonsense, to break the “brittleness bottleneck”

NL understanding, speech understanding, robotics, learning, expert systems, search,…

2. We know enough to do this; it is more an engineering task than a scientific research task.

3. The time was right (1984).

Page 43: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

43

Cyc is…

– The typical bird has 1 beak, 1 heart, lots of feathers,…

– Hearts are internal organs; feathers are external protrusions

– Most vehicles are steered by an awake, sane, adult,… human

– Tangible objects can’t be in 2 (disjoint) places at once

– Badly injuring a child is much worse than killing a dog

– Causes temporally precede (i.e., start before) their effects

– A stabbing requires 2 cotemporal and proximate actors

– etc.

Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

Page 44: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

44

- Each of these represented in formal logic- Info. about a set of hundreds of thousands of terms- Language-independent

PenitentiaryEnglishWord-Plume

EnglishWord-Pen

FrenchWord-Plume

WritingPen

BirdFeather

Authoring

ArabicWordForWritingPen

Cyc is…Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

Corral

Page 45: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

45

- Each of these represented in formal logic- Info. about a set of hundreds of thousands of terms

• An inference engine that produces the same sorts of inferences from those that people would.

• Interfaces so the system can communicate with people, data bases, spreadsheets, websites, etc.

Cyc is…Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

Page 46: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

46

CycCyc ReasoningModules

ReasoningModules

Interface to External Data Sources

Interface to External Data Sources

Cyc

API

Cyc

API

Know

led

ge

Entr

y T

ools

Know

led

ge

Entr

y T

ools

User Interface(with Natural Language Dialog)

User Interface(with Natural Language Dialog)

DataBases

WebPages

Text Sources

Other KBs

OtherApplications

OtherApplications

KnowledgeAuthors

KnowledgeAuthors

KnowledgeUsers

KnowledgeUsers

ExternalData

Sources

ExternalData

Sources

Cyc Ontology & Knowledge Base

Page 47: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

47

Painful Evolution of our Representationfrom Frames&Slots to Contextualized HOL

Very specific information(some indirect, via SKSI)

UpperOntology

CoreTheories

Domain-SpecificTheories

EVENT TEMPORAL-THING PARTIALLY-TANGIBLE-THING

( a, b ) a EVENT b EVENT causes( a, b ) precedes( a, b )

( m, a ) m MAMMAL a ANTHRAX �causes( exposed-to( m, a ), infected-by( m, a ) )

(ist FtLaudHolyCrossERCase#403921 (caused CutaneousAnthrax (SkinLesions Ahmed_al-Haznawit)))

First Order Predicate Calculus: unambiguous; enable mechanical reasoning

Every American has a president.Every American has a mother.

y.x. Amer(x) president(x,y)x.y. Amer(x) mother(x,y)

Higher Order Logic (nth-order predicate calculus): contexts,

predicates as variables, nested modals, reflection,…

Page 48: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

The inference engine is a community of 720 “agents” that attack every problem and, recursively, every subproblem (subgoal). One of these 720 is a general theorem prover; the others have special-purpose data structures/algorithms to handle the most important, most common cases, very fast.

The Knowledge Base is divided into thousands of contexts by:

granularity, topic, culture, geospatial place, time,...

Cyc is not monolithic

Cyc is not committed to any one reasoning mechanism

Page 49: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

Think of reasoning modules 721, 722, 723… as being all manner of external databases, simulators, translators…

98% of its content is marked as merely being usually true.

So reasoning in Cyc is default (gather up all the pro/con

arguments, and compare them).

Cyc is not monotonic

Cyc is not committed to its own reasoning mechanisms

Page 50: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

50

Cyc Knowledge Base

ThingThing

IntangibleThing

IntangibleThing IndividualIndividual

TemporalThing

TemporalThing

SpatialThing

SpatialThing

PartiallyTangible

Thing

PartiallyTangible

ThingPathsPaths

SetsRelations

SetsRelations

LogicMathLogicMath

HumanArtifactsHumanArtifacts

SocialRelations,

Culture

SocialRelations,

Culture

HumanAnatomy &Physiology

HumanAnatomy &Physiology

EmotionPerception

Belief

EmotionPerception

Belief

HumanBehavior &

Actions

HumanBehavior &

ActionsProductsDevices

ProductsDevices

ConceptualWorks

ConceptualWorks

VehiclesBuildingsWeapons

VehiclesBuildingsWeapons

Mechanical& Electrical

Devices

Mechanical& Electrical

Devices

SoftwareLiterature

Works of Art

SoftwareLiterature

Works of ArtLanguageLanguage

AgentOrganizations

AgentOrganizations

OrganizationalActions

OrganizationalActions

OrganizationalPlans

OrganizationalPlans

Types ofOrganizations

Types ofOrganizations

HumanOrganizations

HumanOrganizations

NationsGovernmentsGeo-Politics

NationsGovernmentsGeo-Politics

Business, Military

Organizations

Business, Military

Organizations

LawLaw

Business &CommerceBusiness &Commerce

PoliticsWarfarePoliticsWarfare

ProfessionsOccupationsProfessionsOccupations

PurchasingShopping

PurchasingShopping

TravelCommunication

TravelCommunication

Transportation& Logistics

Transportation& Logistics

SocialActivities

SocialActivities

EverydayLiving

EverydayLiving

SportsRecreation

Entertainment

SportsRecreation

Entertainment

ArtifactsArtifacts

MovementMovement

State ChangeDynamics

State ChangeDynamics

MaterialsParts

Statics

MaterialsParts

Statics

PhysicalAgents

PhysicalAgents

BordersGeometryBorders

Geometry

EventsScriptsEventsScripts

SpatialPaths

SpatialPaths

ActorsActionsActorsActions

PlansGoalsPlansGoals

TimeTime

AgentsAgents

SpaceSpace

PhysicalObjectsPhysicalObjects

HumanBeingsHumanBeings

Organ-izationOrgan-ization

HumanActivitiesHuman

Activities

LivingThingsLivingThings

SocialBehaviorSocial

Behavior

LifeFormsLife

Forms

AnimalsAnimals

PlantsPlants

EcologyEcology

NaturalGeography

NaturalGeography

Earth &Solar System

Earth &Solar System

PoliticalGeography

PoliticalGeography

WeatherWeather

General Knowledge about Various DomainsGeneral Knowledge about Various Domains

Cyc contains:15,000 Predicates

300,000 Concepts3,200,000 Assertions

Represented in:• First Order Logic• Higher Order

Logic• Context Logic• Micro-theories

Specific data, facts, and observationsSpecific data, facts, and observations

Page 51: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

51

Cyc KB extended with domain knowledge about terrorism

ThingThing

IntangibleThing

IntangibleThing IndividualIndividual

TemporalThing

TemporalThing

SpatialThing

SpatialThing

PartiallyTangible

Thing

PartiallyTangible

ThingPathsPaths

SetsRelations

SetsRelations

LogicMathLogicMath

HumanArtifactsHumanArtifacts

SocialRelations,

Culture

SocialRelations,

Culture

HumanAnatomy &Physiology

HumanAnatomy &Physiology

EmotionPerception

Belief

EmotionPerception

Belief

HumanBehavior &

Actions

HumanBehavior &

ActionsProductsDevices

ProductsDevices

ConceptualWorks

ConceptualWorks

VehiclesBuildingsWeapons

VehiclesBuildingsWeapons

Mechanical& Electrical

Devices

Mechanical& Electrical

Devices

SoftwareLiterature

Works of Art

SoftwareLiterature

Works of ArtLanguageLanguage

AgentOrganizations

AgentOrganizations

OrganizationalActions

OrganizationalActions

OrganizationalPlans

OrganizationalPlans

Types ofOrganizations

Types ofOrganizations

HumanOrganizations

HumanOrganizations

NationsGovernmentsGeo-Politics

NationsGovernmentsGeo-Politics

Business, Military

Organizations

Business, Military

Organizations

LawLaw

Business &CommerceBusiness &Commerce

PoliticsWarfarePoliticsWarfare

ProfessionsOccupationsProfessionsOccupations

PurchasingShopping

PurchasingShopping

TravelCommunication

TravelCommunication

Transportation& Logistics

Transportation& Logistics

SocialActivities

SocialActivities

EverydayLiving

EverydayLiving

SportsRecreation

Entertainment

SportsRecreation

Entertainment

ArtifactsArtifacts

MovementMovement

State ChangeDynamics

State ChangeDynamics

MaterialsParts

Statics

MaterialsParts

Statics

PhysicalAgents

PhysicalAgents

BordersGeometryBorders

Geometry

EventsScriptsEventsScripts

SpatialPaths

SpatialPaths

ActorsActionsActorsActions

PlansGoalsPlansGoals

TimeTime

AgentsAgents

SpaceSpace

PhysicalObjectsPhysicalObjects

HumanBeingsHumanBeings

Organ-izationOrgan-ization

HumanActivitiesHuman

Activities

LivingThingsLivingThings

SocialBehaviorSocial

Behavior

LifeFormsLife

Forms

AnimalsAnimals

PlantsPlants

EcologyEcology

NaturalGeography

NaturalGeography

Earth &Solar System

Earth &Solar System

PoliticalGeography

PoliticalGeography

WeatherWeather

General Knowledge about TerrorismGeneral Knowledge about Terrorism

Cyc contains:15,000 Predicates

300,000 Concepts3,200,000 Assertions

Represented in:• First Order Logic• Higher Order

Logic• Context Logic• Micro-theories

Specific data, facts, and observationsabout terrorist groups and activities

Specific data, facts, and observationsabout terrorist groups and activities

Page 52: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

Building Cyc qua Engineering Task

amount known

rate

of

lear

ning

learning by discovery

learning via

natural language

Frontier of human knowledge

Page 53: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

Building Cyc qua Engineering Task

amount known

rate

of

lear

ning

learning by discovery

learning via

natural language

Frontier of human knowledge

CYC

Page 54: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

Building Cyc qua Engineering Task

amount known

rate

of

lear

ning

learning by discovery

learning via

natural language

CYC

750 person-years

21 realtime years

$75 million

Frontier of human knowledge

198

4

200

420

05

codify & enter each piece of knowledge, by hand

Page 55: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

55

Guiding Principle:“We have to get it to work, not appear to work”

– Don’t defer hard problems (time/space/emotions…)

– No “NIH”! Harness every good idea that others have

– Take an engineering approach, not a scientific research one: Instead of one TOE (elegant full solution), find a set of partial solutions that together cover the most common cases

– Pursue applications that require large amounts of real-world knowledge (they need Cyc and also will drive it)

Page 56: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

56

Eschew the 5 pitfalls (ways to cut ontological corners and end up with

something that only appears to work)

• Ignorance-based: Have a small theory size (#terms, #instances, #rules)

• Static KB (can be massively tuned, optimized, cached, etc. ahead of time)

• Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…)

• One global context (no contradictions, limited domain, simplified world)

• Don’t do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, don’t ever have truth maintenance “turned on”)

Page 57: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

57

Eschew the 5 pitfalls (ways to cut ontological corners and end up with

something that only appears to work)

• Ignorance-based: Have a small theory size (#terms, #instances, #rules)

• Static KB (can be massively tuned, optimized, cached, etc. ahead of time)

• Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…)

• One global context (no contradictions, limited domain, simplified world)

• Don’t do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, don’t ever have truth maintenance “turned on”)

As with pharmaceuticals, what is toxic in one dosage is beneficial in a lesser dosage.

E.g., contexts lead to locally-consistent locally-small theories (faster inference/KE)

E.g., often some (sub)problems can be represented/solved in a simpler repr.

Page 58: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

58

Choosing what to add to Cyc

• Bottom-up: Look at a sentence, see what knowledge the writer assumed the reader already had about the world. Generalize that piece of knowledge.

• Top-down: Articulate the scope of a (sub)topic, and articulate queries that should be answerable. Get missing K. by introspecting or just asking Cyc.

Page 59: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

59

Represented in:• First Order Logic• Higher Order

Logic• Context Logic• Microtheories

The Cyc Knowledge Base

ThingThing

IntangibleThing

IntangibleThing IndividualIndividual

TemporalThing

TemporalThing

SpatialThing

SpatialThing

PartiallyTangible

Thing

PartiallyTangible

ThingPathsPaths

SetsRelations

SetsRelations

LogicMathLogicMath

HumanArtifactsHumanArtifacts

SocialRelations,

Culture

SocialRelations,

Culture

HumanAnatomy &Physiology

HumanAnatomy &Physiology

EmotionPerception

Belief

EmotionPerception

Belief

HumanBehavior &

Actions

HumanBehavior &

ActionsProductsDevices

ProductsDevices

ConceptualWorks

ConceptualWorks

VehiclesBuildingsWeapons

VehiclesBuildingsWeapons

Mechanical& Electrical

Devices

Mechanical& Electrical

Devices

SoftwareLiterature

Works of Art

SoftwareLiterature

Works of ArtLanguageLanguage

AgentOrganizations

AgentOrganizations

OrganizationalActions

OrganizationalActions

OrganizationalPlans

OrganizationalPlans

Types ofOrganizations

Types ofOrganizations

HumanOrganizations

HumanOrganizations

NationsGovernmentsGeo-Politics

NationsGovernmentsGeo-Politics

Business, Military

Organizations

Business, Military

Organizations

LawLaw

Business &CommerceBusiness &Commerce

PoliticsWarfarePoliticsWarfare

ProfessionsOccupationsProfessionsOccupations

PurchasingShopping

PurchasingShopping

TravelCommunication

TravelCommunication

Transportation& Logistics

Transportation& Logistics

SocialActivities

SocialActivities

EverydayLiving

EverydayLiving

SportsRecreation

Entertainment

SportsRecreation

Entertainment

ArtifactsArtifacts

MovementMovement

State ChangeDynamics

State ChangeDynamics

MaterialsParts

Statics

MaterialsParts

Statics

PhysicalAgents

PhysicalAgents

BordersGeometryBorders

Geometry

EventsScriptsEventsScripts

SpatialPaths

SpatialPaths

ActorsActionsActorsActions

PlansGoalsPlansGoals

TimeTime

AgentsAgents

SpaceSpace

PhysicalObjectsPhysicalObjects

HumanBeingsHumanBeings

Organ-izationOrgan-ization

HumanActivitiesHuman

Activities

LivingThingsLivingThings

SocialBehaviorSocial

Behavior

LifeFormsLife

Forms

AnimalsAnimals

PlantsPlants

EcologyEcology

NaturalGeography

NaturalGeography

Earth &Solar System

Earth &Solar System

PoliticalGeography

PoliticalGeography

WeatherWeather

Real World Domain KnowledgeReal World Domain Knowledge

Cyc contains:15,000 Predicates

300,000 Concepts3,200,000 Assertions

Specific cases, facts, details,…Specific cases, facts, details,…

Page 60: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

60

Page 61: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

61

Cyc KB “Whitman’s Sampler”• Temporal Relations• Senses of “x is a physical part of y”• Senses of “x is physically in y”• Events and their performers (role types)• Organizations• Propositional Attitudes• Biology• Materials• Devices• Weather• Information-bearing objects

Page 62: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

62

Temporal Relations

37 Relations Between Temporal Things

#$temporalBoundsIntersect

#$temporallyIntersects

#$startsAfterStartingOf

#$endsAfterEndingOf

#$startingDate

#$temporallyContains

#$temporallyCooriginating

#$temporalBoundsContain

#$temporalBoundsIdentical

#$startsDuring

#$overlapsStart

#$startingPoint

#$simultaneousWith

#$after

Page 63: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

63

Temporal Relations

#$temporallyIntersects

Some of these Relations are very General, such as:

Such relations are particularly useful when they are known not to hold between a pair of individuals:

(#$not (#$temporallyIntersects ?X ?Y))

That implies all of these:(#$not (#$spouse PERSON-X PERSON-Y)) (#$not (#$consultant AGENT-X AGENT-Y)) (#$not (#$accountHolder ACCOUNT-X AGENT-Y))(#$not (#$residesInRegion AGENT-X REGION-Y)) (#$not (#$officiator EVENT-X PERSON-Y))

Page 64: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

64

Senses of ‘Part’

#$parts

#$intangibleParts

#$subInformation

#$subEvents

#$physicalDecompositions

#$physicalPortions

#$physicalParts

#$externalParts

#$internalParts

#$anatomicalParts

#$constituents

#$functionalPart

Page 65: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

65

Senses of ‘In’• Can the inner object leave by passing between

members of the outer group?– Yes -- Try #$in-Among

Page 66: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

66

Senses of ‘In’• Does part of the inner

object stick out of the container?

– None of it. -- Try #$in-ContCompletely

– Yes -- Try #$in-ContPartially

– If the container were turned around could the contained object fall out?

No -- Try

#$in-ContClosed

Yes -- Try #$in-ContOpen

Page 67: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

67

Senses of ‘In’ Is it attached to the inside

of the outer object?

– Yes -- Try #$connectedToInside

Can it be removed, if enough force is used,

without damaging either object?

– Yes -- Try #$in-Snugly or #$screwedIn

Does the inner object stick into the outer

object? Yes -- Try #$sticksInto

Page 68: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

68

Event Types

#$PhysicalStateChangeEvent #$TemperatureChangingProcess #$BiologicalDevelopmentEvent #$ShapeChangeEvent #$MovementEvent #$ChangingDeviceState #$GivingSomething #$DiscoveryEvent

#$Cracking #$Carving #$Buying #$Thinking #$Mixing #$Singing #$CuttingNails #$PumpingFluid

11,000 more

Page 69: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

69

A few event types pertaining toVehicular Transportation

#$TransportationEvent #$ControllingATransportationDevice #$TransportWithMotorizedLandVehicle (#$SteeringFn #$RoadVehicle) #$TransporterCrashEvent #$VehicleAccident #$CarAccident #$Colliding #$IncurringDamage #$TippingOver #$Navigating #$EnteringAVehicle

Page 70: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

70

#$performedBy #$causes-EventEvent #$objectPlaced #$objectOfStateChange #$outputsCreated #$inputsDestroyed #$assistingAgent #$beneficiary

#$fromLocation #$toLocation #$deviceUsed #$driverActor #$damages #$vehicle #$providerOfMotiveForce

#$transportees

Relations Between Relations Between an Event and its Participantsan Event and its Participants

Over 400 more.

Page 71: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

71

Here are some slot: value pairs for Attack874 isa: TerroristAttack. performedBy: JihadGroup. deviceUsed: Bomb8388. eventOccursAt: CityOfLondonEngland. victim: Person9399. victim: Person52666. assistingAgent: AlQaeda. objectsDestroyed: Structure2990. objectsDestroyed: Vehicle523452.

These ActorSlots express each type of relation between an Event and its actors and subevents

Page 72: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

72

Organization “Slots”

• #$governingBody• #$parentCompany• #$subOrgs-Command• #$subOrgs-Permanent• #$subOrgs-Temporary• #$physicalQuarters

• #$hasHQinCountry• #$officeInCountry• #$memberTypes• #$organizationHead • #$PolicyFn• #$mainProductType

+ those predicates that make sense for each

generalization of Organization

(e.g., #$startingTime, #$alsoKnownAs).

Page 73: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

73

Emotion

• Types of Emotions:

– #$Adulation– #$Abhorrence– #$Relaxed-Feeling– #$Gratitude– #$Anticipation-Feeling– Over 120 of these

• Predicates For Defining and Attributing Emotions:

– #$contraryFeelings– #$appropriateEmotion– #$actionExpressesFeeling– #$feelsTowardsObject– #$feelsTowardsPersonType

Page 74: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

74

Propositional Attitudes Relations Between Agents and Propositions

• #$goals• #$intends• #$desires• #$hopes• #$expects• #$beliefs

• #$opinions • #$knows• #$rememberedProp• #$perceivesThat• #$seesThat• #$tastesThat

Page 75: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

75

Materials

• Common Substances• Attributes of Materials• States Of Matter• Solutions

• Electrical Conductivity • Thermal Conductivity• Structural Attributes• Tangible Attributes

Page 76: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

76

Materials

• Common Substances• Attributes of Materials• States Of Matter

– SolidStateOfMatter– LiquidStateOfMatter– GaseousStateOfMatter

• Solutions

• Electrical Conductivity • Thermal Conductivity• Structural Attributes• Tangible Attributes

– SolidTangibleThing– LiquidTangibleThing– GaseousTangibleThing

Page 77: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

77

Devices• Over 4000 Specializations

of #$PhysicalDevice– #$ClothesWasher– #$NuclearAircraftCarrier

• Vocabulary for Describing Device Functions– #$primaryFunction-DeviceType

Device Specific Predicates

• #$gunCaliber• #$speedOf

Device States (40+) #$DeviceOn

#$CockedState

Page 78: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

78

Vehicular Transport Devices• Over 800 Specializations of #$RoadVehicle

– #$AcuraCar– #$SportUtilityVehicle– #$Humvee

• Over 100 Specializations of #$AutoPart

– #$AutomobileTire– #$ShockAbsorber– #$Windshield

Five Facets of #$RoadVehicle #$RoadVehicleByChassisType #$RoadVehicleTypeByBodyStyle #$RoadVehicleTypeByModel #$RoadVehicleTypeByPowerSource #$RoadVehicleTypeByUse

• Specialized Predicates #$highwayFuelConsumption

#$vehicleLoadClass

#$trafficableForVehicle

#$vehicle

Page 79: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

79

Weather

• Weather Attributes– #$ClearWeather– #$Visibility– (#$LowAmountFn #$Raininess)

Weather Objects #$CloudInSky #$SnowMob

Weather Events #$TornadoAsEvent #$SnowProcess

Page 80: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

80

Information-Bearing Things

Books, web-page copies, radio broadcasts, utterances, intell cables, TV series,…

Page 81: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

81

“‘ T i s M o b y D i c k !”

(#$thereExists ?SEE (#$and (#$isa ?SEE Seeing) (#$objectPerceived ?SEE #$MobyDick) (#$perceiver ?SEE #$CaptainAhab)))

AbstractInformationStructure(AIS)

PropositionalInformationThing(PIT)

InformationBearingThing(IBT)

What is “Moby Dick” ?What is “Moby Dick” ?

Page 82: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

82

PropositionalInformationThing(PIT)

InformationBearingThing (IBT)

ConceptualWork(CW)

AbstractInformationStructure(AIS)

textOfIBT instantiationOfCW

InfoStructureOfCW

#$infoStructureRepresents

ContainsInfo-Propositional-CW

PITOfIBTFn

What is “Moby Dick” ?What is “Moby Dick” ?

Page 83: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

83

Bridging the Knowledge Gap

upper ontology

lower ontology: task-specific knowledge

HUMMV’s lose 18% traction in 4-inch-deep mud

Water is wet

Intermediate ontology

Vehicles slow down in bad weather

Page 84: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

84

(in 1972),

improving it over the years as -- but only as -- we needed to.

KR Lessons Learned

Fred Albertson

ownsA: Dog

isA: Person

worksFor: UT...

We started with a straightforward “Frames & Slots” representation

Page 85: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

85

KR Lessons Learned

But Frames&Slots are inadequate to naturally express

• disjunction (“Fred owns a dog or a parakeet.”)

• negation (“Fred does not own a dog.”)

• modals (“Fred believes Israel wants Egypt to expect…”)

• meta-assertions (“That rule is 50 years old but reliable.”)

• nested quantification (w)(x)(y)(z)…

“Every American has a president.” versus

“Every American has a mother.”

We started with a straightforward “Frames & Slots” representation

Page 86: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

86

KR Lessons Learned2. On the one hand, we must move from Frames&Slots to Logic.

But on the other hand: Theorem-proving is too slow! Solution: Do it, and to recoup efficiency, separate:

The Epistemological Problem

(what should the system know?)

The Heuristic Problem

(how can it reason efficiently with&about what it knows?)

I.e., represent each assertion in (at least) 2 ways:

one standard logical (predicate calculus) form (EL), and

one (or more) efficient special-purpose representations (HL)

Page 87: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

87

• Bridging the knowledge gap: do the “intermediate theories.”• Rather than struggling to reason in NL sentences, use a more

formal representation language. Make this as simple as possible (but, year by year, we had to make it ever more expressive.)

• Similarly, represent only – but all – useful distinctions. Sounds trivial but leads to huge ontologies of objects, predicates, scripts..

• Distinguish the EL and HL. Rather than striving in vain for a single fast inference engine, use a suite of 720 heuristic modules that each handle some commonly-occurring problems very fast.

• Probabilities are great iff known; often relative likelihood known• Most knowledge is default; reason by argumentation• Rather than striving in vain for a monolithic consistent KB, divide

the KB up into many locally-consistent contexts

Lessons LearnedLessons Learned

Page 88: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

88

Contexts (Microtheories)

Global Consistency:

Can’t Live With It, Can’t Give It Up!

What’s the real source of the problem?

Each rule is rich: it is a simplified statement that obscures a plethora of unstated assumptions and details.

As long as the rules are all in one coherent small context, they are likely to make the same simplifying assumptions, and hence are likely to work together consistently.

Page 89: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

89

“If it’s raining, carry an umbrella” the performer is a human being, the performer is sane, the performer can carry an umbrella; thus:

the performer is not a baby, not unconscious, not dead, the performer is going to go outdoors now/soon, their actions permit them a free hand (e.g., not wheelbarrowing) their actions wouldn’t be unduly hampered by it (e.g., marathon-running) the wind outside is not too fierce (e.g., hurricane strength) the time period of the action is after the invention of the umbrella the culture is one that uses umbrellas as a rain- (not just sun-)protection device, the performer has easy access to an umbrella; thus:

not too destitute, not someone who lives where it practically never rains,

not at the office/theater/… caught without an umbrella the performer is going to be unsheltered for some period of time

the more waterproof their clothing, the gentler the rain, and the warmer the air, the longer that time period

the performer will not be wet anyway (e.g., swimming) the rain is annoying -- but merely annoying. Thus:

not ammonia rain on Venus, radioactive post-apocalyptic rain,biblical (Noah’s-ark-sized, or frogs/blood as rained on Pharaoh)the performer is not a hydrophobic person, gingerbread man,

etc.,and not a hydrophilic person, someone dying of thirst, etc.

Page 90: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

90

Each assertion should be situated in a context: in a region of context-space

• We identified 12 dimensions of mt-space

• We developed a vocabulary of predicates and terms to describe points and regions along each of those 12 dimensions; and

• We have been situating assertions more and more precisely, and we have been working out calculi for inferring contexts

– E.g., if P is true in C1, and P=>Q is true in C2, in what context C2 can Q be validly concluded?

• Anthropacity• Time• GeoLocation• TypeOfPlace• TypeOfTime• Culture• Sophistication/Security• Topic• Granularity• Modality/Disposition

/Epistemology• Argument-Preference• Justification

Page 91: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

91

Mathematical Factoring of Context-space Dimensions

UnitedStatesIn1985Context: Ronald Reagan is president.

PennsylvaniaIn1985Context: Dick Thornburgh is governor.

LehighCountyInFebruary1985Context: Dick Thornburgh is governor and Ronald

Reagan is president.

This inference depends

on the time, space, and

respective granularities

of the contexts.

There are at least 900,000 doctors.

Dick Thornburgh is governor and there

are at least 900,000 doctors.

Page 92: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

92

Time Indices and Granularities

But not:

Doug is talking, at 10:55:11 to 10:55:13, on 11/17/05.

Doug is talking, at 10:30 to 11:30, on 11/17/05.

Doug is talking, at 10:50 to 11:05, on 11/17/05.

Therefore:

Page 93: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

93

Time Indices and Granularities

t = that one hour interval

Future

t

So: talking during that 15-minute interval? Yes

Talking during that 2-second interval: Unknown

Calendar Minutes

P = Doug is talking.

Doug is talking, at 10:30 to 11:30, on 11/17/05 with temporal granularity calendar minute.

Page 94: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

94

• Cyc is a power source, not a single application.Like oil, electricity, telephony, computers,… Cyc can spawn and sustain a new industry.

• It can cost-effectively underlie almost all apps.(Provide a common-sense layer to reduce brittleness when faced with unexpected inputs/situations)

• To apply Cyc, we extend its ontology, its KB, and possibly its suite of specialized reasoning modules

Summary (1): Technology

Page 95: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

95

20 Motivating Applications (1984)

Page 96: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

96

5 More Recent Application Ideas

Page 97: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

97

Recent/Current Government Apps• Dept. of Defense (mostly DARPA, ONR)

– CoABS, HPKB, CPoF, DAML, ACIP– RKF (OE-ing by non-logicians via clarification dialogue)– BUTLER: Knowledge-based machine learning– ResearchCyc: Clean, document, speed up, interface, etc.– ONR: Level 2 and 3 Information Fusion (sense-making)

• Other US Government Agencies (NSF, ARDA, NIST)– NIST ATP: Jumpstarting a Nat’l. Knowledge Infrastructure– AQUAINT, NIMD, Topsail, Eagle, KSP-ATD,…– Building a comprehensive terrorism KB for the US– Automated generation of plausible terrorism threat scenarios– Modeling intelligence analysts (script learning/recognition)– Semantic knowledge source integration– Efficient Inference in Large Knowledge Bases

Page 98: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

98

• using Cyc as the basis for a medical ontology – aligning Cyc with Snomed/UMLS/Mesh/...

• multiple-thesaurus manager (align n 300k-term lists)

• spider the entire Web (indexing it in terms of Cyc concepts)

• identify inter-sentential references in NPR transcripts• improved web (and website) search query/follow-ups• vulnerability assessment (reason about a scanned network)

• semantic matching for a better customer experience

Recent/Current Commercial Apps

Page 99: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

99

Summary (2): Cycorp

• 50 employees (almost all MTS’s)

• Revenue about $7M/year (some commercial licenses and app.’s, but >50% US Government R&D contracts)

• Employee-owned (VC-free and debt-free)

• $75M development effort (750 PY’s over 21 years)– Mostly spent on building up its ontology and KB

– To a lesser extent, its reasoning modules and interfaces

– Focus: automatically growing Cyc via learning

– Focus: enabling Cyc users to directly extend it

– Focus: making inference orders of magnitude faster

Page 100: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

100

• bits/bytes/streams/network…• alphabet, special characters,…• words, morphological variants,…• syntactic meta-level markups (HTML)• semantic meta-level markups (SGML, XML)• content (logical representation of doc/page/...)• context (common sense, recent utterances, and n

dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.)

Summary (3): The Message: What Needs to be Shared?

Page 101: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

101

Summary (3): The Message: What Needs to be Shared?

• bits/bytes/streams/network…• alphabet, special characters,…• words, morphological variants,…• syntactic meta-level markups (HTML)• semantic meta-level markups (SGML, XML)• content (logical representation of doc/page/...)• context (common sense, recent utterances, and n

dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.)

Tiny vocabulary (# distinctions) of standard relations: rdf:type, subclass, label, domain, range, comment,…

Beyond which diversity is toleratedWhich means divergence is inevitable

“What do you mean we have no standard, we have lots of standards!”

Page 102: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

102

Summary (3): The Message: What Needs to be Shared?

• bits/bytes/streams/network…• alphabet, special characters,…• words, morphological variants,…• syntactic meta-level markups (HTML)• semantic meta-level markups (SGML, XML)• content (logical representation of doc/page/...)• context (common sense, recent utterances, and n

dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.)

Tiny vocabulary (# distinctions) of standard relations: rdf:type, subclass, label, domain, range, comment,…

Beyond which diversity is toleratedWhich means divergence is inevitable

“What do you mean we have no standard, we have lots of standards!”

DAML+OIL adds a few more distinctions:

inverses, unambiguous properties, unique properties, lists, restrictions, cardinalities,

pairwise disjoint lists, datatypes, …

To do the logical/arithmetic combination across information sources, we need

tens of thousands of relations, not tens

Page 103: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

103

From the User’s POV

• The user has a question they want answered• The data needed to answer it is available to them,

but not in one single, obvious, reliable place• The answers follow logically (and/or

arithmetically) from m elements in n sources• Don’t want to have to know, ahead of time, what

sources to go to, how to access them, how to combine the intermediate results.

• Do want to be able to limit, ahead of time, the uncertainty, recency, granularity, ideology… (and/or see such meta-level info for each answer)

“Which first-run movies star a teenager born in Texas

and are showing today at a theater < 10 minutes’ drive from this building?”

Page 104: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

104

From the User’s POV

• The user has a question they want answered• The data needed to answer it is available to them,

but not in one single, obvious, reliable place• The answers follow logically (and/or

arithmetically) from m elements in n sources• Don’t want to have to know, ahead of time, what

sources to go to, how to access them, how to combine the intermediate results.

• Do want to be able to limit, ahead of time, the uncertainty, recency, granularity, ideology… (and/or see such meta-level info for each answer)

Page 105: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

105

From the User’s POV

• The user has a question they want answered• The data needed to answer it is available to them,

but not in one single, obvious, reliable place• Do want the answer to be found automatically,

not a bunch of relevant pages for them to peruse.• Don’t want to have to know, ahead of time, what

sources to go to, how to access them, how to combine the intermediate results.

• Do want to be able to limit, ahead of time, the uncertainty, recency, granularity, ideology… (and/or see such meta-level info for each answer)

“Which first-run movies star a teenager born in Texas

and are showing today at a theater < 10 minutes’ drive from this building?”

Page 106: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

106

• bits/bytes/streams/network…• alphabet, special characters,…• words, morphological variants,…• syntactic meta-level markups (HTML)• semantic meta-level markups (SGML, XML)• content (logical representation of doc/page/...)• context (common sense, recent utterances, and n

dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.)

Summary (3): The Message: What Needs to be Shared?

Page 107: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

107

End of “The Message” End of “The Summary”

Delve into a typical domain – answering intelligence analysts’ queries – where Cyc can really help, because that domain thwarts all five of “ontological corner-cutting” solutions

(+ digressions for OpenCyc, ResearchCyc,…)

Page 108: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

108

Eschew the 5 pitfalls (ways to cut ontological corners and end up with

something that only appears to work)

• Ignorance-based: Have a small theory size (#terms, #instances, #rules)

• Static KB (can be massively tuned, optimized, cached, etc. ahead of time)

• Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…)

• One global context (no contradictions, limited domain, simplified world)

• Don’t do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, don’t ever have truth maintenance “turned on”)

As with pharmaceuticals, what is toxic in one dosage is beneficial in a lesser dosage.

E.g., contexts lead to locally-consistent locally-small theories (faster inference/KE)

E.g., often some (sub)problems can be represented/solved in a simpler repr.

Page 109: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

109

"What sequences of events could lead to

the destruction of Hoover Dam?"

“Were there any attacks on targets of symbolic value to

Muslims since 1987 on a Christian holy day?"

CycCyc

Terrorism KnowledgeTerrorism Knowledge

ReasoningModules

ReasoningModulesCycCyc ReasoningModules

ReasoningModules

Cycorp Tools For:Ontology-Building,

-Browsing, -Editing, & Fact/Rule Entry

Domain Experts Scenario

GenerationExplanation Generation

Query Formulation

Scenario Generator

Explanation Generator

Query Formulator

Others’/GOTSAnalysis and Collaboration Components

Interface to Data Repositories

Border Crossings

HIDObserva-

tions

Travel Records

Credit Card

Records

GeopoliticalData

GlobalTerrain

Data

Weather Data

Satellite Intel

HUMINTMessages

INSData

MilitaryIntel

output ofCOTS Text ExtractionSystems

SIGINTMessageContent

AKB

The Analyst’s Knowledge Base

Relational DB “projection” of the AKB

CT Analyst

Terrorism Knowledge

GeneralKnowledgeTerrorism Knowledge

Base

Terrorism Knowledge

Base)Terrorism Knowledge

GeneralKnowledge

Page 110: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

2. Terrorism domain experts met to develop a schema for the missing knowledge.

4. Cyc uses general and domain knowledge to convert the simpleEnglish phrases into formal logic.

TKS6

MIPT TKS3 MATRIX

TTT TKS7 TKS8

TKS2

1. Fusion of available structured terrorism knowledge sources: A tiny fraction of the Comprehensive AKB.

Terrorism Knowledge

Preexisting Structured

Relevant Knowledge

1.92M

80k

5. The Comprehensive AKB: First useful state: will contain over 4M facts and rules of thumb, about half of which is pre-existing general knowledge already in Cyc.

3. They and others are working remotely, collaboratively, to flesh out the missing 95% of the AKB.

Page 111: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

111

1) List the [ORGANIZATIONS] at which [AGENT] was [STATUS] and when.  (1a) List the schools at which [Mohammed Atta] was [enrolled] and when. (1b) List the companies at which [Mark Fulton] was a [employed] and when.  (3) What percentage of [ATTACK-TYPE] are [ATTACK-TYPE]?  (3a) What percentage of [terrorist attacks] are [poisonings]? (3b) What percentage of [bombings] are [suicide bombings]?  (4) Between what times was the [AGENT] a/an [ROLE-PREDICATE] in what types of acts and where?  (4a) Between what times was the [Aum Supreme Truth] a [performer] in what types of acts and where? (4b) Between what times was the [Ulster Volunteer Force] an [assisting agent] in what types of acts and where?  

Templatized Terrorism Analysis QueriesTemplatized Terrorism Analysis Queries

Page 112: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

112

(13) List all [AGENT-TYPE] in [LOCATION] that have used [DEVICE-TYPE] and list the specific types of (devices) that each has used.  (13a) List all [revolt organizations] in [Northern Ireland] that have used [pipe bombs] and list the specific types of pipe bombs that each has used. (13b) List all [right wing terrorist groups] in [North America] that have used [package bombs] and list the specific types of package bombs that each has used.  

(22) List the [AGENT-TYPE] who have [RELATION] [TYPE] to [AGENT] and what those supplies were.  (22a) List the [Terrorist groups] who have [given] [supplies] to [Hamas] and what those supplies were. (22b) List the [state sponsored terrorist agents] who have [provided] [support] to [Osama Bin Laden] and what those supplies were.

Templatized Terrorism Analysis QueriesTemplatized Terrorism Analysis Queries

Page 113: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

113

CIA Intelligence Report“Seeking Information: Ahmad Said”July 26, 2004

Ahmad Said, an expert on remote-controlled bombs with a degree in chemical engineering, was seen travelling to Lebanon early this month. Said claimed to be a member of the Lebanese Hizballah from the mid 1980s until late July 1999.

It is currently believed that Said assisted in the July 22nd car bombing in Beirut that damaged police barracks and destroyed several retail stores. Lebanese Hizballah's spokesman, Emad Mugniyeh, issued a statement on July 26th to the Al Aman newspaper denying the group's involvement in the attack.

Page 114: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

114

Deeper Analytical Question AnsweringWhat factors argue <for/against> the conclusion that

<ETA> <performed> <the March 2004 Madrid attacks>?

For:- ETA often executes attacks near national election- ETA has performed multi-target coordinated attacks- Over the past 30 years, ETA performed 75% of all terrorist attacks in Spain- Over the past 30 years, 98% of all terrorist attacks in Spain were performed by Spain-based groups, and ETA is a Spain-based group.Against:-ETA warns (a few minutes ahead of time) of attacks that would result in a high number civilian casualties, to prevent them. There was no such warning prior to this attack.-ETA generally takes responsibility for its attacks, and it did not do so this time.-ETA has never been known to falsely deny responsibility for an attack, and it did deny responsibility for this attack.

Page 115: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

115

Automatic Link Detection

Page 116: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

116

Automatic Link Detection

Page 117: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

Intelligent Fusion: Disparate Data• USS Lake Champlain is scheduled to return to its

homeport (NavBase San Diego) 1300 4 September

• Hurricane Howard predicted to make landfall at Tijuana, Mexico approx. 0100 5 September

• 0600 4 September: satellite imagery reveals 126 boats berthed Silver Gate Yacht Club.

• 1135 4 September: Coast Guard reports two cigarette boats, traveling together at 54 knots, on a trajectory consistent with a path from the Silver Gate Yacht Club to the entrance of the San Diego Naval Base.

• Monitoring of cell phone activity of a suspected Red Dawn terrorist cell member in Syria has identified four calls, each of 30 seconds’ duration, placed to that suspect from Shelter Island between 2300 September 3 and 1100 September 4.

Intelligent Fusion: Disparate Data

• 0600 4 September: Silver Gate Yacht Club harbormaster manifest only lists 124 craft.

Page 118: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

118

meet inmiddle

Start from seed,if given one

Generate chainsof action and

plausible reaction

Each step should be bothplausible and interesting

End at target,if given one

Grow whole populations ofsuch paths, not just one.

Employ heuristics to evaluate each node’s “promise”:

plausibility x interestingness

Automatic Generation of Plausible (Counter)Terrorism Scenarios

Page 119: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

119

Each step can be a…

• Political event (e.g., an election)• Diplomatic event (communique’)• Military event (buildup along border)• Terrorist event (suicide bombing)• Economic event (loan; arms sale)• Infrastructure event (power outage)• Act of Nature (illness; hurricane)

Often a step is just a response, by 1 or moreagents, to the prior step

(or, if going right to left, it is anenabler/cause of the already-known successor step)

Generate chainsof action and

plausible reaction

Page 120: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

120

Each step can be a…

• Political event (e.g., an election)• Diplomatic event (communique’)• Military event (buildup along border)• Terrorist event (suicide bombing)• Economic event (loan; arms sale)• Infrastructure event (power outage)• Act of Nature (illness; hurricane)

Hoover dam is blown up

Generate chainsof action and

plausible reaction

Page 121: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

121

Hoover dam is blown up

detonate a crude 100 ktonnuclear bomb, 1 km away

Al Qaida has high net worth (assets) andthe will to do it

buy it for $1Mfrom Pakistan

Al Qaida does asudden, atypical liquidizing of $1Mof its assets

Destroy 3.24M tons of concrete

Something that

we can look for

Pakistan has such devicesand is financially hurting

Generate chainsof action and

plausible reaction

Page 122: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

122

Page 123: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

123

Auto. Scen.Gen.: Lessons Learned

• Forward generation is too explosive

• Backward generation is too sterile

• Instead, use a sort of “cardiac rhythm”– Take a large step backward (ABDUCTION)– Work forward a little from it (DEDUCTION)– Repeat.

Page 124: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

124

Targeted Fact Gathering: Web Search

• Abu Sayyaf was founded in ___

• Al Harakat Islamiya, established in ___

• ASG was established in ___

Search Strings

Local storage

Abu Sayyaf was founded in the early 1990s

Parse

(foundingDate AbuSayyaf (EarlyPartFn (DecadeFn 199)))

Suggested Fact

(foundingDate AbuSayyaf ?X)

Page 125: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

125

• (maritalStatus YassirArafat Single)

• (maritalStatus YassirArafat Married)

• (maritalStatus YassirArafat Divorced) …

•(maritalStatus YassirArafat Cohabitating-Unmarried)

Search Strings

(maritalStatus YassirArafat Married)

Suggested Fact

(maritalStatus YassirArafat ?X)

• Yasser Arafat’s fiance

• Yasser Arafat’s wife

• Yasser Arafat’s ex-wife

• Yasser Arafat divorced

All Possible Facts

PersonTypeByMaritalStatus

Targeted Fact Gathering: Web Search

Page 126: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

126

Harnessing Lots of Users

useful distinguishing facts

• Identify underpopulated common sense predicates• Use semantic constraints + shallow parsing to identify possible fact completions• Present multiple choice questions to novices to complete facts

150-400 commonsense GAFs/hour

Hat worn on: Head Neck Foot Leg

Page 127: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

127

OpenCycOpen Source release of: [most of] the Cyc

Ontology + Simple Relns. + Inference Engine

ResearchCycAlmost All of Cyc (for free for R&D purposes)

Page 128: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

128

The OpenCyc Release• Runs on Windows, Linux• OpenCyc Knowledge Base

– LGPL license– 47,000 terms– 306,000 facts

• Cyc Inference Engine– Free license for binary runtime engine

• Application Programming Interface– Java, SubL, Python

• Extensive documentation– Ontological Engineer’s Handbook– Online Cyc 101 course

Page 129: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

129

Why Do We Release All This?• Advance the starting line for AI• Enable a large number of users to in effect

help us to grow the Cyc Knowledge Base • Help Cyc become a critical component

– in the Semantic Web– in more and more applications– using OpenCyc hopefully leads to using

ResearchCyc for free, eventually licensed

Page 130: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

130

OpenCyc is Upward- Compatible with ResearchCyc

ResearchCyc contains

• OpenCyc

• Natural Language Processing subsystem

• Many more facts/rules per term

– The “extent” of non-structural predicates

Page 131: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

131

60,000 OpenCyc Users/Contributors,50 Active ResearchCyc User Groups:

Xerox PARC

Daxtron Labs Lockheed Martin ATLD

Government

Government-related

Commercial

HoustonVA Medical Center

Air ForceRome Labs

Institute for the StudyOf Accelerating Change

U of Maryland

Language ComputerCorporation

NTTCommunications Science

Laboratories (Japan)

Northwestern U Stanford NLP Dept.

ANSER, Inc.

LBJ School of Public Affairs

Fraunhofer Institute

U of Illinois Urbana-Champaign

New MexicoHighlands Univ.

Harvard U

Linkoping U (Sweden)

Radboud U (Netherlands)

Tokyo Inst.of Technology

Terra IncognitaUniversity

Microfabrica, Inc.

U of Stuttgart

NPOs

MIT Media Lab

Witan International

U of Pennsylvania

SRI21st Century

Technologies

U of Minnesota

Stone’s Throw Technologies

ISI

Trimtab Consulting

U of Hawaii

Rensselaer AI and Reasoning LabTNO-DMV (Netherlands)

Sapio Systems (Denmark)

U of Toronto

Knowledge Media Institute, Open

University

Austin Info Systems

Page 132: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

132

End of “The Message” End of “The Summary”

Delve into a typical domain – answering intelligence analysts’ queries – where Cyc can really help, because that domain thwarts all five of “ontological corner-cutting” solutions

(+ digressions for OpenCyc, ResearchCyc,…)

Page 133: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

133

Eschew the 5 pitfalls (ways to cut ontological corners and end up with

something that only appears to work)

• Ignorance-based: Have a small theory size (#terms, #instances, #rules)

• Static KB (can be massively tuned, optimized, cached, etc. ahead of time)

• Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…)

• One global context (no contradictions, limited domain, simplified world)

• Don’t do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, don’t ever have truth maintenance “turned on”)

As with pharmaceuticals, what is toxic in one dosage is beneficial in a lesser dosage.

E.g., contexts lead to locally-consistent locally-small theories (faster inference/KE)

E.g., often some (sub)problems can be represented/solved in a simpler repr.

Page 134: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

134

5 Factors slowing IC inferenceProblem

(F1) Constant stream of new assertions, new data to assimilate.– “elaboration tolerance” vs. tuned, optimized, “compiled” representations.

(F2) Theory Size: Huge vocab. and #instances (people, specific reports,…)

(F3) Sophisticated assertions and constraints strain even FOPC– More repr. language “features” (e.g., quantification) => slower inference

(F4) Assertions are often true in one context and false in another– Contextualized data and queries => exponentially larger search space

(F5) Truth maintenance must be “on”, to assimilate new data properly, and to provide the symbolic justifications behind its conclusions. – Each new datum can trigger an avalanche of TMS reactions in the KB

– There can be multiple answers, each with multiple justifications

Page 135: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

135

5 Factors slowing IC inference

(F1) Constant stream of new assertions, new data to assimilate.– “elaboration tolerance” vs. tuned, optimized, “compiled” representations.

(F2) Theory Size: Huge vocab. and #instances (people, specific reports,…)

(F3) Sophisticated assertions and constraints strain even FOPC– More repr. language “features” (e.g., quantification) => slower inference

(F4) Assertions are often true in one context and false in another– Contextualized data and queries => exponentially larger search space

(F5) Truth maintenance must be “on”, to assimilate new data properly, and to provide the symbolic justifications behind its conclusions. – Each new datum can trigger an avalanche of TMS reactions in the KB

– There can be multiple answers, each with multiple justifications

Problem

Page 136: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

136

Slow Queries

• Queries that take a long time (okay, but faster is better)– Generate scenarios resulting in destruction of NY Stock Exchange

Still running after 2 months

– Answer query Q modulo a small number of plausible “unknown” clauses

• Queries that take a long time and shouldn’t– (capableOf  ArnoldSchwarzenegger RunningForPresidentOfUS) 

Takes 40 minutes to return False.

Why: Wasting time seeing if Arnold is an x where x can’t be President (e.g., Cow)

– (hasBeliefSystems  AdolfHitler  AntiSemitism) In the context of World History 1944, takes 16 minutes to return True.

Why: Lots of ways this might not be true

Page 137: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

137

Page 138: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

138

Page 139: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

139

Slow Queries

• Queries that take a long time (okay, but faster is better)– Generate scenarios resulting in destruction of NY Stock Exchange

Still running after 2 months

– Answer query Q modulo a small number of plausible “unknown” clauses

• Queries that take a long time and shouldn’t– (capableOf  ArnoldSchwarzenegger RunningForPresidentOfUS) 

Takes 40 minutes to return False.

Why: Wasting time seeing if Arnold is an x where x can’t be President (e.g., Cow)

– (hasBeliefSystems  AdolfHitler  AntiSemitism) In the context of World History 1944, takes 16 minutes to return True.

Why: Lots of ways this might not be true

Page 140: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

140

Effic. Reasoning Hypotheses

• Hypothesis 1: There is no silver bullet, no one magic key waiting to be discovered which will unlock efficient pathfinding on huge knowledge-spaces.

– Rather, such inference will only be improved

incrementally, by bringing to bear a large number of efficient partial solutions.

Page 141: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

141

Effic. Reasoning Hypotheses

• Hypothesis 2: These special-case solutions are not random, but factor into a handful of different categories.

– A 2-day workshop meeting could productively

be held for each such category– Important interstitial work to be done,

collaboratively, before and after the meetings.

Page 142: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

142

6 categories (workshop topics)

• Reasoners that exploit limitations in the expressivity of the repr. language they operate over– Description Logic, 1st order, etc. – What simplifications enable what speedups? – At what risk?

• Domain-specific (incl. Context-specific) reasoners• Statistical/Bayesian Reasoners • “Unsound” (but presumably useful) reasoners

• Meta-reasoners (tacticians) and Meta2 (strategists)

• Parellel Processing, HW Acceleration, “Other”

Page 143: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

143

6 categories (workshop topics)

• Reasoners that exploit limitations in the expressivity of the repr. language they operate over– Description Logic, 1st order, etc. – What simplifications enable what speedups? – At what risk?

• Domain-specific (incl. Context-specific) reasoners– What sorts of domain knowledge do they utilize? – How do they use that to speed up inference? – Contexts, dimensions of context-space, algorithms for

exploiting that structure of the KB to do faster reasoning

Page 144: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

144

6 categories (workshop topics)

• Statistical/Bayesian Reasoners – How can these cooperate with, help, and be helped by non-

statistical reasoners (acting as independent agents)? – How can statistical and symbolic inference be more tightly

integrated in a single reasoner (cf. Koller) ?

• “Unsound” (but presumably useful) reasoners– Abduction, induction, analogy, abstraction (ignoring

details which hopefully won’t matter), scen. generation– How can these cooperate with, help, and be helped…?– How can unsound and sound inference be more tightly

integrated in a single reasoning engine?

Page 145: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

145

6 categories (workshop topics)

• Meta-reasoners (tacticians) and Meta2 (strategists)

– Do/Improve object-level meta- level reasoning– Types of meta-… (prior & tacit; trails; reflection;…)

• “Other”– Parallel processing– Hardware acceleration (special purpose chips etc.)– New types of reasoning modules and strategies, that don’t

fit in any above group, that folks are working on. – What specific gaps are there (useful, doable, efficient

reasoners no one has even started to research yet) ?

Page 146: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

146

Background & Lit. Review

• Instantiation-based reasoning systems• Lifted DPLL procedures (Davis Putnam Longemann Loveland)

• Completion/Boolean Ring based methods • ContractNet• TeamWork• Scatter-gather algorithms • Auto. theory decomposition by static analysis• Explanation-based learning/partial evaluation

mechanisms that learn generalized proof schemata

Page 147: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

147

Effic. Reasoning Hypotheses

1. No silver bullet2. 6 types of powerful partial solutions already exist

– Reasoners that exploit limitations in the expressivity of the representation language they operate over

– Domain-specific (incl. Context-specific) reasoners– Statistical/Bayesian Reasoners – “Unsound” (but presumably useful) reasoners– Meta-reasoners (tacticians) and Meta2 (strategists)

– “Other”, HW accel., parallel processing

3. They can cooperate / synergize (neutral harness)

Page 148: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

148

Effic. Reasoning Hypotheses

• Hypothesis 3: They can cooperate / synergize.

– Explicitly characterize, for each “agent” (reasoner):

• A trigger -- in effect specifying its area of competence

• A procedure for estimating its cost, its chance to succeed, etc.

– Cyc’s immense KB and ELHL architecture makes it an efficient reasoning module “magnet” or “universal recipient”

Page 149: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

149

Effic. Reasoning Hypotheses

• Hypothesis 3: They can cooperate / synergize.

More than that, we can and will harness ~10 of them.– Explicitly characterize, for each “agent” (reasoner):

• A trigger -- in effect specifying its area of competence

• A procedure for estimating its cost, its chance to succeed, etc.

– Cyc’s immense KB and ELHL architecture makes it an efficient reasoning module “magnet” or “universal recipient”

• Use Cyc [and ARDA-related assertions/queries in it] as a testbed for – operationally “publishing” the results of each workshop

– experiments on comparative and collaborative power

/SOWHold 3 workshops, on the 6 topics, in 2006Participation by all the leading expertsPre: readings. Post: actually harness them

Page 150: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

Efficient Pathfinding in Very Large Data SpacesGOALS• Develop an ontology and a standard for specifying the applicability, %

success, estimated resource cost, etc., of bringing various reasoning modules to bear on a problem

• Build an Integration Framework, a Harness, that enables several of the world’s leading reasoning systems to cooperatively solve problems [using the above ontology and standard to act as agents, broadcast subproblems, etc.] Actually hook them up to this Har-ness and run them, on test problems from NIMD, AQUAINT, etc.

• Overcome the 5 problems that make IC reasoning hard: (1) New assertions constantly (can’t just “compile” the KB)(2) Each is true in some contexts (in 2003; believed by x)(3) Many are complex (x believes that y believes that…)(4) Huge vocabulary size and number of instances(5) Justifications / sources matter (truth maint. Must be “on”)

Workshop Highlights4Q 05 Pre-start invitations and Steering Comm. planning

1Q 06 Project starts. 1st workshop: gaining efficiency by limiting representation language expressivity

2Q 06

Interstitial work on ontology and standard; building the initial Framework/harness; try out 2 “agents”; 2nd workshop: gaining efficiency by limiting the domain, the type of problem to be solved, etc.

3Q 06 3rd workshop: Integrating Bayesian probability and statistical reasoning with symbolic theorem-proving

1Q 07 4th workshop: meta-reasoning (tactics & strategy)5th workshop: unsound reasoning (e.g., analogy)

4Q 06 6th workshop; Final Report; Hand-off to I.C./Ops “Champions” for tech transfer/operationalization

APPROACH• Identify the most important ways in which automated reasoners gain

efficiency: limit domain, limit expressive-ness, integrate probabilistic and symbolic reasoning, meta-reasoning, and unsound reasoning (e.g., analogy)

• Hold a workshop on each topic (16 invitees; 15 said “Yes”)

• After/between the workshops, get these system builders to “publish” their reasoner to the growing Framework/harness so each can bid for, work on, and broadcast subproblems

Workshop PI’s:

Doug Lenat, Cycorp Michael Genesereth, StanfordWorkshop “Steering Committee”: R.V. Guha, Google; Chris Welty & Andrew Tompkins, IBM; Andrei Veronkov, Manchester; + I.C./Ops. “Champions”

Page 151: November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512)

November 17, 2005

1512 July 2005

The pursuit of Artificial Intelligence -- from robotics to natural language processing to automated learning -- has been held back by the "brittleness bottleneck" caused by the need for common sense. For 21 years, we've been priming the pump, building up a formalized corpus of such knowledge, Cyc. Along the way, we've had to revise our preconceptions and theories, to expand our representation language and arsenal of inference methods, to find approximate yet adequate engineering solutions to problems that philosophers have grappled with for millennia such as ontologizing aspects of substances versus individual objects, time, space, causality, belief, social interactions, and so on. The process of ontological engineering had to grow and evolve throughout this enterprise, as well, such as how Cyc represents and reasons with contradictions and context.   

In this talk I will try to cover both the large scale picture of what we've built and why, and the detailed picture of how it's built, and the lessons learned along the way in how and how not to do large-scale OE. I will report on our recent efforts to make Cyc more accessible to the broader community through OpenCyc and ResearchCyc, which raises issues of how multiple individuals and groups can share and integrate their extensions (and settle their differences). Finally, I will discuss an exciting new effort we have just had funded, to gather automated reasoning researchers together for a series of workshops in 2006 on speeding up inference in large knowledge bases by orders of magnitude.

CYC: Lessons Learned in Large-CYC: Lessons Learned in Large-Scale Ontological EngineeringScale Ontological Engineering