exploiting a thesaurus-based semantic net for knowledge-based search peter clark john thompson...

29
Exploiting a Thesaurus- Based Semantic Net for Knowledge-Based Search Peter Clark John Thompson Lisbeth Duncan Heather Holmback Knowledge Systems Boeing, Mathematics and Computing Technology

Upload: kenia-orum

Post on 16-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based

Search

Peter ClarkJohn ThompsonLisbeth Duncan

Heather Holmback

Knowledge SystemsBoeing, Mathematics and Computing Technology

Overview• Problem: searching for information

– in particular, for human experts• Approach:

– Search using concepts, not words– Use a thesaurus as the initial ontology– Enhance it using simple AI techniques

• The Application: – Two deployed “Expert Locator” applications

Overall Picture

SearchEngine

Query words

“tube placement”

DatabasesHumanExperts

Web pages Documentrepositories ...

Problems with word searches..• Words have many senses (polysemy)

– e.g. “plane” finds both airplanes and geometry• Many words mean the same thing (synonymy)

– e.g. “tail fin” misses “vertical stabilizer” • Lack of world knowledge

– e.g. “jet engine” misses “propulsion systems”

Goal: organize search around concepts, not words

Need a conceptual vocabulary (“ontology”)

The Ontology Bottleneck• Massive up-front cost to build an ontology

• Use a technical thesaurus, enhanced with AI techniques

• Boeing’s Thesaurus:

– Highly customized to aerospace and Boeing

– Massive knowledge repository • 37,000 concepts, 18,000 synonyms

• 100,000 relationships (3 types)

– Many person-years investment of effort

The Approach

A (tiny) fragment of the ontology...

Jetengines

flameout

combustion

Burningrate

afterburning

Ramjetengines

Hydrogenfuels

enginesPropulsion

systems

thrustliftTurbojetengines

Enginestarters

Flamestability

Combustionstability

Flamepropagation

Pneumaticequipment starting

ignition

sprayJet spray

Converting Words to Concepts

Jetengines

flameout

combustion

Burningrate

afterburning

Ramjetengines

Hydrogenfuels

enginesPropulsion

systems

thrustliftTurbojetengines

Enginestarters

Flamestability

Combustionstability

Flamepropagation

Pneumaticequipment starting

ignition

• Search word: “jet”

sprayJet spray

?

?

?

?

Matching Query and Target Concepts

Jetengines

flameout

combustion

Burningrate

afterburning

Ramjetengines

Hydrogenfuels

enginesPropulsion

systems

thrustliftTurbojetengines

Enginestarters

Flamestability

Combustionstability

Flamepropagation

Pneumaticequipment starting

ignition

• Semantic distance between “ignition” and “jet engines”?

sprayJet spray

Expert Locator Demo

(see end of this presentation for the demo in powerpoint form)

• 100,000 links are not enough!– 40% of concepts are “orphans”

• But: Many concept names are phrases– Can add links by analyzing these phrases

Enhancing the Thesaurus:1. Increase connectivity using subsumption

Space Shuttle Main Engine Enginegeneralization

Space Shuttlerelated-to

Subsumption Computation Algorithm

Space Shuttle Main Engine

1. Compute all possible generalizations by “word chopping” and “word generalization”...

Engine

Space Shuttle Engine

Space Engine

Space Vehicle Main Engine

Space Shuttle Main Space Shuttle

Space VehicleSpace

Shuttle

VehicleVehicle Engine

Vehicle Main Engine

Vehicle Main

Space Shuttle Main Engine

Space Shuttle Engine

Space Engine

Space Vehicle Main Engine

Vehicle Main Engine

Space Shuttle Main

Space VehicleSpace

Shuttle

Vehicle Engine

Engine

Space Shuttle

Vehicle

Subsumption Computation Algorithm2. Identify existing Thesaurus concepts and links within these

Vehicle Main

Space Shuttle Engine

Space Engine

Space Vehicle Main Engine

Space Shuttle Main

Space VehicleSpace

Shuttle

Vehicle Engine

Engine

Space Shuttle

Vehicle

Space Shuttle Main Engine

Subsumption Computation Algorithm3. Add missing connections to nearest existing concepts

Vehicle Main Engine

Vehicle Main

MeasuringInstruments

Equipment

OpticalMeasuring

Instruments

DistanceMeasuringEquipment

Range Finders

Optical Range Finders

Halogen Compounds

Fourine Compounds

NitrogenFourine

CompoundsFourides

Nitrogen Flourides

Some Example Inferred Links

• 21,000 generalization/specialization and 37,000 related-to links added

• Number of “orphans” down from 40% to 13%

Metal Tube Metalmade-ofNew:

Enhancing the Thesaurus:2. Use NLP to refine the “related-to” links

Metal Tube Metalrelated toCurrent:

• 27 relationship types chosen (causes, location, …)• heuristic noun-noun rules selects relationship, e.g

For compound “X Y” (e.g. “metal tube”):IF X is a MaterialAND Y is a Physical-ObjectTHEN Y made-of X

• Can use relation type to help compute semantic distance

Definition: “Flap: A movable airfoil attached to an airplane’s wing, and used to increase lift or drag.”

Flap isa: Airfoil attribute: Movable attached-to: Wing part-of: Airplane purpose: Increase

object: Lift, Drag

NLP

Flap

Airfoil

Airplanert

bt

Wing

Lift

DragIncrease

Increase

Movable isaattribute

purpose

purposeobject

object

attached-topart-of

Enhancing the Thesaurus:3. Knowledge from Text

Status and Evaluation• The Applications

– Two “Expert Locators” deployed and in use

– Sustained usage (~20 searches / day)

– Plans to quickly expand them further• more experts

• also cover projects and work groups

• add in attribute filters (years at Boeing, location, …)

• How do the Thesaurus Enhancements Affect Search?

– Study: Expert assessed relevance of “hit” concepts

– Recall increased (44% 75%) with only minimal effect on precision (58% 57%)

Discussion• “Number N of links” “relevance”?

– only for very small N!• The useful bias of a domain-specific Thesaurus:

– only contains relevant concepts• massively reduces errors in Thesaurus enhancement

– only contains relevant links• provides very domain-specific search

• Limitations:– ignored “quality” of expert, social issues, etc.– what if the concept you want isn’t there?

• Generality: Applies to any resource, not just experts

Summary• Search using concepts, not words

• Use of a thesaurus as an initial ontology:

– Can leverage many years of work by librarians

– Made viable using simple AI techniques of• search

• subsumption computation

• language processing

• Domain-specific thesauri provide valuable bias

End - demo in PPT follows