semantics and search

23
Jon Atle Gulla Semantic Days 2007 From Google Search to Semantic Exploration Jon Atle Gulla Professor Norwegian University of Science and Technology [email protected]

Upload: vestforskno

Post on 22-Jan-2015

12.643 views

Category:

Technology


3 download

DESCRIPTION

How use of semantics can improve search

TRANSCRIPT

Page 1: Semantics And Search

Jon Atle Gulla Semantic Days 2007

From Google Search to Semantic Exploration

Jon Atle GullaProfessor Norwegian University of Science and [email protected]

Page 2: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Agenda

Traditional search applicationsAdding shallow linguistics to traditional searchThe concept of semantic searchOntologies in search applicationsOntologies for semantic annotation & explorationOntology-driven query interpretation

"Hakia thinks that indexing has plateaued and that semantic technologies will take overfor the next generation of search".

MacManus, R. “Hakia Takes On Google With Semantic Technologies”. http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php

"Hakia thinks that indexing has plateaued and that semantic technologies will take overfor the next generation of search".

MacManus, R. “Hakia Takes On Google With Semantic Technologies”. http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php

Page 3: Semantics And Search

Jon Atle Gulla Semantic Days 2007

The Language Problem in Search

People use the language differently

Authors

Informationuser

What is the content of the document?

How to know if this document answers the query?

Page 4: Semantics And Search

Jon Atle Gulla Semantic Days 2007

The Google Search Experience

IndexIndexSimilarityPage rankLinguistics

SimilarityPage rankLinguistics

Query

Results

Page 5: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Traditional Search Principles

Bag-of-words principleMachine understands document as a set of word frequencies

Word matching principleSyntactic search:Relevant documents are documents that contain exactly those words that appear in the queryMorpho-syntactic search:Relevant documents are documents that contain inflectional variants of exactly those words that appear in the query

One shot principleQuery and result set ignored when new query is posted

You see:

Machine sees:

drill(4) purpos(1)equip(1) reservoir(1)explor(1) rig(3)extract(1) structur(1)hous(1) underground(1)natur(1) water(1)oil(2)

A drilling rig or oil rig is a structure housiequipment used to drill for and extract oil or natural gas from underground reservoirs. Drilling rigs can also be used to drill for

water or for exploration purposes.

Page 6: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Christmas tree

A Christmas tree is one of the most popular traditionsassociated with the celebration of Christmas. It is normally an evergreen coniferous tree that is brought into a home or used in the open, and is decorated with Christmas lights and colourful ornaments during the days around Christmas.

A Christmas tree is one of the most popular traditionsassociated with the celebration of Christmas. It is normally an evergreen coniferous tree that is brought into a home or used in the open, and is decorated with Christmas lights and colourful ornaments during the days around Christmas.

A Christmas tree is a set of valves, pipes, and fittings used to control the flow of oil and gas as it leaves a well and enters a pipeline.

A Christmas tree is a set of valves, pipes, and fittings used to control the flow of oil and gas as it leaves a well and enters a pipeline.

User need:

Index

Traditional Search Principles

Bag-of-words principleMachine understands document as a set of word frequencies

Word matching principleSyntactic search:Relevant documents are documents that contain exactly those words that appear in the queryMorpho-syntactic search:Relevant documents are documents that contain inflectional variants of exactly those words that appear in the query

One shot principleQuery and result set ignored when new query is posted Relevance given by document similarity

Page 7: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Traditional Search Principles

Bag-of-words principleMachine understands document as a set of word frequencies

Word matching principleSyntactic search:Relevant documents are documents that contain exactly those words that appear in the queryMorpho-syntactic search:Relevant documents are documents that contain inflectional variants of exactly those words that appear in the query

One shot principleQuery and result set ignored when new query is posted

Christmas trees

Search query

Result set

sim(q, d)=(qi *di)

i=1

n

qi2

i=1

n

∑ * di2

i=1

n

∑=( q

q)•( d

d)

d: vector representation of documentq: vector representation of vector

Document relevant to query if cosine similarityabove a certain threshold:

Implementation:

Page 8: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Adding Shallow Linguistics to Search

Clustering or log analysis for groupingsearch results for ‘oil’

Text categorizsationEntity search

Teaser generationSpell checking

Collocations

Page 9: Semantics And Search

Jon Atle Gulla Semantic Days 2007

But“A drilling rig or oil rig is a structure housing equipment used to drill for and extract oil or natural gas from underground reservoirs. Drilling rigs can also be used to drill for water or for exploration purposes.” (Ref: Wikipedia)

drill(4) purpos(1)equip(1) reservoir(1)explor(1) rig(3)extract(1) structur(1)hous(1) underground(1)natur(1) water(1)oil(2)

Text is still just a set of stringsSemantic Search Principle:

Use ontologies to represent domain vocabulary, documents’ content and/or user’s information needs

Semantic Search Principle:

Use ontologies to represent domain vocabulary, documents’ content and/or user’s information needs

rig

drilling rig oil rig

drill

oilwater natural gas

subclassOf

sameAs

partOfusedFor

Page 10: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Semantic Approaches to SearchSearch principles

Applications of ontologies in semantic search:Help user formulate semantic queriesReformulate/reinterpret queriesBrowse domainFormulate related queriesInteroperability between search applicationsSemantic indexing of documents

Syntactic search Semantic search Document view Bag-of-words Terms and concepts Search approach Word matching Concept matching Search process One shot Exploratory session

IIP projectScientific reports

Page 11: Semantics And Search

Jon Atle Gulla Semantic Days 2007

1. Ontologies in Semantic ExplorationUse graphical ontologies for query formulation

Semantic annotations of documentsConstruct queries graphicallyUse ontological structures to expand queryUse ontology to visualize search results

Page 12: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Query Formulation

Queries expanded from ontological structures

Page 13: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Query RefinementUse ontological structures to explore the domain

Page 14: Semantics And Search

Jon Atle Gulla Semantic Days 2007

2. Ontology-Driven Query Interpretation

Semantic Query interpretationSemantic Query interpretation

Query mapping

Query mapping

User interpretation

User interpretation

User terminology

Domain collection

Semantic layer

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -------- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

User queryUser query

Standardsearch engine

Standardsearch engine

Ontology trainedon person anddomain collection

Domain documentcollection

Page 15: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Training Ontology for Search

christmas tree

CHRISTMAS TREEchristmas tree

christmas trees

x-tree

valves

wellhead

0.95

0.80

0.35

0.05

0.02

Concept Prominent document terms

Documentsviewed byuser (andconsidered relevant)

Characteristic terms in these documents express

user’s interpretation ofCHRISTMAS TREE for this

document collection

Page 16: Semantics And Search

Jon Atle Gulla Semantic Days 2007

The Personalized Ontology

Each concept described in terms of weighted wordsWords correspond to user’s assessment of which information is relevant to a concept for this document baseConcept – term associations created automatically based on user’s behavior

CHRISTMAS TREEchristmas tree

christmas trees

x-tree

valves

wellhead

0.95

0.80

0.350.050.02

ConceptIndex terms

WELL

well

wells

...

pipe

pipes

0.98

0.95

0.35

0.95

0.50

0.10

PIPE

Football ontology

Index termsOntology

Concept-term matrix a dynamic structurethat reflects user’s preferences and behavior

Page 17: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Semantic Search Query

CHRISTMAS TREE

User query

CHRISTMAS TREEchristmas tree

christmas trees

x-tree

valves

wellhead

0.95

0.80

0.35

0.05

0.04

Concept Prominent document terms

christmas tree:0.95, christmas trees:0.8, x-tree:0.35, valves:0.05, wellhead:0.04

Christmas trees are used on both subsea and surface wellheadsand both are available in a wide range of sizes and configurations, ...

The function of a christmas tree is to both prevent the release of oil or gas from an oil well into the environment and also to direct and control the flow of formation fluids from the well. ...

A wellhead consists of the spools, valves, and other components which contain the pressure within the well.

A Christmas tree is one of the most popular traditions associated with the celebration of Christmas. ...

Private Christmas trees are not usually put up until at least the middle of December and are usually taken down by the 6th of January , ...

It is normally an evergreen tree that is brought into a home or used in the open, and is decorated with Christmas lights and colourful ornaments during the days around Christmas.

Wellhead valves are used to isolate the flow of oil or gas at the takeoff from an oil or gas well. .

Good understanding of topside equipment used, including x-treesand wellhead systems

VENTILTRE er en ventilenhet montert på toppen av stigerør eller brønnhode, ofte kalt juletre

Matches in document base

Retrieved from ontologyAn artefact that is an assembly of pipes and piping parts, with

valves and associated control equipment that is connected to the top of a wellhead and is intended for control of fluid from

a well

Query mapping

Page 18: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Semantic Search Results

Christmas trees are used on both subsea and surface wellheadsand both are available in a wide range of sizes and configurations, ...

The function of a christmas tree is to both prevent the release of oil or gas from an oil well into the environment and also to direct and control the flow of formation fluids from the well. ...

A wellhead consists of the spools, valves, and other components which contain the pressure within the well.

A Christmas tree is one of the most popular traditions associated with the celebration of Christmas. ...

Private Christmas trees are not usually put up until at least the middle of December and are usually taken down by the 6th of January , ...

It is normally an evergreen tree that is brought into a home or used in the open, and is decorated with Christmas lights and colourful ornaments during the days around Christmas.

Wellhead valves are used to isolate the flow of oil or gas at the takeoff from an oil or gas well. .

Good understanding of topside equipment used, including x-treesand wellhead systems

VENTILTRE er en ventilenhet montert på toppen av stigerør eller brønnhode, ofte kalt juletre

Matches in document base

Retrieved from ontologyAn artefact that is an assembly of pipes and piping parts, with

valves and associated control equipment that is connected to the top of a wellhead and is intended for control of fluid from

a well

plural form

singular form

plural form, but other words different

singular form, but other words different

different words, christmas related

synonyms

related words

related words, ontology not trained inthis language

related words

strong

weak

strong

weak

no

strong

acceptable

no

acceptable

Query/documentsimilarity

Precision: 4/4Recall: 5/6

Precision: 4/4Recall: 5/6

Page 19: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Keyword Search Queryx-tree

User query

CHRISTMAS TREEchristmas tree

christmas trees

x-tree

valves

wellhead

0.95

0.80

0.35

0.05

0.04

Concept Prominent document terms

christmas tree:0.95, christmas trees:0.8, x-tree:0.35, valves:0.05, wellhead:0.04

CHRISTMAS TREE:0.35 Christmas trees are used on both subsea and surface wellheadsand both are available in a wide range of sizes and configurations, ...

The function of a christmas tree is to both prevent the release of oil or gas from an oil well into the environment and also to direct and control the flow of formation fluids from the well. ...

A wellhead consists of the spools, valves, and other components which contain the pressure within the well.

Wellhead valves are used to isolate the flow of oil or gas at the takeoff from an oil or gas well. .

Good understanding of topside equipment used, including x-treesand wellhead systems

Matches in document base

Retrieved from ontologyAn artefact that is an assembly of pipes and piping parts, with

valves and associated control equipment that is connected to the top of a wellhead and is intended for control of fluid from

a well

Query mapping

User interpretation

Page 20: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Semantic Search - Learning

User query

Documents viewedby user (andconsidered relevant)

Personalizedconcept-termmatrix

Result page

CHRISTMAS TREE

User query

No fixed set of relevant documents – depends on user preferences

Page 21: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Ontology adapted using web documentsfrom the oil business

2. IIP Ontology on Web Documents

Semantic Query interpretationSemantic Query interpretation

Query mapping

Query mapping

User interpretation

User interpretation

User terminology

Domain collection

Semantic layer

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -------- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

--- -----

Horizontaltree

Horizontaltree

Reformulatedquery

Reformulatedquery

Web collection from differentdomains

IIP ontology trainedon web oil documents

WELLHEAD HOUSING 0.109CONDUCTOR HOUSING 0.109WEAR BUSHING 0.101RING JOINT GASKET 0.096SUBSEA PRODUCTION MANIFOLD 0.096TESTING TOOL 0.088BORE PROTECTOR 0.088HORIZONTAL CHRISTMAS TREE 0.085RUNNING TOOL 0.076TREE 0.068TUBING SPOOL 0.060SURFACE CHRISTMAS TREE 0.048CONTROL MODULE 0.046DELIVERY PRICE 0.045VALVE NORMALLY OPEN 0.045SUBSEA CHRISTMAS TREE 0.043CHRISTMAS TREE 0.042...

HORIZONTAL VESSEL 0.162HORIZONTAL BOREHOLE 0.138HORIZONTAL CHRISTMAS TREE 0.088HORIZONTAL TUBING HANGER 0.072PLANE 0.057INTERSECTION 0.055PIPING END 0.051BENDING STRESS 0.043SHIFTING TOOL 0.040AXIS 0.037FIXED STRUCTURE 0.037FLUID SEPARATOR 0.036ELECTRICAL PENETRATOR 0.034TEST SEPARATOR 0.034VOLUME FLOW RATE 0.033HYDROGEN FLUORIDE 0.029BASE STEEL 0.028...

horizontal tree

HORIZONTAL CHRISTMAS TREE Score: 0.01488

CONDUCTOR HOUSING, HORIZONTAL VESSEL Score: 0.00586WELLHEAD HOUSING, HORIZONTAL VESSEL Score: 0.00586WEAR BUSHING, HORIZONTAL VESSEL Score: 0.00411CHRISTMAS TREE, HORIZONTAL CHRISTMAS TREE Score: 0.00369HORIZONTAL CHRISTMAS TREE, HORIZONTAL VESSEL Score: 0.00344TUBING SPOOL, HORIZONTAL VESSEL Score: 0.00323CONDUCTOR HOUSING, HORIZONTAL CHRISTMAS TREE Score: 0.00317WELLHEAD HOUSING, HORIZONTAL CHRISTMAS TREE Score: 0.00317TUBING HANGER, HORIZONTAL TUBING HANGER Score: 0.00295BORE PROTECTOR, HORIZONTAL VESSEL Score: 0.00284TESTING TOOL, HORIZONTAL BOREHOLE Score: 0.00243WELLHEAD HOUSING, HORIZONTAL CHRISTMAS TREE Score: 0.00238TREE, HORIZONTAL BOREHOLE Score: 0.00235CHRISTMAS TREE, HORIZONTAL VESSEL Score: 0.00227WEAR BUSHING, HORIZONTAL CHRISTMAS TREE Score: 0.00222TREE, HORIZONTAL VESSEL Score: 0.00220

Interpretation of ‘horizontal tree’horizontal christmas tree 1.0horizontal christmas trees 1.0horizontal x-tree 1.0horixontal x-trees 1.0

sentre 0.465deepwater 0.216 atlantic 0.092 horizontal 0.088develop 0.085investor 0.085water 0.085 gulf 0.083 transocean 0.078 field 0.072bluewater 0.070 deep 0.066

Mapping to query based on document content

Experiment with real document collection

Page 22: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Conclusions

Traditional search based on keyword matching and shallow linguisticsOntologies provide vocabulary for semantic searchGraphical ontology for query formulation

Semantic exploration of domainVisual queries

Trained ontology for query interpretationOntology maps between concepts and domain termsSemantic interpretation hidden to users

ChallengesLinking concepts to termsScalability

Page 23: Semantics And Search

Jon Atle Gulla Semantic Days 2007

Thank you!