databases & information retrieval maya ramanath ( further reading: combining database and...

17
Databases & Information Retrieval Maya Ramanath (Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek, CACM, April 2009 DB & IR: Both Sides Now. G. Weikum, Keynote at SIGMOD 2007)

Upload: ashlynn-gray

Post on 25-Dec-2015

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

Databases & Information Retrieval

Maya Ramanath

(Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek, CACM, April 2009

DB & IR: Both Sides Now. G. Weikum, Keynote at SIGMOD 2007)

Page 2: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

DB and IR: Different Motivations

• Both deal with large amounts of information, but…

DB IR

Applications online reservation, banking

libraries

Emphasis data consistency, efficiency

result quality, user satisfaction

Data structured records

unstructured text

Queries precise interpretations vary

Results exact match/all results

ranked/top-k results

Page 3: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

Why Combine Now?

• The applications drive the need– The need to manage both structured

and unstructured data in an integrated manner

• Healthcare example– Find young patients in central Europe

who have been reported, in the last two weeks, to have symptoms of tropical virus diseases and an indication of anomalies.

• Newspaper archives, product catalogues, etc.

Page 4: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

Integrating DB & IR

top-k processing,

keyword search on graphs

IR Systems

extracting entities and

relationships, ranking for

entities

DB SystemsStructured queries / boolean match results(SQL)

Untructured queries / ranked results(keywords/top-k)

Structured data(relational)

Unstructured data(text)

query processing for text search,effective query interfaces,ranking for structured data

Page 5: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

Modules

1. Top-k processing2. Query Processing and Interfaces3. Keyword Search on Graphs4. Entity and Relationship Extraction5. Ranking and Structured Data

Page 6: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

1. Top-k Processing (1/2)

• Structured data, with scores in multiple dimensions

• Return the top-k “objects”

Car Color

BMW X1 0.9

Honda City

0.8

Maruti Swift

0.6

Tata Nano

0.1

Car Mileage

Honda City

0.8

Maruti Swift

0.6

Tata Nano

0.3

BMW X1 0.1

Car Service

Tata Nano

0.7

Maruti Swift

0.6

Honda City

0.3

BMW X1 0.1

Page 7: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

1. Top-k Processing (2/2)

• Top-k Joins– Example: Return the best house-school

pair

Houses

Rating

Location

H1 0.9 L1

H2 0.8 L2

H3 0.6 L3

H4 0.1 L3

Schools

Rating

Location

S1 0.4 L2

S2 0.2 L2

S3 0.8 L3

S4 0.1 L3

Page 8: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

2. Query Processing and Interfaces (1/3)

• Given: Database of text documents and a text-centric task.– Extract information about disease

outbreaks

• Strategies– Scan all documents – very expensive– Filter promising documents – affects

recall

• Develop cost models and execution strategies appropriate for this setting

Page 9: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

2. Query Processing and Interfaces (2/3)

Querying with “typed” keywords• Keyword querying: Easy to use• Structured queries: PreciseFind the middle ground…

Instead of“german has won nobel award”q(X) :- GERMAN(x), hasWonPrize(x,y), NOBEL_PRIZE(y)“german, has won (nobel award)”

Page 10: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

2. Query Processing and Interfaces (3/3)

• Does the output have to be a boring list of ranked results?

• Nope !

Page 11: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

3. Keyword Search on Graphs (1/3)

• Lots of graphs around– Relational DB (tuples+foreign keys)– XML data

(elements/sub-elements/id/idrefs)– RDF (graph-structured knowledge-

bases)

• Easy to query with keywords, instead of SQL/XQuery/SPARQL

• Results are the top-k interconnections between the keywords

Page 12: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

3. Keyword Search on Graphs (2/3)

Page 13: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

3. Keyword Search on Graphs (3/3)

Query: “Einstein”, “Bohr”

vegetarian

Tom Cruise

1962

isa isabornIn

diedIn

Einstein

BohrNobel Prizewon

won

Page 14: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

4. Entity and Relationship Extraction (1/2)

Information Extraction (or Knowledge Harvesting)

Bill Gates was the founder of Microsoft and later it’s CEO.

Apple was established on April 1, 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.

Infosys was founded on 2 July 1981 by seven entrepreneurs: N. R. Narayana Murthy, Nandan Nilekani, …

Company Founder

Microsoft Bill Gates

Apple Steve Jobs

Apple Steve Wozniak

Infosys N. R. Narayana Murthy

Page 15: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

4. Entity and Relationship Extraction (2/2)

• How to build a knowledge-base of facts?– Structurize Wikipedia– Construct rules for extraction

• How do I acquire all the facts in the world?– Extract “everything”– Don’t stop extracting

Page 16: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

5. Ranking and Structured Data

• Not the same as top-k processing• Given: Data with stucture in it– Relational tables (flat)– XML (trees/graphs)– Text documents consisting of entities

• Task: Rank the query results– SQL/Xquery/”typed” keywords

Page 17: Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G

QUESTIONS?