search by strategy - bcs irsg · 2011-10-31 · spinque don't build the ultimate search engine...

22
Search by Strategy Roberto Cornacchia - Spinque Search Solutions – London - October 21, 2010

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Search by StrategyRoberto Cornacchia - Spinque

Search Solutions – London - October 21, 2010

Page 2: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

A simple search task

A real-estate search scenario

● Location: Amsterdam● Price range: 220.000 - 250.000 €● Size: 75+m2

Many real-estate search engines can satisfy this easy query

So under-specified, that it is likely to be nearly useless(www.funda.nl gives back ~350 results)

Most available engines do allow some more complexityHow much more?

Page 3: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

A not-so-simple search task

A more desirable query

● Location: AmsterdamNear the historical canals, but far from high-crime districts

● Price range: 220.000 - 250.000 €Hey, 252.000 € is still fine, of course!

● Size: I'm not sure, let me choose later

● With backyardor with balcony that faces west

When is the benefit of building (and updating!)complex domain-specific search engines worth the effort?

Page 4: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Spinque● Don't build the ultimate search engine for all possible queries

● “Draw” a number of search strategies

● Click. Generate Web search engines on probabilistic DB

All housesAll houses

Rank full-textRank

full-text

Query termsQuery terms

Rankon location

Rankon location

DifferenceDifference

Selecton attribute

Selecton attribute

UnionUnion

Rankon location

Rankon location

Crime mapCrime map

Page 5: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Full-text search: a 3-block strategy

All housesAll houses

Rank full-textRank

full-text

Amsterdam, canalsAmsterdam, canals

Page 6: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

The simplest search engine

Spinque!

Page 7: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Blocks have parameters

All housesAll houses

Rank full-textRank

full-text

KeywordsKeywords

= has free parameters

Page 8: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Parameters become UI controls

Spinque!

Keywords

amsterdam canals balcony faces west|

Page 9: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Result streams can be combined...

All housesAll houses

Rank full-textRank

full-text

KeywordsKeywords

Rankon location

Rankon location

= has free parameters

UnionUnion

LocationLocation

Page 10: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

... and weighted

Spinque!

Keywords

Location

amsterdam canals balcony faces west|

Page 11: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Filtering result streams

All housesAll houses

Rank full-textRank

full-text

KeywordsKeywords

Rankon location

Rankon location

= has free parameters

Selecton attribute

Selecton attribute

UnionUnion

LocationLocation

Page 12: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

I'd love a backyard, but not a must

Spinque!

Keywords

Location

amsterdam canals balcony faces west|

Desired attributes

Page 13: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Excluding results

All housesAll houses

Rank full-textRank

full-text

KeywordsKeywords

Rankon location

Rankon location

DifferenceDifference

= has free parameters

Selecton attribute

Selecton attribute

UnionUnion

Rankon location

Rankon location

Crime mapCrime mapLocationLocation

Page 14: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Maybe not in the red light district..

Spinque!

Keywords

Location

amsterdam canals balcony faces west|

Desired attributes

Exclude high-crime locations

Page 15: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Not so sure about the rest

Spinque!

Keywords

Location

amsterdam canals balcony faces west|

Desired attributes

Exclude high-crime locations

□ 3 rooms (11351)□ 4 rooms (14502)

□ 200000-220000 € (16116)□ 220000-240000 € (14012)□ 240000-250000 € (9473)

Facets can be used to filter,but also to re-rank

Page 16: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Intellectual property search

Select reference patent

Same assignee

Same IPCR classifications

Mix andrank with keywords from reference patent

Patent corpus

Patents in the same family

● Large corpora, structured / unstructured data, multi-lingual

● IP search professionalsperform complex searches

● Flexibility in drawing and updating strategies

● Transparency of results

● Probabilistic reasoning avoids manual merging of several boolean searches

"Prior Art Candidates" Strategy

Page 17: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

How can strategies help?All housesAll houses

Rank full-textRank

full-text

Query termsQuery terms

Rankon location

Rankon location

DifferenceDifference

Selecton attribute

Selecton attribute

UnionUnion

Rankon location

Rankon location

Crime mapCrime map● Strategies encapsulatedomain expert knowledge(how to find)

● Strategies abstract awaysearch expert knowledge(how to search)

● Results streams can be analysed / debugged at any step

● Strategies can be stored / shared / improved / published

● Strategies can mix exact (DB) and ranked (IR) searches

● No “human (probabilistic) join” needed

Page 18: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

What's inside building blocks?

A = Select [attr='backyard'] (All_houses)B = Join INDEPENDENT [id=id] (Input_houses,A)RESULT = Project DISTINCT [id] (B)

A = Select [attr='backyard'] (All_houses)B = Join INDEPENDENT [id=id] (Input_houses,A)RESULT = Project DISTINCT [id] (B)

● A BB encapsulates the search expert knowledge● In principle, any language (scripting languages, SQL, XQuery, etc)● Spinque uses a subset of Probabilistic Relational Algebra

(PRA) [Fuhr, Rölleke 97]

Page 19: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

What happens to building blocks?

-- A = Select [attr='backyard'] (All_houses)Create view A asSelect *From All_housesWhere attr='backyard';-- B = Join INDEPENDENT [id=id] (Input_houses,A)Create view B asSelect I.id, A.id, I.prob * A.prob as probFrom Input_houses as I, AWhere I.id = A.id;-- RESULT = Project DISTINCT [id] (B)Create view RESULT asSelect id, 1-prod(1-prob) as probFrom BGroup by id;

-- A = Select [attr='backyard'] (All_houses)Create view A asSelect *From All_housesWhere attr='backyard';-- B = Join INDEPENDENT [id=id] (Input_houses,A)Create view B asSelect I.id, A.id, I.prob * A.prob as probFrom Input_houses as I, AWhere I.id = A.id;-- RESULT = Project DISTINCT [id] (B)Create view RESULT asSelect id, 1-prod(1-prob) as probFrom BGroup by id;

● This encapslates the DB expert knowledge● Spinque translates PRA to standard SQL and fires it to a DB engine

Page 20: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Isn't SQL too slow for IR?!● Traditional relational DB engines carry outdated design principles

● Storage model● Traditional: row-oriented, best for OLTP● Next-generation: column-oriented, best for OLAP

● CPU & RAM● Traditional: assume CPU & RAM are fast, no optimisation● Next-generation: cache-conscious, data-hungry algorithms

● MonetDB, Ingres/Vectorwise● Flexible and efficient IR using Array Databases, Cornacchia et al., VLDBJ 2008

Page 21: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Summary

● Spinque promotes a separation of concerns in the engineering of search engines

● Search strategy UI: automatised● Search strategy definition: the real focus, make it easy and flexible● Search strategy processing: automatised

● Spinque promotes IR + DB

● IR + DB computed on a relational database engine – no human joins● IR + DB optimised within the very same framework – no black boxes● Modern DB technology can support IR efficiently

Page 22: Search by Strategy - BCS IRSG · 2011-10-31 · Spinque Don't build the ultimate search engine for all possible queries “Draw” a number of search strategies Click.Generate Web

Thank you