stop stupid fuzzy searches
TRANSCRIPT
Stop stupid fuzzy searches !
STOP STUPID FUZZY SEARCHES !
/01
/02
/03
/04
Fuzzy search
Google Did you mean
Did you mean by Deezer
Conclusion
Table of contents
STOP STUPID FUZZY SEARCHES !
Fuzzy search
/01
STOP STUPID FUZZY SEARCHES !
Fuzzy searchHere’s how it works.
STOP STUPID FUZZY SEARCHES ! 4
Fuzzy searchHere’s how it works.
STOP STUPID FUZZY SEARCHES ! 5
Fuzzy searchHere’s how it works.
STOP STUPID FUZZY SEARCHES ! 6
Fuzzy searchHere’s how it works.
STOP STUPID FUZZY SEARCHES ! 7
Fuzzy searchHere’s how it going fast or not...
STOP STUPID FUZZY SEARCHES ! 8
fuzziness=2 is devastatingly slowhttps://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast
Pros
● Search for all possible mistakes according to the fuzziness parameters
● Natively implemented in Elasticsearch and Lucene
Cons
STOP STUPID FUZZY SEARCHES !
● Increase CPU usage and query response time
● Not always relevant results
● System does not try to guess the user need
9
Fuzzy searchObservations
Google Did You Mean
/02
STOP STUPID FUZZY SEARCHES !
Did You Mean ? By Google
STOP STUPID FUZZY SEARCHES ! 11
Did You Mean ? By Google
STOP STUPID FUZZY SEARCHES !
Based on the Query revision using known highly-ranked queries patent
http://www.google.com/patents/US20060224554
12
Did You Mean ? Here’s how it works.
STOP STUPID FUZZY SEARCHES !
1. Assign a rank to all search queries.2. Identify the highest ranked queries as
known highly-ranked queries (KHRQ).3. Identify queries with strong probability of
being revised to KHRQ as NQ. KHRQ and NQ are indexed.
4. Determine a revision probability for a given query with the respect to indexed query.
5. Calculate a revision score (RS) using revision probability and query rank for the indexed query
6. Retrieve indexed queries with the highest RS as alternative queries
7. Provide alternative queries that are KHRQs or corresponding KHRQ for alternative queries that are NQs.
13
Did You Mean ? Here’s how it works.
STOP STUPID FUZZY SEARCHES ! 14
Did You Mean by Deezer
/03
STOP STUPID FUZZY SEARCHES !
Did You Mean at Deezer Assign a rank to Deezer search queries
STOP STUPID FUZZY SEARCHES !
Exploit user search action :
Compute top ranked queries :
16
Did You Mean at Deezer Identify nearby queries
STOP STUPID FUZZY SEARCHES !
Use a behavioral similarity based on the works of Elisa Gilles which allow us to :
● group user search queries for same needs (temporal analysis and levenshtein distance between tokens of two queries)
● flag reformulated queries (operations like insertion on the middle, substitution or deletion)
● foreach NQ, keep the most frequent NQ-KHRQ pair
17
Did You Mean at Deezer Query revision system
STOP STUPID FUZZY SEARCHES !
● Use Elasticsearch to store pairs of NQS and KHRQs ○ id : md5 of the nq○ nq : the nearby query○ khrq : the known high ranked query corresponding to the nearby
query○ rank : rank of the KHRQ○ frequency : frequency of the NQ – KHRQ couple
● Near 1 millions of NQs coupled to 50 000 KHRQs are stored
● Learn Query revision model on the last month of user web search clicks and the last 3 months of user search queries
18
Did You Mean at Deezer Query revision system
STOP STUPID FUZZY SEARCHES ! 19
Did You Mean at Deezer Query revision system
STOP STUPID FUZZY SEARCHES !20
Did You Mean at Deezer First results
STOP STUPID FUZZY SEARCHES ! 21
Did You Mean at Deezer First Results
STOP STUPID FUZZY SEARCHES ! 22
Conclusion
/04
STOP STUPID FUZZY SEARCHES !
Pros
● For users :
○ get more relevant results
○ show what the system is really searching for
● For servers :
○ save CPU usage
○ improve query response time
○ save memcache space
Cons
STOP STUPID FUZZY SEARCHES !
● Learn from user search actions : potentially subject to “bombing”
Conclusion
24
● How many KHRQs does the system need ?
● When we automatically replace user query by a revisioned query or we leave the choice to the user ?
● How many mistakes could we allow to the user ?
● Could we combine other types of similarities to pair NQs with KHRQs :
○ semantic similarity ?
○ syntactic similarity ?
● Is Elasticsearch is the best tools to be a query revision server ?
Cons
STOP STUPID FUZZY SEARCHES !
Opened questions
25