elasticsearch @ shopwiki 2014-03-20

18
Elasticsearch @ ShopWiki

Upload: rob-stewart

Post on 09-May-2015

2.789 views

Category:

Technology


1 download

DESCRIPTION

Slides from the NY Elasticsearch Meetup on May 20, 2014. http://www.meetup.com/Elasticsearch-NY/events/170714812/ http://vimeo.com/90124531

TRANSCRIPT

Page 1: Elasticsearch @ ShopWiki 2014-03-20

Elasticsearch@ ShopWiki

Page 2: Elasticsearch @ ShopWiki 2014-03-20

What is ShopWiki?

• ShopWiki is the retail division of Oversee.net.

• We run a collection of retail websites,• Including the Comparison Shopping Engines (CSE)– ShopWiki.com– Compare.com

Page 3: Elasticsearch @ ShopWiki 2014-03-20
Page 4: Elasticsearch @ ShopWiki 2014-03-20
Page 5: Elasticsearch @ ShopWiki 2014-03-20

How do we use Elasticsearch?

• You know, for search (not logging).• We index millions of products, offered from

hundreds of thousands of stores, and allow users to search them.

Page 6: Elasticsearch @ ShopWiki 2014-03-20

Why Elasticsearch?

• ShopWiki was built using a proprietary search server written in C++.

• Served us well for many years, but it needed improvements, especially for non-English language search.

• What about Lucene-based solutions?

Page 7: Elasticsearch @ ShopWiki 2014-03-20

Solr3

• We tried out Solr3 when building CouponFinder.com.

• Solr worked well (for English & French), but the coupon dataset is small in comparison to our product dataset.

• The setup was simple master-slave replication.

Page 8: Elasticsearch @ ShopWiki 2014-03-20

How do we scale?

• To use Solr for our product data we needed to shard the data across multiple machines.

• But, Solr3’s sharding capabilities were clunky and difficult to use.

• Enter Elasticsearch!• Designed to scale out-of-the-box.

Page 9: Elasticsearch @ ShopWiki 2014-03-20

Compare.com

• Compare.com was built using Elasticsearch from the start.

• Allowed us to get up & running very quickly.• Allowed us to scale up very quickly.– 60 million products and growing.

• Allows us iterate on new features quickly.

Page 10: Elasticsearch @ ShopWiki 2014-03-20

Other Languages

• ShopWiki search is being gradually ported to Elasticsearch.

• Allows us to have better non-English search right out-of-the-box.– French– German– Dutch– Spanish

Page 11: Elasticsearch @ ShopWiki 2014-03-20

Our Elasticsearch Cluster

• 12 indices, one for each website.• 3 replicas per shard.• 3 master nodes (quorum of 2).• 6 data nodes.• Plan to add more data nodes as we proceed

with our migration of ShopWiki (500m products).

• Expect to need less hardware than the C++. cluster (uses 50+ machines).

Page 12: Elasticsearch @ ShopWiki 2014-03-20

Elasticsearch Head

Page 13: Elasticsearch @ ShopWiki 2014-03-20

Realtime Updates

• C++ search servers need to have the entire dataset re-indexed and swapped out all at once.

• Could only do this once a day, at night (affects performance).

• With Elasticsearch, we can update our data all the time (it’s not even a limiting factor).

Page 14: Elasticsearch @ ShopWiki 2014-03-20

Challenges

• Use TermsFacet to suggest filters to the user.• E.g. filter by stores or brands.• Using the 10 most frequent brands from a

search can produce bad results.– A single brand may have lots of products that are

all weakly relevant.

Page 15: Elasticsearch @ ShopWiki 2014-03-20

Top-N Faceting

• The solution in Solr is to limit facets to the top-N results.

• Elasticsearch doesn’t have this feature (as mentioned at last Meetup).

• Solution: TermsStatsFacet (AKA aggregations in 1.0)

• Allows us to get the brands/stores with the most relevant results.

• E.g. Σ(scoren) n allows us to tune facet results to our liking

Page 16: Elasticsearch @ ShopWiki 2014-03-20

N = 0 (same as count)

TermsStatsFacet for BrandsQuery: “mixing bowl”

Σ(scoren) N = 4

Page 17: Elasticsearch @ ShopWiki 2014-03-20

De-duping Products

• Use “more_like_this” query to find similar products.

• If result’s score is “high enough”, it’s likely the same product from a different store.

• “High enough” is defined as a fraction of the identity match’s score.

Page 18: Elasticsearch @ ShopWiki 2014-03-20

• Questions?

• Rob Stewart• Lead Software Engineer• [email protected]