elasticsearch at automattic
DESCRIPTION
Presentation from the Elasticsearch Denver Meetup. Discusses scaling of Elasticsearch for Related Posts across WordPress.com and some of the big changes that were needed in order to scale for 23 million queries a day across 800 million documents.TRANSCRIPT
![Page 1: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/1.jpg)
at
Tuesday, February 25, 14
![Page 2: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/2.jpg)
Greg Ichneumon
Brown
http://gibrown.wordpress.com@[email protected]
Data Wrangler at Automattic
Tuesday, February 25, 14
![Page 3: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/3.jpg)
Tuesday, February 25, 14
![Page 4: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/4.jpg)
1 Billion Monthly Uniques
Tuesday, February 25, 14
![Page 5: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/5.jpg)
Elasticsearch DeploymentsInternal Search - 216 Internal Blogs - 750k docs [3 GB]Support Documents - KNN Link Prediction - 1.7m docs [14 GB]Polldaddy - Word Clouds/Freq Response - 39m docs [9 GB]
WordPress.com VIP Search - KFF.org - 18m docs [99 MB] - NY Post - 600k docs [2.3 GB]
WordPress.com - ~800m docs [4 TB] - Related Posts - 48 mil reqs/day - search.wordpress.com - 3 mil reqs/day
Tuesday, February 25, 14
![Page 6: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/6.jpg)
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
![Page 7: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/7.jpg)
Related Posts
Search within just the one blog
Tuesday, February 25, 14
![Page 8: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/8.jpg)
WordPress.comTotal Elasticsearch Operations
Operation Ops/Day
Routed Queries 23 mil
Global Queries 2 mil
Docs Indexed 13 mil
Docs Updated 10 mil
Docs Deleted 2.5 mil
Delete By Query 250k
Tuesday, February 25, 14
![Page 9: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/9.jpg)
Global Cluster
DC2
14 Data
1 Master
DC1
14 Data
1 Master
DC3
14 Data
1 Master
Tuesday, February 25, 14
![Page 10: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/10.jpg)
Our Secret To Scaling
Routed Queries
All Posts for each Blog are on the same Shard
Tuesday, February 25, 14
![Page 11: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/11.jpg)
Global Index
7 Indices10 mil Blogs per Index25 Shards per Index
175 Shards Total
Tuesday, February 25, 14
![Page 12: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/12.jpg)
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
![Page 13: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/13.jpg)
20% Improvements Don’t solve scaling problems
Tuesday, February 25, 14
![Page 14: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/14.jpg)
Entangling Elasticsearch with Existing Systems
Indexing
Tuesday, February 25, 14
![Page 15: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/15.jpg)
Bulk Indexing 1.0
44 Days to Index all Posts(estimated)
Tuesday, February 25, 14
![Page 16: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/16.jpg)
Bulk Indexing Problems
- Overhead: Spent too much time starting indexing jobs
WordPress.com has 500 mil MySQL tables.
- High DB Load: Corner Cases. Blogs with 1+ mil followers.- High DB Load: Indexing sequentially doesn’t spread the load.- High DB Load: Heavy load on archive DBs.
Tuesday, February 25, 14
![Page 17: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/17.jpg)
Bulk Indexing Today
12.0?
4 Days to Index all Posts(running right now)
Tuesday, February 25, 14
![Page 18: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/18.jpg)
Real Time Indexing
The Hardest Part!
Tuesday, February 25, 14
![Page 19: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/19.jpg)
Real Time Goals
1) Eventually Consistent
2) Minimize Bulk Re-indexing
3) Normally updated < 1 minute
Tuesday, February 25, 14
![Page 20: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/20.jpg)
Real Time Goals
1) Eventually Consistent
2) Minimize Bulk Re-indexing
3) Normally updated < 1 minute
Bulk reindexed 3 times in 5 months.One intentional,
Two during system upgrades.Tuesday, February 25, 14
![Page 21: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/21.jpg)
Stuff Fails
1) Humans
2) Hardware
3) Elasticsearch (steady improvements)
Combinations of the above.
Tuesday, February 25, 14
![Page 22: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/22.jpg)
Hardware Problems
1) Detect and Track Down Servers
2) Prioritize Queries over Indexing
3) Throttle Indexing Jobs
- any issues: block bulk changes to blogs
- >10 min: block doc updates
- >20 min: block all indexing
Tuesday, February 25, 14
![Page 23: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/23.jpg)
Real Time Failures
1) Auto Retry Failed Indexing Jobs
2) Indexing Queue for Failures
3) Scrolling Queries to Find Bad Docs
Tuesday, February 25, 14
![Page 24: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/24.jpg)
Cluster Restarts
Indexing across replicas is non-deterministic
Segments diverge
Slows Restart TimeTuesday, February 25, 14
![Page 25: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/25.jpg)
Simplistic Example
Segments w/ identical checksums
Docs
Primary
Replica
Shard 1 merges
Only first segment is identical
Tuesday, February 25, 14
![Page 26: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/26.jpg)
After Bulk Index
Every segment is out of sync!
Tuesday, February 25, 14
![Page 27: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/27.jpg)
Our Bulk Indexing Procedure
1) Bulk Index All Docs
2) Optimize the index
3) Rolling Restart (sync segments)
4) Future restarts will be much faster.
- Play with recovery settings
- SSDs? => use noop Linux scheduling
Tuesday, February 25, 14
![Page 28: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/28.jpg)
Indexing
It’s all about handling Failures
Tuesday, February 25, 14
![Page 29: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/29.jpg)
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
![Page 30: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/30.jpg)
Querying
Test and Iterate
Tuesday, February 25, 14
![Page 31: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/31.jpg)
Related Posts Query
Started with MoreLikeThis API.
Did not scale well enough.
Tuesday, February 25, 14
![Page 32: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/32.jpg)
MLT API
1) Get Document
2) Analyze Document
3) Search for Similar Docs
Tuesday, February 25, 14
![Page 33: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/33.jpg)
MLT API vs MLT Query
MLT API MLT Query
147 req/sec 1062 req/sec
40% CPU 30% CPU
306 ms median latency 49.5 ms median latency
All processing by ES Build query in PHP
Tuesday, February 25, 14
![Page 34: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/34.jpg)
Related Posts RelevancyGreat With Long Content
{ "more_like_this":{ "fields":["mlt_content"], "like_text":"Scaling Elasticsearch Part 1: Overview ElasticSearch scaling Search We recently launched Related Posts across WordPress.com, so its time to pop the hood and take a look at what ended up in our engine... ", "percent_terms_to_match":0.08, "boost_terms":5, "analyzer": "en_analyzer"}}
Tuesday, February 25, 14
![Page 35: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/35.jpg)
MLT Query RelevancyUse match or multi_match for
short content.
Average Related Posts CTR
Tuesday, February 25, 14
![Page 36: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/36.jpg)
Language Analyzers
arabic, armenian, basque, brazilian, bulgarian, catalan, chinese, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, italian, japanese, korean, norwegian, persian, portuguese, romanian, russian, spanish, swedish, turkish, thai
Tuesday, February 25, 14
![Page 37: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/37.jpg)
Related Posts Relevancy
How Important is using the
correct Language Analyzer?
Tuesday, February 25, 14
![Page 38: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/38.jpg)
Related Posts Relevancy
How Important is using the
correct Language Analyzer?
Doubled Click Through Rate
Tuesday, February 25, 14
![Page 39: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/39.jpg)
Unfortunately
Increased Slow Queries
(>1 second)
by 10x
still worth it.Tuesday, February 25, 14
![Page 40: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/40.jpg)
Global Query Performancesearch.wordpress.com
Tuesday, February 25, 14
![Page 41: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/41.jpg)
Parent-Child FilteringBlog Doc
Post Doc
public: true|false
title: “...”
content: “...”
Tuesday, February 25, 14
![Page 42: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/42.jpg)
has_parent Filter
With has_parent Without has_parent
7.6 req/sec 17.5 req/sec
75% CPU 50% CPU
503 ms median latency 207 ms median latency
Requires more Indexing
Querying Across All Shards
Tuesday, February 25, 14
![Page 43: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/43.jpg)
Indexing:
Optimize to Handle Failures
Querying:
Test and Iterate
Tuesday, February 25, 14
![Page 44: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/44.jpg)
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
![Page 45: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/45.jpg)
Open Issues
Slow Queries (> 1 second)
Getting Better. Shards are too big.Tuesday, February 25, 14
![Page 46: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/46.jpg)
Open Issues
What does it take to scale?
3x Data
5x Queries
Tuesday, February 25, 14
![Page 47: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/47.jpg)
Open Issues
Elasticsearch for Natural
Language Processing?At Scale.
On Live Data.
Tuesday, February 25, 14
![Page 48: Elasticsearch at Automattic](https://reader034.vdocuments.us/reader034/viewer/2022051314/54c73c1c4a795927458b45db/html5/thumbnails/48.jpg)
http://gibrown.wordpress.com@gregibrown
Feeling Inspired?http://automattic.com/work-with-us/data-wrangler/
Tuesday, February 25, 14