elasticsearch in production

64
Elasticsearch in production Alex Brasetvik [email protected] @alexbrasetvik

Upload: foundsearch

Post on 10-May-2015

571 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Elasticsearch in Production

Elasticsearch in production !

Alex Brasetvik [email protected] @alexbrasetvik

Page 2: Elasticsearch in Production

Elasticsearch in production !

Alex Brasetvik [email protected] @alexbrasetvik

Page 3: Elasticsearch in Production

Who?

Co-founder of Found AS 8+ years search, 3+ Elasticsearch

Herding hundreds of Elasticsearch clusters

Page 4: Elasticsearch in Production

Agenda

Page 5: Elasticsearch in Production

Agenda• Anti-patterns

• Memory / Resource Usage

• Distributed problems

• Security

• Client concerns

• Changing a cluster

Page 6: Elasticsearch in Production

found.no/foundation

Elasticsearch in Production Elasticsearch as a NoSQL Database

Intro to Function Scoring All About Analyzers

Securing your Elasticsearch Cluster

Page 7: Elasticsearch in Production
Page 8: Elasticsearch in Production
Page 9: Elasticsearch in Production
Page 10: Elasticsearch in Production
Page 11: Elasticsearch in Production

Snapshot / Restore

Circuit breakersDocument values

Aggregations

Distributed percolation

Suggesters

Page 12: Elasticsearch in Production

Anti-Patterns

Page 13: Elasticsearch in Production

Arbitrary Keys

• “Schema Free”

• One field per value

• Ever-growing cluster state

acls: 1234: READ 42: WRITE

Page 14: Elasticsearch in Production

Heavy Updating

• Update = Delete + Reindex

• Be careful with counters

Page 15: Elasticsearch in Production

Slow queries

• WHERE foo ILIKE ‘%bar%’

• {“query_string”: {“query”: “foo:*bar*”}}

Page 16: Elasticsearch in Production

Arbitrary searchesquery: filtered: filter: term: user_id: 42 query: [user’s query here]

Page 17: Elasticsearch in Production
Page 18: Elasticsearch in Production

Time Bomb

Page 19: Elasticsearch in Production

Memory

Page 20: Elasticsearch in Production

Memory• Field caches

• Filter caches

• Page caches

• Aggregations

• Index building

Page 21: Elasticsearch in Production

Page Cache

• Keeping index pages in memory

• Can’t have too much

• Outgrow: Gradual slowdown

Page 22: Elasticsearch in Production

Heap Space

• Memory used by Elasticsearch process

• Field / Filter caches

• Aggregations

Page 23: Elasticsearch in Production

Time Bomb

Page 24: Elasticsearch in Production

Time Bomb

Page 25: Elasticsearch in Production

OutOfMemoryError

Woah there I ate all the memories

Your cluster may or may not work any more

Page 26: Elasticsearch in Production

OutOfMemory

• Growing too big

• Selecting too big timespan in Kibana

• Document ingestion peak

Page 27: Elasticsearch in Production

Preventing OOMs• Have enough memory :-)

• Understand your search’s memory profile

• Bulk / Circuit breaker settings

• Monitoring

• Document values

Page 28: Elasticsearch in Production

Marvel( /_stats )

Page 29: Elasticsearch in Production
Page 30: Elasticsearch in Production
Page 31: Elasticsearch in Production

Document Values

Page 32: Elasticsearch in Production

"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }

Page 33: Elasticsearch in Production

Sizing

Page 34: Elasticsearch in Production

Sizing

• Test, don’t guess

• Start big, scale down

• Index, search, monitor

Page 35: Elasticsearch in Production
Page 36: Elasticsearch in Production
Page 37: Elasticsearch in Production
Page 38: Elasticsearch in Production

Glitch Meltdown

Page 39: Elasticsearch in Production
Page 40: Elasticsearch in Production

Glitch Meltdown

Page 41: Elasticsearch in Production
Page 42: Elasticsearch in Production
Page 43: Elasticsearch in Production

• Tie-breaker can be a cheap master-node

• Applies to data centers / availability zones too

Page 44: Elasticsearch in Production

Data-only nodes

Master-only nodes

Page 45: Elasticsearch in Production
Page 46: Elasticsearch in Production

Jepsen

Page 47: Elasticsearch in Production

Jepsen

• Kyle Kingsbury’s series on distributed systems

• Distributed systems are hard

• aphyr.com

Page 48: Elasticsearch in Production

Security

Page 49: Elasticsearch in Production

Security

• “Not my job!” – Elasticsearch

• That’s fine!

Page 50: Elasticsearch in Production

Dynamic Scripts

!

• Scoring

• Aggregations

• Updating

Page 51: Elasticsearch in Production

Dynamic Scripts

Runtime.getRuntime().exec(…)

Page 52: Elasticsearch in Production

Dynamic Scripts

Runtime.getRuntime().exec(…)

<script src=“http://127.0.0.1:9200/_search?callback=capture&…

Page 53: Elasticsearch in Production

Security

!

• Disable dynamic scripts

• Mind index patterns

• Even then, don’t accept arbitrary requests

Page 54: Elasticsearch in Production

Client Concerns

Page 55: Elasticsearch in Production

Client Concerns

• Connection pools

• Idempotent requests

• Have sane syncing/indexing strategies

Page 56: Elasticsearch in Production
Page 57: Elasticsearch in Production

# BOOM !

Page 58: Elasticsearch in Production

Cluster changes

Page 59: Elasticsearch in Production

Cluster changes

• Make new nodes join existing cluster

• No rolling restarts

• Easy rollback if things go bad

Page 60: Elasticsearch in Production

v1.0.0 v1.0.1

Page 61: Elasticsearch in Production

Cluster changes

• Test first

• Mind recover_*-settings

Page 62: Elasticsearch in Production

Multi-Cluster Workflows

• Snapshot/Restore

• Operations across clusters

• Swap clusters!

• Works well with good syncing strategy

Page 63: Elasticsearch in Production

Misc

• Same JVM

• ulimits

• Unicast

• SSD? noop-scheduler

Page 64: Elasticsearch in Production

?

@foundsays