elasticsearch in production
TRANSCRIPT
Elasticsearch in production !
Alex Brasetvik [email protected] @alexbrasetvik
Elasticsearch in production !
Alex Brasetvik [email protected] @alexbrasetvik
Who?
Co-founder of Found AS 8+ years search, 3+ Elasticsearch
Herding hundreds of Elasticsearch clusters
Agenda
Agenda• Anti-patterns
• Memory / Resource Usage
• Distributed problems
• Security
• Client concerns
• Changing a cluster
found.no/foundation
Elasticsearch in Production Elasticsearch as a NoSQL Database
Intro to Function Scoring All About Analyzers
Securing your Elasticsearch Cluster
Snapshot / Restore
Circuit breakersDocument values
Aggregations
Distributed percolation
Suggesters
…
Anti-Patterns
Arbitrary Keys
• “Schema Free”
• One field per value
• Ever-growing cluster state
acls: 1234: READ 42: WRITE
Heavy Updating
• Update = Delete + Reindex
• Be careful with counters
Slow queries
• WHERE foo ILIKE ‘%bar%’
• {“query_string”: {“query”: “foo:*bar*”}}
Arbitrary searchesquery: filtered: filter: term: user_id: 42 query: [user’s query here]
Time Bomb
Memory
Memory• Field caches
• Filter caches
• Page caches
• Aggregations
• Index building
Page Cache
• Keeping index pages in memory
• Can’t have too much
• Outgrow: Gradual slowdown
Heap Space
• Memory used by Elasticsearch process
• Field / Filter caches
• Aggregations
Time Bomb
Time Bomb
OutOfMemoryError
Woah there I ate all the memories
Your cluster may or may not work any more
OutOfMemory
• Growing too big
• Selecting too big timespan in Kibana
• Document ingestion peak
Preventing OOMs• Have enough memory :-)
• Understand your search’s memory profile
• Bulk / Circuit breaker settings
• Monitoring
• Document values
Marvel( /_stats )
Document Values
"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }
Sizing
Sizing
• Test, don’t guess
• Start big, scale down
• Index, search, monitor
Glitch Meltdown
Glitch Meltdown
• Tie-breaker can be a cheap master-node
• Applies to data centers / availability zones too
Data-only nodes
Master-only nodes
Jepsen
Jepsen
• Kyle Kingsbury’s series on distributed systems
• Distributed systems are hard
• aphyr.com
Security
Security
• “Not my job!” – Elasticsearch
• That’s fine!
Dynamic Scripts
!
• Scoring
• Aggregations
• Updating
Dynamic Scripts
Runtime.getRuntime().exec(…)
Dynamic Scripts
Runtime.getRuntime().exec(…)
<script src=“http://127.0.0.1:9200/_search?callback=capture&…
Security
!
• Disable dynamic scripts
• Mind index patterns
• Even then, don’t accept arbitrary requests
Client Concerns
Client Concerns
• Connection pools
• Idempotent requests
• Have sane syncing/indexing strategies
# BOOM !
Cluster changes
Cluster changes
• Make new nodes join existing cluster
• No rolling restarts
• Easy rollback if things go bad
v1.0.0 v1.0.1
Cluster changes
• Test first
• Mind recover_*-settings
Multi-Cluster Workflows
• Snapshot/Restore
• Operations across clusters
• Swap clusters!
• Works well with good syncing strategy
Misc
• Same JVM
• ulimits
• Unicast
• SSD? noop-scheduler
?
@foundsays