elasticsearch in production (london version)
Post on 27-Aug-2014
2.625 Views
Preview:
DESCRIPTION
TRANSCRIPT
Elasticsearch in Production !
Alex Brasetvik alex@found.no @alexbrasetvik
Elasticsearch in Production !
Alex Brasetvik alex@found.no @alexbrasetvik
Who?
Co-founder of Found AS 8+ years search, 3+ Elasticsearch
Herding hundreds of Elasticsearch clusters
Agenda
Agenda• Anti-patterns
• Memory / Resource Usage
• Distributed problems
• Security
• Client concerns
• Changing a cluster
found.no/foundation
Elasticsearch in Production Elasticsearch as a NoSQL Database
Intro to Function Scoring All About Analyzers
Securing your Elasticsearch Cluster
Snapshot / Restore
Circuit breakersDocument values
Aggregations
Distributed percolation
Suggesters
…
Anti-Patterns
Arbitrary Keys
• “Schema Free”
• One field per value
• Ever-growing cluster state
acls: 1234: READ 42: WRITE
Heavy Updating
• Update = Delete + Reindex
• Be careful with counters
Slow queries
• WHERE foo ILIKE ‘%bar%’
• {“query_string”: {“query”: “foo:*bar*”}}
• Don’t ask for 3300 results :)
Arbitrary searchesquery: filtered: filter: term: user_id: 42 query: [user’s query here]
Memory
Memory• Field caches
• Filter caches
• Page caches
• Aggregations
• Index building
Page Cache
• Keeping index pages in memory
• Can’t have too much
• Outgrow: Gradual slowdown
Heap Space
• Memory used by Elasticsearch process
• Field / Filter caches
• Aggregations
Time Bomb
Time Bomb
OutOfMemoryError
Woah there I ate all the memories
Your cluster may or may not work any more
OutOfMemory
• Growing too big
• Selecting too big timespan in Kibana
• Document ingestion peak
Preventing OOMs• Have enough memory :-)
• Understand your search’s memory profile
• Bulk / Circuit breaker settings
• Monitoring
• Document values
Marvel( /_stats )
"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }
Document Values
• Rely on page cache
• Only caches doc values actually used
Sizing
Sizing
• Test, don’t guess
• Start big, scale down
• Index, search, monitor
Glitch Meltdown
Glitch Meltdown
• Tie-breaker can be a cheap master-node
• Applies to data centers / availability zones too
Data-only nodes
Master-only nodes
Jepsen
Jepsen
• Kyle Kingsbury’s series on distributed systems
• Distributed systems are hard
• aphyr.com
Security
Security
• “Not my job!” – Elasticsearch
• That’s fine!
Dynamic Scripts
!
• Scoring
• Aggregations
• Updating
Dynamic Scripts
Runtime.getRuntime().exec(…)
Dynamic Scripts
Runtime.getRuntime().exec(…)
<script src=“http://127.0.0.1:9200/_search?callback=capture&…
Security
!
• Disable dynamic scripts (On by default in ≤1.1)
• Mind index patterns
• Even then, don’t accept arbitrary requests
Client Concerns
Client Concerns
• Connection pools
• Idempotent requests
• Have sane syncing/indexing strategies
# BOOM !
Cluster changes
Cluster changes
• Make new nodes join existing cluster
• No rolling restarts
• Easy rollback if things go bad
v1.0.0 v1.0.1
Cluster changes
• Test first
• Mind recover_*-settings
Multi-Cluster Workflows
• Snapshot/Restore
• Operations across clusters
• Swap clusters!
• Works well with good syncing strategy
• Rolling restarts: Risky, fast
• Grow and shrink: Less risky, copies lots of data
• Multiple clusters: Least risky, copies lots of data
Misc
• Same JVM
• ulimits
• Unicast
• Kernel-settings like IO-scheduler
?
@foundsays
top related