elasticsearch data analyses
TRANSCRIPT
![Page 1: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/1.jpg)
Elasticsearch
Elasticsearch Timed Data Analyses
By Alaa Elhadba@aelhadba
![Page 2: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/2.jpg)
Table of Contents
- Hot-Cold Architecture
- Data High Availability
- Data design at large scale
- Search Execution
- Time framed indices
- Aggregations
![Page 3: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/3.jpg)
Hot-Cold Architecture
![Page 4: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/4.jpg)
Hot-Cold Architecture
Hot Data Nodes
Perform indexingHold most recent dataUse SSD storage, Writing is an Intensive IO operation
Cold Data Nodes
Handle read only operationsCan use large spinning disks
![Page 5: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/5.jpg)
Hot-Cold Configuration
node.box_type: hot
elasticsearch.yaml
Shard 2
Node
Shard 1
Node
node.box_type: cold
elasticsearch.yaml
![Page 6: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/6.jpg)
Data Availability
Availability Zone 1
Availability Zone 2
![Page 7: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/7.jpg)
Data Availability
Availability Zone 1
Availability Zone 2
![Page 8: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/8.jpg)
Data Availability
Availability Zone 1
Availability Zone 2Availability Zone / Rack failure ? Shard Allocation Awareness
![Page 9: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/9.jpg)
Shard Allocation Awareness
Availability Zone 1
Availability Zone 2
![Page 10: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/10.jpg)
Shard Allocation Awareness
Availability Zone 1
Availability Zone 2
1
2
1
21
2
3
1
2
3
![Page 11: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/11.jpg)
Shard Allocation Awareness
cluster.routing.allocation.awareness.attributes: rack_1
● Data replication is spanned across AZs
● No two copies of same shard on the same rack
● Elasticsearch is fully aware of shard distribution
● Awareness can be set based cluster or index
● Elasticsearch will prefer using local shards
● Always balance your nodes across AZs
● Routing Allocation Awareness can be updated
on a live cluster
cluster.routing.allocation.awareness.attributes: rack_2
Availability Zone 1 Availability Zone 2
![Page 12: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/12.jpg)
Shard Allocation Awareness
cluster.routing.allocation.awareness.attributes: rack_1
● Data replication is spanned across AZs
● No two copies of same shard on the same rack
● Elasticsearch is fully aware of shard distribution
● Awareness can be set based cluster or index
● Elasticsearch will prefer using local shards
● Always balance your nodes across AZs
● Routing Allocation Awareness can be updated
on a live cluster
cluster.routing.allocation.awareness.attributes: rack_2
Availability Zone 1 Availability Zone 2
![Page 13: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/13.jpg)
Shard Allocation Awareness
cluster.routing.allocation.awareness.attributes: rack_1
● Data replication is spanned across AZs
● No two copies of same shard on the same rack
● Elasticsearch is fully aware of shard distribution
● Awareness can be set based cluster or index
● Elasticsearch will prefer using local shards
● Always balance your nodes across AZs
● Routing Allocation Awareness can be updated
on a live cluster
cluster.routing.allocation.awareness.attributes: rack_2
Availability Zone 1 Availability Zone 2
![Page 14: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/14.jpg)
Shard Allocation Awareness
cluster.routing.allocation.awareness.attributes: rack_1
● Data replication is spanned across AZs
● No two copies of same shard on the same rack
● Elasticsearch is fully aware of shard distribution
● Awareness can be set based cluster or index
● Elasticsearch will prefer using local shards
● Always balance your nodes across AZs
● Routing Allocation Awareness can be updated
on a live cluster
● Use Forced Awareness to avoid the extra load
of reallocation of missing shards
cluster.routing.allocation.awareness.attributes: rack_2
Availability Zone 1 Availability Zone 2
![Page 15: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/15.jpg)
Shard Allocation Awareness
cluster.routing.allocation.awareness.attributes: rack_1
● Data replication is spanned across AZs
● No two copies of same shard on the same rack
● Elasticsearch is fully aware of shard distribution
● Awareness can be set based cluster or index
● Elasticsearch will prefer using local shards
● Always balance your nodes across AZs
● Routing Allocation Awareness can be updated
on a live cluster
● Use Forced Awareness to avoid the extra load
of reallocation of missing shards
cluster.routing.allocation.awareness.attributes: rack_2
Availability Zone 1 Availability Zone 2
Make sure you can handle the load with less nodes!
![Page 16: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/16.jpg)
Forced Awareness
● Forced awareness solves this problem by NEVER allowing copies of the same shard to be allocated to the same zone.
● Avoid extra of reallocating unassigned shards after rack failure.
● Allow no single point of failure for your system.● Make sure you can handle the load with less nodes.
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
cluster.routing.allocation.awareness.attributes: rack1,zone1
![Page 17: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/17.jpg)
Data design at large scale
![Page 18: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/18.jpg)
Searching
Shard 4
Shard 2
Query
Result
Node
Node
Shard 3
Node
Shard 1
Node
![Page 19: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/19.jpg)
Searching
Shard 4
Shard 2
Query
Result
Node
Node
Shard 3
Node
Shard 1
Node
How to avoid asking all shards ?
![Page 20: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/20.jpg)
Searching
Shard 4
Shard 2
Query
Result
Node
Node
Shard 3
Node
Shard 1
Node
How to avoid asking all shards ? Routing
I know my shards!
![Page 21: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/21.jpg)
Routing
PUT my_index/my_type/my_id?routing=shard1
GET my_index/_search?routing=shard1,shard2
● Avoid calling all shards● Dedicated shards per purpose● Talk to one dedicated shard● Eliminate Network Traffic● Better Performance● Handle sharding on your own
![Page 22: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/22.jpg)
Routing
PUT my_index/my_type/my_id?routing=shard1
GET my_index/_search?routing=shard1,shard2
● Avoid calling all shards● Dedicated shards per purpose● Talk to one dedicated shard● Eliminate Network Traffic● Better Performance● Handle sharding on your own
But, Once in, Never out● Routing must be always specified
![Page 23: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/23.jpg)
Routing
1 2 3 1 2 3 1 2
21.06.2016 20.06.2016 19.06.2016
![Page 24: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/24.jpg)
Routing
1 2 3 1 2 3 1 2
21.06.2016 20.06.2016 19.06.2016
I MUST KNOW EVERYTHING!
![Page 25: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/25.jpg)
Talking to data
![Page 26: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/26.jpg)
Aliasing
1 2 3 1 2 3 1 2
21.06.2016 20.06.2016 19.06.2016
today yesterday 3_days_ago
![Page 27: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/27.jpg)
Aliasing
1 2 3 1 2 3 1 2
21.06.2016 20.06.2016 19.06.2016
today yesterday 3_days_ago
1 2 3
22.06.2016
![Page 28: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/28.jpg)
Aliasing
1 2 3 1 2 3
21.06.2016 20.06.2016
today yesterday 3_days_ago
1 2 3
22.06.2016
![Page 29: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/29.jpg)
Aliasing
1 2 3 1 2 3
21.06.2016 20.06.2016
today yesterday 3_days_ago
1 2 3
22.06.2016
I MUST KNOW!it’s Better Performance
![Page 30: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/30.jpg)
Aliasing
1 2 3 1 2 3
21.06.2016 20.06.2016
1 2 3
22.06.2016
It’s a Data Problem!
today yesterday 3_days_ago
![Page 31: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/31.jpg)
Aliasing + Routing
1 2 3 1 2 3
21.06.2016 20.06.2016
1 2 3
22.06.2016
It’s a Data Problem!
today yesterday 3_days_agotoday_returns recent_returns
![Page 32: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/32.jpg)
Aliasing + Routing + Search
IndexIndex Shard
Alias
Shard slice
![Page 33: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/33.jpg)
Search Execution Preference
Elasticsearch targets shards and replicas in round-robin manner. Each shard is queried similarly
_primary Query only primary shards (latest info from index or optimize for writing path)
_primary_first Query primary first in available
_replica Query replica shard only
_replica_first Query replica first in available
_local Query shards available on the current node
_only_node:node_id Query a specific node
_only_nodes:* Query only a set of nodes
_prefer_node:node_id Query a prefered noe
_shards:1,3 e,g _shards:1,3;_local Query specific shards with a preference
PUT _search?preference=_replica
![Page 34: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/34.jpg)
Time Framed Indices
![Page 35: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/35.jpg)
Data Flow
HOT Cold Closed
Backed_up
Trashed
Time
![Page 36: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/36.jpg)
Closing/Opening Index
➔ Closing an index
◆ Removes all shard allocations from the cluster ◆ But keeps the index data around ◆ Helps reduce the resources used on the cluster ◆ Consumes only disk space
➔ Opening an index
◆ Allows to open a closed index ◆ Note, those are not “milliseconds” time operation, opening an index can take a few seconds
to a couple of minutes ◆ Flushing before closing will reduce the opening time
![Page 37: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/37.jpg)
Index Templates
- Order allows you to override other templates
- Settings allows you to scale anytime
- Aliases can be defined on index creation
![Page 38: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/38.jpg)
Index Templates
![Page 39: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/39.jpg)
Time framed indices lifecycle
1. Use Index templates to generate mappings for new indices2. Use aliases to decouple your application from data logic3. Use hot nodes for fresh data4. Move old data to cold nodes5. Close old indices before deletion6. Change your time frame at any point to scale (Monthly, Weekly….)7. Use Routing if you have too many shards in a big cluster
![Page 40: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/40.jpg)
Data Flow
HOT Cold Closed
Backed_up
Trashed
Time
![Page 41: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/41.jpg)
Aggregations
![Page 42: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/42.jpg)
Aggregations Types
Buckets Metrics Pipeline
![Page 43: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/43.jpg)
Nested Bucket Aggregations
![Page 44: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/44.jpg)
Aggregation Query
![Page 45: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/45.jpg)
Aggregation Query
Better cachingFetch relevant documents
First segmentation
Nested segmentation
![Page 46: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/46.jpg)
Doc Values
- Why do we need this?
- Sorting, Aggregations, Some Scripting
- Doc Values
- Build columnar style data structure on disk
- Created at indexing time, stored as part of the segment
- Read like other pieces of the Lucene index
- Don't take up heap space
- Uses file system cache
- Default for not_analyzed string and numeric fields in 2.0+
![Page 47: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/47.jpg)
Raw Fields
- Use customer_name.raw for aggregations
- Use customer_name for search
![Page 48: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/48.jpg)
Aggregations Types
Buckets Metrics Pipeline
![Page 49: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/49.jpg)
Metrics Aggregations
- Avg Aggregation
- Cardinality Aggregation
- Extended Stats Aggregation
- Max Aggregation
- Min Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Scripted Metric Aggregation
- Stats Aggregation
- Sum Aggregation
- Top hits Aggregation
- Value Count Aggregation
![Page 50: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/50.jpg)
Extended Stats Aggregation
![Page 51: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/51.jpg)
Aggregation Search
Shard 4
Shard 2
Query
Result
Node
Node
Shard 3
Node
Shard 1
Node
![Page 52: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/52.jpg)
Scripted Metric Aggregation
- Init_script Executed first. Allows initialization of variables.- map_script Executed once after each document is collected. - combine_script Executed once on each shard after document collection is complete. - reduce_script Executed once on the coordinating node after all shards have returned their results.
![Page 53: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/53.jpg)
Buckets Aggregations
- Children Aggregation
- Date Histogram Aggregation
- Date Range Aggregation
- Filter Aggregation
- Filters Aggregation
- Global Aggregation
- Histogram Aggregation
- Missing Aggregation
- Range Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Terms Aggregation
![Page 54: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/54.jpg)
Date Histogram Aggregation
![Page 55: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/55.jpg)
Date Range Aggregation
Don’t forget!
Round your dates
![Page 56: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/56.jpg)
Missing Aggregations
![Page 57: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/57.jpg)
Range agg
![Page 58: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/58.jpg)
Histogram Aggregation
![Page 59: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/59.jpg)
Pipeline Aggregations
Pipeline
![Page 60: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/60.jpg)
Pipeline Aggregations
Parent
- Able to compute new buckets or new aggregations to a parent aggregation.
Sibling
- Able to compute new buckets or new aggregation on the same level.
![Page 61: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/61.jpg)
Siblings Aggregation
- min_bucket
- max_bucket
- sum_bucket
- avg_bucket
- stats_bucket
- extended_stats_bucket
- percentiles_bucket
![Page 62: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/62.jpg)
Average Aggregation
![Page 63: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/63.jpg)
Parent Pipeline Aggregation
- moving_avg
- derivative
- cumulative_sum
- bucket_script
- bucket_selector
- serial_diff
![Page 64: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/64.jpg)
Cumulative Sum Aggregation
![Page 65: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/65.jpg)
Derivative Aggregation
![Page 66: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/66.jpg)
Moving Average Aggregation
![Page 67: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/67.jpg)
Moving Average Aggregation
![Page 68: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/68.jpg)
Moving Average Aggregation
Prediction
![Page 69: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/69.jpg)
Bucket Selector Aggregation
![Page 70: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/70.jpg)
Bucket Script Aggregation
![Page 71: Elasticsearch Data Analyses](https://reader034.vdocuments.us/reader034/viewer/2022052206/5877cb821a28ab39588b6b23/html5/thumbnails/71.jpg)
The End