![Page 1: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/1.jpg)
Bartek Plotka Bwplotka Bplotka
Fabian Reinartz fabxc
Global, durable Prometheus monitoring
Munich, 9th August 2018
![Page 2: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/2.jpg)
Prometheus 2.X
● Reliable operational model● Powerful query language● Scraping capabilities beyond the casual usage● Local metric storage
Prometheus
![Page 3: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/3.jpg)
Cluster 1
Prometheus at Scale
Cluster 2
Prometheus
Cluster n
Cluster n+1
Prometheus...
![Page 4: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/4.jpg)
Cluster 1
Problem: Global View
Cluster 2
Prometheus
Cluster n
Cluster n+1
Prometheus...
Grafana
Alertmanager
![Page 5: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/5.jpg)
Cluster 1
Problem: Global View
Cluster 2
Prometheus
Cluster n
Cluster n+1
Prometheus...
Grafana
Alertmanager
sum(rate(go_memstats_alloc_bytes_total[1m])) by (env, cluster, job) ?
![Page 6: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/6.jpg)
Cluster 1
Problem: Global View
Cluster 2
Prometheus
Cluster n
Cluster n+1
Prometheus...
Grafana
Alertmanager
sum(go_memstats_alloc_bytes_total::rate1m) by (env, cluster, job) ✓
Prometheus
/federate
![Page 7: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/7.jpg)
Cluster 1
Problem: High Availability
Cluster 2
Prometheus
Cluster n
Cluster n+1
Prometheus...
Grafana
Alertmanager
![Page 8: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/8.jpg)
Cluster 1
Problem: High Availability
PrometheusCluster 2
Prometheus
Cluster n
Cluster n+1
Prometheus...
Grafana
Alertmanager
Prometheus Prometheus
![Page 9: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/9.jpg)
Cluster 1
Problem: High Availability
PrometheusCluster 2
Prometheus
Cluster n
Cluster n+1
Prometheus...
Grafana
Alertmanager
Prometheus Prometheus
“Which replica to use?”
![Page 10: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/10.jpg)
Problem: Metric retention
![Page 11: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/11.jpg)
Problem: Metric retention
SSD
Prometheus
PrometheusRemote write
![Page 12: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/12.jpg)
Thanos
Goals
● Have a global view● Have a HA in place● Increase retention
![Page 13: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/13.jpg)
Global View
See everything from a single place!
![Page 14: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/14.jpg)
SSD
Prometheus
PrometheusTargets
![Page 15: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/15.jpg)
SSD
Sidecar
Prometheus SidecarTargets
gRPC (Store API)
![Page 16: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/16.jpg)
Store API
service Store {
rpc Series(SeriesRequest) returns (stream SeriesResponse);
rpc LabelNames(LabelNamesRequest) returns (LabelNamesResponse);
rpc LabelValues(LabelValuesRequest) returns (LabelValuesResponse);
}
message SeriesRequest {
int64 min_time = 1;
int64 max_time = 2;
repeated LabelMatcher matchers = 3;
}
Sidecar
Prometheus
remote read
Store API
![Page 17: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/17.jpg)
SSD
Querier
Prometheus Sidecar
Querier
Store API
Targets
HTTP Query API
![Page 18: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/18.jpg)
SSD
Global View
Prometheus Sidecar
Querier
Targets
SSD
SidecarTargets
Prometheus
Merge
Store API
![Page 19: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/19.jpg)
SSD
Global View + Availability
Prometheus SidecarTargets
SSD
Sidecar
Targets
Prometheus
SSD
Sidecar Prometheus
“replica”:”1”
“replica”:”2”
QuerierMerge
Deduplicate
Store API
![Page 20: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/20.jpg)
Thanos
Goals
● Have a global view ✓● Have a HA in place ✓
Prometheus Sidecar
SSD
Sidecar PrometheusSidecar Prometheus
Querier
![Page 21: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/21.jpg)
Historical Metrics
What exactly happened X months ago?
![Page 22: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/22.jpg)
TSDB Layout
Block 2 Block 4Block 3Block 1
T-10hT-16h T-4h T-2h T
![Page 23: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/23.jpg)
TSDB Layout
Block 4Block 3Block 1
chunks chunks
chunks chunks
index
T-10hT-16h T-4h T-2h T
![Page 24: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/24.jpg)
SSD
Data saving
Prometheus SidecarTargets
Object Storage
Blocks Blocks
Block
![Page 25: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/25.jpg)
SSD
Data saving
Prometheus SidecarTargets
Object Storage
Blocks Blocks
Block
--storage.tsdb.max-block-duration=2h --storage.tsdb.retention=12h
![Page 26: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/26.jpg)
Store Gateway
Object Storage
BlocksCache
Store
Querier
Store API
![Page 27: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/27.jpg)
Thanos
Goals
● Have a global view ✓● Have a HA in place ✓● Increase retention ✓
![Page 28: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/28.jpg)
Prometheus
Querier
Scrape EngineCompactor
Rule & Alert Engine
Prometheus
![Page 29: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/29.jpg)
Thanos
Scrape EngineCompactor
Rule & Alert Engine
Thanos QuerierThanos Querier
Thanos Querier
![Page 30: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/30.jpg)
Thanos
Compactor
Rule & Alert Engine
Thanos QuerierThanos Querier
Thanos Querier
SSD
Prometheus Sidecar
SSD
Prometheus Sidecar
SSD
Prometheus Sidecar
![Page 31: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/31.jpg)
Thanos
Compactor
Thanos QuerierThanos Querier
Thanos Querier
SSD
Prometheus Sidecar
SSD
Prometheus Sidecar
SSD
Prometheus Sidecar
Thanos RulerThanos Ruler
![Page 32: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/32.jpg)
Thanos
Thanos RulerThanos Ruler
Thanos QuerierThanos Querier
Thanos Querier
SSD
Prometheus Sidecar
SSD
Prometheus Sidecar
SSD
Prometheus SidecarGlobal Compactor
![Page 33: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/33.jpg)
Thanos
Store GatewayStore
Gateway
Object StorageSSD
Prometheus Sidecar
SSD
Prometheus Sidecar
SSD
Prometheus Sidecar
Thanos RulerThanos Ruler
Global Compactor
Thanos QuerierThanos Querier
Thanos Querier
![Page 34: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/34.jpg)
Deployment Models
![Page 35: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/35.jpg)
Federation
QuerierQuerierQuerierStoreBucket
QuerierQuerierQuerier
…
StoreBucket
QuerierQuerierQuerierStoreBucket
Cluster A (master)
Cluster B
Cluster C
++
Federation (through Store API)++
++
![Page 36: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/36.jpg)
Example Deployment
Cluster 1
Cluster 2
+
Cluster n
Cluster n+1
+...
+
Core Cluster
Grafana
Alertmanager
Bucket
Compactor
Querier Querier
Querier
Ruler Store
Statically configured
+
![Page 37: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/37.jpg)
Example Global Deployment
++++
++++
++++
++++
++++
++++
Testing Staging
Production Querier Querier
Querier
![Page 38: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/38.jpg)
Bonus: Downsampling
![Page 39: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/39.jpg)
Downsampling
Raw: 16 bytes/sample
Compressed: 1.07 bytes/sample
![Page 40: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/40.jpg)
Downsampling
BUT…
![Page 41: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/41.jpg)
Downsampling
Decompressing one sample takes 10-40 nanoseconds
● Times 1000 series @ 30s scrape interval
● Times 1 year
![Page 42: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/42.jpg)
Downsampling
Decompressing one sample takes 10-40 nanoseconds
● Times 1000 series @ 30s scrape interval
● Times 1 year
● Over 1 billion samples, i.e. 10-40s – for decoding alone
● Plus your actual computation over all those samples, e.g. rate()
![Page 43: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/43.jpg)
Downsampling
BlockRAW
Block@ 5m
Block@ 1h
10x 12x
![Page 44: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/44.jpg)
Downsampling
chunk
count sum min max counter
chunk...
![Page 45: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/45.jpg)
Downsampling
count sum min max counter
count_over_time(requests_total[1h])
![Page 46: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/46.jpg)
Downsampling
count sum min max counter
sum_over_time(requests_total[1h])
![Page 47: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/47.jpg)
Downsampling
count sum min max counter
min(requests_total)
min_over_time(requests_total[1h])
![Page 48: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/48.jpg)
Downsampling
count sum min max counter
max(requests_total)
max_over_time(requests_total[1h])
![Page 49: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/49.jpg)
Downsampling
count sum min max counter
rate(requests_total[1h])
increase(requests_total[1h])
![Page 50: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/50.jpg)
Downsampling
count sum min max counter
requests_total
avg(requests_total)
...
*
avg
![Page 51: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/51.jpg)
Thanos
Goals
● Have a global view ✓● Have a HA in place ✓● Increase retention ✓
![Page 52: Munich, 9th August 2018 - PromCon · Prometheus 2.X Reliable operational model Powerful query language Scraping capabilities beyond the casual usage Local metric storage](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0712867e708231d41b2a1f/html5/thumbnails/52.jpg)
Any questions?
github.com/improbable-eng/thanos
Fabian Reinartz fabxc
Bartek Plotka bwplotka Bplotka