life of a promql query - perconalife of a promql query percona live, santa clara, ca – 2017-04-27...
TRANSCRIPT
Life of a PromQL query
Percona Live, Santa Clara, CA – 2017-04-27Björn “Beorn” Rabenstein, Production Engineer, SoundCloud Ltd.
The fundamental problem of TSDBs:Vertical writes, horizontal(-ish) reads.
Time (~weeks)
TimeSeries(~millions)
Writes
Reads
External storage with BigTable semanticscf. https://cloud.google.com/bigtable/docs/schema-design-time-series
...http_requests_total{status="200",method="GET"}@1434317560938 ⇒ 94355http_requests_total{status="200",method="GET"}@1434317561287 ⇒ 94934http_requests_total{status="200",method="GET"}@1434317562344 ⇒ 96483http_requests_total{status="404",method="GET"}@1434317560938 ⇒ 38473http_requests_total{status="404",method="GET"}@1434317561249 ⇒ 38544http_requests_total{status="404",method="GET"}@1434317562588 ⇒ 38663http_requests_total{status="200",method="POST"}@1434317560885 ⇒ 4748http_requests_total{status="200",method="POST"}@1434317561483 ⇒ 4795http_requests_total{status="200",method="POST"}@1434317562589 ⇒ 4833http_requests_total{status="404",method="POST"}@1434317560939 ⇒ 122...
Metric name Dimensions aka Labels Timestamp Sample Value
VALUEKEY
Labels are the new hierarchies.
api_http_requests_total
method=POSTmethod=GETmethod=...
path=/trackspath=/userspath=...
status=200status=404status=...
job=api-serverjob=nodejob=...
instance=1.2.3.4:80instance=1.2.3.4:81instance=...
api-server1.2.3.4:80
/tracksGET
200404[…]
POST[…]
/users[…]
1.2.3.4:81/tracks
GET200[…]
[...]/users
[...][...]
Shamelessly stolen from Julius Volz.
api_http_requests_total{method="post",code=~"2.."}
vs.
api-server.*.*.post.2*
1st generation
Just LevelDB, using the well known approaches of how to implement a TSDB on top of BigTable semantics. With some tweaks…
Indices are LevelDB, too.
(Used in the prototype.)
2nd generation
LevelDB only for indices.
Custom chunked storage for raw sample data, heavily (ab-)using the file system. More details at https://promcon.io/2016-berlin/talks/the-prometheus-time-series-database/
(Used in Prometheus as we know it.)
3rd generation
Completely custom TSDB.
Sophisticated custom indexing.
Fully integrated raw sample storage (no abuse of the file system anymore).
Heavy use of mmap.
Details: https://fabxc.org/blog/2017-04-10-writing-a-tsdb/
(Used in upcoming Prometheus 2.)
PromQL
Even though Borgmon remains internal to Google, the idea of treating time-series data as a data source for generating alerts is now accessible to everyone through those open source tools like Prometheus, Riemann, Heka, and Bosun [...]
Site Reliability Engineering: How Google Runs Production Systems (O'Reilly Media)
Chapter 10: Practical Alerting from Time-Series Data
PromQL is read-only
rate(incoming_http_requests_total[5m])
rate(incoming_http_requests_total[5m])
rate(incoming_http_requests_total[5m])
SELECT job, instance, method, status, path, client, version, […] rate(value, 5m) FROM incoming_http_requests_total
avg by(city) (temperature_celsius{country="germany"})
SELECT city, AVG(value) FROM temperature_celsius WHERE country="germany" GROUP BY city
errors{job="foo"} / total{job="foo"}
SELECT errors.job, errors.instance, […more labels…], errors.value / total.value FROM errors, total WHERE errors.job="foo" AND total.job="foo" JOIN […some more complicated stuff here…]
some_metric + 1
SELECT 1 + "value" FROM "some_metric"
offset(some_metric.*, 1)
log10(some_metric)
n/a
logarithm(some_metric.*)
my_a - my_b
SELECT "a" - "b" FROM "table"
(doesn't work for measurements)
reduceSeries(my.*, "diffSeries", 1, "a", "b")
temperature_celsius> without(instance) group_left 2 * stddev ignoring (instance) (temperature_celsius) + avg ignoring (instance) (temperature_celsius)
n/a
n/a
HTTP API
https://prometheus.io/docs/querying/api/#expression-queries
Used by
● internal expression browser● Grafana● custom clients
HTTP API PromQL engine
storage
type Querier interface { QueryRange( ctx context.Context, from, through model.Time, matchers ...*metric.LabelMatcher, ) ([]SeriesIterator, error) QueryInstant( ctx context.Context, ts model.Time, stalenessDelta time.Duration, matchers ...*metric.LabelMatcher, ) ([]SeriesIterator, error)// ...}
{code="404"}{code!="404"}{code=~"2.."}{code!~"2.."}
disk
Indices (simplified, Prometheus 1.x)
1. Key: Label name → Value: all existing label values for that name labelname→labelvalues
2. Key: Label pair ({name="value"})→Value: all series with that pairlabelpair→series
(N.B.: Metric name xxx becomes {__name__="xxx"} .)
Life of a QueryRange / QueryInstant call (simplified)
1. Resolve negative and regexp matchers into a set of possible simple matchers using labelname→labelvalues.
2. Lookup possible series for each simple matcher using labelpair→series and intersect/union the result.
3. For each remaining series, find the chunks for the requested time (instant or range), load them from disk if needed, and pin them into memory.
4. Return iterators for the series.5. PromQL engine does its thing with them and then closes them.6. Closing the iterators will unpin the chunks from memory (releasing
them into an LRU cache).
GET /api/v1/query?query=sum(rate(errors_total{job="foo"}[5m]))/sum(rate(requests_total{job="foo"}[5m]))
HTTP API PromQL engine
storage disk
QueryRange(ctx, now–5m, now, {__name__="errors_total", job="foo"})QueryRange(ctx, now–5m, now, {__name__="requests_total", job="foo"})
chunks [now-5m, now] for: {__name__="errors_total", job="foo", code="503", instance="1.2.3.4:80"}, {__name__="errors_total", job="foo", code="500", instance="4.5.3.1:80"}, …
Credits
PromQL example queries and the comparison to other query languages are taken from:
● Julius Volz’s talk Prometheus Design and Philosophy https://promcon.io/2016-berlin/talks/prometheus-design-and-philosophy
● Brian Brazil’s blog post Translating between monitoring languages https://www.robustperception.io/translating-between-monitoring-languages