introduction to influxdb, an open source distributed time series database by paul dix
DESCRIPTION
In this presentation, Paul introduces InfluxDB, a distributed time series database that he open sourced based on the backend infrastructure at Errplane. He talks about why you'd want a database specifically for time series and he covers the API and some of the key features of InfluxDB, including: • Stores metrics (like Graphite) and events (like page views, exceptions, deploys) • No external dependencies (self contained binary) • Fast. Handles many thousands of writes per second on a single node • HTTP API for reading and writing data • SQL-like query language • Distributed to scale out to many machines • Built in aggregate and statistics functions • Built in downsamplingTRANSCRIPT
![Page 1: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/1.jpg)
Introducing InfluxDB, an open source distributed
time series databasePaul Dix@pauldix
![Page 2: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/2.jpg)
● Co-founder, CEO of Errplane (YC W13)● Organizer of NYC Machine Learning● Author of “Service Oriented Design with
Ruby & Rails”
About me
![Page 3: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/3.jpg)
Series editor for Addison Wesley’s “Data & Analytics”
![Page 4: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/4.jpg)
What is a time series?
![Page 5: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/5.jpg)
Metrics
![Page 6: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/6.jpg)
![Page 7: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/7.jpg)
![Page 8: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/8.jpg)
![Page 9: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/9.jpg)
![Page 10: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/10.jpg)
Events
● Measurements● Exceptions● Page Views● User actions● Commits● Deploys● Things happening in time...
![Page 11: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/11.jpg)
Analyticsoperations, developers, users, business
![Page 12: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/12.jpg)
Things you want to ask questions about,
visualize, or summarize over time.
![Page 13: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/13.jpg)
Actually a summarization
![Page 14: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/14.jpg)
Also a summarization
![Page 15: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/15.jpg)
What about...“...order by some_time_col”
![Page 16: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/16.jpg)
Why a database for time series?
![Page 17: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/17.jpg)
Billions of data points. Scale horizontally.
![Page 18: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/18.jpg)
HTTP native.API to build on.
![Page 19: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/19.jpg)
Built in tools for downsampling and
summarizing
![Page 20: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/20.jpg)
Automatically clear out old data if we want
![Page 21: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/21.jpg)
Process or monitor data as it comes in, like Storm
![Page 22: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/22.jpg)
Visualize and Summarize
● Graphs & dashboards● Last 10 minutes● Last 4 hours● Last 24 hours● Past week● Past month● YTD● All Time
![Page 23: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/23.jpg)
Data Collection
● Statsd - https://github.com/etsy/statsd/● CollectD - http://collectd.org/● Heka - https://github.com/mozilla-
services/heka● l2met - https://github.
com/ryandotsmith/l2met● Libraries● Framework integrations● Cloud integrations (AWS, OpenStack)● Third-party integrations
![Page 24: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/24.jpg)
Existing Tools
● RRDTool (metrics)● Graphite (metrics)● OpenTSDB (metrics + events)● Kairos (metrics + events)● and others...
![Page 25: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/25.jpg)
Something missing...
![Page 26: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/26.jpg)
![Page 27: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/27.jpg)
![Page 28: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/28.jpg)
![Page 29: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/29.jpg)
![Page 30: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/30.jpg)
InfluxDB: harness lightning, get 1.21
gigawatts.
![Page 31: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/31.jpg)
InfluxDB
● Written in Go● Uses LevelDB for storage (may change)● Self contained binary● No external dependencies● Distributed (in December)
![Page 32: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/32.jpg)
HTTP Native
● Read/write data via HTTP● Manage via HTTP● Security model to allow access directly from
browser
![Page 33: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/33.jpg)
How data is organized
● Databases (like in MySQL, Postgres, etc)● Time series (kind of like tables)● Points or events (kind of like rows)
![Page 34: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/34.jpg)
Security
● Cluster admins● Database admins● Database users
○ read permissions■ only certain series■ only queries with a column having a specific
value (e.g. customer_id=32)○ write permissions
■ only certain series■ only with columns having a specific value
![Page 35: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/35.jpg)
InfluDB Setup
● http://play.influxdb.org● OSX
○ brew update && brew install influxdb● http://influxdb.org/download● Ubuntu
○ sudo dpkg -i influxdb_latest_amd64.deb● RedHat
○ sudo rpm -ivh influxdb-latest-1.i686.rpm
![Page 36: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/36.jpg)
Examples, but sadly no R :(
![Page 37: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/37.jpg)
HTTP API docs athttp://influxdb.org/docs/api/http
![Page 38: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/38.jpg)
https://github.com /influxdb/influxdb-r
fork, write sweet code, submit PR, be loved and adored FOREVER
![Page 39: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/39.jpg)
Create a databasecurl -X POST \ 'http://localhost:8086/db?u=root&p=root' \ -d '{"name":"mydb", "replicationFactor": 3}'
![Page 40: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/40.jpg)
Add a user
curl -X POST\ 'http://.../db/mydb/users?u=root&p=root' -d \ '{"name":"paul", "password": "foo", "admin": true}'
![Page 41: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/41.jpg)
Write points
curl -X POST \ 'http://localhost:8086db/mydb/series?u=paul&p=pass' \ -d '[{"name":"foo", "columns":["val"], "points": [[3]]}]'
![Page 42: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/42.jpg)
Querying
curl \'http://...:8086/db/mydb/series?u=paul&p=pass&q=...'
![Page 43: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/43.jpg)
SQL(ish) Query Language
select * from user_events where time > now() - 4h
![Page 44: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/44.jpg)
[{ "name": "foo", "columns": [ "time", "sequence_number", "val1", "val2" ], "points": [ [1384295094, 3, "paul", 23], [1384295094, 2, "john", 92], [1384295094, 1, "todd", 61] ] }, {...}]
JSON data returned
![Page 45: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/45.jpg)
select count(state) from user_eventsgroup by time(5m), state where time > now() - 7d
![Page 46: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/46.jpg)
select percentile(value, 90) from response_timesgroup by time(30s)where time > now() - 1h
![Page 47: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/47.jpg)
select percentile(value, 90) from response_timesgroup by time(5m)into response_times.percentiles.90
Continuous Queries (downsampling)
![Page 48: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/48.jpg)
Continuous queries for real-time processing &
monitoring
![Page 49: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/49.jpg)
Regexes
select * from eventswhere email =~ /.*gmail\.com/
![Page 50: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/50.jpg)
select percentile(value, 99)from /stats\.*/into :series_name.percentiles.99
![Page 51: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/51.jpg)
select count(value)from seriesA merge seriesB
![Page 52: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/52.jpg)
Querying
● Functions○ count, min, max, mean, distinct, median, mode,
percentiles, derivative, stddev● Where clauses● Group by clauses (time and other columns)● Periodically delete old raw data
![Page 53: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/53.jpg)
Built in UI
![Page 54: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/54.jpg)
CLI
![Page 55: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/55.jpg)
Libraries
● Ruby● Frontend JS● Node● Python● PHP● Go (soon)● Java (soon)
![Page 56: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/56.jpg)
Ideas to come...
● Custom functions○ Embedded LUA, YARN like interface, or both?
● Custom real-time queries○ define custom logic and InfluxDB will feed it data
● Queries triggering web hooks○ pair with custom functions for monitoring/anomaly
detection
![Page 57: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/57.jpg)
Project Status
● Based on work at https://errplane.com○ 2 billion points per month
● http://influxdb.org● Code available at https://github.com/influxdb● API finalized in the next month● Clustered version in December● Production ready by end of year
![Page 58: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/58.jpg)
We’re available for consulting/help
![Page 59: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/59.jpg)
We need your help
● API, what else would you like to see?● Client libraries● Visualization tools● Data collection integrations● Comments/feedback on the mailing list● http://influxdb.org/overview/
![Page 60: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/60.jpg)
Share the love
● Star or watch the project on http://github.com/influxdb/influxdb
● Tweet, blog, shout, whisper● Participate in discussions on mailing list
![Page 61: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/61.jpg)
Come to the hackfest
● Monday, December 2nd at Pivotal● http://meetup.com/nyc-influxdb-user-group
![Page 62: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/62.jpg)
OSS lives and dies by adoption/popularity
![Page 63: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/63.jpg)
MongoDB has 4,406 stars
![Page 64: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/64.jpg)
MongoDB valued at $1.2B
![Page 65: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/65.jpg)
Each star worth $272,355.00
![Page 66: Introduction to InfluxDB, an Open Source Distributed Time Series Database by Paul Dix](https://reader034.vdocuments.us/reader034/viewer/2022050801/540deab68d7f728d7e8b4b5c/html5/thumbnails/66.jpg)
Help InfluxDB get to 10k stars!
go forth and build!