warp10: collect, store and manipulate sensor data - breizhcamp - 2016 03-24
TRANSCRIPT
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10: Collect, store and manipulate sensor data
Horacio Gonzalez Sébastien Lambour
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Horacio Gonzalez
@LostInBrittany
Cityzen Data
Spaniard lost in Brittany, developer, dreamer and all-around geek
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Sébastien Lambour
@FinistSeb
Cityzen Data
Runner, 2 Kids, Geek, Handyman, Polyglot JVM Developer
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10
@FinistSeb @LostInBrittany#BzhCmp #Warp10
IntroductionGeo-Time SeriesTM
Image: Spacetime distorsions
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time Series
Image: Mike Bostock
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time series storage and analysis
Image: Hamza Fessi and ABC Bourse
Not suited for your vanilla SQL RDBMS
One simple example: moving averages...
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time SeriesTM
Image: AIS Vessel Tracking
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series and the IoT
Image: LinkedIn
@FinistSeb @LostInBrittany#BzhCmp #Warp10
IoT means talking thingHow fast are they talking?
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of very introverted Things
Long range transmissions
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of introverted Things
Personal Area Network
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of shy Things
Local Area NetworkCellular Networks
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Lots of shy thing generate a huge lot of data
Image: Universal Studios
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of chatty Things
10 000 Hz
670 000 sensors
20 000 metrics
per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of garrulous Things
Image: Google
Millions of metrics per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 : A software platform for IoT
Warp 10 is a software platform that● Ingests and stores data● Manipulates and analyzes data● Is dedicated to data from sensors, meters, IoT and any real or
virtual probe
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 General Synoptic
Stockage Architecture
Language, Functions, Algorithms
Application access
VizualisationRealTime
@FinistSeb @LostInBrittany#BzhCmp #Warp10
#collectHow do you get these metrics?
Image: Games Radar
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using our own Sensision agent
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using our own Sensision agent
With queue forwarder
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using plugins for other collecting systems
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Or simply pushing data directly
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Choosing an input format
@FinistSeb @LostInBrittany#BzhCmp #Warp10
XML? JSON?
139 bytes 108 bytes
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 GTS Input Format
57 bytes
But size isn't the most important reason
parsing time is way more important
XML or even JSON parsing is slow and costlyWarp 10 GTS input format isn't
@FinistSeb @LostInBrittany#BzhCmp #Warp10
timestamp (us by default)
latitude:longitude (WGS84)
elevation (millimeters)
classname*
labels (key=value)
value* (long, double, boolean or string)* mandatory fields
Warp 10 GTS Input Format
@FinistSeb @LostInBrittany#BzhCmp #Warp10
#storeFrom tiny to huge
Image: Games Radar
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on Raspberry Pi B+
1 000 datapoints per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on Raspberry Pi 2 B
3 000 datapoints per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on a modern server
120 000 datapoints per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on a cluster
3 millions of datapoints per second(our current record on input traffic)
@FinistSeb @LostInBrittany#BzhCmp #Warp10
#analyseFrom tiny to huge
Image: Amazon
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Many time-series solutions
TSAR
@FinistSeb @LostInBrittany#BzhCmp #Warp10
But they are only stores...
Fetching data is only the tip of the iceberg
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Analysing the data
High level analysis must be done elsewhere
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Algorithms are resource hungry
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Your computer is not a datacenter
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Manipulating GTS
To be scalable, analysis must be done in Warp 10 platform, not in user's computer
@FinistSeb @LostInBrittany#BzhCmp #Warp10
A true GTS analysis toolbox○ Hundreds of functions○ Manipulation frameworks○ Analysis workflow
Manipulating GTS
@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation
Why not a simple REST API?● One endpoint by function?● How to chain a workflow analysis?
REST API not suitable for complex manipulations
@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation
Why not a SQL dialect?● How do you do a simple moving average in SQL?● How do you geo-time fencing in SQL?
SQL is not adapted to (G)TS analysis!
@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation language
Our solution: a GTS manipulation language
WarpScript
@FinistSeb @LostInBrittany#BzhCmp #Warp10
A stack based language
@FinistSeb @LostInBrittany#BzhCmp #Warp10
WarpScript
Non-compiled Optimized functions, fast execution
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic operations
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Five frameworks
●●●●●
@FinistSeb @LostInBrittany#BzhCmp #Warp10
More than 500 functions
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time series functions
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time series functions
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series functions
Geo mapping (WKT)
Horizontal & vertical speed
Horizontal & vertical distance
Haversine...
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Quantum IDE
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Enough teasing...
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Fuel prices data
16 297 448 metrics
11 379 fuel stations
42 885 Geo Time Series
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic analysis
Average diesel fuel prices in France since 2007
Image: LEGO Ideas
@FinistSeb @LostInBrittany#BzhCmp #Warp10
First Fetch Data (SQL vs WarpScript )
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Timestamp (microseconds since epoch)
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Location (latitude, longitude)
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Value
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average
Using Groovy:
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
1- Calculate the mean price by station
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
BUCKETIZE framework
Put the data of a GTS into regularly spaced buckets
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
2- Reduce to get the global average
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
REDUCE framework
Apply a function on a set of GTS tick by tick
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Too verbose? Write it differently
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Even more concise
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic analysisMean of the last available
diesel fuel prices in France
Image: LEGO Ideas
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Fetching Data (SQL vs WarpScript )
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Mean of those last prices
align ticks with BUCKETIZE framework
compute the average with REDUCE
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-time analysis
Find the cheapest fuel station near here
48.115434, -1.636877
@FinistSeb @LostInBrittany#BzhCmp #Warp10
WKT: Well-known text geometry
@FinistSeb @LostInBrittany#BzhCmp #Warp10
…
WKT in WarpScript
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-filtering points of GTS
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-filtering points of GTS
MAPPER framework
Apply a function on values of a GTS that fall into a sliding window
@FinistSeb @LostInBrittany#BzhCmp #Warp10
The stations near my position
@FinistSeb @LostInBrittany#BzhCmp #Warp10
There can only be one
@FinistSeb @LostInBrittany#BzhCmp #Warp10
And this is only the surfacePossibilities are endless
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Think differently
Geo-Time Series are everywhere
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 platform and tools
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Everything is on GitHub
https://github.com/cityzendata/
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Thank you !