do you have an "analytics"? how analytics tools work
DESCRIPTION
Originally presented April 24, 2014 to Orlando's Tech on Tap meetup by Paul Fleetwood, Sr. Software Engineer at SPLYT. Paul gives an overview of how analytics platforms work and how SPLYT's revolutionary process delivers better answers to all your questions.TRANSCRIPT
Do you have an ‘analytics’?©
Paul Fleetwood
Originally presented to Tech on TapApril 24,2014
@SplytAnalyticswww.splyt.com
Do you have an ‘analytics’?
• What are analytics / telemetry / instrumentation?• How do they work?• How can they be used efficiently?• What can they do?
Define terms (these are mine)
• analytics - charts, graphs, and reports• telemetry - collected data• instrumentation - the code / process of
collecting data
How does it work?
So, basically...
Instrument the code…
… to generate the telemetry …
… used to create analytics.
SDKs (Software Development Kits)
• Platform specific: iOS, Android, Windows,...• Language specific: PHP, Ruby, JavaScript,...• Domain specific: Mobile, Web, Embedded,...
Usually a library or some source that has a few functions, which are used to report telemetry.
SDK example
Gathering telemetry for purchases:SDK.init(‘myuserid’);<some purchase code>SDK.event(‘purchase’);
… or maybe ...SDK.event(‘purchase’, {item:’energy’, price:5}); purchase
Photo credit Flickr user Philip Wilson
Somewhere in the cloud, a server consumes the data, and writes it to persistent storage.
Collect the data
Photo credit Flickr user Philip Wilson
Photo credit Flickr user Mike Shelby
Photo credit Flickr user Team Dalog
Organize the data
The data may exist as:
• Log files - e.g. S3, NFS, etc.• Row store DB - e.g. RDS, MySQL, Oracle, etc.• Column store DB - e.g. RedShift, InfiniDb, etc.• NoSQL - e.g. Mongo, CouchBase, HBase, etc.• Other - there is a lot of development going on
around database technology
Photo credit Flickr user Philip Wilson
Create analyticsAn analysis engine will process the collected telemetry, generating reports and graphs.
The kind of reports and tools provided will drive the kind of processing:
• SQL statement• Hadoop job• Real Time aggregates
Photo credit Flickr user
Science Museum LondonPhoto credit Flickr user Team Dalog
OK, but how does it really work?
Two basic approaches Agile Ninja
Nostradamus
Photo credit Flickr user Chris Christian
● Decide which analytics are needed○ This can be an involved and lengthy process
● Instrument the application● Profit!
Approach #1: NostradamusPredict
^
^
Prophet?
Approach #1: Nostradamus
How accurate was Nostradamus? How accurate are YOU?
Probably…
1) You forgot something
AND/OR
2) You never knew it in the first place
Approach #2: Agile Ninja
Don’t spend time building what you don’t need. Iterate and improvise as you go.
Question / Hypothesis
InstrumentGather Telemetry
Deploy But, there’s a problem...
Be self.Iterate until awesome.
Approach #2: Agile Ninja
Question / Hypothesis
Instrument
Gather Telemetry
Deploy
schedule engineer (so busy, bro)
test instrumentation
submit to 1st party (Apple, Google, Amazon, etc.)
review (2 weeks!?)
reject
how much data is enough? one month?!
generate analytics
answer no longer relevantdevelopment effort? more engineering?
Photo credit Collider.com
Solutions
Learn to predict the future
OR
Eliminate / Reduce the Reinstrumentation / Redeployment / Development
(you should be doing something else)
Question
Answer
Solutions
In order to enable this cycle, a few things:
● The data must already be present○ No need to deploy new hooks and wait for data
to roll in
● The data must be connected in ways that make it useful
○ Group-by’s, filters, etc.
● The data must be accessible without new development
○ A flexible data model and set of powerful generic “queries”
reinstrumentation & redeployment
development
Predicting needed data ???Can we predict what data will be needed?
Can we predict what is meaningful?
no, so collect everything... that’s “meaningful”
yes, probably
Event Meaningful?
User moves mouse pointer 1 pixel no
User levels up yes
User navigates to a screen yes
* There are probably a finite set of events in your application that model a user’s interaction with it. * Even better, you probably already have code written to handle those interactions. * Even, even better, a lot of those interactions probably go through the same code.
So, a little instrumentation can capture a lot of activity.
Other kinds of data
Events are one kind of data, but event metadata holds a lot of power, and there can be a lot of it. Remember our purchase?
SDK.event(‘purchase’, {item:’energy’, price:5, level:7, coinbalance:3100});
event occurrence event state (some of this) user state
(lots of this)● number of purchases● revenue from purchases● revenue from purchases for each item● number of purchases of each type of item by users @ level 7● purchases by user level● purchases by user coin balance
… SPLYT calls this associated state “context”.
User state is a problem• Event state is specific to an event, and is likely to be limited and localized• User state accumulates over time and across systems
• There can be tons of it• It isn’t localized and might not be “handy” when reporting an event
• SPLYT has updateUserState and updateDeviceState API calls to manage this• Allows for reporting state where it is known and when it changes
SDK.updateUserState({ level:7 });
SDK.updateUserState({ coinbalance:3100 });
SDK.event(‘purchase’, {item:’energy’, price:5});
…
…
Even more types of data
This is like a triple store
● Temporal association - one activity occurs during/while another activity is occurring
● Entity relationships - one entity activity occurs with another entity
○ One user shoots a cannon at another user
■ Event: shoots cannon■ Association: “targets user B”
● Spacial relationships - where things happen - how near?
Transactions vs. Events● Events are instantaneous● Transactions span time
○ Allows for temporal association● SPLYT provides an API for declaring when a transaction (activity) begins and ends
Events can be modeled as transactions
SDK.beginTransaction(‘purchase’, {item:’energy’, price:5});
SDK.endTransaction(‘purchase’, {result:’success’}
Not everything is known at the start of an activity, and the end might never come
Build context outside of the app
● The app is a terrible place to try and compute analytics
○ Designed to be … the app!
● External software can be designed to build context
● SPLYT is an app that lives in a world where it is fed events and is built specifically to stitch all this data into context
Question
Answer
SolutionsIn order to enable this cycle, a few things:
● The data must already be present○ No need to deploy new hooks and wait for data
to roll in
● The data must be connected in ways that make it useful
○ Group-by’s, filters, etc.
● The data must be accessible without new development
○ A flexible data model and set of powerful generic “queries”
reinstrumentation & redeployment
development
A data model and a toolSPLYT models things with entities (users, devices, etc), their state, and the transactions (activities) they perform.
Some general question forms (all filterable by event state and entity state):
● Count of events (by event state or entity state)○ count of purchases (by item or by user level)
● Computation of event state or entity state (by event state or entity state)○ sum of purchase prices (by item or user level)
● All of that by “parent” event state or entity state
We made Slicer, a web tool used to build these standard types of queries.
It can be done!
● Report all meaningful data● Build lots of connections between the data
(everything by everything)● Have a general, powerful data model● Have a tool that allows you to manipulate
and explore the data model
Photo credit Flickr user Philip Wilson
Photo credit Flickr user
Science Museum LondonPhoto credit Flickr user Team Dalog
What can you do?