do you have an "analytics"? how analytics tools work

Do you have an ‘analytics’?©

Paul Fleetwood

Originally presented to Tech on TapApril 24,2014

@SplytAnalyticswww.splyt.com

[email protected]

Do you have an ‘analytics’?

• What are analytics / telemetry / instrumentation?• How do they work?• How can they be used efficiently?• What can they do?

Define terms (these are mine)

• analytics - charts, graphs, and reports• telemetry - collected data• instrumentation - the code / process of

collecting data

How does it work?

So, basically...

Instrument the code…

… to generate the telemetry …

… used to create analytics.

SDKs (Software Development Kits)

• Platform specific: iOS, Android, Windows,...• Language specific: PHP, Ruby, JavaScript,...• Domain specific: Mobile, Web, Embedded,...

Usually a library or some source that has a few functions, which are used to report telemetry.

SDK example

Gathering telemetry for purchases:SDK.init(‘myuserid’);<some purchase code>SDK.event(‘purchase’);

… or maybe ...SDK.event(‘purchase’, {item:’energy’, price:5}); purchase

Photo credit Flickr user Philip Wilson

Somewhere in the cloud, a server consumes the data, and writes it to persistent storage.

Collect the data


Photo credit Flickr user Mike Shelby

Photo credit Flickr user Team Dalog

Organize the data

The data may exist as:

• Log files - e.g. S3, NFS, etc.• Row store DB - e.g. RDS, MySQL, Oracle, etc.• Column store DB - e.g. RedShift, InfiniDb, etc.• NoSQL - e.g. Mongo, CouchBase, HBase, etc.• Other - there is a lot of development going on

around database technology


Create analyticsAn analysis engine will process the collected telemetry, generating reports and graphs.

The kind of reports and tools provided will drive the kind of processing:

• SQL statement• Hadoop job• Real Time aggregates

Photo credit Flickr user

Science Museum LondonPhoto credit Flickr user Team Dalog

OK, but how does it really work?

Two basic approaches Agile Ninja

Nostradamus

Photo credit Flickr user Chris Christian

● Decide which analytics are needed○ This can be an involved and lengthy process

● Instrument the application● Profit!

Approach #1: NostradamusPredict

^

^

Prophet?

Approach #1: Nostradamus

How accurate was Nostradamus? How accurate are YOU?

Probably…

1) You forgot something

AND/OR

2) You never knew it in the first place

Approach #2: Agile Ninja

Don’t spend time building what you don’t need. Iterate and improvise as you go.

Question / Hypothesis

InstrumentGather Telemetry

Deploy But, there’s a problem...

Be self.Iterate until awesome.

Approach #2: Agile Ninja

Question / Hypothesis

Instrument

Gather Telemetry

Deploy

schedule engineer (so busy, bro)

test instrumentation

submit to 1st party (Apple, Google, Amazon, etc.)

review (2 weeks!?)

reject

how much data is enough? one month?!

generate analytics

answer no longer relevantdevelopment effort? more engineering?

Photo credit Collider.com

Solutions

Learn to predict the future

OR

Eliminate / Reduce the Reinstrumentation / Redeployment / Development

(you should be doing something else)

Question

Answer

Solutions

In order to enable this cycle, a few things:

● The data must already be present○ No need to deploy new hooks and wait for data

to roll in

● The data must be connected in ways that make it useful

○ Group-by’s, filters, etc.

● The data must be accessible without new development

○ A flexible data model and set of powerful generic “queries”

reinstrumentation & redeployment

development

Predicting needed data ???Can we predict what data will be needed?

Can we predict what is meaningful?

no, so collect everything... that’s “meaningful”

yes, probably

Event Meaningful?

User moves mouse pointer 1 pixel no

User levels up yes

User navigates to a screen yes

* There are probably a finite set of events in your application that model a user’s interaction with it. * Even better, you probably already have code written to handle those interactions. * Even, even better, a lot of those interactions probably go through the same code.

So, a little instrumentation can capture a lot of activity.

Other kinds of data

Events are one kind of data, but event metadata holds a lot of power, and there can be a lot of it. Remember our purchase?

SDK.event(‘purchase’, {item:’energy’, price:5, level:7, coinbalance:3100});

event occurrence event state (some of this) user state

(lots of this)● number of purchases● revenue from purchases● revenue from purchases for each item● number of purchases of each type of item by users @ level 7● purchases by user level● purchases by user coin balance

… SPLYT calls this associated state “context”.

User state is a problem• Event state is specific to an event, and is likely to be limited and localized• User state accumulates over time and across systems

• There can be tons of it• It isn’t localized and might not be “handy” when reporting an event

• SPLYT has updateUserState and updateDeviceState API calls to manage this• Allows for reporting state where it is known and when it changes

SDK.updateUserState({ level:7 });

SDK.updateUserState({ coinbalance:3100 });

SDK.event(‘purchase’, {item:’energy’, price:5});

…

…

Even more types of data

This is like a triple store

● Temporal association - one activity occurs during/while another activity is occurring

● Entity relationships - one entity activity occurs with another entity

○ One user shoots a cannon at another user

■ Event: shoots cannon■ Association: “targets user B”

● Spacial relationships - where things happen - how near?

Transactions vs. Events● Events are instantaneous● Transactions span time

○ Allows for temporal association● SPLYT provides an API for declaring when a transaction (activity) begins and ends

Events can be modeled as transactions

SDK.beginTransaction(‘purchase’, {item:’energy’, price:5});

SDK.endTransaction(‘purchase’, {result:’success’}

Not everything is known at the start of an activity, and the end might never come

Build context outside of the app

● The app is a terrible place to try and compute analytics

○ Designed to be … the app!

● External software can be designed to build context

● SPLYT is an app that lives in a world where it is fed events and is built specifically to stitch all this data into context

Question

Answer

SolutionsIn order to enable this cycle, a few things:

● The data must already be present○ No need to deploy new hooks and wait for data

to roll in

● The data must be connected in ways that make it useful

○ Group-by’s, filters, etc.

● The data must be accessible without new development

○ A flexible data model and set of powerful generic “queries”

reinstrumentation & redeployment

development

A data model and a toolSPLYT models things with entities (users, devices, etc), their state, and the transactions (activities) they perform.

Some general question forms (all filterable by event state and entity state):

● Count of events (by event state or entity state)○ count of purchases (by item or by user level)

● Computation of event state or entity state (by event state or entity state)○ sum of purchase prices (by item or user level)

● All of that by “parent” event state or entity state

We made Slicer, a web tool used to build these standard types of queries.

It can be done!

● Report all meaningful data● Build lots of connections between the data

(everything by everything)● Have a general, powerful data model● Have a tool that allows you to manipulate

and explore the data model


Photo credit Flickr user

Science Museum LondonPhoto credit Flickr user Team Dalog

What can you do?

Questions?

Contact me!

Paul [email protected]

Splyt.com@SplytAnalytics

mailto:[email protected]

do you have an "analytics"? how analytics tools work

Technology

data instrumentation

kind of data

needed data

user level purchases

types of data

user levels

localized user state

user state lots