(re-)introducing snowplow

8
Introducing SnowPlow A new approach to web analytics

Upload: yalisassoon

Post on 27-Jan-2015

108 views

Category:

Technology


5 download

DESCRIPTION

A deck describing what we believe is wrong with web analytics, in 2012, and how we have architected SnowPlow to address those problems in a fresh way

TRANSCRIPT

Page 1: (Re-)introducing SnowPlow

Introducing SnowPlow

A new approach to web analytics

Page 2: (Re-)introducing SnowPlow

A lot is wrong with web analytics today…

• Hard to integrate with other sources of customer data including CRM, email marketing, social marketing, customer service, financial systems ad serving systems

• Typically separated from other business intelligence system, with each system used to answer different types of business questions

Siloed

Narrow focused

Inflexible

Too high level AND

too low level level

• Focus on marketing-related analytics (visits, click-throughs, conversions)• Focus on ecommerce sites. (Limited number of goals, limited set of clearly defined workflows e.g.

sign up to email, purchase product)• No analytics for SaaS based businesses, drivers of customer value, product analytics

• Hard to perform analyses on users / customers that span multiple visits• Hard to examine the ways users actually engage on sites (esp. for SaaS / web apps), aggregate

customer journeys• Hard to map and segment users based on their behaviour and customer journeys• Limited tools to pick out the root cause of differences in customer journey

• Too high level: impractical or impossible to zoom in on individual customers and events• Too low level: hard to see the wood for the trees in a sea of data / pre-defined views

Page 3: (Re-)introducing SnowPlow

…with bad consequences for businesses

• Questions related to the customer base– Who are our most valuable customers?– How can I spot them in advance?– What are the “sliding doors” moments in a customer’s

journey that impact their future value?– How does our customer base break down, by

behaviour?– How well do I serve each segment?– How well do I monetize each segment?– Where are the best opportunities for growing the

value of my customer base?• Product development questions

– How successful has each product iteration been at driving user engagement?

– Does our product work better for some customer segments than others? If so, why?

– Does our product work better at some parts of the customer journey than others? Where?

– Where should we focus product development efforts?

• Two reasons to export our data:– So that we can answer business questions using this

data in another (more appropriate) system– So that we can use this data in other value generating

ways e.g. drive product / content recommendation, service personalisation

• Sometimes impossible,– Impossible to export granular data out of Google

Analytics• Otherwise expensive

– Enterprise web analytics products charge for export based on data volumes, making export expensive for large data sets

• Hard to house exported data– Web analytics systems generate big data volumes of

data, which can be costly to warehouse and query

Cannot answer important business questions Hard to export web analytics data to answer questions in other systems

Page 4: (Re-)introducing SnowPlow

SnowPlow takes a radically new approach to web analytics…

Traditional approach

1. What reports do we want to deliver?

2. What data do we collect to support those reports?

SnowPlow approach

1. What is all the available data that we could ever want?

2. What tools will empower our analysts to answer any possible biz Q?

Page 5: (Re-)introducing SnowPlow

…one that starts from the principal of having all the data

• All data is captured via easy-to-implement JavaScript tags• Light-weight event tracking makes it easy to capture any type of online behaviour• No limits on the number, type or categories of events or variables that can be assigned• Data is stored in Amazon S3 for scalability• Data can be enriched from other 1st and 3rd party sources. (Data can be exported and imported)

Capture all data

• Latest big data and cloud computing technologies for data storage and querying• Data is queried using Facebook-developed Apache Hive via Elastic MapReduce, making it easy to run

queries against enormous data sets• Possible to run any big data analytics toolset (e.g. Mahout, Cascalog, Microstrategy) on SnowPlow data

Powerful analytics toolset

• Data capture is via 1st party cookies• Javascript tracking and ETL source code is open source• All data is stored in SnowPlow users’ own Amazon S3 accounts

Complete data ownership

Page 6: (Re-)introducing SnowPlow

To date, SnowPlow users can query data using Apache Hive, which is great for analysts but bad for business users

Hive is a datawarehousing platform Built on top of Hadoop: scalableDeveloped at Facebook, but now widely used at e.g. Netflix, OpenX, The Globe and Mail.Enables analysts to query data using SQL

SnowPlow data is stored in a single Hive table Each line of data represents one event (e.g. page view, add-to-basket, video play, ad view etc)Each line of data includes a user_id and visit_id

ConsPros• Easy for anyone with SQL knowledge to run queries• Straightforward to aggregate data• Straightforward to ingest new data sources to

enrich the web analytics data (e.g. CRM data, media catalogues)

• Interactive UI allows for ad hoc query development sessions

• Straightforward to export aggregated data sets into other tools

• Possible to schedule jobs to populate e.g. KPI dashboard

• Command-line interface not suitable for many business people

• No in-built data visualisation capability. (Have to export data to a separate application)

• KPI dashboards can be driven from Hive analysis, but always require the integration of another application

Page 7: (Re-)introducing SnowPlow

Our priority now is to develop the toolset to answer business questions using all this analytics data

SnowPlow web analytics data

KPIs and standard reports Ad hoc analytics Operational systems e.g. recommendation engines, marketing

• Enable analysts to easily create and distribute KPI dashboards and reports including on customer lifetime value and cohort analysis

• Reports will vary in scope e.g. for management team, marketing teams, product development team etc.

• Enable analysts with more limited SQL and programming knowledge to query data e.g. pivot tables, data visualisation tools

• Statistical and machine learning tools to perform e.g. behavioural segmentations of customer base, predict likely customer lifetime value

• Use SnowPlow data in live systems e.g. in-store product recommendation…

• …or to send personalised marketing to customers to drive up customer satisfaction

Some of the analytics tools we develop will be offered as cloud-based solutions, for a

monthly subscription

Page 8: (Re-)introducing SnowPlow

Download SnowPlow from Github

Whilst many of the tools are not yet developed, we recommend installing SnowPlow today

• Start warehousing your web analytics data using SnowPlow today

• Start using the already available (free, open source) tools, particularly Apache Hive, to drive insight from your user data today

• Have a large data set ready for when our more business friendly analytics tools become available

1

2

3

Contact Keplar LLP for support and consultancy

www.keplarllp.comgithub.com/snowplow/snowplow