why use big data tools to do web analytics? and how to do it using snowplow and qubole

16
Using big data tools to analyse web analytics data Why use big data tools to analyse web analytics data? How would you use big data tools to analyse web analytics data (with Snowplow and Qubole)

Upload: yalisassoon

Post on 27-Jan-2015

104 views

Category:

Technology


1 download

DESCRIPTION

There are a number of mature web analytics products that have been on the market for ~20 years. Big data tools have only really taken off in the last 5 years. So why use big data tools mine web analytics data? In this presentation, I explore the limitations of traditional approaches to web analytics, and explain how big data tools can be used to address those limitations and drive more value from the underlying data. I explain how a combination of Snowplow and Qubole can be used to do this in practice

TRANSCRIPT

Page 1: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Using big data tools to analyse web analytics data

Why use big data tools to analyse web analytics data?How would you use big data tools to analyse web

analytics data (with Snowplow and Qubole)

Page 2: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Web event data is incredibly valuable

• It tells you how your customers actually behave (in lots of detail), and how that varies• Between different customers• For the same customers over time. (Seasonality, progress in customer journey)• How behaviour drives value

• It tells you how customers engage with you via your website / webapp• How that varies by different versions of your product• How improvements to your product drive increased customer satisfaction and lifetime value

• It tells you how customers and prospective customers engage with your different marketing campaigns and how that drives subsequent behaviour

Web analytics data should be essential to driving customer development, product development and marketing decisions

Page 3: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Deriving value from web analytics data often involves very bespoke analytics• The web is a rich and varied space! E.g.

• Bank• Newspaper• Social network• Analytics application• Government organisation (e.g. tax office)• Retailer• Marketplace

• For each type of business you’d expect different :• Types of events, with different types of associated data• Ecosystem of customers / partners with different types of relationships• Product development cycle (and approach to product development)• Types of business questions / priorities to inform how the data is analysed

Page 4: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Web analytics tools are good at delivering the standard reports that are common across different business types…• Where does your traffic come from e.g.

• Sessions by marketing campaign / referrer• Sessions by landing page

• Understanding events common across business types (page views, transactions, ‘goals’) e.g.• Page views per session• Page views per web page • Conversion rate by traffic source• Transaction value by traffic source

• Capturing contextual data common people browsing the web• Timestamps• Referer data• Web page data (e.g. page title, URL)• Browser data (e.g. type, plugins, language)• Operating system (e.g. type, timezone)• Hardware (e.g. mobile / tablet / desktop, screen resolution, colour depth)

Page 5: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

…but not at enabling the high-value bespoke analytics

• What is the impact of different ad campaigns and creative on the way users behave, subsequently? What is the return on that ad spend?

• How do visitors use social channels (Facebook / Twitter) to interact around video content? How can we predict which content will “go viral”?

• How do updates to our product change the “stickiness” of our service? ARPU? Does that vary by customer segment?

Page 6: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

That is because there are significant limitations in the way traditional web analytics programmes handle:

• Sample-based (e.g. Google Analytics)

• Limited set of events e.g. page views, goals, transactions

• Limited set of ways of describing events (custom dim 1, custom dim 2…)

Data collection Data processing Data access

• Data is processed ‘once’• No validation• No opportunity to

reprocess e.g. following update to business rules

• Data is aggregated prematurely• Only particular

combinations of metrics / dimensions can be pivoted together (Google Analytics)

• Only particular type of analysis are possible on different types of dimension (e.g. sProps, eVars, conversion goals in SiteCatalyst

• Data is either aggregated (e.g. Google Analytics), or available as a complete log file for a fee (e.g. Adobe SiteCatalyst)

• As a result, data is siloed: hard to join with other data sets

Page 7: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

We built Snowplow to address those limitations and enable high value, bespoke analytics on web event data

Big data storeData pipeline

Snowplow is a data pipeline:• Captures data from website via Javascript tags• Validates, cleans, and enriches the incoming data (using Hadoop)• Loads the cleaned / enriched data store into a big data store for

analysis e.g. S3 where it can be analysed using big data tools e.g. Qubole

Page 8: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Understanding the technology that powers the Snowplow data pipeline

The Snowplow data pipeline consists of five loosely coupled modules:

Page 9: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Understanding the technology that powers the Snowplow data pipeline

The Snowplow data pipeline consists of five loosely coupled modules:

Trackers generate event data• Javascript tracker for collecting data client-side• No-JS / pixel tracker (e.g. for email marketing)• Server side trackers (e.g. Lua tracker). Python / Ruby / Java / Scala on roadmap• Mobile trackers (iOS, Android on the roadmap…) • Internet of things (e.g. Arduino tracker)

Page 10: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Understanding the technology that powers the Snowplow data pipeline

The Snowplow data pipeline consists of five loosely coupled modules:

Collectors receive data and write it to a queue for processing• Cloudfront collector writes data to S3• Clojure collector sets 3rd party cookie writes to S3• Scala RT collector sets 3rd party cookie writes to S3 AND Kinesis

Page 11: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Understanding the technology that powers the Snowplow data pipeline

The Snowplow data pipeline consists of five loosely coupled modules:

Enrichment validates and enriches the data• Validates e.g. checks expected fields are set for each event type• Enrichments e.g. categorising referrers (search / social), inferring location from IP

• Hadoop-based enrichment module (easy reprocessing of data)• Kinesis-based enrichment module (real time processing) in development

Page 12: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Understanding the technology that powers the Snowplow data pipeline

The Snowplow data pipeline consists of five loosely coupled modules:

Storage – make data available for analysis• Store data in Amazon S3 for processing using big data tools e.g. Qubole• Also support storage in Amazon Redshift / PostgreSQL for analysis using

traditional BI tools

Page 13: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

So what does Snowplow data look like?

• A single table

• One line of data per event

• Fat table: 98 different fields (and counting)…

Type of field Example field(s) Description

User ID domain_userid, network_userid

Fields to identify user performing browsing. 1st and 3rd party cookie IDs, browser fingerprints, IP address and separate field for setting to custom value all available

Web page page_urlpath Fields that describe the web page the event occurred on, including document size, URL, title

Traffic source mkt_source, refr_source Fields that relate to indicate the source of traffic. Snowplow includes fields that can be set via utm parameters and others based on the referrer

Event (rather than context)

event, se_action, tr_total Fields that relate to a specific event (e.g. transaction total)

User tech setup

br_type, os_name, dvce_type, br_viewheight

Fields that describe the user’s browser / OS / device setup

… … …

Page 14: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

How do you analyse Snowplow data with Qubole?

• Common approach: use Hive on Qubole (could also use Pig or other Hadoop-based jobs)

• Create the events table (incl. recovering partitions)

• Write highly bespoke queries directly against the complete events table

Page 15: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

DEMO!

Page 16: Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Performing more sophisticated analysis

• Unfortunately there’s not time on this webinar to do a deeper demo…

• …however, there are resources available, in particular, the Snowplow Analytics Cookbook - http://snowplowanalytics.com/analytics/index.html