digital analytics with r - sydney users of r forum - may 2015

45
Sydney Users of R Forum Digital Analytics with R

Upload: johann-de-boer

Post on 28-Jul-2015

233 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Digital analytics with R - Sydney Users of R Forum - May 2015

Sydney Users of R Forum

Digital Analytics with R

Page 2: Digital analytics with R - Sydney Users of R Forum - May 2015
Page 3: Digital analytics with R - Sydney Users of R Forum - May 2015

Google Analytics User Conference 2014 #GAUC2014

Google Analytics

Page 4: Digital analytics with R - Sydney Users of R Forum - May 2015

What is Google Analytics?▪ Used to measure and analyse your

online audience to gain insights and drive continuous improvement

▪ The most widely adopted web analytics platform

▪ Launched in November 2005

▪ Free and premium versions available

Page 5: Digital analytics with R - Sydney Users of R Forum - May 2015

Interfacing with Google Analytics

Data Collection

APIs

Management API

Reporting APIs

Big Query

Page 6: Digital analytics with R - Sydney Users of R Forum - May 2015

Platform overview▪ Collection APIs

▸ Collect and send data to Google Analytics for storage and processing

▪ Management API▸ Configure Google Analytics accounts

▸ Import off-line data

▪ Reporting APIs▸ Query and extract processed data from

Google Analytics for reporting

Page 7: Digital analytics with R - Sydney Users of R Forum - May 2015

Google Tag Manager

Data Collection – overview

Measurement Protocol

iOS SDK

Android SDK

analytics.js

Page 8: Digital analytics with R - Sydney Users of R Forum - May 2015

Management API▪ Import off-line data

▸ Ecommerce refunds, advertising costs

▸ Product, content, campaign, and customer metadata

▪ Get and set configuration details▸ List accounts, properties and views

▸ Goals, filters, and view settings

▸ User permissions, segment definitions

Page 9: Digital analytics with R - Sydney Users of R Forum - May 2015

Reporting APIs▪ Core Reporting API

▸ Reporting over a date range

▪ Multi-Channel Funnels Reporting API▸ The marketing journey of customers

▪ Real-Time Reporting API▸ What is happening right now

▪ Metadata API▸ For building your own reporting tools

Page 10: Digital analytics with R - Sydney Users of R Forum - May 2015

Why use the Reporting APIs?▪ Surpass the limits of the GA user

interface

▪ Automate repetitive tasks

▪ Perform more powerful data analysis

▪ Easily reproduce and reuse analyses

▪ Combine with other data sources

▪ Develop innovative tools

▪ Feed into other tools and applications

Page 11: Digital analytics with R - Sydney Users of R Forum - May 2015

API concepts

Your application

Reporting API

Query(GET)

Response(JSON)

Page 12: Digital analytics with R - Sydney Users of R Forum - May 2015

Google Analytics User Conference 2014 #GAUC2014

ganalytics for R

Page 13: Digital analytics with R - Sydney Users of R Forum - May 2015

Goals of the ‘ganalytics’ R package

▪ Access to all Google Analytics features

▪ Easy to use and familiar to R users

▪ Abstract away from API technicalities

▪ Interactively build and manipulate queries

▪ Detect and correct query definition errors

▪ Integrate with R classes, functions and other packages

Page 14: Digital analytics with R - Sydney Users of R Forum - May 2015

Getting started

Page 15: Digital analytics with R - Sydney Users of R Forum - May 2015

Installation and “Hello World!”

devtools::install_github("jdeboer/ganalytics")

library(ganalytics)

creds <- GoogleApiCreds( "[email protected]", # Your Google username "client_secret.json" # From Google APIs console)

# Default view of first account and propertyview <- NA

query <- GaQuery(view, creds)

GetGaData(query)

Page 16: Digital analytics with R - Sydney Users of R Forum - May 2015

“Hello World!” of ganalytics

Page 17: Digital analytics with R - Sydney Users of R Forum - May 2015

Google Analytics User Conference 2014 #GAUC2014

Understanding Google Analytics

queries and reports

Page 18: Digital analytics with R - Sydney Users of R Forum - May 2015

Google Analytics Data

Hit SessionUser

Metrics:▪ 3 users▪ 14 sessions▪ 53 hits▪ ...

Dimensions▪ City▪ Page title▪ Device type▪ Product SKU▪ ...

Property / View

Page 19: Digital analytics with R - Sydney Users of R Forum - May 2015

Filtering hits

Session segmentation

User segmentation

Page 20: Digital analytics with R - Sydney Users of R Forum - May 2015

Filters and segments

▪ Segmentation selects which users or sessions to include before summarising.

▸ Segments can optionally be defined by a sequence of user interactions.

▪ Filters select which rows to keep in the resulting summary.

Page 21: Digital analytics with R - Sydney Users of R Forum - May 2015

Query and response

▪ From which Google Analytics account / property / view?

▪ Over what date range?

▪ Which metrics to summarise and over what dimensions?

▪ Optional segment to apply before summarising

▪ Optional filter to select rows from resulting summary table

Page 22: Digital analytics with R - Sydney Users of R Forum - May 2015

Google Analytics User Conference 2014 #GAUC2014

Defining your query and getting data into

R with ganalytics

Page 23: Digital analytics with R - Sydney Users of R Forum - May 2015

Example queryquery <- GaQuery( view = 94605596, startDate = "2015-01-01", endDate = "2015-01-31", metrics = c("users", "sessions", "pageviews"), dimensions = c("deviceCategory", "dayOfWeekName"), filter = Expr("deviceCategory", "~=", "mobile|tablet"), segment = Expr("country", "==", "Australia"))

response <- GetGaData(query)

Page 24: Digital analytics with R - Sydney Users of R Forum - May 2015

Example query response deviceCategory dayOfWeekName users sessions pageviews1 mobile Friday 429 500 8162 mobile Monday 273 329 4663 mobile Saturday 339 399 6404 mobile Sunday 256 301 4875 mobile Thursday 420 475 6896 mobile Tuesday 314 358 5427 mobile Wednesday 304 357 5718 tablet Friday 209 235 4249 tablet Monday 160 190 35510 tablet Saturday 195 225 43511 tablet Sunday 128 151 33412 tablet Thursday 217 266 45513 tablet Tuesday 169 194 37414 tablet Wednesday 161 178 356

Page 25: Digital analytics with R - Sydney Users of R Forum - May 2015

Conditional expressionsexpr1 <- Expr(variable, comparator, value)expr2...

expr3 <- expr1 | !expr2expr5 <- expr3 & expr4expr6 <- xor(expr1, expr2)

Page 26: Digital analytics with R - Sydney Users of R Forum - May 2015

From R to an API calle1 <- Expr('keyword', '=@', 'buy')e2 <- Expr('city', '=~', '^(Sydney|Melbourne)$')e3 <- Expr('deviceCategory', '==', 'tablet')

e4 <- e1 & (e2 | e3)

as(e4, 'character')

# ga:keyword=@buy;ga:city=~^(sydney|melbourne)$,ga:deviceCategory==tablet

Page 27: Digital analytics with R - Sydney Users of R Forum - May 2015

How does traffic from desktop, mobile and tablet users change

throughout the day and over the week?

Average number of visits per hour and day – split by desktop, mobile and

tablet

Page 28: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 1: Define the queryquery <- GaQuery(view = '98765432')

DateRange(query) <- c("2014-01-01", "2014-12-31")

Dimensions(query) <- c( "deviceCategory", "dayOfWeekName", "hour", "date")

Metrics(query) <- "sessions"

MaxResults(query) <- 30000

Page 29: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 2: Get data and summarisedata <- GetGaData(query)

library(dplyr)

weekly_data <- tbl_df(data) %>% group_by(deviceCategory, dayOfWeekName, hour) %>% summarise(avg_sessions_per_day = mean(sessions))

# deviceCategory dayOfWeekName hour avg_sessions_per_day#1 desktop Friday 0 83.83673#2 desktop Friday 1 81.79167#3 desktop Friday 2 77.29167#4 desktop Friday 3 80.35417#5 desktop Friday 4 91.60417#.. ... ... ... ...

Page 30: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 3: Plot the summarylibrary(ggplot2)

ggplot(weekly_data) + aes( x = hour, y = avg_sessions_per_day, fill = deviceCategory, group = deviceCategory ) + geom_area(position = "stack") + facet_wrap(~dayOfWeekName)

Page 31: Digital analytics with R - Sydney Users of R Forum - May 2015

ggplot2 + dplyr + ganalytics =

Page 32: Digital analytics with R - Sydney Users of R Forum - May 2015

Which permutation of two different device types are signed-in users most likely to transition between?

Define sequences to segment users by permutations of device type and

compare against total users of each type of device.

Page 33: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 1: Setup query (needs UserID)Dimensions(query) <- NULL

Metrics(query) <- c("users")

DateRange(query) <- c("2015-01-01", "2015-03-31")# Maximum of 90 days for user-based segmentation

Page 34: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 2: Define sequence segmentsdevices <- list( desktop = Expr("deviceCategory", "==", "desktop"), mobile = Expr("deviceCategory", "==", "mobile"), tablet = Expr("deviceCategory", "==", "tablet"))

device_sequences <- lapply(devices, function(from) { lapply(devices, function(to) { GaSegmentCondition( GaSequenceCondition( GaStartsWith(from), GaPrecedes(to) ), scope = "users" ) })})

Page 35: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 3: Get data for each segmentdata <- lapply(seq_along(device_sequences), function(from_index){ from_name <- names(device_sequences)[from_index] from_seq <- device_sequences[[from_index]] lapply(seq_along(from_seq), function(to_index){ to_name <- names(from_seq)[to_index] GaSegment(query) <- from_seq[[to_index]] df <- GetGaData(query) df <- cbind(from = from_name, to = to_name, df) })})

data <- unlist(data, recursive = FALSE)data <- do.call(rbind, data)

Page 36: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 4: Summarise the datalibrary(dplyr)

benchmarks <- data %>% subset(from == to) %>% select(from, benchmark = users)

data <- data %>% subset(from != to) %>% inner_join(benchmarks, by = "from") %>% group_by(from, to) %>% summarise(transitioned = users / benchmark)

# from to transitioned# 1 desktop mobile 0.05565673# 2 desktop tablet 0.02547988# 3 mobile desktop 0.16147748# 4 mobile tablet 0.03634899# 5 tablet desktop 0.15945559# 6 tablet mobile 0.07034457

Page 37: Digital analytics with R - Sydney Users of R Forum - May 2015

Step 5: Plot the resultslibrary(ggplot2, scales)

ggplot(data) + aes(x = from, y = to, fill = transitioned) + geom_tile() + scale_fill_continuous(labels = percent) + theme_minimal(base_size = 18) + theme(axis.title.y = element_text(angle = 0))

Page 38: Digital analytics with R - Sydney Users of R Forum - May 2015

Google Analytics User Conference 2014 #GAUC2014

Using ganalytics with other

Google Analytics APIs

Page 39: Digital analytics with R - Sydney Users of R Forum - May 2015

Reporting APIs

GaQuery() Core Reporting API

Standard reporting over a date range

McfQuery() Multi-channel funnels reporting API

Requires conversion tracking to be set up (goals or ecommerce)

RtQuery() Real-time reporting API

Request developer access first

Page 40: Digital analytics with R - Sydney Users of R Forum - May 2015

Other APIs

MetaData API

Internal package use only

Management API

GaAccounts()GaUserSegments()

Work in progress so please use with caution!

Google Tag Manager API

GtmAccounts()

Page 41: Digital analytics with R - Sydney Users of R Forum - May 2015

Get involved!Open source R package development is

fun!

Page 42: Digital analytics with R - Sydney Users of R Forum - May 2015

Remaining work to complete▪ Package documentation

▪ Adding examples and demos

▪ Further error handling and messages

▪ Testing and finding bugs to fix

▪ Submitting to CRAN

▪ Code optimisation and tidy up

▪ Feature enhancements and improvements

▪ Occasional updates for API changes

▪ Internationalisation

Page 43: Digital analytics with R - Sydney Users of R Forum - May 2015

Enhancement suggested by HadleyInstead of:Expr("source", "!=", "google")

Write:Expr(source != "google")

Instead of:Expr("medium", "[]", c("cpc", "display"))

Write:Expr(medium %in% c("cpc", "display")

Page 44: Digital analytics with R - Sydney Users of R Forum - May 2015

Enhancement suggested by HadleyInstead of:Expr("pageviews", ">", 5)

Write:Expr(pageviews > 5)

Instead of:Expr(dateOfSession, "<>", c("2015-01-01", "2015-01-31"))

Write:Expr(dateOfSession ... ? ...) What ideas

do you have?

Page 45: Digital analytics with R - Sydney Users of R Forum - May 2015

How to get involved ...

▪ Go to the Github site:github.com/jdeboer/ganalytics

▪ Submit feedback via the issues tracker

▪ Get in touch on twitter: @johannux

▪ Download these slides from:http://lovesdata.co/0QRjR