snowplow at the heart of busuu's data & analytics infrastructure

28
Snowplow Meetup London, February 2017

Upload: giuseppe-gaviani

Post on 12-Apr-2017

281 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Snowplow Meetup

London, February 2017

2

busuu is the world’s leading social network for language learning

Language courses Social network

+• Access to native speakers• Peer to peer text corrections

• High quality courses in 12 languages• Beginner to advanced intermediate level

1 2

How does busuu work?

Most important vocabulary

Key grammar

Practice with native speakers

Faster fluency

busuu is a complete self-study and language practice environment

3

busuu 2016

What sort of data do we use?

● Front end tracking data

● Progress data

● Backend db data

● Third party data

Why did we look at using Snowplow

busuu 2016

Problems

My data says X, why does yours say Y?

Cloudwatch Alert!

“Why can’t i find the results of my A/B test till tomorrow again?

“Oh my god, do we really have to put yet another tracker in?”

busuu 2016

Scalability

busuu 2016

Batch vs Real time

busuu 2016

Reconciliation

busuu 2016

Then we thought...

Can we use snowplow

framework for more than just

analytics?

busuu 2016

Too many SKDs and trackers

busuu 2016

Snowplow delivery

How do we get Snowplow to deliver the events to everybody/thing that needs it, instead of adding more trackers to the frontend

Tech Stack

busuu 2016

Data Collection Phase

Events Events Backend Data

API Calls

Yet to be done

Scala StreamCollector

busuu 2016

Processing

15

Stream Enrich

Raw Data Enriched Data

busuu 2016

Processing

Validation● Customised busuu event

schemas● Different based on environment

Enrichments● ip lookup● Forex Conversion

busuu 2016

Distribution

17

Results back to App/SiteMachine Learning

Models

Yet to be done

busuu 2016

Plug & Play Integrations

18

● One source of truth● Scalability● Third party systems can be added very quickly

busuu 2016

Lambda?

19

Parse through each field of enriched data looking for custom schema

name

One lambda function per type of data and

per integration

Relay required data to third party service

through REST api or given python client

Problem areas

20 busuu 2016

busuu 2016

Main implementation bugbears

1. Strict Multi Platform Schemas

2. Offline mode delay

3. Device vs Collector Timestamps

Future Improvements

22 busuu 2016

busuu 2016

Future projects

● Live A/B test trains & results

● Live machine learning results in app

● Automated alerting on complex company metrics.

Thanks!Bruce Pannaman

busuu 2016

Frontend event data

● Track and find issues in user behavior.● Insight into product usage● A/B Testing● CRM cohorting● In-app message cohorting

busuu 2016

Progress Data

● What has a user learnt?● How is our content performing● What is their language level?● Vocabulary lists

busuu 2016

Backend Data

● What are the user’s attributes● Social relationships (friends)● Writing exercises and comments

busuu 2016

Third Party

● Payments● CRM performance● App store metadata (review etc.)● PPC data