mind candy giaf 4 presentation

27
Capturing Events at Mind Candy Tim Bennett – Data Engineer at Mind Candy Please visit: http://slid.es/bennetimo/capturing- events-at-mind-candy

Upload: gamesanalytics

Post on 28-Nov-2014

444 views

Category:

Technology


0 download

DESCRIPTION

This presentation features all three of Mind Candy's talks from the 4th Games Industry Analytics Forum in February 2014. The presentations included are: - Capturing Events at Mind Candy Tim Bennett - Statistics on a MacBook Liz Macfie - When Things Go Wrong Luis Vicente

TRANSCRIPT

Page 1: Mind Candy GIAF 4 Presentation

Capturing Events at Mind Candy

Tim Bennett – Data Engineer at Mind CandyPlease visit: http://slid.es/bennetimo/capturing-events-at-mind-

candy

Page 2: Mind Candy GIAF 4 Presentation

Statistics on a Macbook

Liz Macfie – Data Scientist at Mind Candy

Page 3: Mind Candy GIAF 4 Presentation

What is a Data Scientist?

Data Scientist (n.): Person who is better at statistics than a software engineer and better at software engineering than a statistician.

Josh Wills (paraphrased)

Page 4: Mind Candy GIAF 4 Presentation

What is a Data Scientist atMind Candy?

Data EngineersData Engineers Analysts

Page 5: Mind Candy GIAF 4 Presentation

What is a Data Scientist atMind Candy?

Data EngineersData Engineers Analysts

maintaining data structures and back-end systems

pulling data from various external sources

communicating the needs of the product managers and analysts

Page 6: Mind Candy GIAF 4 Presentation

What is a Data Scientistat Mind Candy?

Data EngineersData Engineers Analysts

maintaining data structures and back-end systems

pulling data from various external sources

communicating the needs of the product managers and analysts

determining which questions are needing to be answered

carrying out statistically relevant analysis of the data

providing support for internal tools and systems

Page 7: Mind Candy GIAF 4 Presentation

What is a Data Scientistat Mind Candy?

Page 8: Mind Candy GIAF 4 Presentation

Recent HighlightsDeciding which events to capture

Moshi Monsters Village

Page 9: Mind Candy GIAF 4 Presentation

Recent HighlightsDeciding which events to capture

Moshi Monsters Village

RescuingMoshlings

FarmingCrops

SendingGifts

MakingIAPs

CompletingQuests

BuildingHomes

InvitingFriends

ConvertingCurrencies

Page 10: Mind Candy GIAF 4 Presentation

Recent HighlightsDeciding which events to capture

trade-off between getting all possible information and bloating the app

receive prioritised questions from analysts and determine the data needed to answer them

design a data structure to meet the analysts' needs

source any required external data and place it in suitable databases

liase with developers to test all events

Page 11: Mind Candy GIAF 4 Presentation

Recent HighlightsCreating dashboards - real-time

key metrics for display on large screens around the office

allow product managers to make immediate decisions about content delivery and strategies

give immediate feedback to management

Page 12: Mind Candy GIAF 4 Presentation

Recent HighlightsCreating dashboards - aggregated

aggregated metrics for display on web-based dashboards

allow analysts to answer more complex queries, building filters and tables

provide longer-term analysis of the game's performance across various segments

Page 13: Mind Candy GIAF 4 Presentation

When Things Go Wrong

Luis Vicente – Data Engineer at Mind Candy

Page 14: Mind Candy GIAF 4 Presentation

Problems....

Evan and his friends were enjoying their life in Redshift...

... but there were dark clouds approaching.

Page 15: Mind Candy GIAF 4 Presentation

Problems...Being prepared (or discovering you're not)

We thought we had designed our eventing system very carefully and would not have many problems

To quote our CPO: “If our servers aren't breaking, then we probably aren't growing fast enough”.

Page 16: Mind Candy GIAF 4 Presentation

Problems...Being prepared (or discovering you're not)

We thought we had designed our eventing system very carefully and would not have many problems

To quote our CPO: “If our servers aren't breaking, then we probably aren't growing fast enough”.

Then our servers broke.

Page 17: Mind Candy GIAF 4 Presentation

Problems...Investigating the reasons why

How were we using Redshift?

BI Engineers Deep analysis dashboards Real-time dashboards The eventing system

All using Redshift heavily.

Page 18: Mind Candy GIAF 4 Presentation

Problems...BI Engineers

They have to answer questions, which need data to be stored in the data warehouse.

How dangerous could a BI Engineer be....?

Page 19: Mind Candy GIAF 4 Presentation

Problems...Deep Analysis

We use Qlikview for deep analysis

Daily incremental updates of Qlikview datasets at midnight

Small incremental updates during the day to provide up-to-date metrics

We wanted to stop daily refreshes....but couldn't just then

Page 20: Mind Candy GIAF 4 Presentation

Problems...Real-time Madness

The real-time dashboard started as a generic tool

But product managers love real-time dashboards....

At first.... there was 1 dashboard with 6 metrics

Now.... there are 5 dashboards with 15-20 metrics each

Page 21: Mind Candy GIAF 4 Presentation

Problems...Eventing System

All these dashboards and users need events

Whirlpool was responsible for storing them in the data warehouse (DWH)

But we were worried about duplicates in the data warehouse...

...so we stored them in a STAGING table, then used an UPSERT job to move them to their final destination

Page 22: Mind Candy GIAF 4 Presentation

Problems...Why this concern over duplicates?

Our previous DWH was highly structured, with primary keys and uniqueness

Redshift is not an SQL database...

...and uniqueness is not enforced

So you can have duplicates!

Page 23: Mind Candy GIAF 4 Presentation

What happened?

Page 24: Mind Candy GIAF 4 Presentation

What happened?

After two days working with a real load, events weren't being stored in the DWH.

We started seeing these kind of error messages:

Serializable isolation violation on table - 111594, transactions forming the cycle are: 2604845, 2604854, 2604912 (pid:2053)

We began running UPSERT jobs every five minutes, but the time it took to do this kept increasing...until they stopped working altogether.

Page 25: Mind Candy GIAF 4 Presentation

Everything OK now?

We don't have those problems anymore.... but we do have other problems!

We are still executing a huge number of queries, and Redshift can only handle 5 in parallel...

...well, you can configure it to handle more, but they start fighting one another for resources!

Almost!

Page 26: Mind Candy GIAF 4 Presentation

What are we going to do? Our real-time dashboards will stop using the

DWH as their data source

Since we still need real-time data, we will build a “lambda architecture” around our eventing system

Page 27: Mind Candy GIAF 4 Presentation

And then everyone is happy!