transforming mobile push notifications with big data

55
Transforming Mobile Push Notifications with Big Data Dennis Waldron , Data Engineering Pablo Varela , Systems Engineering

Upload: plumbee

Post on 07-Jul-2015

585 views

Category:

Technology


0 download

DESCRIPTION

How we at Plumbee collect and process data at scale and how this data is used to send relevant mobile push notifications to our players to keep them engaged. Presented as part of a Tech Talk: http://engineering.plumbee.com/blog/2014/11/07/tech-talk-push-notifications-big-data/

TRANSCRIPT

Page 1: Transforming Mobile Push Notifications with Big Data

Transforming Mobile Push Notifications with Big DataDennis Waldron, Data EngineeringPablo Varela, Systems Engineering

Page 2: Transforming Mobile Push Notifications with Big Data

Who is Plumbee?

● 12.8M Installs● 209K Daily Active Users● 818K Monthly Active Users

● Social Games Studio● Mirrorball Slots & Bingo● Facebook Canvas, iOS

Page 3: Transforming Mobile Push Notifications with Big Data

Data Providers

Inhouse data = 99.9% of all data

In Total:

● 98TB (907 days of data)● All stored in Amazon S3

Daily:

● 78GB compressed● ~450M events/day● 4,800 events/second (peak)

Page 4: Transforming Mobile Push Notifications with Big Data

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Log Aggregators

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Analytics (SQL Queries)

Architecture - Overview

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 5: Transforming Mobile Push Notifications with Big Data

Amazon Web Service

End Users (Desktop & Mobile)

Application/Game Servers

● Collect everything!

● RPC events intercepted by annotated endpoints. (Requests)

● All mutating state changes recorded:○ DynamoDB, MySQL, Memcache

(Blobs Updates)● Custom Telemetry (Other):

○ Client: click tracking, loading time statistics, GPU data...

○ Server: promotions, transactions, Facebook user data...

Game Data

RPC

OTHER 15%

77%

9%

MySQL

MemCache

GENERATES

DynamoDB

Page 6: Transforming Mobile Push Notifications with Big Data

Game Data - Example RPC Endpoint Annotation

/** * Example annotation */@SQSRequestLog(requestMessage = SpinRequest.class)@RequestMapping(“/spin”)public SpinResponse spin(SpinRequest spinRequest) {

}

Page 7: Transforming Mobile Push Notifications with Big Data

Example Event - userStats● All events are recorded in JSON.● Structure:

○ Headers○ Categorization Data (metadata)○ Payload (message)

● Important Headers:○ timestamp○ testVariant○ plumbeeUid

Page 8: Transforming Mobile Push Notifications with Big Data

Analytics (SQL Queries)

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Architecture - Collection

Log Aggregators

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 9: Transforming Mobile Push Notifications with Big Data

Data Collection (I) - PUT

Application/Game Servers

Events (JSON)

SQS Queue

Log Aggregators

Producers Consumers

What is SQS (Simple Queue Service)?

A cloud-based message queue for transmitting messages between producers and consumers

SQS Provides:

● ACK/FAIL semantics● Unlimited number of messages● Scales transparently● Buffer zone

Page 10: Transforming Mobile Push Notifications with Big Data

What is Apache Flume?

A distributed, reliable, and available service for efficiently collecting, aggregating, and

moving large amounts of log data

SQS Queue

Apache Flume

Consumers

Data Collection (II) - GET

Amazon S3 (Simple Storage Service)

S3 Data:

● Partitioned by: date / type / sub_type● Compressed with: Snappy● Aggregated in 512MB chunks

Page 11: Transforming Mobile Push Notifications with Big Data

● Pluggable component architecture● Durability via transactions● File channel use Elastic Book Store (EBS) volumes (network attached storage)

○ Protects against Hardware failure

● SQS Flume Plugin: https://github.com/plumbee/flume-sqs-source

Data Collection (III) - Flume

Flume Agent

Source(Custom)

Sink(HDFS)

SQS Queue

Channel(File Based)

S3 Bucket

Transactions

A + B + C = Flow

A B C

Page 12: Transforming Mobile Push Notifications with Big Data

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Analytics (SQL Queries)

Architecture - Processing

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 13: Transforming Mobile Push Notifications with Big Data

● Daily activity● Orchestrated by Amazon DataPipeline● Includes generation of reports● Configured with JSON

What is DataPipeline?

A cloud-based data workflow service that helps you process and move data between

different AWS services

Extract, Transform, Load

SCH

EDU

LEC

OM

MA

ND

RES

OU

RC

E

Page 14: Transforming Mobile Push Notifications with Big Data

What is Elastic Map Reduce?

Cloud-based MapReduce implementation to process vast amounts of data built on top of

the open-sourced Hadoop framework.

Two phases:

● Map() Procedure -> Filtering & Sorting● Reduce() -> Summary operation

Extract & Transform (I)Penguin

HorseCake

Cake

Penguin

Penguin

Penguin

Horse

Horse

CakeCake

HorseHorseHorse

PenguinPenguinPenguinPenguin

Cake: 2 Horse: 3

REDUCE()

MAP()

RA

W D

ATA

SOR

TED

QU

EUES

RES

ULT

Penguin:4

Page 15: Transforming Mobile Push Notifications with Big Data

What is Hive?

An open-sourced Apache project with provides a SQL-Like interface to summarize, query and

analysis large datasets by leveraging Hadoop’s MapReduce infrastructure.

● Not really SQL, HQL -> HiveQL● No transactions, materialized views,

limited subquery support, ...

Extract & Transform (II)

SELECT plumbeeuid, COUNT(*) AS spins FROM eventlog

-- Partitioned data access WHERE event_date = '2014-11-18' AND event_type = 'rpc' AND event_sub_type = 'rpc-spin' -- Aggregation GROUP BY plumbeeuid;

Table: Eventlog● Mounted on top of raw data● SerDe provides JSON parsing● Target data via partition filters

Page 16: Transforming Mobile Push Notifications with Big Data

● Hive has limitations! ○ Speed, JSON

● Most of our transformations use:

Streaming MapReduce Jobs

What is Streaming?

“A Hadoop utility that allows you to create and run MapReduce jobs using any

executable script as a mapper or reducer”

Extract & Transform (III)

for line in sys.stdin: data = json.loads(line) print data['plumbeeUid'] + '\t' + 1

results = defaultdict(int)

for line in sys.stdin: plumbee_uid, count = line.split('\t') results[plumbee_uid] += int(count)

print results

Emits, Key value Pairs 466264 => 1, 376166 => 1 983131 => 1, 466264 => 1

Hadoop sorts and shuffles the data making sure matching keys are processed by a single reducer!

JSON rpc-spin Data

Result:{ 466264: 2, 376166: 1, 983131: 1 }

map()

reduce()

Page 17: Transforming Mobile Push Notifications with Big Data

Results

EMR Transformed data:

● Referred to as aggregates● Stored in S3● Accessible via EMR cluster

Load (I) - Problem

Raw S3 JSON Data Aggregated Data

EMR Transformation(Hive & Streaming Jobs)

Problem

● We don’t run long-lived EMR clusters.

EMR requires:

● Specialists knowledge● Is slow, processing and booting “offline”.

5.4TB

Use Amazon Redshift for fast “online” data access

Page 18: Transforming Mobile Push Notifications with Big Data

What is Redshift?

A column-oriented database which uses Massive Parallel Processing (MPP) techniques

to support analytics style SQL based workloads across large datasets.

Power comes from:

● Query parallelization● Column-oriented design

Redshift Provides:

● Low latency JDBC and ODBC access● Fault Tolerance● Automated Backups

Load (II) - Redshift

Redshift (x3 nodes): 0.33sEMR (x20 nodes): 135.46s

Page 19: Transforming Mobile Push Notifications with Big Data

Load (II) - Column-Oriented Databases

ID First Name Last Name Country

1 Penguin Situation GB

2 Cheese Labs US

3 Horse Barracks GB

ID First Name Last Name Country

1 Penguin Situation GB

2 Cheese Labs US

3 Horse Barracks GB

Row-oriented Database - MySQL

Column-oriented Database - Redshift

● East to add/modify records● Could read irrelevant data.● Great for fast lookups (OLTP)

● Only read in relevant data● Adding rows requires multiple

updates to column data.● Great for aggregation queries

(OLAP)

Page 20: Transforming Mobile Push Notifications with Big Data

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Analytics (SQL Queries)

Architecture - Revisit

Log Aggregators

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 21: Transforming Mobile Push Notifications with Big Data

Q&A

Page 22: Transforming Mobile Push Notifications with Big Data

Targeted Push Notifications

Page 23: Transforming Mobile Push Notifications with Big Data

Mirrorball Slots: Kingdom of Riches

Page 24: Transforming Mobile Push Notifications with Big Data

Mirrorball Slots: Challenges

● recurring timed event● collect symbols from non-winning

spins● get free coins if enough symbols are

collected

Page 25: Transforming Mobile Push Notifications with Big Data

Some players ask for notifications

Page 26: Transforming Mobile Push Notifications with Big Data

Use Cases

Page 27: Transforming Mobile Push Notifications with Big Data

Building blocks

Page 28: Transforming Mobile Push Notifications with Big Data

Data Collection

Page 29: Transforming Mobile Push Notifications with Big Data

Players

Data Collection

Amazon Redshift

Page 30: Transforming Mobile Push Notifications with Big Data

Architecture - Overview

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 31: Transforming Mobile Push Notifications with Big Data

User Targeting

Page 32: Transforming Mobile Push Notifications with Big Data

User targeting

Run SQL queries directly against Redshift

Amazon Redshift User Segment

SQL Query

Page 33: Transforming Mobile Push Notifications with Big Data

User targeting: Query example

-- Target all mobile users SELECT plumbee_uid, arnFROM mobile_user

Page 34: Transforming Mobile Push Notifications with Big Data

User targeting: Query example (II)-- Target lapsed users (1 week lapse)

SELECT plumbee_uid, arnFROM mobile_userWHERE last_play_time < (now - 7 days)

Page 36: Transforming Mobile Push Notifications with Big Data

Architecture - Mobile Push

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 37: Transforming Mobile Push Notifications with Big Data

Amazon Simple Notification Service

Page 38: Transforming Mobile Push Notifications with Big Data

What is SNS?

“Amazon Simple Notification Service (Amazon SNS) is a fast, flexible, fully managed push messaging service”

Page 39: Transforming Mobile Push Notifications with Big Data

Amazon SNS

Page 40: Transforming Mobile Push Notifications with Big Data

Amazon SNS

Page 41: Transforming Mobile Push Notifications with Big Data

Amazon SNS: Device Registration

Game Servers SQS Analytics Queue Amazon RedshiftPlayers

Amazon SNS

register deviceevent

register

Page 42: Transforming Mobile Push Notifications with Big Data

Amazon SNS: ARN Retrievalprivate String getArnForDeviceEndpoint(String platformApplicationArn, String deviceToken) {

CreatePlatformEndpointRequest request =

new CreatePlatformEndpointRequest()

.withPlatformApplicationArn(platformApplicationArn)

.withToken(deviceToken);

CreatePlatformEndpointResult result = snsClient.createPlatformEndpoint(request);

return result.getEndpointArn();

}

Page 43: Transforming Mobile Push Notifications with Big Data

Amazon SNS: Analytics Eventprivate String registerEndpointForApplicationAndPlatform( final long plumbeeUid,

String platformARN, String platformToken) {

final String deviceEndpointARN = getArnForDeviceEndpoint( platformARN, platformToken);

sqsLogger.queueMessage( new HashMap<String, Object>() {{

put("notification", "register");

put("plumbeeUid", plumbeeUid);

put("provider", platformName);

put("endpoint", deviceEndpointARN );

}}, null);

return deviceEndpointARN;

}

Page 44: Transforming Mobile Push Notifications with Big Data

Amazon SNS: Mobile Pushprivate void publishMessage(UserData userData, String jsonPayload) { amazonSNS.publish(new PublishRequest() .withTargetArn( userData.getEndpoint()) .withMessageStructure( "json") .withMessage( jsonPayload));}

{"default": "The 5 day Halloween Challenge has started today! Touch to play NOW!"}

Payload example

Page 45: Transforming Mobile Push Notifications with Big Data

Architecture - Orchestration

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 46: Transforming Mobile Push Notifications with Big Data

Amazon Simple Workflow

Page 47: Transforming Mobile Push Notifications with Big Data

What is Amazon SWF?

“Amazon Simple Workflow (Amazon SWF) is a task coordination and state management service for cloud applications.”

Page 48: Transforming Mobile Push Notifications with Big Data

What Amazon SWF provides● consistent execution state management

● workflow executions and tasks tracking

● non-duplicated dispatch of tasks

● task routing and queuing

● the AWS Flow Framework

Page 49: Transforming Mobile Push Notifications with Big Data

Architecture - Orchestration

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 50: Transforming Mobile Push Notifications with Big Data

Mobile Push: Scheduling

Trigger Publish Service Amazon Simple Workflow

Page 51: Transforming Mobile Push Notifications with Big Data

Mobile Push: Targeting

query querytarget users

Amazon SWF Amazon EC2

Worker(Segmentation)

Amazon Redshift

Amazon S3

Page 52: Transforming Mobile Push Notifications with Big Data

Mobile Push: Processing

publish pushbatch 1-N

Read data + push End UserAmazon SWFWorkers

(Processing)

Page 53: Transforming Mobile Push Notifications with Big Data

Mobile Push: Reporting

send send

Amazon SWF Amazon EC2

Worker(Reporting)

Amazon SES

Page 54: Transforming Mobile Push Notifications with Big Data

Demo (II)

Page 55: Transforming Mobile Push Notifications with Big Data

Q&A