Transcript
Page 1: Transforming Mobile Push Notifications with Big Data

Transforming Mobile Push Notifications with Big DataDennis Waldron, Data EngineeringPablo Varela, Systems Engineering

Page 2: Transforming Mobile Push Notifications with Big Data

Who is Plumbee?

● 12.8M Installs● 209K Daily Active Users● 818K Monthly Active Users

● Social Games Studio● Mirrorball Slots & Bingo● Facebook Canvas, iOS

Page 3: Transforming Mobile Push Notifications with Big Data

Data Providers

Inhouse data = 99.9% of all data

In Total:

● 98TB (907 days of data)● All stored in Amazon S3

Daily:

● 78GB compressed● ~450M events/day● 4,800 events/second (peak)

Page 4: Transforming Mobile Push Notifications with Big Data

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Log Aggregators

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Analytics (SQL Queries)

Architecture - Overview

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 5: Transforming Mobile Push Notifications with Big Data

Amazon Web Service

End Users (Desktop & Mobile)

Application/Game Servers

● Collect everything!

● RPC events intercepted by annotated endpoints. (Requests)

● All mutating state changes recorded:○ DynamoDB, MySQL, Memcache

(Blobs Updates)● Custom Telemetry (Other):

○ Client: click tracking, loading time statistics, GPU data...

○ Server: promotions, transactions, Facebook user data...

Game Data

RPC

OTHER 15%

77%

9%

MySQL

MemCache

GENERATES

DynamoDB

Page 6: Transforming Mobile Push Notifications with Big Data

Game Data - Example RPC Endpoint Annotation

/** * Example annotation */@SQSRequestLog(requestMessage = SpinRequest.class)@RequestMapping(“/spin”)public SpinResponse spin(SpinRequest spinRequest) {

}

Page 7: Transforming Mobile Push Notifications with Big Data

Example Event - userStats● All events are recorded in JSON.● Structure:

○ Headers○ Categorization Data (metadata)○ Payload (message)

● Important Headers:○ timestamp○ testVariant○ plumbeeUid

Page 8: Transforming Mobile Push Notifications with Big Data

Analytics (SQL Queries)

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Architecture - Collection

Log Aggregators

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 9: Transforming Mobile Push Notifications with Big Data

Data Collection (I) - PUT

Application/Game Servers

Events (JSON)

SQS Queue

Log Aggregators

Producers Consumers

What is SQS (Simple Queue Service)?

A cloud-based message queue for transmitting messages between producers and consumers

SQS Provides:

● ACK/FAIL semantics● Unlimited number of messages● Scales transparently● Buffer zone

Page 10: Transforming Mobile Push Notifications with Big Data

What is Apache Flume?

A distributed, reliable, and available service for efficiently collecting, aggregating, and

moving large amounts of log data

SQS Queue

Apache Flume

Consumers

Data Collection (II) - GET

Amazon S3 (Simple Storage Service)

S3 Data:

● Partitioned by: date / type / sub_type● Compressed with: Snappy● Aggregated in 512MB chunks

Page 11: Transforming Mobile Push Notifications with Big Data

● Pluggable component architecture● Durability via transactions● File channel use Elastic Book Store (EBS) volumes (network attached storage)

○ Protects against Hardware failure

● SQS Flume Plugin: https://github.com/plumbee/flume-sqs-source

Data Collection (III) - Flume

Flume Agent

Source(Custom)

Sink(HDFS)

SQS Queue

Channel(File Based)

S3 Bucket

Transactions

A + B + C = Flow

A B C

Page 12: Transforming Mobile Push Notifications with Big Data

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Analytics (SQL Queries)

Architecture - Processing

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 13: Transforming Mobile Push Notifications with Big Data

● Daily activity● Orchestrated by Amazon DataPipeline● Includes generation of reports● Configured with JSON

What is DataPipeline?

A cloud-based data workflow service that helps you process and move data between

different AWS services

Extract, Transform, Load

SCH

EDU

LEC

OM

MA

ND

RES

OU

RC

E

Page 14: Transforming Mobile Push Notifications with Big Data

What is Elastic Map Reduce?

Cloud-based MapReduce implementation to process vast amounts of data built on top of

the open-sourced Hadoop framework.

Two phases:

● Map() Procedure -> Filtering & Sorting● Reduce() -> Summary operation

Extract & Transform (I)Penguin

HorseCake

Cake

Penguin

Penguin

Penguin

Horse

Horse

CakeCake

HorseHorseHorse

PenguinPenguinPenguinPenguin

Cake: 2 Horse: 3

REDUCE()

MAP()

RA

W D

ATA

SOR

TED

QU

EUES

RES

ULT

Penguin:4

Page 15: Transforming Mobile Push Notifications with Big Data

What is Hive?

An open-sourced Apache project with provides a SQL-Like interface to summarize, query and

analysis large datasets by leveraging Hadoop’s MapReduce infrastructure.

● Not really SQL, HQL -> HiveQL● No transactions, materialized views,

limited subquery support, ...

Extract & Transform (II)

SELECT plumbeeuid, COUNT(*) AS spins FROM eventlog

-- Partitioned data access WHERE event_date = '2014-11-18' AND event_type = 'rpc' AND event_sub_type = 'rpc-spin' -- Aggregation GROUP BY plumbeeuid;

Table: Eventlog● Mounted on top of raw data● SerDe provides JSON parsing● Target data via partition filters

Page 16: Transforming Mobile Push Notifications with Big Data

● Hive has limitations! ○ Speed, JSON

● Most of our transformations use:

Streaming MapReduce Jobs

What is Streaming?

“A Hadoop utility that allows you to create and run MapReduce jobs using any

executable script as a mapper or reducer”

Extract & Transform (III)

for line in sys.stdin: data = json.loads(line) print data['plumbeeUid'] + '\t' + 1

results = defaultdict(int)

for line in sys.stdin: plumbee_uid, count = line.split('\t') results[plumbee_uid] += int(count)

print results

Emits, Key value Pairs 466264 => 1, 376166 => 1 983131 => 1, 466264 => 1

Hadoop sorts and shuffles the data making sure matching keys are processed by a single reducer!

JSON rpc-spin Data

Result:{ 466264: 2, 376166: 1, 983131: 1 }

map()

reduce()

Page 17: Transforming Mobile Push Notifications with Big Data

Results

EMR Transformed data:

● Referred to as aggregates● Stored in S3● Accessible via EMR cluster

Load (I) - Problem

Raw S3 JSON Data Aggregated Data

EMR Transformation(Hive & Streaming Jobs)

Problem

● We don’t run long-lived EMR clusters.

EMR requires:

● Specialists knowledge● Is slow, processing and booting “offline”.

5.4TB

Use Amazon Redshift for fast “online” data access

Page 18: Transforming Mobile Push Notifications with Big Data

What is Redshift?

A column-oriented database which uses Massive Parallel Processing (MPP) techniques

to support analytics style SQL based workloads across large datasets.

Power comes from:

● Query parallelization● Column-oriented design

Redshift Provides:

● Low latency JDBC and ODBC access● Fault Tolerance● Automated Backups

Load (II) - Redshift

Redshift (x3 nodes): 0.33sEMR (x20 nodes): 135.46s

Page 19: Transforming Mobile Push Notifications with Big Data

Load (II) - Column-Oriented Databases

ID First Name Last Name Country

1 Penguin Situation GB

2 Cheese Labs US

3 Horse Barracks GB

ID First Name Last Name Country

1 Penguin Situation GB

2 Cheese Labs US

3 Horse Barracks GB

Row-oriented Database - MySQL

Column-oriented Database - Redshift

● East to add/modify records● Could read irrelevant data.● Great for fast lookups (OLTP)

● Only read in relevant data● Adding rows requires multiple

updates to column data.● Great for aggregation queries

(OLAP)

Page 20: Transforming Mobile Push Notifications with Big Data

Aggregates

Application/Game Servers

End Users (Desktop & Mobile)

Amazon S3 (Simple Storage Service)DataPipeline

Amazon EMR(Elastic MapReduce)

Amazon Redshift

Daily Batch Processing

Plumbee Employees

Analytics (SQL Queries)

Architecture - Revisit

Log Aggregators

Events (JSON)

SQS Analytics Queue

Events (JSON)

Page 21: Transforming Mobile Push Notifications with Big Data

Q&A

Page 22: Transforming Mobile Push Notifications with Big Data

Targeted Push Notifications

Page 23: Transforming Mobile Push Notifications with Big Data

Mirrorball Slots: Kingdom of Riches

Page 24: Transforming Mobile Push Notifications with Big Data

Mirrorball Slots: Challenges

● recurring timed event● collect symbols from non-winning

spins● get free coins if enough symbols are

collected

Page 25: Transforming Mobile Push Notifications with Big Data

Some players ask for notifications

Page 26: Transforming Mobile Push Notifications with Big Data

Use Cases

Page 27: Transforming Mobile Push Notifications with Big Data

Building blocks

Page 28: Transforming Mobile Push Notifications with Big Data

Data Collection

Page 29: Transforming Mobile Push Notifications with Big Data

Players

Data Collection

Amazon Redshift

Page 30: Transforming Mobile Push Notifications with Big Data

Architecture - Overview

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 31: Transforming Mobile Push Notifications with Big Data

User Targeting

Page 32: Transforming Mobile Push Notifications with Big Data

User targeting

Run SQL queries directly against Redshift

Amazon Redshift User Segment

SQL Query

Page 33: Transforming Mobile Push Notifications with Big Data

User targeting: Query example

-- Target all mobile users SELECT plumbee_uid, arnFROM mobile_user

Page 34: Transforming Mobile Push Notifications with Big Data

User targeting: Query example (II)-- Target lapsed users (1 week lapse)

SELECT plumbee_uid, arnFROM mobile_userWHERE last_play_time < (now - 7 days)

Page 36: Transforming Mobile Push Notifications with Big Data

Architecture - Mobile Push

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 37: Transforming Mobile Push Notifications with Big Data

Amazon Simple Notification Service

Page 38: Transforming Mobile Push Notifications with Big Data

What is SNS?

“Amazon Simple Notification Service (Amazon SNS) is a fast, flexible, fully managed push messaging service”

Page 39: Transforming Mobile Push Notifications with Big Data

Amazon SNS

Page 40: Transforming Mobile Push Notifications with Big Data

Amazon SNS

Page 41: Transforming Mobile Push Notifications with Big Data

Amazon SNS: Device Registration

Game Servers SQS Analytics Queue Amazon RedshiftPlayers

Amazon SNS

register deviceevent

register

Page 42: Transforming Mobile Push Notifications with Big Data

Amazon SNS: ARN Retrievalprivate String getArnForDeviceEndpoint(String platformApplicationArn, String deviceToken) {

CreatePlatformEndpointRequest request =

new CreatePlatformEndpointRequest()

.withPlatformApplicationArn(platformApplicationArn)

.withToken(deviceToken);

CreatePlatformEndpointResult result = snsClient.createPlatformEndpoint(request);

return result.getEndpointArn();

}

Page 43: Transforming Mobile Push Notifications with Big Data

Amazon SNS: Analytics Eventprivate String registerEndpointForApplicationAndPlatform( final long plumbeeUid,

String platformARN, String platformToken) {

final String deviceEndpointARN = getArnForDeviceEndpoint( platformARN, platformToken);

sqsLogger.queueMessage( new HashMap<String, Object>() {{

put("notification", "register");

put("plumbeeUid", plumbeeUid);

put("provider", platformName);

put("endpoint", deviceEndpointARN );

}}, null);

return deviceEndpointARN;

}

Page 44: Transforming Mobile Push Notifications with Big Data

Amazon SNS: Mobile Pushprivate void publishMessage(UserData userData, String jsonPayload) { amazonSNS.publish(new PublishRequest() .withTargetArn( userData.getEndpoint()) .withMessageStructure( "json") .withMessage( jsonPayload));}

{"default": "The 5 day Halloween Challenge has started today! Touch to play NOW!"}

Payload example

Page 45: Transforming Mobile Push Notifications with Big Data

Architecture - Orchestration

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 46: Transforming Mobile Push Notifications with Big Data

Amazon Simple Workflow

Page 47: Transforming Mobile Push Notifications with Big Data

What is Amazon SWF?

“Amazon Simple Workflow (Amazon SWF) is a task coordination and state management service for cloud applications.”

Page 48: Transforming Mobile Push Notifications with Big Data

What Amazon SWF provides● consistent execution state management

● workflow executions and tasks tracking

● non-duplicated dispatch of tasks

● task routing and queuing

● the AWS Flow Framework

Page 49: Transforming Mobile Push Notifications with Big Data

Architecture - Orchestration

Amazon S3

Amazon Redshift

Batch ProcessorsAmazon SNS

PublisherTrigger Segmentation Workers

Players

Targeting

Mobile Push

Page 50: Transforming Mobile Push Notifications with Big Data

Mobile Push: Scheduling

Trigger Publish Service Amazon Simple Workflow

Page 51: Transforming Mobile Push Notifications with Big Data

Mobile Push: Targeting

query querytarget users

Amazon SWF Amazon EC2

Worker(Segmentation)

Amazon Redshift

Amazon S3

Page 52: Transforming Mobile Push Notifications with Big Data

Mobile Push: Processing

publish pushbatch 1-N

Read data + push End UserAmazon SWFWorkers

(Processing)

Page 53: Transforming Mobile Push Notifications with Big Data

Mobile Push: Reporting

send send

Amazon SWF Amazon EC2

Worker(Reporting)

Amazon SES

Page 54: Transforming Mobile Push Notifications with Big Data

Demo (II)

Page 55: Transforming Mobile Push Notifications with Big Data

Q&A


Top Related