Download - AWS Game Analytics - GDC 2014
AWS Gaming Solutions | GDC 2014
Game Analytics with AWS Or, How to learn what your players love so they will love your game Nate Wiger @nateware | Principal Gaming Solutions Architect
AWS Gaming Solutions | GDC 2014
Mobile Game Landscape
• Free To Play • In-App Purchases • Long-Tail • Cross-Platform • Go Global • User Retention = Revenue
AWS Gaming Solutions | GDC 2014
Projected Mobile App Revenue
0 10000 20000 30000 40000 50000 60000 70000 80000 90000
2011 2012 2013 2014 2015 2016 2017
Ads IAP Paid
Source: Gartner
AWS Gaming Solutions | GDC 2014
Winning at Free to Play
• Phase 1: Collect Data • Phase 2: Analyze • Phase 3: Profit
AWS Gaming Solutions | GDC 2014
Analyze What?
Emotions • Enjoying game • Engaged • Like/dislike new content • Stuck on a level • Bored • Abandonment
Behaviors • Hours played day/week • Number of sessions/day • Level progression • Friend invites/referrals • Response to mobile push • Money spent/week
AWS Gaming Solutions | GDC 2014
Example: Level Progression (One Metric)
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
# of Tries
AWS Gaming Solutions | GDC 2014
Example: Level Progression (Two Metrics)
0 10 20 30 40 50 60
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
% Highest Level # of Tries
AWS Gaming Solutions | GDC 2014
Key Takeaways
• Multiple data sources • Correlate variables • Deltas vs absolutes • Settle on terminology (game vs level) • Time matters
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Events & Metrics
• Event = Moment in Time – Login/quit – Game start/end – Level up – In-app purchase
• Metrics = What to Measure – KISS – Numbers – Booleans – Strings (Enums)
• Always Include (ALWAYS) – User – Action – Session (context-dependent) – Timestamp in ISO8601
2014-‐03-‐16T16:28:26
AWS Gaming Solutions | GDC 2014
Off The Shelf Analytics
• Easy To Integrate • Pre-Baked Reports • Rate Limits • Retention Windows • Data Lock-In
AWS Gaming Solutions | GDC 2014
Ok, A Real Business Plan
Ingest Store Process Analyze
AWS Gaming Solutions | GDC 2014
Ok, A Real Business Plan
Ingest • HTTP PUT • Kafka • Kinesis • Scribe
Store • S3 • DynamoDB • HDFS • Redshift
Process • EMR (Hadoop) • Spark • Storm
Analyze • Tableau • Pentaho • Jaspersoft
AWS Gaming Solutions | GDC 2014
• Write Events File on Device • Periodically Upload to S3 • Process into Redshift • Point GUI Tool to Redshift
Start Simple
2014-‐01-‐24,nateware,e4df,login 2014-‐01-‐24,nateware,e4df,gamestart 2014-‐01-‐24,nateware,e4df,gameend 2014-‐01-‐25,nateware,a88c,login 2014-‐01-‐25,nateware,a88c,friendlist 2014-‐01-‐25,nateware,a88c,gamestart
Profit!
AWS Gaming Solutions | GDC 2014
Redshift at a Glance
10 GigE (HPC)
Ingestion Backup Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3/DynamoDB
JDBC/ODBC
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
Leader Node
• Leader Node – SQL endpoint – Stores metadata – Coordinates query execution
• Compute Nodes – Columnar table storage – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB
• Single node version available
AWS Gaming Solutions | GDC 2014
Tableau + Redshift
AWS Gaming Solutions | GDC 2014
Plumbing
① Create S3 bucket ("mygame-analytics-events") ② Request a security token for your mobile app:
http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html
③ Upload data from your users' devices ④ Run a scheduled copy to Redshift ⑤ Setup Tableau to access Redshift ⑥ Go to the Beach
AWS Gaming Solutions | GDC 2014
Loading Redshift from S3
copy events from 's3://mygame-‐analytics-‐events' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>' delimiter=',';
Scheduled Redshift Load using Data Pipeline: http://aws.amazon.com/articles/1143507459230804
AWS Gaming Solutions | GDC 2014
• Also Collect Server Logs • Periodically Upload to S3 • Stuff into Redshift • External Analytics Data Too
More Data Sources
EC2
External Analytics
AWS Gaming Solutions | GDC 2014
Logrotate to S3
/var/log/apache2/*.log { sharedscripts postrotate sudo /usr/sbin/apache2ctl graceful s3cmd sync /var/log/*.gz s3://mygame-‐logs/ endscript }
Blog Entry on Log Rotation: http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/ And/or, Use ELB Access Logs: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/access-log-collection.html
AWS Gaming Solutions | GDC 2014
• Different File Formats • Device vs Apache vs CDN • Cleanup with EMR Job • Output to Clean Bucket • Load into Redshift
Dealing With Messy Data
EC2
AWS Gaming Solutions | GDC 2014
Redshift vs Elastic MapReduce
Redshift • Columnar DB • Familiar SQL • Structured Data • Batch Load • Faster to Query • Long-term Storage
Elastic MapReduce • Hadoop • Hive/Pig are SQL-like • Unstructured Data • Streaming Loop • Scales > PB's • Transient
AWS Gaming Solutions | GDC 2014
• Integrate Game DB • Load Directly into Redshift • Redshift does Intelligent Merge • Tracks Hash Keys, Columns
Direct From DynamoDB
EC2
AWS Gaming Solutions | GDC 2014
• Integrate Game DB • Load Directly into Redshift • Redshift does Intelligent Merge • Tracks Hash Keys, Columns • Or Stream into EMR
Direct From DynamoDB
EC2
AWS Gaming Solutions | GDC 2014
Loading Redshift from DynamoDB
copy games from 'dynamodb://games' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>';
copy events from 's3://mygame-‐analytics-‐events' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>' delimiter=',';
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Funnel Cake
AWS Gaming Solutions | GDC 2014
Back To Basics
2014-‐01-‐24,nateware,e4df,login 2014-‐01-‐24,nateware,e4df,gamestart 2014-‐01-‐24,nateware,e4df,gameend 2014-‐01-‐25,nateware,a88c,login 2014-‐01-‐25,nateware,a88c,friendlist 2014-‐01-‐25,nateware,a88c,gamestart
AWS Gaming Solutions | GDC 2014
Measure Retention: Repeated Plays
create view events_by_user_by_month as select user_id, date_trunc('month', event_date) as month_active, count(*) as total_events from events group by user_id, month_active;
AWS Gaming Solutions | GDC 2014
First-Pass Retention – Too Noisy
0 5
10 15 20 25 30 35 40
# Play Sessions / Month
nateware Lazyd0g AK187 3strikes
AWS Gaming Solutions | GDC 2014
Cohorts & Cambria
• Enables calculating relative metrics • Group users by a common attribute
– Month game installed – Demographics
• Run analysis by cohort – Join with metrics
• Use Redshift as it's SQL – Example of where SQL is a good fit
AWS Gaming Solutions | GDC 2014
Creating Cohorts with Redshift
create view cohort_by_first_event_date as select user_id, date_trunc('month', min(event_date)) as first_month from events group by user_id;
http://snowplowanalytics.com/analytics/customer-analytics/cohort-analysis.html
AWS Gaming Solutions | GDC 2014
Retention by Cohort – Join Events with Cohort
0
5
10
15
20
25
Week 1 Week 2 Week 3 Week 5 Week 6 Week 7
# Sessions / Week
2013-11 2013-12 2014-01 2014-02 2014-03 2014-04
AWS Gaming Solutions | GDC 2014
Moar Cohorts
• Define multiple cohorts – By activity, time, demographics – As many as you like
• Change cohort depending on analysis • Join same metrics with different cohorts
– Retention by date – Retention by demographic – Retention by average plays/month quartile
AWS Gaming Solutions | GDC 2014
Example Event Stream
2014-‐03-‐17T09:52:08-‐07:00,nateware,e4b5,login 2014-‐03-‐17T09:52:54-‐07:00,nateware,e4b5,gamestart 2014-‐03-‐17T09:53:15-‐07:00,nateware,e4b5,levelup 2014-‐03-‐17T09:54:06-‐07:00,nateware,e4b5,gameend 2014-‐03-‐17T09:54:23-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:55:14-‐07:00,nateware,30a4,gameend 2014-‐03-‐17T09:55:41-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:57:12-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:58:50-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:59:52-‐07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014
Example Event Stream
2014-‐03-‐17T09:52:08-‐07:00,nateware,e4b5,login 2014-‐03-‐17T09:52:54-‐07:00,nateware,e4b5,gamestart 2014-‐03-‐17T09:53:15-‐07:00,nateware,e4b5,levelup 2014-‐03-‐17T09:54:06-‐07:00,nateware,e4b5,gameend 2014-‐03-‐17T09:54:23-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:55:14-‐07:00,nateware,30a4,gameend 2014-‐03-‐17T09:55:41-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:57:12-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:58:50-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:59:52-‐07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014
Cohorts by Type of Activity
create view cohort_by_first_play_date as select user_id, date_trunc('month', min(event_date)) as first_month from events where action = 'gamestart' group by user_id;
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Post-Match Heatmaps
AWS Gaming Solutions | GDC 2014
Real-Time Analytics
Batch • What game modes do
people like best? • How many people have
downloaded DLC pack 2? • Where do most people
die on map 4? • How many daily players
are there on average?
Real-Time • What game modes are
people playing now? • Are more or less people
downloading DLC today? • Are people dying in the
same places? Different? • How many people are
playing today? Variance?
AWS Gaming Solutions | GDC 2014
Why Real-Time Analytics?
30x in 24 hours What if you ran a promo?
AWS Gaming Solutions | GDC 2014
Real-Time Tools
Spark • High-Performance
Hadoop Alternative • Berkeley.edu • Compatible with HiveQL • 100x faster than Hadoop • Runs on EMR
Kinesis • Amazon fully-managed
streaming data layer • Similar to Kafka • Streams contain Shards • Each Shard ingests data
up to 1MB/sec, 1000 TPS • Data stored for 24 hours
AWS Gaming Solutions | GDC 2014
• Always Batch Due to S3
Back To Basics [Dubstep Remix]
EC2
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis • Multiple Writers and Readers • Still Output to Redshift
Need Data Faster!
EC2
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis • Multiple Writers and Readers • Still Output to Redshift • Stream to Spark on EMR • Storm via Kinesis Spout • Custom EC2 Workers
Lots of Ins and Outs
EC2
EC2
AWS Gaming Solutions | GDC 2014
Data Sources
App.4
[Machine Learning]
AW
S En
dpoint
App.1
[Aggregate & De-‐Duplicate]
Data Sources
Data Sources
Data Sources
App.2
[Metric Extrac=on]
S3
DynamoDB
Redshift
App.3 [Sliding Window Analysis]
Data Sources
Availability Zone
Shard 1 Shard 2 Shard N
Availability Zone
Availability Zone
Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion
AWS Gaming Solutions | GDC 2014
Putting Data into Kinesis
• Producers use PUT to send data to a Stream
• PutRecord {Data, PartitionKey, StreamName}
• Partition Key distributes PUTs across Shards
• Unique Sequence # returned on PUT call
• Documentation:
http://docs.aws.amazon.com/kinesis/latest/dev/
introduction.html
Producer
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Kinesis
AWS Gaming Solutions | GDC 2014
Writing to a Kinesis Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-‐amz-‐Date: <Date> Authorization: AWS4-‐HMAC-‐SHA256 Credential=<Credential>, SignedHeaders=content-‐type;date;host;user-‐agent;x-‐amz-‐date;x-‐amz-‐target;x-‐amzn-‐requestid, Signature=<Signature> User-‐Agent: <UserAgentString> Content-‐Type: application/x-‐amz-‐json-‐1.1 Content-‐Length: <PayloadSizeBytes> Connection: Keep-‐Alive X-‐Amz-‐Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }
AWS Gaming Solutions | GDC 2014
Kinesis + Spark
http://aws.amazon.com/articles/4926593393724923
AWS Gaming Solutions | GDC 2014
Death in Real-Time
PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":34,"victim":18,"coord":"163,677,18"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":20,"victim":37,"coord":"71,473,20"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":21,"victim":19,"coord":"332,381,17"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":0,"victim":10,"coord":"14,108,25"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}
AWS Gaming Solutions | GDC 2014
Real-Time Heatmaps
AWS Gaming Solutions | GDC 2014
But A Bow On It
• Collect data from the start • Store it even if you can't process it (yet) • Start simple – S3 + Redshift • Add data sources – process with EMR • Real-time – Kinesis + Spark • Tons of untapped potential for gaming
AWS Gaming Solutions | GDC 2014
Fallback Plan
Cheers – Nate Wiger @nateware