riot games - player focused pipeline - stampedecon 2015

68
#StampedeCon 2015 - Riot Games BUILDING A PLAYER FOCUSED DATA PIPELINE RYAN TABORA @ryantabora SEAN MALONEY @sean_seannery

Upload: seanseannery

Post on 15-Aug-2015

287 views

Category:

Technology


0 download

TRANSCRIPT

#StampedeCon 2015 - Riot Games

BUILDING A PLAYER FOCUSED DATA PIPELINE

RYAN TABORA@ryantabora

SEAN MALONEY@sean_seannery

SEAN MALONEYENGINEER

WHO WE ARE

SMALONEY @RIOTGAMES.COM

@SEAN_SEANNERY

WORKING ON RIOT’S ETL TOOLS

FAVORITE ACTIVITY:ATTEMPTING TO GROW FACIAL HAIR BUT FAILING MISERABLY

RYAN TABORAENGINEER

WHO WE ARE

WORKING ON RIOT’S INGESTION PIPELINE.

FAVORITE ACTIVITY:EATING MAC + CHEESE WHILE LISTENING TO DEATH METAL.

RTABORA @RIOTGAMES.COM @RYANTABORA

OUR DATA PLATFORM (THEN)

5 THINGS YOU NEED

RIOT GAMES SCALE

AGENDA

3 THINGS WE STILL NEED (AND YOU MAY WANT ALSO)

OUR DATA PLATFORM (NOW)

RIOT’S SCALE1

23

4

5

LEAGUE OF LEGENDS

2009LAUNCH

ONLINE MULTI-PLAYER

WINDOWS/ OSX

40-50 MIN.

GAMES

LEAGUE OF LEGENDS STATS

7.5 MILLION

PEAK CONCURRENT

PLAYERS

STATS RELEASED JANUARY 2014

67 MILLION

MONTHLY ACTIVE PLAYERS

MORE THAN MORE THAN27 MILLION

DAILY ACTIVE PLAYERS

MORE THAN

OUR DATA PLATFORM1

23

4

5

2013

5 THINGS YOU NEED1

23

4

5

Auditing ETLs can use queries with custom injected data.

Ad-Hoc Data Requests Extend with new connection types and custom etls easily

Self-Service ArchitectureThe big data team is small. We can’t manage all the ETLS ourselves.

Support Multiple DatacentersOne task will execute on different database servers around the world.

A.K.A. 5 THINGS WE DIDN’

T HAVE Multiple Data Access PatternsExtend with new connection types and custom etls easily

5 THINGS YOU NEEDSELF-SERVICE ARCHITECTURE

1

23

4

5

Need Backup!

ANALYSTS

DEVS

TOOLS

VSANALYSTS

DEVS

TOOLS

User DocumentationNo one likes doing it, but it helps a lot.

Onboard trainingGet new coworkers in-the-know

Familiar ProtocolsUse REST or RPC so developers are on the same page

Focus on UXYour tools need to be easy for non-technical people to use.

SELF SERVICEHOW?

5 THINGS YOU NEEDA PLAN FOR MULTIPLE DATACENTERS

1

23

4

5

NA Korea Russia

Sqoop / Oozie

OUTGROWING OUR TOOLS

Pentaho DMExpress

Templating ETLs can use queries with custom injected data.

Scale Horizontally As the data grows, the tool should be able to handle it.

Empower Users The big data team is small. We can’t manage all the ETLS ourselves.

Support One ETL - Many SourcesOne task will execute on different database servers around the world.

YOUR ETL TOOL

SHOULD...

Distributed ETL Software written in Ruby.

Candidate for Riot open sourcing

Same ETL applied to multiple regions / datacenters

Self-Service UI with SQL query templating.

Create an ETL

Create an ETL

DynamoDBS3SQS(S)FTPHiveMicrosoft SQL ServerVerticaRedshiftMysqlREST websites

FUETL CAN

CONNECT TO

Create an ETL

FuETL SCALING

FuETL SCALING

KEEPING INTEGRITY

X

FuETL STATISTICS

14 TB

DATA MOVED DAILY

5213

ACTIVE REGIONAL ETLS

23125

DAILY ETL RUNS

5 THINGS YOU NEEDAUDITING TOOLS

1

23

4

5

Network Blips

WHAT THE @&!#$?

Improper shutdowns (deploys / ec2 reboots)

HOW ETLSFAIL

REST micro-service built with Java and docker.

Reports and visualizations we can use to find problems.

Source and target comparison.

WarehouseAuditingServicePlatform

HOW TO AUDIT

HOW TO AUDIT

HOW TO AUDIT

VISUALIZING

VISUALIZING

5.5 THINGS YOU NEEDSTANDARDIZED TIMEZONE!!

1

23.5

4

5

AUDITING ENABLES

Single source of truth

Recovering from data loss

Good source of metadata

5 THINGS YOU NEEDMULTIPLE ACCESS PATTERNS

1

23

4

5

BATCH

BATCH OLAP

?

BATCH OLAP POINT

WASTING RESOURCES

Fast Aggregates Slow Lookups

Returns one row in less than one second

Java web service

Simple abstraction, backed by DynamoDB

PointDataService

ABSTRACTION

MIRRORING HIVE

Full duplicate of the transactional data copied to DynamoDB

Data load powered by Fuetl and ad-hoc EMR cluster

Audited by WASP

PointDataService

CHOOSE TECHNOLOGY BASED ON NEEDS,

AND DON’T FORCE IT

5 THINGS YOU NEEDAD-HOC DATA CRUNCHING

1

23

4

5

Easily scale our resourcesBoth vertically (metastore) and horizontally (clusters)

Support intensive ad-hoc tasks.We can spin up temporary dedicated clusters for big projects.

We own our infrastructureBefore, the game servers team got all the love.

Can now join our data!One task will execute on different database servers around the world.

TO THE CLOUD!

OUR DATA PLATFORM1

23

4

5

2013

3 THINGS WE STILL NEED(AND YOU MIGHT ALSO WANT)

1

23

4

5

Data compliance

OUR FUTURE

Real-time access A data catalog

OPEN SOURCE

RIOTERS WANTED!

SMALONEY @RIOTGAMES.COM

@SEAN_SEANNERY

RTABORA @RIOTGAMES.COM

@RYANTABORA

QUESTIONS?

RIOTGAMES.COM @RIOTGAMES