innovation with aerospike · 2018-05-25 · innovation with aerospike: rob russo platform architect...

24
AEROSPIKE USER SUMMIT 2018 Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles Billions of Requests per Day

Upload: others

Post on 20-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

AERO SPIKE USER SUM M IT 2018

Innovation with Aerospike:

ROB RUSSOPlatform Architect

AppLovin

How AppLovin Handles Billions of Requests per Day

Page 2: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

2 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

A Typical Day at AppLovin

Page 3: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

3 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

✓ 3+ billion active devices

✓ 35 individual clusters

✓ Over 350 servers running Aerospike

✓ 700TB+ of disk and 70TB+ of RAM

✓ 250B+ total Aerospike records

✓ 4B+ objects expired daily

AppLovin at Scale

B A C K G R O U N D

Our production Aerospike setup consists of:

Page 4: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

4 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

A Typical Day At AppLovin

Page 5: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

5 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

✓ Device data storage

✓ Realtime XDR Ad Data

✓ Down funnel event tracking

✓ Dimension counting

✓ Decision tracking

✓ Atomic state tracking

✓ Spark functionality enhancements

Our Aerospike Use Cases

B A C K G R O U N D

At AppLovin we’ve applied Aerospike to the following successful uses:

Page 6: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

6 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

Page 7: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

7 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

• Aerospike data not quickly accessible in bulk

• Business intelligence questions require scanAll

• How many devices have X feature?

• What is the device distribution of Y?

• Takes up to a day for feedback

• Questions usually lead to follow-ups

• Slow backup/restore reduces room for error

The Problem

Speeding Up Development and BI

B A C K G R O U N D

Page 8: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

8 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

• Develop Spark job to export Aerospike to HDFS

• Reduces query time by 750X

• BI becomes interactive and iterative

• Data becomes usable in all spark jobs

• Capable of a full cluster restore

• Saved in Parquet for query speed

• Reduces backup time by 80%

The Solution

Aerospike Backup/Restore with Spark

B A C K G R O U N D

Page 9: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

9 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

The Results

Aerospike Backup/Restore with Spark

R E S U L T S

0

5

10

15

20

25

ScanAll Export +SparkJob

SparkJob

Hou

rs

BI Answer Time

750xfaster!

Page 10: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

10 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

BI & Debugging with ApacheZeppelin

H O W I T W O R K S

Page 11: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

11 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

Scaling it Up

H O W I T W O R K S

Page 12: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

12 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

Aerospike Export Benchmarks

R E S U L T S

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

asbackup asbackup parallel ALBackup V1 ALBackup V2 ALBackup V3

RE

CO

RD

S/

S

Page 13: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

13 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

What Next?

Combing Spark and Aerospike

B A C K G R O U N D

• We’ve found a way to augment Aerospike with Spark

• What about the reverse?

• How can we use the unique capabilities of Aerospike to augment Spark?

• Aggregation

• Set Operations

• Joins

Page 14: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

14 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

THE APPLICATION

• Matching down funnel events to their source event

• Typically involves matching disproportionately sized datasets

• Perfect fit for realtimematching using Aerospike

Down Funnel Event Matching At Scale

B A C K G R O U N D

THE PROBLEM

• Original Vertica query began to fail before realtime solution ready

• Current Vertica solution:

• 384 cores, 2 TB RAM, 1 hour runtime, 2.7M events/s

• Quickly needed to replace it with an Apache Spark Query

• Immediately noticed the heavy resource utilization

• Vertica and Spark use the same inefficient joins

Page 15: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

15 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

Spark to the Rescue

H O W I T W O R K S

FIRST APPROACH

▪ Use Spark SQL as a near drop in replacement

▪ Resulting performance was a choice between:

▪ 2000+ cores, 4TB RAM, 1 hour runtime, 2.7M events/s

▪ 384 cores, 1TB RAM, 5 hour runtime, 500K events/s

▪ At least it runs successfully!

SECOND APPROACH

▪ Use Spark broadcast join to avoid the large shuffle

▪ Clearly the fastest option: 100x faster than approach 1

▪ Results: Failed at scale due to 2GB broadcast limit

Hash Join

Broadcast Join

Page 16: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

16 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

THE SOLUTION

• Install Aerospike in-memory namespace onto all Spark servers

• Use Aerospike within Spark job to avoid large data shuffles

• Aerospike inter-node communication is limited to small hashes

Avoiding the Shuffle

H O W I T W O R K S

THE METHOD

• Load the smaller dataset into Aerospike:

• Key off the join key

• Store required join data in bins

• Iterate large dataset locally and query Aerospike for matches

Page 17: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

17 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

Step 1: Step 2:

The Aerospike-Spark Join

H O W I T W O R K S

Page 18: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

18 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

INITIAL IMPLEMENTATION RESULTS

Proof of Concept

H O W I T W O R K S

How to scale it up?1. First simplify the test case, isolate to Aerospike

2. Pick consistent benchmark query

3. Tune Aerospike configuration

4. Re-implement with Aerospike batch protocol~10M reads/s

Used standard get/put

Page 19: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

19 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

BENCHMARK RESULTS

Full Scale Implementation

R E S U L T S

Final job resources:384 Cores

500GB RAM (4x less)

5 minute runtime (12x faster)

230M reads/s (7x faster than spark alone)

6M reads/s per node

0

500

1000

1500

2000

2500

3000

3500

Spark Shuffle Spark + Aerospike Spark Broadcast

MIL

LIO

N R

EC

OR

DS/s

Join Throughputs

Page 20: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

20 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

Peak Reads/s

O U T C O M E

Page 21: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

21 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

The Final Comparison

O U T C O M E

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Vertica Spark Shuffle Spark + Aerospike

MIL

LIO

N R

EC

OR

DS/s

Page 22: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

22 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

FORMALIZING THE FRAMEWORK

• Needed to provide simple API to make use of it

• Abstract away all of the Aerospike specific logic

• Hide the use of namespace, sets and bins

• Use TTL records to prevent the cluster from filling

• Provide feature parity to Spark’s included set/join functions:

Bringing it to the Masses

O U T C O M E

DEVELOPER API

largeRDD.aeroSubtract(broadcastRDD)

largeRDD.aeroIntersect(broadcastRDD)

largeRDD.aeroJoin(broadcastRDD)

largeRDD.aeroLeftJoin(broadcastRDD)

largeDF.aeroJoin(broadcastDF,columns)

largeDF.aeroLeftJoin(broadcastDF,columns)

Page 23: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

23 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

LIMITATIONS

• Aerospike set/bin limits

• How to properly deal with TTL or removing records

• Key conflicts

• Potential for cascading failures

• Need to know where to apply it

Limitations and Future Work

F O L L O W U P

FUTURE IMPROVEMENT IDEAS

• Plugin to cost based optimizer

• Determine is SQL support is possible

• Launch dedicated job clusters through Mesos?

Page 24: Innovation with Aerospike · 2018-05-25 · Innovation with Aerospike: ROB RUSSO Platform Architect AppLovin How AppLovin Handles ... asbackup asbackup parallel ALBackup V1 ALBackup

24 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc

▪ Quick efficient access to Aerospike data from Spark can speed up BI and development by 750x

▪ Backup solutions through Spark can perform up to 2x faster than as backup alone

▪ A hybrid Aerospike-Spark setup can

perform 7x faster than Spark alone

Key

Takeaways

Aerospike and Spark

F O L L O W U P