ads personalization at spotify - nyc data engineering 10/23

54
Building personalized ad experiences through iterative engineering and product development Ad Personalization at Spotify Kinshuk Mishra Noel Cody

Upload: kinshuk-mishra

Post on 04-Jul-2015

595 views

Category:

Data & Analytics


4 download

DESCRIPTION

Spotify engineers (Kinshuk Mishra and Noel Cody) share their experiences about building personalized ad experiences for users through iterative engineering and product development. The slide explains their process of continuous problem discovery, hypothesis generation, product development and experimentation. They deep dive into the specific ad personalization problems Spotify is solving and explain their data infrastructure technology stack in detail. They also explain how they've experimented various product hypothesis and iteratively evolved their infrastructure to keep up with the product requirements.

TRANSCRIPT

Page 1: Ads Personalization at Spotify - NYC Data Engineering 10/23

Building personalized ad experiences through iterative engineering and product development

Ad Personalizationat Spotify

Kinshuk MishraNoel Cody

Page 2: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 3: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 4: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 5: Ads Personalization at Spotify - NYC Data Engineering 10/23

Music…

...is with you throughout the day.

...fits your mood.

...fits your activity.

Page 6: Ads Personalization at Spotify - NYC Data Engineering 10/23

Music…

...is personal.

Page 7: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 8: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 9: Ads Personalization at Spotify - NYC Data Engineering 10/23

If your day looks like this:

Wake up Work out Commute Focus at Work Relax at Home Sleep

Page 10: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ads should follow.

Wake up Work out Commute Focus at Work Relax at Home Sleep

Ads

Page 11: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ads should follow.

Wake up Work out Commute Focus at Work (Classical) Relax at Home Sleep

Electronic Music ad

Not bad. WTF?

Page 12: Ads Personalization at Spotify - NYC Data Engineering 10/23

Why Personalization?

Page 13: Ads Personalization at Spotify - NYC Data Engineering 10/23

“...it works well the advertisements are annoying though I am not a fan of mainstream music so hearing about pop bands is also driving me crazy”

“Great way to listen to whatever music you want. The ads can be really annoying though since they don't seem to be targeted. I HATE rap music, yet I seem to get a lot of ads for it.”

Why Personalization?

Page 14: Ads Personalization at Spotify - NYC Data Engineering 10/23

Data confirms anecdotal evidence

Page 15: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 16: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 17: Ads Personalization at Spotify - NYC Data Engineering 10/23

AD PERSONALIZATION

Page 18: Ads Personalization at Spotify - NYC Data Engineering 10/23

User Stories

Hypotheses + Goals

Product MVPs + Experiments

Page 19: Ads Personalization at Spotify - NYC Data Engineering 10/23

● Context-aware ads ● Music ads like music recommendations ● Ads that learn

User Stories to Hypotheses & Goals:

Page 20: Ads Personalization at Spotify - NYC Data Engineering 10/23

● Real-time genre targeting

● Historic genre targeting

● Real-time moment targeting

Hypotheses to Products:

Page 21: Ads Personalization at Spotify - NYC Data Engineering 10/23

(Product MVPs to Experiments)

Control Variation 1 Variation 2

Page 22: Ads Personalization at Spotify - NYC Data Engineering 10/23

INFRASTRUCTURE

Page 23: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture

Feedback Loop

Page 24: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture

OSS Data Infrastructure

Spotify Backend Infrastructure

Page 25: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V1.0

COTS Data Infrastructure

Spotify Backend Infrastructure

Real-time Targeting

Page 26: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V2.0

Real-time + Batch Targeting(a.k.a. Lambda Architecture)

Page 27: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V2.5

Transition to Persistent User Profile

Page 28: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V3.0

Richer Profile Schema with Persistence

Page 29: Ads Personalization at Spotify - NYC Data Engineering 10/23

Tech Choices

Page 30: Ads Personalization at Spotify - NYC Data Engineering 10/23

Kafka

● Kafka is a distributed, partitioned, replicated commit log service.

● Guarantees

● Kafka provides a total order over messages within a partition

● Fault tolerance : handles N-1 failures for replication factor N.

Page 31: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V1.0

COTS Data Infrastructure

Spotify Backend Infrastructure

Real-time Targeting

Page 32: Ads Personalization at Spotify - NYC Data Engineering 10/23

StormStorm

● Real time stream processing

● Like hadoop without HDFS

● Like Map/Reduce with many reducer steps

● Fault tolerant and guaranteed message processing

Page 33: Ads Personalization at Spotify - NYC Data Engineering 10/23

StormStorm: Testing (since 0.8.1)

Page 34: Ads Personalization at Spotify - NYC Data Engineering 10/23

StormStorm: Visualization (since 0.9.2)

Page 35: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V2.0

Real-time + Batch Targeting

Page 36: Ads Personalization at Spotify - NYC Data Engineering 10/23

Apache Crunch

● Framework for writing, testing, and running MapReduce pipelines

● Pipelines are composed of user-defined functions and higher-level

abstractions of common MR tasks (filter, join, etc.)

Page 37: Ads Personalization at Spotify - NYC Data Engineering 10/23

Apache Crunch

Data structures:

● PCollection<T>

● PTable<K,V>

● PGroupedTable<K,V>

Functions:

● MapFn<T1,T2>: T1 → T2

● CombineFn<K,V>: (K, Iterable<V>) → (K, V)

Page 38: Ads Personalization at Spotify - NYC Data Engineering 10/23

What’s wrong with plain Python Streaming MapReduce?

● Testability

● Optimization

● Performance

● IDE support

● Type Safety

● Lack of higher-level operations (filter/join/aggregate)

Apache Crunch

From Spotify Presentation: Scalding the Crunchy Pig for Cascading into the Hive

Page 39: Ads Personalization at Spotify - NYC Data Engineering 10/23

● About a 5x performance improvement over Python streaming MapReduce

● Readable functional-style API in plain Java

● Great local testing support

● First-class support for Avro records.

Apache Crunch

From Spotify Presentation: Scalding the Crunchy Pig for Cascading into the Hive

Page 40: Ads Personalization at Spotify - NYC Data Engineering 10/23

Apache Crunch

Page 41: Ads Personalization at Spotify - NYC Data Engineering 10/23

Apache Crunch

Page 42: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V2.5

Transition to Persistent User Profile

Page 43: Ads Personalization at Spotify - NYC Data Engineering 10/23

CASSANDRA

Rich wide-column schema support

Solid persistence and replication

Slower reads

● Rich schema● Persistence

MEMCACHED

K/V only

TTL is default (in-memory mgmt)

vs.

Page 44: Ads Personalization at Spotify - NYC Data Engineering 10/23

Ad Targeting Architecture V3.0

Richer Profile Schema with Persistence

Page 45: Ads Personalization at Spotify - NYC Data Engineering 10/23

DATA INGESTION:

CASSANDRA

CRUNCH

STORM

HDFS

KAFKALOGS

Page 46: Ads Personalization at Spotify - NYC Data Engineering 10/23

TESTING

Page 47: Ads Personalization at Spotify - NYC Data Engineering 10/23
Page 48: Ads Personalization at Spotify - NYC Data Engineering 10/23

User Stories

Hypotheses + Goals

Product MVPs + Experiments

Page 49: Ads Personalization at Spotify - NYC Data Engineering 10/23

AAR

Vital Signs

Ad-Specific Metrics

Page 50: Ads Personalization at Spotify - NYC Data Engineering 10/23

AAR

Vital Signs

Ad-Specific Metrics

Higher-level metrics are hard to move

Page 51: Ads Personalization at Spotify - NYC Data Engineering 10/23

US

ER E

XP

ERIE

NC

E

TEST ITERATION

IMPACTS AAR

Page 52: Ads Personalization at Spotify - NYC Data Engineering 10/23

AAR

Vital Signs

Ad-Specific MetricsOur focus

Page 53: Ads Personalization at Spotify - NYC Data Engineering 10/23

Test evaluation

● Positive Signals: CTR, Downstream Effects

● Avoidance Signals: Volume, Audio Output

● An “Ad Quality Score”

Page 54: Ads Personalization at Spotify - NYC Data Engineering 10/23

Thanks!

(We’re hiring):spotify.com/us/jobs/