Transcript
Page 1: Netflix viewing data architecture evolution - QCon 2014
Page 2: Netflix viewing data architecture evolution - QCon 2014

Please read the notes associated with each slide for the full context of the

presentation

Page 3: Netflix viewing data architecture evolution - QCon 2014

Who am I?

Philip Fisher-Ogden• Director of Engineering @

Netflix

• Playback Services (making “click play” work)

• 6 years @ Netflix, from 10 servers to 10,000s

Page 4: Netflix viewing data architecture evolution - QCon 2014

Story

Netflix streaming – 2007 to present

Page 5: Netflix viewing data architecture evolution - QCon 2014

Device Growth

20071 device

200810s of devices

200910s of devices

2010100s of devices

2011+1000+ devices

Page 6: Netflix viewing data architecture evolution - QCon 2014

Experience Evolution

Page 7: Netflix viewing data architecture evolution - QCon 2014

Subscribers & Viewing

53M global subscribers

50 countries

>2 billion hours viewed per month

Page 8: Netflix viewing data architecture evolution - QCon 2014

Improved Personalization

Better Experience

Viewing

Virtuous Cycle

Page 9: Netflix viewing data architecture evolution - QCon 2014

Viewing Data

Who, What, When, Where, How Long

Page 10: Netflix viewing data architecture evolution - QCon 2014

Real time data use cases

What have I watched?

Page 11: Netflix viewing data architecture evolution - QCon 2014

Real time data use cases

Where was I at?

Page 12: Netflix viewing data architecture evolution - QCon 2014

Real time data use cases

What else am I watching?

Page 13: Netflix viewing data architecture evolution - QCon 2014

Session Analytics

Page 14: Netflix viewing data architecture evolution - QCon 2014

Session Analytics

Page 15: Netflix viewing data architecture evolution - QCon 2014

Active Sessions

Last Position

Viewing History

Data Feed

Generic Architecture

Start Stop

Collect

ProcessStream State

Session Summary

Event Stream

Provide

Page 16: Netflix viewing data architecture evolution - QCon 2014

Architecture Evolution

• Different generations

• Pain points & learnings

• Re-architecture motivations

Page 17: Netflix viewing data architecture evolution - QCon 2014

Real Time Data

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 18: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 1

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 19: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 1

Start Stop

SessionsLogs / Events

History / Position

SQL

Page 20: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 1 pain points

• Scalability

– DB scaled up not out

• Event Data Analytics

– ad hoc

• Fixed schema

Page 21: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 22: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2 motivations

• Scalability

– Scale out not up

• Flexible schema

– Key/value attributes

• Service oriented

Page 23: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2

Start Stop

No

SQL

50 data partitions

Viewing Service

Page 24: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2 pain points

• Scale out

– Resharding was painful

• Performance

– Hot spots

• Disaster Recovery

– SimpleDB had no backups

Page 25: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 26: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 landscape

• Cassandra 0.6

• Before SSDs in AWS

• Netflix in 1 AWS region

Page 27: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 motivations

• Order of magnitude increase in requests

• Scalability

– Actually scale out rather than up

Page 28: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

StatelessTier

(fallback)

Sessions

Viewing History

Mem

cach

ed

Page 29: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Start

Stop

Page 30: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

Page 31: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

update

Page 32: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

snapshot

Sessions

Page 33: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

Viewing History

Mem

cach

ed

Page 34: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

What have I

watched?

Viewing History

Mem

cach

ed

View Summary

Page 35: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

Latest PositionsWhere

was I at?

Viewing History

StatelessTier

(fallback)

Mem

cach

ed

Page 36: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

What else am I

watching?

Active Sessions

Page 37: Netflix viewing data architecture evolution - QCon 2014

gen 3 - Requests ScaleOperation Scale

Create (start streaming) 1,000s per second

Update (heartbeat, close) 100,000s per second

Append (session events/logs) 10,000s per second

Read viewing history 10,000s per second

Read latest position 100,000s per second

Page 38: Netflix viewing data architecture evolution - QCon 2014

gen 3 – Cluster ScaleCluster Scale

Cassandra Viewing History ~100 hi1.4xl nodes~48 TB total space used

Viewing Service Stateful Tier ~1700 r3.2xl nodes50GB heap memory per node

Memcached ~450 r3.2xl/xl nodes~8TB memory used

Page 39: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 pain points

• Stateful tier

– Hot spots

– Multi-region complexity

• Monolithic service

• read-modify-write poorly suited for memcached

Page 40: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 learnings

• Distributed stateful systems are hard

– Go stateless, use C*/memcached/redis…

• Decompose into microservices

Page 41: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 4

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

StatelessTier

(fallback)

Viewing History

Sessions

Mem

cach

ed

Page 42: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 4

Stream State/Event Collectors

Stateless Microservices

Data Processors

Data Services Data Feeds

Page 43: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 4

Viewing History

Session State Session Positions

Session Events

Data Tiers

red

isre

dis

Page 44: Netflix viewing data architecture evolution - QCon 2014

Session Analytics

• Summarize detailed event data

• Non-real time, but near real time

• Some shared logic with real time

Page 45: Netflix viewing data architecture evolution - QCon 2014

Session Analytics - Processing

2007 2009 20102008 2011 2012 2013 2014 Future

CustomService

(Java on AWS)

Mantis

Bat

ch

Nea

r R

eal-

Tim

e

Stre

am

Pro

cess

ing

Page 46: Netflix viewing data architecture evolution - QCon 2014

Session Analytics - Storage

2007 2009 20102008 2011 2012 2013 2014 Future

Bat

ch

Nea

r R

eal-

Tim

e

Stre

am

Pro

cess

ing

Page 47: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 1

• Storage • Processing

Sess

ion

sLo

gs

Page 48: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 1 pain points

• MapReduce good for batch

– Not for near real time

• Complexity

– Code in 2 systems / frameworks

– Operational burden of 2 systems

Page 49: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 2

• Storage • Processing

Session Events & Logs

Java

Page 50: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 2 learnings

• Reduced complexity

– shared code and ops

• Batch still available

• New bottleneck

– harder to extend logic

Page 51: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 3 (*)

• Storage • Processing

Mantis

Storm

Samza

Spark Streaming

Stream Processing Frameworks

Page 52: Netflix viewing data architecture evolution - QCon 2014

Takeaways

• Polyglot Persistence– One size fits all doesn’t

fit all

• Strong opinions, loosely held– Design for long term, but

be open to redesigns

Page 53: Netflix viewing data architecture evolution - QCon 2014

Thanks!

@philip_pfo


Top Related