how we scaled push messaging for millions of netflix devices · push registry zuul push servers...

83
How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway

Upload: others

Post on 03-Sep-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How we scaled push messaging for millions of Netflix devices

Susheel AroskarCloud Gateway

Page 2: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Why do we need push?

Page 3: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 4: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How I spend my time in Netflix application...

Page 5: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?

Page 6: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?● How you can build it

Page 7: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?● How you can build it● How you can operate it

Page 8: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?● How you can build it● How you can operate it● What can you do with it

Page 9: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Susheel Aroskar

Senior Software EngineerCloud Gateway

[email protected]

github.com/raksoras @susheelaroskar

Page 10: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PERSISTUNTILSOMETHINGHAPPENS

Page 11: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PERSISTUNTILSOMETHINGHAPPENS

Page 12: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Zuul Push Architecture

Page 13: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Zuul Push Servers

Page 14: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Zuul Push Servers

WebSockets / SSE

Page 15: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 16: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 17: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Library

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 18: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 19: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 20: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 21: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

Lookup server

WebSockets / SSE

Page 22: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

Lookup server

Deliver message

WebSockets / SSE

Page 23: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Handling millions of persistent connections

Zuul Push server

Page 24: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

C10K challenge

Page 25: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Socket Socket

Thread per Connection

Thread-1 Thread-2

Read

WriteWrite

Read

Page 26: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Socket Socket

Thread per Connection

Thread-1 Thread-2

Read

WriteWrite

Read

Async I/O

Socket

read callback

write callback

Socket

Single Threadread

callbackwrite

callback

Page 27: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

SOCKET

ChannelInboundHandler

ChannelInboundHandler

ChannelOutboundHandler

ChannelOutboundHandler

Channel Pipeline

Head Tail

Netty

Page 28: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

protected void addPushHandlers(ChannelPipeline pl) {

pl.addLast(new HttpServerCodec());

pl.addLast(new HttpObjectAggregator());

pl.addLast(getPushAuthHandler());

pl.addLast(new WebSocketServerCompressionHandler());

pl.addLast(new WebSocketServerProtocolHandler());

pl.addLast(getPushRegistrationHandler());

}

Page 29: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Authenticate by Cookies, JWT or any other custom scheme

Plug in your custom authentication policy

Page 30: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Tracking clients’ connectionMetadata in real-time

Push Registry

Page 31: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

public class MyRegistration extends PushRegistrationHandler {

@Override

protected void registerClient(

ChannelHandlerContext ctx,

PushUserAuth auth,

PushConnection conn,

PushConnectionRegistry registry) {

super.registerClient(ctx, authEvent, conn, registry);

ctx.executor().submit(() -> storeInRedis(auth));

}

}

Page 32: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push registry features checklist

Page 33: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency

Push registry features checklist

Page 34: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency● Record expiry

Push registry features checklist

Page 35: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency● Record expiry● Sharding

Push registry features checklist

Page 36: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency● Record expiry● Sharding● Replication

Push registry features checklist

Page 37: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 38: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

What we use

https://github.com/Netflix/dynomite

Redis + Auto-sharding+ Read/Write quorum+ Cross-region replication

Dynomite

Page 39: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processing

Queue, RouteDeliver

Page 40: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

We use Kafka message queues to decouple message senders from receivers

Page 41: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Fire and Forget

Page 42: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Cross-region Replication

Page 43: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Different queues for different priorities

Page 44: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

We run multiple message processor instances in parallel to scale our message processing throughput.

Page 45: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Operating Zuul Push Different than REST of them

Page 46: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Persistent connections make Zuul Push server stateful

Long lived stable connections

Page 47: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Persistent connections make Zuul Push server stateful

Long lived stable connections○ Great for client efficiency

Page 48: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Persistent connections make Zuul Push server stateful

Long lived stable connections○ Great for client efficiency○ Terrible for quick deploy/rollback

Page 49: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

If you love your clients set them free...

Tear down connections periodically

Page 50: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Randomize each connection’s lifetime

Page 51: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

#

reco

nnec

ts

Time

Effect of randomizing connection lifetime on reconnect peaks

Page 52: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Ask client to close its connection.

Page 53: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Most connections are idle!

How to optimize push server

Page 54: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

BIG Server, tons of connections

ulimit -n 262144

net.ipv4.tcp_rmem="4096 87380

16777216"

net.ipv4.tcp_wmem="4096 87380

16777216"

Page 55: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 56: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 57: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Goldilocks strategy

Page 58: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Optimize for cost, NOT instance count

$$ $$

Page 59: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How to auto-scale?

Page 60: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How to auto-scale?

RPS? CPU??

Page 61: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How to auto-scale?

RPS? CPU??

Open Connections

Page 62: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Amazon Elastic Load Balancers cannot proxy WebSockets.

Page 63: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Solution - Run ELB as a TCP load balancer

7 Application

6 Presentation

5 Session

4 Transport

3 Network

2 Data link

1 Physical

HTTP

TCP

IP

Ethernet

OSI 7 network layers (conceptual)

HTTP over TCP/IP

Layer 7 HTTP (WebSocket Upgrade Request)

Layer 4 TCP

Page 64: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes

Page 65: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize each connection’s lifetime

Page 66: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize connection’s lifetime● More number of smaller servers >> few BIG servers

Page 67: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize connection’s lifetime● More number of smaller servers >> few BIG servers● Auto-scale on number of open connections per box

Page 68: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize connection’s lifetime● More number of smaller servers >> few BIG servers● Auto-scale on number of open connections per box● WebSocket aware vs TCP load balancer

Page 69: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

If you build it,They will push

Page 70: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

On-demand diagnostics

Page 71: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Remote recovery

Page 72: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

User messaging

Page 73: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

WHAT WILL YOU

USE IT FOR?

Page 74: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Call to action

Page 75: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PULL!

Page 76: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PULL!

https://github.com/Netflix/zuul

Page 77: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you

Page 78: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you rich (in functionality),

Page 79: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you rich (in functionality), thin (by getting rid of polling)

Page 80: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you rich (in functionality), thin (by getting rid of polling) and happy!

Page 81: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Thank you.

Page 82: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Questions? Susheel Aroskar

Senior Software EngineerCloud Gateway

[email protected]

github.com/raksoras@susheelaroskar

Page 83: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Rich, exciting Apps

More efficient systems

Easy to customize

Easy to operate

Zuul Push

Battle tested