rethinking cloud proxies

Post on 16-Apr-2017

4.269 Views

Category:

Engineering

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Cloud Gateway -A Large Scale Company’s First Line of Defense

Mikey CohenManager - Edge Gateway

Netflix

Today, more than 36% of North America’s internet

traffic is controlled by systems in the Amazon

Cloud

Global Streaming of TV Shows and Movies

Nearly 70 Million Subscribers

In over 80 Countries

Netflix accounts for over 36% of Downstream Traffic in North America

From the Internet to Services in the Cloud

GatewayGateway

?????

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Our Edge Gateway @ Netflix

Handles most netflix.com hostsOver 20 production Zuul clusters~ 50 elbs Gateway handles ~10 origin services

Netflix Gateway Scale

Tens of billions of requests per day3 AWS regionsOver 1000 device types

Hundreds of permutations of protocols and device versions

SuccessEvolutionScaleFailure

Our Journey

So What!? - Change your perspective!!

Traditional Cloud Proxy Mission

Simple static rule-based routingAPI portalRequest authenticationThrottling - request capsMonitoringCaching

The Gateway - a grown-up proxy!●Dynamic routing●Deep Insights●Load balancing●Availability focused●Service protection●Quality assurance tool

Evolving to a Gateway

Netflix’s Public API

Late 2008MasheryDatacenter

Streaming Devices using public API

Early Streaming Devices - 2009 Windows Media CenterXBoxPS3

Migration to AWS2010Sonoa / Apigee proxy

Device traffic, not publicControlling DC -> cloud

migrationRunning in AWSUnder Netflix control

Streaming Success2011ChaosComplexityFailureSuccessLeveraging

Cloud benefits

Anti-patterns of most cloud proxiesStatic configurations

Service push needed to change behavior

Limited range of functionality

Limited to HTTP

Zuul Created

2012Dynamically injected and compiled filters

Manipulate requests and responsesHeaders / Body / etc

Change routing Add metrics and other functions

Built on Netflix’s OSS stackOpen Sourced

Zuul - A Victim of SuccessEasy and convenientInstant resultsHigh adoptionHappy customers

Business logic in proxyAffects system resiliency Zuul team in critical path

Creating a Gateway Strategy

Principles of Netflix’s Gateway Strategy

Creative RoutingDynamic RoutingDelivery FocusedTraffic ShapingReact Fast Insights

Creative Routing - Subclusters with Purpose

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeeze

Red / Green Deployments

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

InstrumentedInstrumented

squeezesqueeze

Developer Test Branches

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

InstrumentedInstrumented

squeezesqueeze

Instrumented Clusters

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Squeeze Testing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeeze

Targeted Routing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeeze

Service “Canarying”

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

“Sticky” Canary

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Failure Injection Testing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Degraded Experience Testing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Traffic Shaping

A Global Cloud Deployment

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Global Cloud Routing

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

A Failing region

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Gateway routing to other regions

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Attack prevention

GatewayGateway

Gateway

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Smart Load Balancing

GatewayGateway

Gateway

Origin (API)

Smart Load Balancing - Bad Nodes

GatewayGateway

Gateway

Origin (API)

Gateway Backoff and Blacklists Bad Nodes

GatewayGateway

Gateway

Origin (API)

Zone Failure - Blacklist the Zone automatically

GatewayGateway

Gateway

Origin (API)

React Quickly - Runtime Filter changes

GatewayGateway

Gateway

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Runtime Policy Injection

A Room with a View - Insights

GatewayGateway

Gateway

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Insights

What’s Next for Netflix’s Gateway?

Gateway as a serviceSelf-service dynamic routing / route validationControl APIs for special routing functions

Netty Based Zuul (using RxNetty)Handling persistent connectionsnon-blocking, async

Transport protocol agnostic routingReactive Socket http://reactivesocket.io/

Top Ten Lessons Learned

Build for handling Failures

Expect the Unexpected

Using Routing Creatively

Shard to Reduce Blast Radius

Devices are WeirdProtocols are Weird

Devices are ForeverProtocols are Forever

It will be built “wrong”

Keep Business Logic out of your Gateway

top related