serverless in production, an experience report (codemotion milan)

196
from the TRENCHES TRENCHES what you should know before you go to production AWS LAMBDA AWS LAMBDA

Upload: yan-cui

Post on 23-Jan-2018

328 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Serverless in production, an experience report (codemotion milan)

from the

TRENCHESTRENCHES

what you should know before you go to production

AWS LAMBDAAWS LAMBDA

Page 2: Serverless in production, an experience report (codemotion milan)

hi, I’m Yan Cui

Page 3: Serverless in production, an experience report (codemotion milan)

hi, I’m Yan CuiAWS user since 2009

Page 4: Serverless in production, an experience report (codemotion milan)
Page 5: Serverless in production, an experience report (codemotion milan)
Page 6: Serverless in production, an experience report (codemotion milan)
Page 7: Serverless in production, an experience report (codemotion milan)
Page 8: Serverless in production, an experience report (codemotion milan)
Page 9: Serverless in production, an experience report (codemotion milan)
Page 10: Serverless in production, an experience report (codemotion milan)

apr, 2016

Page 11: Serverless in production, an experience report (codemotion milan)

hidden complexities and dependencies

low utilisation to leave room for traffic spikes

EC2 scaling is slow, so scale earlier

lots of cost for unused resources

up to 30 mins for deployment

deployment required downtime

Page 12: Serverless in production, an experience report (codemotion milan)

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

Page 13: Serverless in production, an experience report (codemotion milan)
Page 14: Serverless in production, an experience report (codemotion milan)

“what would good

look like for us?”

Page 15: Serverless in production, an experience report (codemotion milan)

be small be fast

have zero downtime have no lock-step

DEPLOYMENTS SHOULD...

Page 16: Serverless in production, an experience report (codemotion milan)

FEATURES SHOULD...be deployable independently

be loosely-coupled

Page 17: Serverless in production, an experience report (codemotion milan)

WE WANT TO...minimise cost for unused resources

minimise ops effort reduce tech mess

deliver visible improvements faster

Page 18: Serverless in production, an experience report (codemotion milan)

nov, 2016

Page 19: Serverless in production, an experience report (codemotion milan)

170 Lambda functions in prod

1.2 GB deployment packages in prod

95% cost saving vs EC2

15x no. of prod releases per month

Page 20: Serverless in production, an experience report (codemotion milan)

timeis a good fit

Page 21: Serverless in production, an experience report (codemotion milan)

1st function in prod!time

is a good fit

Page 22: Serverless in production, an experience report (codemotion milan)

?

timeis a good fit

1st function in prod!

Page 23: Serverless in production, an experience report (codemotion milan)

ALERTING

CI / CD

TESTING

LOGGING

MONITORING

Page 24: Serverless in production, an experience report (codemotion milan)

Practices ToolsPrinciples

what is good? how to make it good? with what?

Page 25: Serverless in production, an experience report (codemotion milan)

Principles outlast Tools

Page 26: Serverless in production, an experience report (codemotion milan)

170 functions

WOOF!

? ?

timeis a good fit

1st function in prod!

Page 27: Serverless in production, an experience report (codemotion milan)

SECURITY

DISTRIBUTEDTRACING

CONFIGMANAGEMENT

Page 28: Serverless in production, an experience report (codemotion milan)

evolving the PLATFORM

Page 29: Serverless in production, an experience report (codemotion milan)

rebuilt search

Page 30: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearch

Page 31: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Page 32: Serverless in production, an experience report (codemotion milan)

new analytics pipeline

Page 33: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery

Page 34: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery

1 developer, 2 daysdesign production

(his 1st serverless project)

Page 35: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery“nothing ever got done

this fast at Skype!”

- Chris Twamley

Page 36: Serverless in production, an experience report (codemotion milan)

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

Page 37: Serverless in production, an experience report (codemotion milan)

Rebuiltwith Lambda

Page 38: Serverless in production, an experience report (codemotion milan)
Page 39: Serverless in production, an experience report (codemotion milan)
Page 40: Serverless in production, an experience report (codemotion milan)
Page 41: Serverless in production, an experience report (codemotion milan)
Page 42: Serverless in production, an experience report (codemotion milan)
Page 43: Serverless in production, an experience report (codemotion milan)
Page 44: Serverless in production, an experience report (codemotion milan)

Rebuiltwith Lambda

Page 45: Serverless in production, an experience report (codemotion milan)

BigQuery

Page 46: Serverless in production, an experience report (codemotion milan)

BigQuery

Page 47: Serverless in production, an experience report (codemotion milan)

grapheneDB

BigQuery

Page 48: Serverless in production, an experience report (codemotion milan)

grapheneDB

BigQuery

Page 49: Serverless in production, an experience report (codemotion milan)

grapheneDB

BigQuery

Page 50: Serverless in production, an experience report (codemotion milan)

getting PRODUCTION READY

Page 51: Serverless in production, an experience report (codemotion milan)

CHOOSE A

FRAMEWORK

DEPLOYMENT

Page 52: Serverless in production, an experience report (codemotion milan)

http://serverless.com

Page 53: Serverless in production, an experience report (codemotion milan)

https://github.com/awslabs/serverless-application-model

Page 54: Serverless in production, an experience report (codemotion milan)

http://apex.run

Page 55: Serverless in production, an experience report (codemotion milan)

https://apex.github.io/up

Page 56: Serverless in production, an experience report (codemotion milan)

https://github.com/claudiajs/claudia

Page 57: Serverless in production, an experience report (codemotion milan)

https://github.com/Miserlou/Zappa

Page 58: Serverless in production, an experience report (codemotion milan)

http://gosparta.io/

Page 59: Serverless in production, an experience report (codemotion milan)

TESTING

Page 60: Serverless in production, an experience report (codemotion milan)

amzn.to/29Lxuzu

Page 61: Serverless in production, an experience report (codemotion milan)

Level of Testing

1.Unitdo our objects do the right thing?are they easy to work with?

Page 62: Serverless in production, an experience report (codemotion milan)
Page 63: Serverless in production, an experience report (codemotion milan)

Level of Testing

1.Unit2.Integrationdoes our code work against code we can’t change?

Page 64: Serverless in production, an experience report (codemotion milan)

handler

Page 65: Serverless in production, an experience report (codemotion milan)

handler

test by invoking the handler

Page 66: Serverless in production, an experience report (codemotion milan)

Level of Testing

1.Unit2.Integration3.Acceptancedoes the whole system work?

Page 67: Serverless in production, an experience report (codemotion milan)

Level of Testing

unit

integration

acceptance

feedb

ack

confidence

Page 68: Serverless in production, an experience report (codemotion milan)

“…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise.

The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…”

Don’t Mock Types You Can’t Change

Page 69: Serverless in production, an experience report (codemotion milan)

“…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do…

Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…”

Don’t Mock Types You Can’t Change

Page 70: Serverless in production, an experience report (codemotion milan)

Don’t Mock Types You Can’t ChangeServices

Page 71: Serverless in production, an experience report (codemotion milan)

Paul Johnston

The serverless approach to testing is different and may

actually be easier.

http://bit.ly/2t5viwK

Page 72: Serverless in production, an experience report (codemotion milan)

LambdaAPI Gateway DynamoDB

Page 73: Serverless in production, an experience report (codemotion milan)

LambdaAPI Gateway DynamoDB

Unit Tests

Page 74: Serverless in production, an experience report (codemotion milan)

LambdaAPI Gateway DynamoDB

Unit Tests

Mock/Stub

Page 75: Serverless in production, an experience report (codemotion milan)

is our request correct?

is the request mapping set up correctly?is the API resources

configured correctly?

are we assuming the correct schema?

LambdaAPI Gateway DynamoDB

is Lambda proxy configured correctly?

is IAM policy set up correctly?

is the table created?

what unit tests will not tell you…

Page 76: Serverless in production, an experience report (codemotion milan)
Page 77: Serverless in production, an experience report (codemotion milan)

most Lambda functions are simple have single purpose, the risk of

shipping broken software has largely shifted to how they integrate with

external services

observation

Page 78: Serverless in production, an experience report (codemotion milan)
Page 79: Serverless in production, an experience report (codemotion milan)

But it slows down my feedback loop…

IT’S NOT ABOUT YOU!

Page 80: Serverless in production, an experience report (codemotion milan)

…if a service can’t provide you with a relatively easy

way to test the interface in reality, then you should

consider using another one.

Paul Johnston

Page 81: Serverless in production, an experience report (codemotion milan)

“…Wherever possible, an acceptance test should exercise the system end-to-end without directly calling its internal code.

An end-to-end test interacts with the system only from the outside: through its interface…”

Testing End-to-End

Page 82: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Page 83: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Test Input

Page 84: Serverless in production, an experience report (codemotion milan)

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Test Input

Validate

Page 85: Serverless in production, an experience report (codemotion milan)

integration tests exercise system’s Integration with its

external dependencies

Page 86: Serverless in production, an experience report (codemotion milan)

acceptance tests exercise system End-to-End from

the outside

Page 87: Serverless in production, an experience report (codemotion milan)

integration tests differ from acceptance tests only in HOW the

Lambda functions are invoked

observation

Page 88: Serverless in production, an experience report (codemotion milan)
Page 89: Serverless in production, an experience report (codemotion milan)
Page 90: Serverless in production, an experience report (codemotion milan)
Page 91: Serverless in production, an experience report (codemotion milan)

CI + CD PIPELINE

Page 92: Serverless in production, an experience report (codemotion milan)

“the earlier you consider CI + CD, the more time you save in the long run”

- me

Page 93: Serverless in production, an experience report (codemotion milan)

“…We prefer to have the end-to-end tests exercise both the system and the process by which it’s built and deployed…

This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software’s lifetime…”

Testing End-to-End

Page 94: Serverless in production, an experience report (codemotion milan)

“deployment scripts that only live on the CI

box is a disaster waiting to happen”

- me

Page 95: Serverless in production, an experience report (codemotion milan)

Jenkins build config deploys and tests

unit + integration tests

deploy

acceptance tests

Page 96: Serverless in production, an experience report (codemotion milan)

if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4

npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4

npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4

npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi

Page 97: Serverless in production, an experience report (codemotion milan)

build.sh allows repeatable builds on both local & CI

Page 98: Serverless in production, an experience report (codemotion milan)
Page 99: Serverless in production, an experience report (codemotion milan)

Auto Auto Manual

Page 100: Serverless in production, an experience report (codemotion milan)

LOGGING

Page 101: Serverless in production, an experience report (codemotion milan)
Page 102: Serverless in production, an experience report (codemotion milan)

2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?

Page 103: Serverless in production, an experience report (codemotion milan)

2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?

UTC Timestamp API Gateway Request Id

your log message

Page 104: Serverless in production, an experience report (codemotion milan)

function name

date

function version

Page 105: Serverless in production, an experience report (codemotion milan)

me

Logs are not easily searchable in CloudWatch Logs.

Page 106: Serverless in production, an experience report (codemotion milan)

LOG OVERLOAD

Page 107: Serverless in production, an experience report (codemotion milan)

CENTRALISE LOGS

Page 108: Serverless in production, an experience report (codemotion milan)

CENTRALISE LOGS

MAKE THEM EASILYSEARCHABLE

Page 109: Serverless in production, an experience report (codemotion milan)

+ +the elk stack

Page 110: Serverless in production, an experience report (codemotion milan)

CloudWatch Logs

Page 111: Serverless in production, an experience report (codemotion milan)

CloudWatch Logs AWS Lambda ELK stack

Page 112: Serverless in production, an experience report (codemotion milan)

CloudWatch Events

Page 113: Serverless in production, an experience report (codemotion milan)
Page 114: Serverless in production, an experience report (codemotion milan)

http://bit.ly/2f3zxQG

Page 115: Serverless in production, an experience report (codemotion milan)

DISTRIBUTED TRACING

Page 116: Serverless in production, an experience report (codemotion milan)
Page 117: Serverless in production, an experience report (codemotion milan)

“my followers didn’t receive my new post!”

- a user

Page 118: Serverless in production, an experience report (codemotion milan)

where could the problem be?

Page 119: Serverless in production, an experience report (codemotion milan)

correlation IDs*

* eg. request-id, user-id, yubl-id, etc.

Page 120: Serverless in production, an experience report (codemotion milan)

ROLL YOUR OWNCLIENTS

Page 121: Serverless in production, an experience report (codemotion milan)

kinesis client

http client

sns client

Page 122: Serverless in production, an experience report (codemotion milan)

http://bit.ly/2k93hAj

Page 123: Serverless in production, an experience report (codemotion milan)

ROLL YOUR OWNCLIENTS

X-RAY

Page 124: Serverless in production, an experience report (codemotion milan)

Amazon X-Ray

Page 125: Serverless in production, an experience report (codemotion milan)

Amazon X-Ray

Page 126: Serverless in production, an experience report (codemotion milan)

traces do not span over API Gateway

Page 127: Serverless in production, an experience report (codemotion milan)

http://bit.ly/2s9yxmA

Page 128: Serverless in production, an experience report (codemotion milan)

MONITORING + ALERTING

Page 129: Serverless in production, an experience report (codemotion milan)

“where do I install monitoring agents?”

Page 130: Serverless in production, an experience report (codemotion milan)

you can’t

Page 131: Serverless in production, an experience report (codemotion milan)

• invocation Count• error Count• latency• throttling• granular to the minute• support custom metrics

Page 132: Serverless in production, an experience report (codemotion milan)

• same metrics as CW• better dashboard• support custom metrics

https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/

Page 133: Serverless in production, an experience report (codemotion milan)
Page 134: Serverless in production, an experience report (codemotion milan)
Page 135: Serverless in production, an experience report (codemotion milan)
Page 136: Serverless in production, an experience report (codemotion milan)

my code

Page 137: Serverless in production, an experience report (codemotion milan)

my code

Page 138: Serverless in production, an experience report (codemotion milan)

my codeinternet internet

press button something happens

Page 139: Serverless in production, an experience report (codemotion milan)

“how do I batch up and send logs in the

background?”

Page 140: Serverless in production, an experience report (codemotion milan)

you can’t (kinda)

Page 141: Serverless in production, an experience report (codemotion milan)

console.log(“hydrating yubls from db…”);

console.log(“fetching user info from user-api”);

console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);

console.log(“MONITORING|1489795335|8|count|yubls-served”);

timestamp metric value

metric type

metric namemetrics

logs

Page 142: Serverless in production, an experience report (codemotion milan)

CloudWatch Logs AWS Lambda

ELK stacklogs

metrics

CloudWatch

Page 143: Serverless in production, an experience report (codemotion milan)

http://bit.ly/2gGredx

Page 144: Serverless in production, an experience report (codemotion milan)

DASHBOARDS

Page 145: Serverless in production, an experience report (codemotion milan)

DASHBOARDS

SET ALARMS

Page 146: Serverless in production, an experience report (codemotion milan)

DASHBOARDS

SET ALARMS

TRACK APP-LEVELMETRICS

Page 147: Serverless in production, an experience report (codemotion milan)

Not Only CloudWatch

Page 148: Serverless in production, an experience report (codemotion milan)
Page 149: Serverless in production, an experience report (codemotion milan)

“you really don't want your monitoring

system to fail at the same time as the

system it monitors” - me

Page 150: Serverless in production, an experience report (codemotion milan)

CONFIG MANAGEMENT

Page 151: Serverless in production, an experience report (codemotion milan)

easily and quickly propagate config changes

Page 152: Serverless in production, an experience report (codemotion milan)
Page 153: Serverless in production, an experience report (codemotion milan)

me

Environment variables make it hard to share configurations

across functions.

Page 154: Serverless in production, an experience report (codemotion milan)

me

Environment variables make it hard to implement fine-grained

access to sensitive info.

Page 155: Serverless in production, an experience report (codemotion milan)

CENTRALISEDCONFIG SERVICE

Page 156: Serverless in production, an experience report (codemotion milan)

config servicegoes here

Page 157: Serverless in production, an experience report (codemotion milan)
Page 158: Serverless in production, an experience report (codemotion milan)
Page 159: Serverless in production, an experience report (codemotion milan)
Page 160: Serverless in production, an experience report (codemotion milan)

SSM Parameter

Store

Page 161: Serverless in production, an experience report (codemotion milan)

sensitive data should be encrypted in-flight, and at rest

(credentials, connection string, etc.)

Page 162: Serverless in production, an experience report (codemotion milan)

role-based access

Page 163: Serverless in production, an experience report (codemotion milan)

SSM Parameter Store

HTTPS

role-based access

encrypted in-flight

Page 164: Serverless in production, an experience report (codemotion milan)

SSM Parameter Store

encrypt

role-based access

Page 165: Serverless in production, an experience report (codemotion milan)

SSM Parameter Store

encrypted at-rest

Page 166: Serverless in production, an experience report (codemotion milan)

HTTPS

role-based access

SSM Parameter Store

encrypted in-flight

Page 167: Serverless in production, an experience report (codemotion milan)

CENTRALISEDCONFIG SERVICE

CLIENT LIBRARY

Page 168: Serverless in production, an experience report (codemotion milan)

fetch & cache at Cold Start

Page 169: Serverless in production, an experience report (codemotion milan)

invalidate at interval + signal

Page 170: Serverless in production, an experience report (codemotion milan)

http://bit.ly/2yLUjwd

Page 171: Serverless in production, an experience report (codemotion milan)

PRO TIPS

Page 172: Serverless in production, an experience report (codemotion milan)

max 75 GB total deployment package size*

* limit is per AWS region

Page 173: Serverless in production, an experience report (codemotion milan)

Janitor Monkey

Page 174: Serverless in production, an experience report (codemotion milan)

Janitor Lambda

http://bit.ly/2xzVu4a

Page 175: Serverless in production, an experience report (codemotion milan)

disable versionFunctions in

Page 176: Serverless in production, an experience report (codemotion milan)

install Serverless framework as dev dependency at project level

dev dependencies are excluded since 1.16.0

Page 177: Serverless in production, an experience report (codemotion milan)

http://bit.ly/2vzBqhC

Page 178: Serverless in production, an experience report (codemotion milan)

http://amzn.to/2vtUkDU

Page 179: Serverless in production, an experience report (codemotion milan)

UNDERSTANDCOLDSTARTS

Page 180: Serverless in production, an experience report (codemotion milan)

Amazon X-Ray1st invocation

2nd invocation

cold start

Page 181: Serverless in production, an experience report (codemotion milan)

source: http://bit.ly/2oBEbw2

Page 182: Serverless in production, an experience report (codemotion milan)

http://bit.ly/2rtCCBz

Page 183: Serverless in production, an experience report (codemotion milan)

C#

http://bit.ly/2rtCCBz

Page 184: Serverless in production, an experience report (codemotion milan)

Java

http://bit.ly/2rtCCBz

Page 185: Serverless in production, an experience report (codemotion milan)

NodeJs, Python

http://bit.ly/2rtCCBz

Page 186: Serverless in production, an experience report (codemotion milan)

AVOIDCOLDSTARTS

Page 187: Serverless in production, an experience report (codemotion milan)

CloudWatch Event AWS Lambda

Page 188: Serverless in production, an experience report (codemotion milan)

CloudWatch Event AWS Lambda

ping

ping

ping

ping

Page 189: Serverless in production, an experience report (codemotion milan)

CloudWatch Event AWS Lambda

ping

ping

ping

ping

Page 190: Serverless in production, an experience report (codemotion milan)

CloudWatch Event AWS Lambda

ping

ping

ping

ping

HEALTH CHECKS?

Page 191: Serverless in production, an experience report (codemotion milan)

AVOID HARDASSUMPTIONS

ABOUT FUNCTIONLIFETIME

Page 192: Serverless in production, an experience report (codemotion milan)

USE STATE FOR

OPTIMISATION

Page 193: Serverless in production, an experience report (codemotion milan)

max 5 mins execution time

Page 194: Serverless in production, an experience report (codemotion milan)

USE RECURSIONFOR LONG

RUNNING TASKS

Page 195: Serverless in production, an experience report (codemotion milan)

@theburningmonktheburningmonk.comgithub.com/theburningmonk

Page 196: Serverless in production, an experience report (codemotion milan)

@theburningmonktheburningmonk.comgithub.com/theburningmonk

http://bit.ly/2yQZj1H

all my blog posts on Lambda