operational challenges behind serverless architectures
TRANSCRIPT
![Page 1: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/1.jpg)
Operational challenges behindServerless architectures
16 Mai 2017 - AWS User group
![Page 2: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/2.jpg)
Who am I?
Laurent Bernaille @d2si
• OPS background• Cloud enthousiast• Opensource advocate• Love discovering, building (and breaking…) new things• Passionate about the ongoing IT transformations
@lbernail
![Page 3: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/3.jpg)
About this talk
![Page 4: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/4.jpg)
About this talk
![Page 5: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/5.jpg)
Agenda
• Observability
• Challenges with event based architecture
• Understanding new services
• Security
• Continuous Delivery
![Page 6: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/6.jpg)
Observability
![Page 7: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/7.jpg)
Monitoring: how do I monitor my functions?
• Are my functions behaving well?
• Where is my New Relic?
• Where is my Datadog?
![Page 8: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/8.jpg)
Monitoring: for lambda, we can use cloudwatch!
Invocations/mn
Average duration
• Simple application: <20 lambdas• Is this normal? What about trends? What about scale?• What about user experience?
![Page 9: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/9.jpg)
Monitoring: What about errors?
Errors
Are these errors "normal"?
What kind of errors?• Code errors?• Execution errors (out of memory? out of time?)• Lambda runtime error (can they happen?)
Are they related to retries?
![Page 10: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/10.jpg)
Logging: what are the cause for errors / latency?
• Lambda logs console/logger outputs• Logs are in Cloudwatch logs
One Log group per function, nice!
One Log stream per?
![Page 11: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/11.jpg)
Crazy amount of logs (only from lambda engine here)
> Requires careful configuration> AND appropriate tools
Logging: needle in a haystack
![Page 12: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/12.jpg)
Tracing: where is my function taking time?
• No off-the-shelf APM solution (yet)• Current State-of-the-art: manual tracing
![Page 13: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/13.jpg)
Challenges with event based architecture
![Page 14: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/14.jpg)
Snowball effects
Let's write a function that reacts to writes on s3• do a transformation• writes the result on s3
Guess what happens?
![Page 15: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/15.jpg)
Poison messages
Kinesis streamDynamo DB
Kinesis guarantees in-order delivery
What will happen now?
![Page 16: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/16.jpg)
Latency
Lambdas can be very fast
• < 10ms for simple treatments• What happens when we call many lambdas? Latency sums up• Is this fast enough?
- Paris-London, one-way 4-5ms- redis local latency? < 100us- simple operation on CPU? < 10ns
• Being fast is important, but on the other side, billing is per 100ms
Warm-up times
• First run of a lambda is *much* slower (100s ms)> Even slower in some cases (lambda in a VPC which requires an ENI)
• Lambdas are rescheduled regularly (every few hours) => new cold-start• What about new version of the code?
![Page 17: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/17.jpg)
Asynchronicity
Event processing is asynchronous, which can have side-effects
• Race conditions• Inconsistent states
> Applications must take this into account
![Page 18: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/18.jpg)
Understanding new services
![Page 19: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/19.jpg)
Lambda
Warm-up and rescheduling
Limits and throttling
• By default Lambda is limited to 100 concurrent executions (now 1000!)• For a 100ms function, it means 1000 invocations/s (now 10000/s)• No metric for concurrent executions
- Look at throttling- Estimate concurrency based on function duration / number of calls
Event source behavior / configuration
• One event at a time or batching• Retries• Dead-Letter queues
![Page 20: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/20.jpg)
Other managed services
New services• Serverless applications (usually) don't use RDBMS• Serverless applications (usually) don't use classic messaging technologies
Scalability• Scaling up / down needs to be automated• Not always simple
New services => New expertise• DynamoDB
- table and index design- read / write capacity estimation- optimize performance *and *costs
• Kinesis- sharding for multiplexing and scalability- when to reshard / merge shards?
![Page 21: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/21.jpg)
Security
![Page 22: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/22.jpg)
SecurityServerless helps with security• No Operating System to manage• No application runtime to manage• Limited attack surface (short function)• Short lifespan (<5mn for function, up to 6h for container)
And others are sometimes trickier• Many external services to secure (SAAS, managed services)• AWS permissions
But some things don't change• Code security• Frameworks• 3rd party dependancies
![Page 23: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/23.jpg)
Continuous Delivery
![Page 24: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/24.jpg)
Continous integration
Testing is not easy
• How do I replicate Lambda in my CI environment?• Will I use AWS services for unit testing?• What about mocking?
Local deployment is helpflul to iterate fast
• How do I replicate Lambda locally?• How can I simulate AWS services?
- "Easy" for some (many dynamoDB implementations)- Much harder for some complex integration (DynamoDB streams for instance?)- Several projects working on this (localstack)
![Page 25: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/25.jpg)
Packaging and versioning
Managing versioning
• Easy for the code• Lambda can be versioned in AWS
Most frameworks are designed to push from local machine
• Build the code, get dependencies, push• Can be duplicated in CI• But no real artifact that can be shared
Deploying the same version across environments?
Is there a deployment "artifact" I can share- across environements- across AWS accounts (Prod / Staging)- with all the dependencies built-in
![Page 26: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/26.jpg)
What is an application?
Is it a single function?• Deployed independently• Versioned independently> What about shared libraries between functions?
The answer is probably somewhere in the middle• No clear best practice yet• Trial and error
Is it all my functions?• Versioned as a whole• With bundled shared libraries• Same artifact with different handlers• Deployed together or independently?> Functions and dependencies can sum up to a big artifact (Megabytes)
![Page 27: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/27.jpg)
Conclusion
![Page 28: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/28.jpg)
Conclusion
Serverless is the future (or a big part of it)• Focus on business logic that matters• Much simpler applications• Really pay for what you use
Serverless creates many new challenges• How can we adapt standard code best practices?• How do operate these new applications?
From NoOPS to NewOPS• No longer sysadmins or netadmins• Supervision remains similar but requires new tools• A big focus on new architectures and new backends• Optimize for performance and costs
![Page 29: Operational challenges behind Serverless architectures](https://reader034.vdocuments.us/reader034/viewer/2022051710/5a64e1297f8b9a88148b5c6f/html5/thumbnails/29.jpg)
Questions?
Thank you
@lbernail