five approaches to - epsagon · although faas is only a subset of the serverless ecosystem, the two...
Post on 22-May-2020
2 Views
Preview:
TRANSCRIPT
1www.epsagon.com info@epsagon.com
Five Approaches to Serverless Application Observability
2www.epsagon.com info@epsagon.com
Table of Contents
Introduction ................................................................................................................. 3
About Serverless Applications ............................................................................. 4
Simple Serverless App Example ................................................................. 4
Key Characteristics of Serverless Apps .................................................... 5
More on Serverless Apps ............................................................................. 5
About Observability .................................................................................................. 6
Serverless vs. Traditional Observability ............................................................. 7
Five Approaches to Serverless Observability .................................................. 8
1. Cloud Provider Monitoring Tools ............................................................ 8
2. Log Streaming to an External Service ................................................... 8
3. Function-Level Monitoring ....................................................................... 9
4. Self-Built Observability Solutions ............................................................ 11
5. Automated Distributed Tracing Solutions .......................................... 11
How Epsagon Can Help ......................................................................................... 13
3www.epsagon.com info@epsagon.com
Introduction
The serverless software development approach went mainstream when
AWS introduced Lambda – Function-as-a-Service (FaaS) in November 2014.
Although FaaS is only a subset of the serverless ecosystem, the two terms
are often used interchangeably.
Serverless/FaaS is yet another quantum leap in the evolution of computing
abstraction that started in the 1960s when programming languages buffered
developers from machine language. More recently, we have seen the
emergence of virtual machines, which abstract and standardize infrastructure
deployments on commodity servers, as well as containers that abstract apps
from both the operating system and infrastructure layers.
The benefits and challenges of serverlessEvent-driven, pay-per-execution serverless applications do, of course, use
both physical and virtual servers. However, developers and the operations
team do not interact with the infrastructure. They rely solely on managed
compute services to execute the code and scale as necessary, leveraging
APIs to consume third-party services. Because pricing is based on the
number of executions versus pre-purchased compute capacity, serverless
applications have lower runtime costs. Serverless also promotes faster
time-to-market and enhances process agility. It achieves this by allowing
developers to focus on the application rather than the infrastructure and by
eliminating the need to set up separate dev/test, staging, and production
environments.
Still, with all the benefits that are driving the growth of serverless computing, it
is not without its challenges. Serverless applications are complex, distributed
architectures of functions and API calls, encapsulated within ephemeral
containers. Without careful development and extensive testing, performance
issues can degrade the user experience and rack up costs. And yet, end-
to-end realtime observability is key to monitoring and troubleshooting
application performance, and it is not easy to achieve this in a serverless
environment.
This white paper explores why it is difficult to achieve observability in
serverless systems and how next-generation platforms offer the required
paradigm shift that closes the serverless-observability gap.
4www.epsagon.com info@epsagon.com
About Serverless Applications
Today the uptake of serverless computing is comparable to what we were
seeing for containers back in 2016. According to a recent Cloud Foundry
survey, 46% of the respondents stated that their companies are both using
and evaluating serverless, with another 35% solely in the evaluation stage.
F5’s 2019 State of Application Services report found that 29% of cloud
architects, 24% of SRE/DevOps, and 24% of executives view serverless as
a strategic trend. Gartner also puts serverless computing at the top of their
list of ten trends that will impact infrastructure and operations in 2019. And
market analysts are predicting dramatic growth rates for the global FaaS
market, which is expected to reach a value of ~$12 billion by 2022, at a ~34%
compound annual growth rate as of 2017.
Simple Serverless Application ExampleThe following graphic is a high-level description of a typical serverless web
application on AWS:
1 The static website UI is hosted in and served through an Amazon S3 bucket.
2 The app’s login and data access services are built as Lambda functions, with the developer specifying the amount of memory required and the maximum execution time. When invoked by calls from the client app, Lambda runs the functions in stateless compute containers that are ephemeral, sometimes lasting for only one invocation.
3 The functions read from and write to a fully managed NoSQL DynamoDB database (not essential to a serverless app and used here only as an example). They also handle the responses back to the client app.
4 Other external services used by the app are Cognito for authenticating users and STS for generating temporary AWS credentials for users to invoke Lambda.
5www.epsagon.com info@epsagon.com
Key Characteristics of Serverless ApplicationsServerless apps are usually highly distributed and make very heavy use
of third-party APIs. Although not illustrated in the simple example shown
above, serverless systems will often use an API Gateway to map the Lambda
functions to the API endpoints via RESTful HTTP requests.
Pay-Per-Use
Note that in our example there is no programmed access to a server. One of the
key principles of serverless apps is that, upon invocation, the provider of the
managed function service takes care of all infrastructure requirements. This
includes launching and loading the container to automatically provisioning
the compute/storage resources and services required by the executed code.
The FaaS vendor also handles concurrency and ensures virtually unlimited
scalability, with the customer paying only for the time that the code is actually
running. The cost for this differs depending on the amount of memory
provisioned but, by way of example, on AWS a Lambda function with 512MB
of memory costs $0.000000834 per every 100 ms.
Limited Resources
Another key characteristic of serverless apps is the ephemeral lifespan of
the function container (typically lasting for only one invocation) as well as
strict timeout limitations for the functions themselves. In AWS, for example,
the maximum execution time for a Lambda function is fifteen minutes, and
the Amazon API Gateway has a 29-second integration timeout. In order to
avoid timed-out function calls or too-frequent container cold starts, both
of which can degrade app performance, it is important to choose function
timeout values carefully. It is also important to monitor and make every effort
to reduce the time that a function waits for a response to an API call.
Last but not least, the more memory allocated to a function in serverless, the
more CPU resources are available to execute its code. This in turn shortens
its runtime and reduces the frequency of timeouts. However, as noted above,
the amount of memory provisioned will affect the run cost.
More on Serverless ApplicationsIn short, serverless developers need to thoroughly understand and carefully
balance timeout, API, and memory allocation considerations in order to
achieve optimal but cost-effective system performance. The following articles
provide more detailed information on these technical challenges as well as
guidelines on how to overcome them:
• Best Practices for AWS Lambda Timeouts
• The Importance and Impact of APIs in Serverless
• Finding Serverless’ Hidden Costs
6www.epsagon.com info@epsagon.com
About Observability
Today’s application architectures—whether serverless or not—are highly
distributed and highly complex. Throughout the application lifecycle, from
development and testing to staging and production, it is critical that all
aspects of the system’s current status be observable. In Cindy Sridharan’s
book “Distributed Systems Observability,” she notes that observability is
an inherent property of a system that has been “designed, built, tested,
deployed, operated, monitored, maintained, and evolved.” This is with the
understanding that, among other things, “no complex system is ever fully
healthy” and “distributed systems are pathologically unpredictable.”
An observable system, therefore, is one that makes the internal state of all of
its components externally observable, usually through instrumentation code.
Observability signals include logs, metrics, traces, exception trackers, and
detailed profiles to name but a few. Although observability is often considered
just a more sophisticated form of monitoring, monitoring is really only the
starting point of observability. And effective observability must provide all of
the following capabilities:
MONITORING SYSTEM PERFORMANCE
Verifying at all points in time that the system is working
as planned, customers are getting the right service at the
right SLA, errors are being captured and handled, logic and
business flows are correct, and so on.
TROUBLESHOOTING FAILURES
In the event of a detected anomaly, it is important for both
developer productivity and, in the case of production
systems, customer satisfaction that the cause can be easily
investigated. Ideally, the root cause analysis of an observable
system will be highly automated so that the problem can be
remediated as quickly as possible.
VISUALIZATION
With so many interdependent moving parts, it is difficult to
get a meaningful understanding of a distributed system’s
current state without visualization. A graphic representation
of the system elements, their interdependencies, and their
historical and current metrics is important for many reasons.
These allow for proper troubleshooting and more effective
day-to-day management of the system. They will also raise
the confidence of development and operations teams that
they have a good grasp of system performance.
7www.epsagon.com info@epsagon.com
Serverless vs. Traditional ObservabilityYan Cui, AWS Serverless Hero and frequent guest blogger for Epsagon, has
written the definitive article about the new challenges that serverless poses
for current observability practices. With AWS as his frame of reference, the
key points that he raises are the following:
• Because serverless separates the code from the infrastructure running
it, there is no place to install agents to automatically collect and transmit
system data to an observability system. Often, the only way to achieve
data collection automation is through manual instrumentation, which is
both tedious and, like everything manual, error-prone.
• Because everything has to be executed within an invoked function, you
can no longer perform your own background processing. You have to rely
on whatever the platform gives you in terms of logs and tracing data.
• Serverless frees you from the need to manage concurrency. But the flip
side of this is that it will be harder to batch observability data, and there
will be a much higher volume of traffic to the observability system. At scale,
this can have both performance and cost implications.
• Defining bigger batch sizes will not solve the data volume problem, as a lot
of valuable data may be lost due to the short lifespan of Lambda functions
and subsequent frequency of garbage collection.
• Functions are often chained together through asynchronous invocations.
Unfortunately, tracing these invocations through so many different event
sources is difficult.
The bottom line is that diligently instrumenting your serverless functions may
improve visibility into your system’s health. But the sheer volume of data that
then gets sent to the observability system may actually obfuscate debugging
as well as impact client-side latency—which may, in turn, have an impact on
business outcomes.
In addition, traditional observability practices and tools are not equipped
to identify and handle the new technical challenges posed by serverless
architectures, such as timeouts, out-of-memory errors, slow API responses,
and too-frequent container cold starts.
8www.epsagon.com info@epsagon.com
Five Approaches to Serverless ObservabilityThis section describes five approaches that are often used separately or in
combination to address the challenges involved in gaining greater serverless
observability.
Cloud Provider Monitoring ToolsDefault cloud vendor consoles, such as AWS CloudWatch, are monitoring
and management services that aggregate metrics and logs across
distributed stacks and services. However, in more complex distributed
systems with multiple functions, queues, triggers, etc., it is difficult to
understand connections across the different log entries, which are refreshed
asynchronously.
Log Streaming to an External ServiceLog aggregation platforms such as Splunk or Loggly provide single-pane
views of log metrics across distributed systems. Some of these platforms
analyze the data to create baseline performance profiles so that they can
then detect and alert to suspected anomalies.
Source: AWS: Getting Started with CloudWatch
9www.epsagon.com info@epsagon.com
Source: Kibana queries and filters
However, log aggregation platforms still require the development team to
generate the logs. In addition, they do not make it easier to overcome the
often asynchronous nature of serverless systems. Even in a log aggregation
dashboard, it can still be difficult to quickly understand the relationships
between events and triggers, making troubleshooting cumbersome. Last but
not least, these third-party log aggregation platforms can be expensive.
Function-Level MonitoringMore advanced observability layers, such as AWS X-Ray, automatically
integrate and instrument Lambda functions, providing an end-to-end function-
level view of requests as they traverse all components of the application.
AWS X-Ray example for monitoring Lambda showing a synchronous request with one downstream call to Amazon S3. Source: AWS Lambda Developer Guide
10www.epsagon.com info@epsagon.com
AWS X-Ray does make it easier to identify slow spots or slow AWS APIs and
even analyze root causes. But currently you cannot connect asynchronous
events with AWS X-Ray, so it is hard to trace a chain of events such as a
message that is published into SNS which, in turn, triggers another Lambda.
Also, the developer has to insert traces manually, which is not optimal when
monitoring a dynamic environment.
Distributed Tracing
The key drawback, however, of observability layers like AWS X-Ray is that
they essentially only measure the metrics of functions. And they only do so
on an individual basis. They will let you see, for example, that Function X
failed five times in the last hour, that the average duration of Function Y is 10
seconds, or that Function Z has an unusually high number of cold starts that
are affecting its latency. This information can be helpful to detect relatively
simple and straightforward issues related to the system’s health. But it cannot
identify application-level issues, such as a user trying to purchase an item
online and abandoning the cart because it's taking too long.
Thus, in a distributed serverless system with many moving parts, function-
level monitoring can not provide insight into business flows. If the individual
function metrics look good, you can be lulled into thinking that the application
is performing as planned. To achieve application-level observability, you need
distributed tracing. As shown below, distributed tracing tracks what happens
to a request across all the involved components when the user interacts with
the application. This is visualized in logical chunks, or spans.
11www.epsagon.com info@epsagon.com
Self-Built Observability SolutionsIt might be tempting to make use of existing open source libraries and build
a customized tracing solution for your serverless system. Turning to Yan Cui
once again for inspiration, he has written a detailed blog on why and how
developers can introduce correlation IDs into their serverless functions to
debug transactions that involve multiple functions and event source types.
However, a self-built distributed tracing system is not just about correlating
events throughout a call chain. To be an enterprise-grade serverless
observability system, it also has to meet numerous operational and functional
requirements, such as security, being able to work across multiple external
APIs, and having a back end to analyze events, issue alerts, and present
insights visually.
In short, unless building a distributed tracing solution is a core business
activity of the organization, implementing it in-house is most probably not a
good use of resources.
Automated Distributed Tracing SolutionsGiven the complexities of building and maintaining your own distributed tracing
solution, another approach is to adopt a third-party platform developed by a
vendor whose core competency and business is distributed tracing. There
are a number of distributed tracing platforms out there. So when choosing
the right solution for your organization, it’s important to assess the following:
12www.epsagon.com info@epsagon.com
• How quickly will that solution help you troubleshoot issues in your particular
system? Will it identify issues automatically and alert you to them, or will
you have to search for events inside the platform?
• Will you have to manually instrument your code, or will the
solution automatically discover your system components and their
interdependencies?
• Does it provide insight beyond your code into the whole system—APIs,
managed services, orchestration services such as AWS Step Functions,
and so on?
• Does it have a comprehensive single-pane console, or will you have
to aggregate information from other sources such as cloud provider
dashboards?
13www.epsagon.com info@epsagon.com
How Epsagon Can Help
Epsagon has developed a next-generation serverless monitoring and
troubleshooting platform based on distributed tracing technology. With no
code changes whatsoever and in less than five minutes, Epsagon can fully
and automatically discover all of your serverless system’s components and
the relationships among them. After onboarding Epsagon, you are no longer
monitoring discrete functions. Rather, you are gaining actionable insight into
logic and business flows.
With Epsagon’s powerful visualization features, troubleshooting and root
cause analyses are fast and efficient. Epsagon also applies advanced artificial
intelligence methods to predict issues before they happen, allowing you to
fix problems before they impact system performance and user experience.
Epsagon’s serverless observability platform also provides valuable insights
into ongoing system performance. Thus, for example, you can discover
system behavior that may be unnecessarily increasing your system’s
runtime costs.
14www.epsagon.com info@epsagon.com
See Epsagon at work by signing up for a free trial, or contact us to discuss
how we can help you with your serverless observability challenges today.
15www.epsagon.com info@epsagon.com
The ABCs of Serverless Observability
Embrace the serverless revolution to accelerate your development
cycles, enhance process agility, and optimize runtime costs.
Yes, observability is a challenge in highly distributed serverless apps,
with their complex event-driven function call chains and limited
visibility into underlying infrastructures.
But today’s highly automated serverless observability platforms use
distributed tracing technology, advanced data analytics, and intuitive
visualization to monitor system health end to end and quickly alert
to anomalies.
12
3
top related