performance monitoring and call tracing in microservice environments
TRANSCRIPT
Performance Analysis and
Call Tracingin Microservice environments
Martin Gutenbrunner
Dynatrace Innovation Lab
@MartinGoowell
Microservice Meetup Berlin – 2016-06-30
About me
Started with Commodore 8-bit (VC-20 and C-64)
Built Null-Modem connections for playing Doom and WarCraft I
Went on to IPX/SPX networks between MS-DOS 6.22 and
WfW 3.11
Did DevOps before it was a thing (mainly Java and Web)
for ~ 10 years
Now at Dynatrace Innovation Lab
Tech Lead for Azure and Microservices
Find me on Twitter: @MartinGoodwell
Passionate about life, technology and the people behind both of them.
Agenda
Traditional monitoring
What‘s wrong with it?
Performance in your code
The dramatic dilemma
Happy end
@MartinGoodwell
Questions
Please, ask and interrupt anytime!
What‘s your occupation?
Dev, Ops, BinExec?
What‘s your technology stack?
Java, .net
Node.js
Who of you knows what APM is/does?
A lil` bit o`history
Traditional monitoring was for Ops only
APM (incl. Call Tracing) is also for devs, debugging, pre-prod
@MartinGoodwell
Monitoring
@MartinGoodwell
Host performance
CPU-usage
Memory-usage
Disk IO
Network performance
@MartinGoodwellNagios
What‘s wrong with it?
Nothing is wrong
Some things might just be out of scope
No insight into your application‘s performance
@MartinGoodwell
Performance in your codea.k.a. Application Performance Management
@MartinGoodwell
Add monitoring code
@MartinGoodwell
Use statsd
@MartinGoodwell
statsd real quick
http://www.slideshare.net/DatadogSlides/dev-opsdays-tokyo2013effectivestatsdmonitoring@MartinGoodwell
Use JMX
@MartinGoodwell
@MartinGoodwell
Aspect oriented programming
http://veerasundar.com/blog/2010/01/spring-aop-example-profiling-method-execution-time-tutorial/@MartinGoodwell
Graphite Visualization
@MartinGoodwell
Any downsides here?
Basic approaches are subject to polluting your code
AOP is the better choice, but requires advanced skills
If you‘re not using something like statsd, it‘s hard to have a central spot for
all your performance data of different components
Great for performance insights of single components
What about 3rd parties?
Or distributed systems?
Like, microservices, maybe
@MartinGoodwell
What about components which we
can‘t modify?like databases, message queues, ...
@MartinGoodwell
Best case: use readily available APIs or integrations (statsd, JMX, etc)
For open-source: apply same technique as to your own code
Keeping in sync with original code can become tedious
try to make your changes part of the original project
Use dedicated monitoring tools
Very common for databases
BUT even the best tool is an additional tool
How long does it take to get a new team member up-to-speed?
@MartinGoodwell
Microservices
@MartinGoodwell
Microservices vs SOA
Microservices
fit the scope of a single application
Service Oriented Architecture
is scoped to fit enterprises / environments / infrastructures
@MartinGoodwell
For a dev, microservices hardly pose any downsides
On the upside, the code-size and scope of the domain becomes smaller
Any best practices for analyzing performance of a single microservice are still
valid
The real challenge of microservices is proper operation
@MartinGoodwell
What‘s the challenge about monitoring
microservice?
The big challenge of well performing microservices is the communication
between the microservices
Not in the high-performance of a single microservice
Tracing calls between services is very difficult
@MartinGoodwell
@MartinGoodwell
Source: http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/
Call Tracing
@MartinGoodwell
@MartinGoodwell
Source: http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/
@MartinGoodwell
Source: http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/
In Java
https://taidevcouk.wordpress.com/category/experiments/
@MartinGoodwell
C#
http://theburningmonk.com/2015/05/a-consistent-
approach-to-track-correlation-ids-through-microservices/ @MartinGoodwell
Leverage on existing tools
https://github.com/ordina-jworks/microservices-dashboard
@MartinGoodwell
Spring Cloud Sleuth
@MartinGoodwell
Sleuth: https://github.com/spring-cloud/spring-cloud-sleuth
Spring Cloud Sleuth implements a distributed tracing
solution for Spring Cloud.
@MartinGoodwell
Zipkin
@MartinGoodwell
Trace
https://trace.risingstack.com/
So, here we got everything we need?
Usually, one tracing solution only covers a single technology
Besides visualization, you‘ll also want log analysis
ELK stack does this really well, especially in connection with correlation Ids
But ELK stack does no visualization
And your visualization does no log analysis
yet another tool
Don‘t get me started about integrating all this with host monitoring...
The trace ends, where your code ends
No correlation IDs for database calls
@MartinGoodwell
What‘s next?
@MartinGoodwell
Considerations for custom
implementations
Multitude of languages
Open-source tools can get expensive
Manual configuration
Often only applicable to a single technology
Keep the pace with new technology
Serverless code (eg AWS Lambda, Azure Functions)
@MartinGoodwell
http://de.slideshare.net/InfoQ/netflix-built-its-own-monitoring-system-and-
why-you-probably-shouldnt
@MartinGoodwell
The Ops‘ dilemmahow to handle all this in production
how to identify production issues
how to tell the devs, what they should look into, w/o tearing down everything
@MartinGoodwell
All fine?
While the Dev can leverage on a huge number of tools, libs and frameworks,
it‘s still up to the Ops to integrate it into a single, unified, well-integrated
solution that allows to draw the right conclusions
@MartinGoodwell
From Dev to Prod
Dev
Single transaction
Deal with a specific problem
No impact on real users and business
Can concentrate on single component
„perfect world“
A dev‘s deadline is made of Sprints
A couple of weeks, usually
Ops
100s or 1000s of transactions
No idea, what the prob is
Slow or bad requests impact real
users and business
Lots of components that might not
be under your control
An Op‘s deadline is made of SLAs
Hours, maybe just minutes
@MartinGoodwell
The Dev-Ops-Dev-Ops-Dev-Ops dilemma
Dev
Ops
@MartinGoodwell
Sprint
(days / weeks)
SLA
(hours / minutes)
From Prod to Dev
Dev
Single transaction
Deal with a specific problem
No impact on real users and business
Can concentrate on single component
„perfect world“
Ops
100s or 1000s of transactions
No idea, what the prob is
Slow or bad requests impact real users and
business
Lots of components that might not be under
your control
Which?
Which?
Time!
Reproduce
?
@MartinGoodwell
Commercial solutionsDynatrace Ruxit
@MartinGoodwell
@MartinGoodwell
Dynatrace Ruxit
@MartinGoodwell
Set-up in 5 minutes
Install a single monitoring agent per host
Everything is auto-detected
No changes to your source-code
No changes to runtime configuration
Supports a wide array of technologies
http://www.dynatrace.com/en/ruxit/technologies/
@MartinGoodwell
Traditional metrics
@MartinGoodwell
Service metrics
@MartinGoodwell
Does not end at your custom
components
@MartinGoodwell
Baselining
Automatically detects and correlates problems without setting thresholds
@MartinGoodwell
Includes the Client-side
Browser auto-injection
Includes client-side JavaScript in traces and problem-correlation
@MartinGoodwell
Visualization
@MartinGoodwell
Call Tracing
@MartinGoodwell
Solving a dilemma
Include this URL in a
trouble ticket and the Dev
can jump in right away
@MartinGoodwell
Supporting most popular technologies
• Java
• .NET
• Node.js
• PHP
• Databases via
• JDBC
• ADO.NET
• PDO
• Message Queues
• Caches
• Cloud Infrastructure Metrics
• See more at
http://www.dynatrace.com/en/ruxit/technologies/
@MartinGoodwell
Dynatrace Ruxit
2016 hours for free
@MartinGoodwell
http://bit.ly/monitoring-2016
References
https://www.nagios.org
https://github.com/etsy/statsd/wiki
http://veerasundar.com/blog/2010/01/spring-aop-example-profiling-method-execution-time-tutorial/
http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/
http://apmblog.dynatrace.com/2014/06/17/software-quality-metrics-for-your-continuous-delivery-pipeline-part-iii-logging/
https://blog.buoyant.io/2016/05/17/distributed-tracing-for-polyglot-microservices/
https://blog.init.ai/distributed-tracing-the-most-wanted-and-missed-tool-in-the-micro-service-world-c2f3d7549c47#.93r1dj6ah
@MartinGoodwell